It's not only rendering the game twice but it's rendering a different version of the game. Lighting, textures, sound are all different. Then take into account AI, physics etc two games being calculated simultaneously.
OK, but is there even all that much of each to justify this?
Lighting is part of the rendering of the scene, if you have two cameras it is "different" by definition. So, not any different than a normal split screen.
Same for textures. They already are in RAM, so you might need more RAM in total for each section of the game, but that's it. Rendering two half screens with different textures in each shouldn't be any harder than rendering one full screen with twice the textures in view.
Sound has probably nothing to do at all with the GPU, and if it does and it's tanking an RTX 3080, then something went VERY wrong in the optimization phase.
As for AI and physics, is there even that much? Just like with textures, having a bunch of active entities per screen shouldn't be any harder than having twice as many in a single screen. Also, hardly impactful on the GPU.
Look at it this way. Imagine in the infinite 3D space, you have two 3D enclosed environments, and you place a camera and a character in each. The fact that the environments are designed to look similar in plant, and that your controller inputs are bound to both characters and cameras at the same time, don't mean you are running "two games", it's just two cameras placed in two different points of the world, each rendering half the screen. So, as far as the renderer is concerned, it's just like playing a split screen game in a large map where the two players always stay very far from one another.
At least, this is is how I assume they did it. If they came up with a different system entirely that assigns multiple meshes and visual properties to the same objects and uses ore or the other depending on the camera, it's clearly needlessly bleeding performance over the simpler solution.