XBO to PS4 "Power" Differential: 40%
PS4P to XBO-X "Power" Differential: ~43%
PS5 to Series X "Power" Differential: 18%
One thing everyone using these percentages needs to keep in mind is that these are all on different architectures, so if you're looking at next-gen differentials compared to base last gen, you would get something more along the lines of 31.5% going by the old architecture (RDNA being 50% better than GCN, RDNA2 being perhaps 25% over RDNA1 (AMD gave 50% improved IPC on RDNA2 but doubtful that results in 50% efficiency gain except maybe on 7nm EUV which neither system seems to be using).
The other, more important thing, though, is that thanks to modern-day programming techniques, frontend architectural improvements of RDNA2, more advanced algorithms, specially customized hardware to suit said modern-day programming and algorithm techniques and overall improvements in system I/O pipeline thanks to use of storage of SSDs, you
WILL be able to "do more" per FLOP with next-gen systems than you could ever do with XBO and PS4, or even the mid-gen upgrades.
This is where specific system customizations will make a difference, and both systems have VERY tailored customizations that will give them an edge in certain areas. We already know of Sony's by and large, MS's are more mysterious ATM but Hot Chips should hopefully bring more of theirs to light. I think the main takeaway is that both systems have customizations that will help them "punch above their weight", and people getting caught up in dick-measuring contests over their system doing it "moreso" (with there being no proof of this actually being the case because you'd need full disclosure from all parties to do accurate comparisons) are just doing it to shake pom-poms at this point.
Because both the lead architect and developers have said it can sustain that frequency if need be?
PS5 can sustain max clocks for CPU & GPU simultaneously during processing loads that actually require that amount of power. Otherwise the question has consistently been what type of work loads generate a power excess to tip the budget into requiring a power downshift of the CPU or GPU, and how big the downshift would actually be.
Cerny's brief example was a best-case estimate, and we also didn't get an estimate of time duration of heavy workloads the CPU and GPU can handle while at max clock before needing to bring down the power load. For example would a continuous stream of AVX 128 instructions that, equal to a smaller load of AVX 256 instructions, could cause the same scenario Cerny described in the presentation? All things being equal, both scenarios would eventually cause the same situation to trigger WRT power budget being exceeded, but then say a big stream of additional sort of instructions are being processed for the game logic, could that actually bring the percentage drop down lower than Cerny's quoted amount (which again was a best-case scenario)?
We don't have clear answers on this and I think there's kind of a reason why they didn't elaborate in further detail on these kind of scenarios. One being because they were pressed for time, but the other being perhaps due to unsavory scenarios and outcomes needing to be openly acknowledged. If the clocks could be sustained at max level the vast majority of the time on work loads requiring max clocks, then the clocks could have simply been locked to a fixed frequency.