• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Microsoft Xbox Series X's AMD Architecture Deep Dive at Hot Chips 2020

splattered

Member
Now we can officially definitively calculate the raytracing performance of Xbox Series X


fqvK7bgMNGxQdNKNnHKZHQ-1366-80.png



XSX - 4 x 52 x 1.825 = 379.6 billion ray triangle intersection per clock

compared to PS5

PS5 - 4 x 36 x 2.23 = 321.12 billion

THIS IS NOT A TROLL POST...

But is the "PS5 - 4 x 36 x 2.23 = 321.12 billion" equation a theoretical max performance spec considering the PS5 is working with Smartshift here? I wonder if the GPU clock scales down RT performance as it drops lower to give power to the CPU? Or is it just fixed at 321.12?

Probably a silly question but figured one worth asking.
 

M1chl

Currently Gif and Meme Champion
In Road to PS5 Cerny said the Tempest Engine was about equivalent to a single CU core. PS5 TF is 10.275. 10.275 / 36 = 285 GFLOPs. So that's roughly what Tempest Engine is. They also said it can consume about 20 GB/s of memory bandwidth.

The Jaguar FLOPs I had to go to an older GAF thread to find, I'm taking that person's numbers at face value but they break down One X's CPU FLOPs to about 148 GFLOPs. However MS have already used "raw" numbers before kind of underplaying their GPU capabilities (e.g Series X "2x GPU of One X", doesn't account for architecture changes, which actually puts it much higher than "just" 2x One X's GPU). RDNA1 IPC over GCN was 50%, RDNA2 IPC over RDNA1 is roughly 25%.

Assuming Zen 2 gains over Jaguar architecture at least mirror RDNA1 over GCN, that puts Series X's audio solution at least around 222 GFLOPs - 225 GFLOPs. However I also remember MS saying they had "4x CPU performance" for Series X over One X and XBO. So at an extreme example that'd actually put Series X's audio solution at equivalent of 592 GFLOPs of One X's CPU cluster. That might be an extreme end though and I'm not nearly as such about that given figure vs. the 222 GFLOPs - 225 GFLOPs one.

So other way of seeing the audio solutions would be PS5 as 285 GFLOPs of RDNA2 equivalent and Series X as 222 GFLOPs (or 225 GFLOPs) equivalent of GCN. However it may be closer to 592 GFLOPs of GCN equivalent taking MS's statement of Series X CPU being "4x" that of One X's into account.
It's weird to say that it's equivalent to single CU, when those CUs are in cluster per 2 CUs... Just pointing that out.

THIS IS NOT A TROLL POST...

But is the "PS5 - 4 x 36 x 2.23 = 321.12 billion" equation a theoretical max performance spec considering the PS5 is working with Smartshift here? I wonder if the GPU clock scales down RT performance as it drops lower to give power to the CPU? Or is it just fixed at 321.12?

Probably a silly question but figured one worth asking.

It's okay to ask questions, obviously performance is going to lower when you limit the frequency which GPU can reach. However we don't know how well and how much smartshift is going to throttle performance of the GPU.
 
Last edited:
THIS IS NOT A TROLL POST...

But is the "PS5 - 4 x 36 x 2.23 = 321.12 billion" equation a theoretical max performance spec considering the PS5 is working with Smartshift here? I wonder if the GPU clock scales down RT performance as it drops lower to give power to the CPU? Or is it just fixed at 321.12?

Probably a silly question but figured one worth asking.

The answer is yes.

Further explanation is as follows: there is enough power for both CPU and GPU to potentially run at their max clocks. In cases where they do not, it is more likely that the CPU will be throttled back and not the GPU because it's far more likely that the GPU will be the bottleneck and not the CPU.
 

Tripolygon

Banned
THIS IS NOT A TROLL POST...

But is the "PS5 - 4 x 36 x 2.23 = 321.12 billion" equation a theoretical max performance spec considering the PS5 is working with Smartshift here? I wonder if the GPU clock scales down RT performance as it drops lower to give power to the CPU? Or is it just fixed at 321.12?

Probably a silly question but figured one worth asking.
They are both theoretical max specs regardless of whether the frequency is locked or variable.

Everything in the GPU scales with clock. Smartshift does not divert power away from the CPU or GPU, it diverts unused power from the overall power budget to where it is needed.

Hope that helps.
 
Last edited:

splattered

Member
So they have an audio chip we didn't know about?

Other than that, sounds like some actual GPU customization, specifically regarding ray tracing?

Someone summarize what was unknown before vs. known plz lol

It's always been known the XsX had an audio chip, we just didn't have any actual details on it yet.

CONGRATS Xbox gamers! You finally see the architecture! Price is definitely going higher with each generation. It only makes sense though. More power, more money.

More than happy to pay asking price considering what we're getting. I've always said i would pay $600 if i had to and i have to buy two at launch for my household.

My kids definitely aren't getting a Series X though, they're going to be happy with Lockharts come Christmas haha
 

TBiddy

Member
It's always been known the XsX had an audio chip, we just didn't have any actual details on it yet.

I gotta say. This post aged well in that regard.

Nope, their solution is barely better than current gen. For TrueAudio Next, AMD reserves 4 CU's for it, and support only 128 sources:

They'll use ordinary, CPU-based, semi-3D audio like in the UE4 already, but will mostly free the CPU by their own dedicated processing chip:



XSX should spare its CU's for that to match the Tempest, and not sure if it'll work out properly.
 

silent head

Member
THIS IS NOT A TROLL POST...

But is the "PS5 - 4 x 36 x 2.23 = 321.12 billion" equation a theoretical max performance spec considering the PS5 is working with Smartshift here? I wonder if the GPU clock scales down RT performance as it drops lower to give power to the CPU? Or is it just fixed at 321.12?

Probably a silly question but figured one worth asking.
13.50
 
Last edited:

GHG

Member
Can anyone explain in layman terms as to why one would go with 52 CU at a lower clock speed than having 36 CU at higher clocks ? Die size, yields and cost is the first topic of dicussion in the slides, signifying the importance. what are the tradeoffs in both cases?

Assuming all else is equal:
  • 36 CU at higher clocks = better yields but more considerations required for keeping temperatures under control
  • 52 CU at lower clocks = worse yields but less considerations required for temperatures
Worse yields = more cost at the point of chip manufacturing

Higher temperatures = more cost for cooling requirements
 

M1chl

Currently Gif and Meme Champion
Assuming all else is equal:
  • 36 CU at higher clocks = better yields but more considerations required for keeping temperatures under control
  • 52 CU at lower clocks = worse yields but less considerations required for temperatures
Worse yields = more cost at the point of chip manufacturing

Higher temperatures = more cost for cooling requirements
Also with 52CU you get way more caches, which could be beneficial, but also it could mean longer pipeline. But the last point, I am not sure about it.
 
Assuming all else is equal:
  • 36 CU at higher clocks = better yields but more considerations required for keeping temperatures under control
  • 52 CU at lower clocks = worse yields but less considerations required for temperatures
Worse yields = more cost at the point of chip manufacturing

Higher temperatures = more cost for cooling requirements

Hmm, interesting. I assumed the higher clocked chips would be harder to hit big yields with because the silicon quality needs to be better to ensure the clocks can be reached and maintained as expected.

Overall I think on the APU front Sony and MS basically cancel each other out WRT costs; larger chip will cost more wafer budget but higher-clocked GPU will similarly eat away more due to needing better silicon "quality" therefore tighter yields.

I know people take the report of Sony increasing PS5 production to 10 million as evidence the yields are better than expected and maybe they are, but that figure's been in reference to the run up to the fiscal year's end right? Which originally was 6 million. Dunno if production capacities for the launch batch or end-of-2020 units is really jumping up to scale the revised production numbers or if the bulk of those extra units are for 2021 shipments.
 
I'm not even sure anyone will discern the differences, it will be in positional audio and other QoL aspects that the new solutions will shine.

Yes, and to really appreciate that you either need a kick-ass audio system in your home (TV speakers simply will not cut it), or extremely expensive headphones. Or at least really good headphones that'll probably set you back $150 - $200.

It's little things like that which'll probably make next-gen a bit more expensive than it'll appear on paper for people who want the ultimate experience (that isn't PC).
 

Entroyp

Member
Do we know If the PS die is smaller? It does have all the I/O logic in it but that might not take much space though.
 

anothertech

Member
It doesn't bode well for pricing when in a tech deep dive for the damn SoC they're damage controlling costs......

The optimist in me hopes it is just their way of explaining only a 2X increase in RAM and other Moore's Law brick walls. At least Flash looks like it should decrease in price YoY.
Pretty much what I got out of it too

$$$
 

M1chl

Currently Gif and Meme Champion
So I would guess a 36cu die would be about the same cost as the Xbox One X die? So $+ extra cooling costs?
Probably cheaper, due to being smaller chip (probably), cooling and how to get heat from there is going to be different story. Because small die > less area which can conduct heat > worse heat dissipation.
 

Entroyp

Member
I/O logic is obviously here too, you can see I/O parts on the diagram of the chip. It's probably going to be smaller, I would guess...

I meant the decompressor blocks, co-processors, cache scubber, DMA units, sram, etc etc
 
Last edited:
Now we can officially definitively calculate the raytracing performance of Xbox Series X


fqvK7bgMNGxQdNKNnHKZHQ-1366-80.png



XSX - 4 x 52 x 1.825 = 379.6 billion ray triangle intersection per clock

compared to PS5

PS5 - 4 x 36 x 2.23 = 321.12 billion
I wonder how is this possible, considering 2080 Ti only yields 10 Gigarays per second:
I hope it's not some BS inflated numbers (like RSX @ 1.8TF).
 

M1chl

Currently Gif and Meme Champion
I meant the decompressor blocks, co-processors, cache scubber, DMA units, sram, etc etc
Well XSX still have some decompression blocks, they did not specified to the detail what is "Velocity architecture", only what it does. So not sure how it's going to stack up againts PS5 chip. It's probably very different looking chip.
 

Rob_27

Member
So this Hotchips event is at 7pm PST? So I can watch it on YouTube tomorrow? As that's quite late in UK?
 
Last edited:

geordiemp

Member
Hmm, interesting. I assumed the higher clocked chips would be harder to hit big yields with because the silicon quality needs to be better to ensure the clocks can be reached and maintained as expected.

Overall I think on the APU front Sony and MS basically cancel each other out WRT costs; larger chip will cost more wafer budget but higher-clocked GPU will similarly eat away more due to needing better silicon "quality" therefore tighter yields.

I know people take the report of Sony increasing PS5 production to 10 million as evidence the yields are better than expected and maybe they are, but that figure's been in reference to the run up to the fiscal year's end right? Which originally was 6 million. Dunno if production capacities for the launch batch or end-of-2020 units is really jumping up to scale the revised production numbers or if the bulk of those extra units are for 2021 shipments.

No not really, you have parametric yield and yield due to particulate which is a defects per wafer and really just size of die effect and probability of defect falling on a CU which can be disabled.....

Parametric yield - if die fails at 2.2 Ghz, its not a good die anyway so why use it at 1.9 Ghz and its a poor specimin and would be low performance part anyway ? With EUV on critcal layers parametrics would be better anyway except SOME dies at edges of wafers, but who wants a dud.

And I was correct,. EUV litho at TSMC is astronomical and cost more than god, die size is big big major effect of cost of next gen consoles if using enhanced EUV litho for gates, all the estimates from Bloomberg and others were assuming RDNA1 and DUV costs and way off on die costs. Next gen TSMC sourced EUV is going to cost lots.

SO XSX will be pushing 600 at a discount, and depending on ps5 die size, it will be not too far behind unless serious big losses are taken by either.
 
Last edited:
THIS IS NOT A TROLL POST...

But is the "PS5 - 4 x 36 x 2.23 = 321.12 billion" equation a theoretical max performance spec considering the PS5 is working with Smartshift here? I wonder if the GPU clock scales down RT performance as it drops lower to give power to the CPU? Or is it just fixed at 321.12?

Probably a silly question but figured one worth asking.

Neither will operate at their maximum rates. The WGPs can either execute Ray intersection tests, OR texture filtering. Both cannot be done concurrently.
Some of the blocks will do texture filtering and some will handle ray tracing.
 
I wonder how is this possible, considering 2080 Ti only yields 10 Gigarays per second:
I hope it's not some BS inflated numbers (like RSX @ 1.8TF).
Like Tom's hardware put it... they don't give a context or anything. Is this theorical max? Is this typical? They said it seemed to be on par with the 2080Ti
 
I wonder how is this possible, considering 2080 Ti only yields 10 Gigarays per second:
I hope it's not some BS inflated numbers (like RSX @ 1.8TF).

Nvidia have their own strange measurements. It's not literally 10 billion ray triangle intersections. AMD's numbers are straight to the point and numerically what is actually the case.
 
I meant the decompressor blocks, co-processors, cache scubber, DMA units, sram, etc etc

All I remember on that front for PS5 was the diagram in the presentation and that I/O block was friggin massive. Bigger than CPU and GPU portion combined (do have to keep in mind PS5's GPU is physically smaller than Series X's though).

So this Hotchips event is at 7pm PST? So I can watch it on YouTube tomorrow? As that's quite late in UK?

I'd like to know this, too.

CPU is downgraded compared to desktop Zen 2 CPUs, but with dedicated audio chip and HW decompression XSX CPU should punch above it's weight (the same with PS5 CPU).

Yeah there were rumors (lol) floating about "unusually large CPU cache" but 8 MB is literally 1/4th the size of largest Zen 2 desktop CPUs. But as you say, these systems don't need such big L3 caches since they have dedicated hardware for many tasks CPUs run in software (thus where the larger L3 caches are mandatory).

No not really, you have parametric yield and yield due to particulate which is a defects per wafer and really just size of die effect and probability of defect falling on a CU which can be disabled.....

Parametric yield - if die fails at 2.2 Ghz, its not a good die anyway so why use it at 1.9 Ghz and its a poor specimin and would be low performance part anyway ? With EUBV on critcal layers parametrics would be better anyway except SOME dies at edges of wafers, but who wants a dud.

And I was correct,. EUV litho at TSMC is astronomical and cost more than god, die size is big big major effect of cost of next gen consoles if using enhanced EUV litho for gates, all the estimates from Bloomberg and others were assuming RDNA1 and DUV costs and way off on due costs. Netx gen TSMC sourced EUV is going to cost lots.

SO XSX will be pushing 600 at a discount, and depending on ps5 die size, it will be not too far behind unless serious big losses are taken by either.

Ah okay, that helps clearing some of this up. I think the big surprise for me was them going EUV; honestly though it would be DUV enhanced. That alone though will be pushing these prices up higher than I anticipated and makes some of these price rumors look like they could stick.
 
Last edited:

Entroyp

Member
All I remember on that front for PS5 was the diagram in the presentation and that I/O block was friggin massive. Bigger than CPU and GPU portion combined (do have to keep in mind PS5's GPU is physically smaller than Series X's though).



I'd like to know this, too.

Yea, my impression was that it was for illustration purposes only, but I feels like the I/O unit does eat a considerable amount of die space.
 

Tripolygon

Banned
I wonder how is this possible, considering 2080 Ti only yields 10 Gigarays per second:
I hope it's not some BS inflated numbers (like RSX @ 1.8TF).
That's because they are different metrics a 2080Ti is more powerful. One is rays/cycle while mine is ray triangle intersection/cycle.
 

Marlenus

Member
Can anyone explain in layman terms as to why one would go with 52 CU at a lower clock speed than having 36 CU at higher clocks ? Die size, yields and cost is the first topic of dicussion in the slides, signifying the importance. what are the tradeoffs in both cases?

MS wanted a 12 Tflop console. When designing this 2+ years ago there was no guarantee that 2+ Ghz GPU clockspeeds would be viable in a console power envelope so they built a chip that only required clockspeeds that were doable at the time which meant 52 active CUs @1.825Ghz.

The specs confirm 64ROPs so PD5 has a pixel fillrate advantage.
 
I wonder how is this possible, considering 2080 Ti only yields 10 Gigarays per second:
I hope it's not some BS inflated numbers (like RSX @ 1.8TF).

This is all I can do for you

1guXvr_0OkvlzZW00


Series X has similar RT performance to a RTX 2060 on this particular demo.

86% percentage increase on 30.5 will get you to 2080ti performance.

1.86x379.6 will give you 706 billion rays for 2080ti.
 
Last edited:

LordOfChaos

Member
THIS IS NOT A TROLL POST...

But is the "PS5 - 4 x 36 x 2.23 = 321.12 billion" equation a theoretical max performance spec considering the PS5 is working with Smartshift here? I wonder if the GPU clock scales down RT performance as it drops lower to give power to the CPU? Or is it just fixed at 321.12?

Probably a silly question but figured one worth asking.


Like most other hardware on the GPU, RT performance scales with its relevant units (intersect engines) x clock speed, so yes the RT performance would vary the same amount as raster.
 

Aceofspades

Banned
It has custom hardware on the GPU that can allow the shaders to run BVH traversals in parallel with other things like materials calculation, etc. But not all RT workloads can be done in parallel with non-RT workloads.

I bet its the standard RDNA2 stuff. The RT performance we are getting from both PS5/Series X is more than enough for me. Actually they both surpassed my set goals for next gen, especially on CPUs and SSD
 
Top Bottom