• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Simulating Gonzalo (Rumoured NextGen/PS5 leak)

This thread is work in progress


First things first:
  • I'm not saying Gonzalo is PS5, it might be!
  • I'm not suggesting this simulated Gonzalo is equivalent to PS5's power in games, this thread is about how much computational power you can put in a console size box and not how efficient you can use that power.


So what‘s Gonzalo?

"Gonzalo" was a leak from 3Dmark database earlier this year that hinted to a semicustom (not PC) APU wich through a AMD intern product code told us that it uses a CPU boost clock of 3.2Ghz and a GPU clock of 1.8Ghz.

DECODE3.png



8d7b1542898939.png



You can read a summary of all that here:

https://www.eurogamer.net/articles/digitalfoundry-2019-is-amd-gonzalo-the-ps5-processor-in-theory

so the string „13E9“ at the end of the code refers to a driver entry for Navi10 lite. Navi10 is also the architecture name of the recent rx 5700 series of AMDs desktop GPUs which use up to 40 Compute Units.


So confronted with these numbers the question must be allowed to ask: Would the power requirements of such a chip even fit in a console sized box with it’s thermal and power constraints?



Simulating Gonzalo:

So here we have the spec sheet of the 5700xt vs the data we can derive from the leaks:

specsdekos.png



For the CPU part of the APU we expect a 8 core variant. on 7nm the nearest AMD prozessors would be the 3700X or 3800X. i snapped up the former.

As for the testing conditions: I put the 5700xt and 3700X with 16GB DDR4 on a B350 motherboard in a well ventilated case. As you can see from the spec sheet, the first problem we are confronted with, is that the GPU alone is capable of drawing 225W.



So how do we get this combination of gear to a comparable to consoles set up?

first we underclock and undervolt the GPU to what the Gonzalo leak suggests. You can archieve that by changing the parameters in AMDs Wattman:

wattman100k13.png


We start by setting the power limit to -30% which gives us a physical limit of current drawn which from my pretesting equates to around 125W GPU die power (only die without vram and aux)

After that we limit the voltage frequency curve to just 1800MHz at 975mV (stock is 2032Mhz @ 1198mV):

wattman277kif.png



Secondly we underclock and undervolt the CPU via AMDs ryzen master:

ryzenmasterecksd.png


We max the clock to 3.2Ghz and undervolt to 1000mV (stock iirc 1.4V)





Benchmarks and Testing

3D Mark - Fire Strike:


drumroll
.
.
.

firestrikegonzalolowwgjoc.png


Exactly what the last leak of TumApisak was suggesting.


So how does that acutally compare?

firestrikeresults4ijuv.png


First wow, the 3700x is a beast. With the 1600 i used before you couldn't dare to dream to come close to 20K overall at the 5700XT's stock settings. The overall score of the Gonzalo configuration is around 10% down from the stock settings. Graphics score is down just 6% but CPU dependend Physics score takes a hit with the limitations and is down nearly 20%.


But what does it take to get there?

So here is the system power drawn from the wall. That's repeated peak power and not the mean. A wallside TDP equivalent would probably be around 10W lower:

firestrikepowersijyq.png


That's in Fire Strike's graphics test 1 representing the maximal power load. Graphics test 2 which stresses other aspects of the graphics pipeline more is around 10 watts lower.


Here are the GPU clock rates stock and gonzalo:

firestrikegpuclockz2kam.png


As we can see, we don't hit the 1800MHz in this load due to reaching the power limit. In GT2 where power constraints are lower we reach steadier and higher clocks.


Here's the over time chart of the GPU die power:

firestrikegpudiepowerm7kv7.png




Disclamer: I did this in quite a hury. The settings used seem to run stable. Because of said time constraints i haven't done due dilligence on everything (e.g. if the undervolt of the cpu did register corretly). So take all results with a huge grain of salt.


Got to go to work now. More to follow...
 
Last edited:
Further Testing:

To get an idea how power characteristics of Navi would change in different circumstances i repeated the over time testing in Fire Strike at different frequencies. The goal was to get different paramaters for every frequency point. For that I undervolted individually for every frequency point and ran GT 1 and 2 in Fire Strike with this settings. I logged the data of every run with a high sampling rate, so that i could do the seen above over time charts. I repeated the testing for every datapoint with a second 5700XT to ensure that i didn't went to unstable territory.

So here's the most meaningful result of that testing. The power drawn (only) by the GPU die over target frequency in Wattman [green curve]. That's the average power draw over the duration of Graphics Test 1, which is power-wise the most demanding in Fire Strike. To show how performance changes in relation to the power draw I ploted the graphics score in Fire Strike against it [red curve]:

powerscalinggpuonlyuljwr.png


As you can see, power draw looks exponential and really takes off above 1.9 Ghz target frequency. All that while the Fire Strike score raises not even quite linearly. The yellow dot is the Gonzalo simulation for reference.


Another way to show this, is to compare scaling factors of Fire Strike and GPU die power drawn from the wall with the Gonzalo simulation as a base (100%):

powerscalingfactor5kkho.png


I summerized the results of the testing in the table below:

resultsshjg4.png




Interpretation:

So what does that all mean?

Ok as seen above my PC had 208W power draw from the wall socket for the Gonzalo settings. Does that mean Gonzalo would be a >200W console? Let's check the data we have...

For said settings we have a second measurement: the GPU die power over time as described above. That get's us s an avarage of 119W in Fire Strike Graphics Test 1. Furthermore I know the power efficiency of my PSU on 1/3 load: It's 90%. We also know the difference between die power and TBP of the 5700XT at stock. The delta should be not insignificantly lower at Gonzalo settings, as VRM losses get exponentially bigger with temps and fans suck up overproportionally more current at high rpm. So we also need to account for that.

That framework culminates in the following:

firestrikepower-compoh4jkq.png


So black are measured values. Yellow are pure estimates (Whew! only one). And the rest [red] can be calculated or somewhat derived. Granted that's no excact science, but i bet it wont be to far off what you would measure if you had the means.


So if we mutate the actual computing components into one hypothetical APU we see that we get under 150W of TDP. Thermal Design Power. The stuff that - when it eventually turns to heat - you have to cool away. So keep in mind that there are some redundant components in GPU and CPU like memory controllers that an APU would have less of. Those need power the APU wouldn't need. Also moving data between RAM CPU VRAM and GPU also consumes lots of juice, that wouldn't be needed if big parts of the data would just hang around in on-die cache.



Speculation – What would be possible next gen?

This data is certainly not only useful figuring out, what Gonzalo could be, but what would be possible with Navi in general. So how much more power than this would we be able to fit into a console provided we have to go with the same process node?

This depends mainly on two aspects: one – how much power would it draw and therefore heat would it produce (you can’t heat up a console sized volume indefinitely) and two – how much beef can you fit on a affordable APU / die. Let’s start with the second one

Die layout:

To get an idea of a hyopotechtical next gen die, first of all we have to take a look at the navi 10 layout and it’s dimensions. We have the outer dimensions of the die and we got this die shot / render showing the on die components directly from AMD.

navi10die2jrksb.png


With that information I tried to use realtive scaling to figure out what’s how big and came up with the follwing:

navi10dielayoutf9kpu.png
<----->
navi10dielayout28oj25.png



So we have the rough dimensions (rough meaning a fraction of a mm in this case) of what size the CUs need to be. From several AMD presentations we also know the hierachy in which Navi is organized. So from this we can extrapolate what a bigger APU would look like (as a dimension constraint i used the die length which proelite from beyond3d derived from the scarlett trailer = 24+mm):

consoledielayoutqojie.png


So as you can see i added 8 Workgroup processors / dual CUs in total. 4 on each side to keep symmetry. I also adjusted L2 graphics cache and widened the bus to 384bit. Furthermore i added two 4 Core CCXs (The dimension are from anandtech and scaled proportionally). The additional memory controllers would likely shape around the lower side as seen on the X1X’s die. But i kept everything symmetrical and tidy to show how much empty space is left on the die. Empty space which could accomadate stuff like ray tracing hardware.

For yield reasons this die would not have all the 56CUs enabled. On the rx5700 non-xt there are 4 CUs diabled. At this point it’s not clear how they disable those. In the past, AMD had to disable one CU per shader array presumaly to keep load symmetry. Since the introduction of dual CUs it’s not clear if you can just disable half a dual CU now, or if symmetry can be broken now, just requiring to disable on dCU per Shader Engine in Navi. If i had to make a guess i would bet on the former. For visualization reasons i disabled / greyed out one dCU per shader engine in the pic above, anyways.

All in all that would give you a 52CU APU at around 350mm² size. That should at this point be at uttermost $60 more expensive per die as a simliar die sizes when the 16nm or 28nm console SOC launched. Long story short: Such a die should absolutely NOT be cost prohibitive at $449 or $499 even today.

Now lets check the second aspect, power requirements.


Power Prognosis:

So we learned from the testing that if we clock Navi down slightly, we improve power efficiency in an overproportional manner. So at F_t = 1500Mhz the Navi10 GPU die just consumed 87W. Lets round that up to 90W to ensure our underclock would be viable for a wider range of silicon quality.

So 90W for 40 active CUs. What would happen if we scale that up to a 52CU console APU as shown above?

Ok lets just assume the power at the same frequency would just scale linearly. That would be some sort of worst case scenario though, because in the scenario above the ratio from front-end components to CUs would not be constant in the higher CU GPU meaning that those parts would contribute less to the total power requirements of the die.

So linear scaling to 52 CUs would bring us to 117W die power. For the CPU side we just take our 24W we derived in the previous chapter.
That said, there are some redundant components in those GPU and CPU figures that wouldn’t be present twice in an APU (memcontrollers for example). So from that perspective that is yet again worst case.

Following our method from the Interpretation chapter, that would give us the following:

GPU: 117W
CPU: 24W
RAM: 12 x 1,5W = 18 W (12 x 2GB GDDR6 modules)
PCB/AUX: 15W
PSU losses: 31W

powerbalancenbknz.png


Conclusion:

So ~210W, sounds much, but not impossible. In my opinion the real hard barrier is the heat density of the die. The other components are managable passively. But you really can’t cool away much more than 160-170W from the die itself in a acceptable manner. Provided you’re using a vapour chamber plus blower cooler, which i guess you would have to do in a console form factor. But since we are savely under 150W even with our rather pessimistic assumption, we should be good.
 
Last edited:

Dr.D00p

Gold Member
Interesting but ultimately pointless really when there is simply no way to simulate all the custom tweaks to the console silicon and the much lower API overheads of a dedicated console development envoirement and teams of programmers whose only goal is to optimise code down to the bare metal which will easily cancel out any advantages the PC equivalent parts have with raw clockspeeds on the CPU & GPU.
 

Von Hugh

Member
Interesting but ultimately pointless really when there is simply no way to simulate all the custom tweaks to the console silicon and the much lower API overheads of a dedicated console development envoirement and teams of programmers whose only goal is to optimise code down to the bare metal which will easily cancel out any advantages the PC equivalent parts have with raw clockspeeds on the CPU & GPU.

That's one hell of a long sentence, buddy.
 

_sqn_

Member
Very good work, only problem with firestrke overallscore is that cpu strongly determine result. What result will you have if you oc 5700xt to 2thz with this 3.2ghz 3700x ?
 
Update:

fire strike results stock vs gonzalo:


firestrikeresults4ijuv.png


First wow, the 3700x is a beast. With the 1600 i used before you couldn't dare to dream to come close to 20K overall at the 5700XT's stock settings. The overall score of the Gonzalo configuration is around 10% down from the stock settings. Graphics score is down just 6% but CPU depended Physics score takes a hit with the limitations and is down nearly 20%.
 
Update:

fire strike results stock vs gonzalo:


firestrikeresults4ijuv.png


First wow, the 3700x is a beast. With the 1600 i used before you couldn't dare to dream to come close to 20K overall at the 5700XT's stock settings. The overall score of the Gonzalo configuration is around 10% down from the stock settings. Graphics score is down just 6% but CPU depended Physics score takes a hit with the limitations and is down nearly 20%.


firestrikepowersijyq.png


so what to make of the above data?

lets start with that we can save 25% TDP for 10% less performance. Maybe even just 5-10% less performance in less CPU dependend tasks (going by fire strike graphics score).
 
Interesting but ultimately pointless really when there is simply no way to simulate all the custom tweaks to the console silicon and the much lower API overheads of a dedicated console development envoirement and teams of programmers whose only goal is to optimise code down to the bare metal which will easily cancel out any advantages the PC equivalent parts have with raw clockspeeds on the CPU & GPU.

Well it's absolutely not the point of this thread to show how perfomant a next gen console with the said specs would be compared to PCs. But to show how much horsepower you can put in a console sized box with all it's constraints using AMDs recent tech.
 
Last edited:
Very good work, only problem with firestrke overallscore is that cpu strongly determine result. What result will you have if you oc 5700xt to 2thz with this 3.2ghz 3700x ?

your right about firestrike. do you mean 2ghz real clock or clock limit on the U(f) curve [wattman]? for the second one: it will do roughly the same because it runs into its power limit before reaching such clocks. for the first one: i've done overclocking on the 5700xt in the navi-topic. but not with the 3700x yet. it draws lot more power if you want it to get to 2ghz.
 

_sqn_

Member
your right about firestrike. do you mean 2ghz real clock or clock limit on the U(f) curve [wattman]? for the second one: it will do roughly the same because it runs into its power limit before reaching such clocks. for the first one: i've done overclocking on the 5700xt in the navi-topic. but not with the 3700x yet. it draws lot more power if you want it to get to 2ghz.
2ghz no power limit for 5700xt but stil underclock 3700x@3.2ghz
 

_sqn_

Member
Also interesting would be to find lowest value for 5700xt clock that sill give above 20k with 3700x@3.2ghz
 
Should also have posted that famous programmer tweet saying consoles have usually twice the performance as an identical PC setup (just to stop the people from complaining that this setup will be weaker than a PC with the same specs).

I'm not sure who it was though.
 
Last edited:
Brilliant thread, could we do game benchmarks. That would be revelatory. In addition I think you need to lock the gpu frequency. It would be interesting to see the outcome with the frequency locked at 1.8Ghz like typical consoles.

GPU overclocking is different from CPU overclocking nowadays. you can't just lock the clock frequency of a GPU. on recent AMD cards you can just define a frequency max. clocks will allways autoadjust on environmental conditions (power limit, thermal limit, bandwidth constraints)

Can you do a GPU watts per clock graph? Maybe by 50mhz increments?

yeah, i will do something like that definitely (it's basically the whole goal of this exercise to get a feeling for the power/perf sweetspot of RDNA). it's a lot of work and im still figuring out what's the best way to do it (probably will set step up the power limit in incements. it won't work to step up the frequency as soon as you run into a power limited scenario. what makes it even harder is that you have to find a stable voltage to complement the coresponding clock rate. too many variables...). hope i get there on the weekend but no promises.

I've got 200Mb internet yet the images are taking forever to download...

sorry man. maybe the image host isn't up for the task. has anyone else this problem?
 
Last edited:

xool

Member
Excellent!!

just over 200W at wall and gets a firestrike of 20,000+ (as given in one of the twitter leaks) ..

If we get 14Gbps GDDR6 I think the optimum clocks are 3.5GHz CPU / 1.75 GHz GPU (up to 4 GHz/2Ghz with top end 16Gbps GDDR6)

If it were me I'd test to see how far undervolted is possible to acchieve these clocks .. (actually your test was pretty close at 1.75GHz, but I'd be interested in the premium case)

Anyone got any ideas of board contribution to these figures, and whether PS5 would be higher/lower relatively with a custom PCB .. ?
 

vpance

Member
I feel like something closer to 1.6Ghz will be optimal for GPU. Maybe 1.7 if on EUV?

Excellent!!

just over 200W at wall and gets a firestrike of 20,000+ (as given in one of the twitter leaks) ..

If we get 14Gbps GDDR6 I think the optimum clocks are 3.5GHz CPU / 1.75 GHz GPU (up to 4 GHz/2Ghz with top end 16Gbps GDDR6)

If it were me I'd test to see how far undervolted is possible to acchieve these clocks .. (actually your test was pretty close at 1.75GHz, but I'd be interested in the premium case)

Anyone got any ideas of board contribution to these figures, and whether PS5 would be higher/lower relatively with a custom PCB .. ?
 
A narrow/fast chip is not optimal for consoles, 9.2TF RDNA let alone 8 won't cut it for next gen graphics at 4k

yeah i still think broad and lower clocked would be the way to go. but i also thought you wouldn't gain as much efficiency by simply maxing the GPU down at roughly 1,75Ghz. that was a main point why i discarded the 36-40CU idea. im still confident we will at least see on console going broad. the question im trying to anser is, if it's even possible with 7nm DUV.
 
Last edited:

_sqn_

Member
The problem with wider gpu prediction is that gonzalo is perfect naming match for ps5 and it's navi 10 lite
 

SonGoku

Member
the question im trying to anser is, if it's even possible with 7nm DUV.
60 CU chip with a 320bit bus would be ~380mm2 on 7nm DUV, with 2DCUs disabled leaves 56CU @1600 = 11.4TF
or you meant if 1.8GHZ is even possible on a console on DUV?
btw can you try undervolting and clock to 1680mhz like the one posted on ree
gonzalo is perfect naming match for ps5
How so?
 
Last edited:

xool

Member
The problem with wider gpu prediction is that gonzalo is perfect naming match for ps5 and it's navi 10 lite

maaayybee the "lite" bit doesn't mean less CUs than Navi10, but actually means that it's built on a low power version of 7 nm, for increased efficiency.

This idea is quiet compelling to me ...
 

_sqn_

Member
60 CU chip with a 320bit bus would be ~380mm2 on 7nm DUV, with 2DCUs disabled leaves 56CU @1600 = 11.4TF
or you meant if 1.8GHZ is even possible on a console on DUV?
btw can you try undervolting and clock to 1680mhz like the one posted on ree

How so?
From beyond3d:
DG1000FGF84HT-PS4.
DG1101SKF84HV-PS4.
DG1201SLF87HW-PS4 Pro.
DG1301SML87HY-PS4 Pro.
DG14__________ - ???.
DG15 _
__________ - ???.
2G16002CE8JA2_32/10/10_13E9 - Gonzalo
ZG16702AE8JB2_32/10/18_13F8 - Gonzalo engineering sample

DG3001FEG84HR - Durango
DG4010T3G87E1 - Arlene SoC ??? Not sure what this is.
DG4001FYG87IA - XB1 S
1G5211T8A87E9 - Scorpio
 
This seems a fun thing to do to me but can someone explain to me what useful information we get from these tests? Or is it just a more intellectual ‘wow the next gen consoles are going to be beastly’ type thing?
 

_sqn_

Member
maaayybee the "lite" bit doesn't mean less CUs than Navi10, but actually means that it's built on a low power version of 7 nm, for increased efficiency.

This idea is quiet compelling to me ...
But hard to predict that lite means more CUs ;)
 

xool

Member
[more] If the Gonzalo/PS5-APU whatever is made using the low-power version of 7nm, rather than the high-performance version transisitor density is increased from 65M/mm to 91M/mm2 - I assume everyone was expecting the high power node version ..?


Think about it - switch to the lower power version and get 40% more transistors per mm of die (we know the chips weren't going to have their clocks pushed hard because thermals/ efficiency)

....
 

McHuj

Member
Fantastic. Thanks.

My expectation is a 150-200W console from Sony so I think this aligns very well.
 

xool

Member
This seems a fun thing to do to me but can someone explain to me what useful information we get from these tests? Or is it just a more intellectual ‘wow the next gen consoles are going to be beastly’ type thing?

It achieves a 20,000 firestrike which is similar to that which was quoted for Gonzalo - so - in short - it means that next gen might be roughly equivalent to a Ryzen 3700 + Radeon 5700xt combo - not much better, nor not much worse
 
It achieves a 20,000 firestrike which is similar to that which was quoted for Gonzalo - so - in short - it means that next gen might be roughly equivalent to a Ryzen 3700 + Radeon 5700xt combo - not much better, nor not much worse

Yes but even before this test that’s about what I was expecting. Personally I think that’s a really great sign I mean it’s amazing to see some really cutting edge current tech going into the next consoles when I’ve always felt like consoles were quite a bit behind the best PC’s in terms of raw power. I suppose that’s the same again now when comparing the 5700 to the 2080Ti but personally I’m excited to see what devs can do with that much power on dedicated games because I still don’t believe we’ve seen games on PC looking better than some of the best PS4 stuff and those games are working with huge limitations when compared to the brute force power of current PCs. Couple that with the integrated ssd and I’m really excited for next gen hw.
 

SonGoku

Member
From beyond3d:
DG1000FGF84HT-PS4.
DG1101SKF84HV-PS4.
DG1201SLF87HW-PS4 Pro.
DG1301SML87HY-PS4 Pro.
DG14__________ - ???.
DG15 _
__________ - ???.
2G16002CE8JA2_32/10/10_13E9 - Gonzalo
ZG16702AE8JB2_32/10/18_13F8 - Gonzalo engineering sample

DG3001FEG84HR - Durango
DG4010T3G87E1 - Arlene SoC ??? Not sure what this is.
DG4001FYG87IA - XB1 S
1G5211T8A87E9 - Scorpio
A single digit is far from conclusive evidence, entirely coincidental and could mean any number of things
 

McCheese

Member
It's pretty nuts how we went from a single character in a serial number to performance charts.

I have no idea how correct or accurate any of that stuff will turn out to be, but kudos to OP for the effort!
 

SonGoku

Member
maaayybee the "lite" bit doesn't mean less CUs than Navi10, but actually means that it's built on a low power version of 7 nm, for increased efficiency.

This idea is quiet compelling to me ...
Even if it has more CUs the 20k score is too low to be in a next gen console
 

Boss Mog

Member
It's completely worthless to try to simulate next gen console performance using Windows PC benchmarks as a yardstick.
 

Dontero

Banned
Even if it has more CUs the 20k score is too low to be in a next gen console

And who decides that ?
People need to remember that this gen spec base is 1,2TF. So getting 10TF gpu would be almost 10x improvement.

Other point. No way in hell console will have 200W TDP gpu. At best you are looking at 100-150W MAX TDP for SOC.
 

SonGoku

Member
And who decides that ?
People need to remember that this gen spec base is 1,2TF.
5700 is just not enough for next gen games at 4k, going with 5700 for next gen is the equivalent of going with a gtx 280/460 equivalent for ps4
The 7850 which ps4 gets compared to (and even the 7770) was destroying games at 1080p, the 5700xt is piss poor at 4k current gen games let alone an underclocked variant.
So getting 10TF gpu would be almost 10x improvement.
Gonzalo FS score is slightly below a 9TF GPU
Other point. No way in hell console will have 200W TDP gpu.
A wider slower design won't need to, and will be more power efficient
At best you are looking at 100-150W MAX TDP for SOC.
lol not even launch consoles where 100W. The x pulls 200W depending of model (hobbit method)

A Vapor chamber can easily cool a 200W SoC
 
Last edited:
Top Bottom