• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Microsoft Xbox Series X's AMD Architecture Deep Dive at Hot Chips 2020

anothertech

Member
Imo NXGamer is heavily invested in Playstation, so I would take his statements with a grain of salt.
You should take everything online with a grain of salt.

That said, NXGamer seems to have a better grasp of things than most imo.

Also Craig says hi.

thisiscraig.gif
 

Redlight

Member
Clocks in PS5 aren't boosted. If it is boosted, PS5 would be a 9 TF console, not 10. Of course XSX doesn't need Smartshift when it has a locked clocks. Ms designed XSX with locked clocks in mind. Go figure
The PS5 is boosted. It's just using a system that manages that boosting intelligently.

Mark Cerny...
"Released PS5 games always get boosted frequencies"


 
The PS5 is boosted. It's just using a system that manages that boosting intelligently.

Mark Cerny...
"Released PS5 games always get boosted frequencies"



Yes, but PS5 has Continuous Boost, not the same boost clock from current GPUs.

Boost clock is when the frequency ups when GPU needs more.

Continuous boost is when the frequency down when you don't need it.

Boost clock is something like a sustained overclock that you can't mantain for a long time.

Continuous boost is when the base and the boost are under the same clock.
 

jimbojim

Banned
The PS5 is boosted. It's just using a system that manages that boosting intelligently.

Not boosted in way as you think it is. Otherwise, PS5 GPU and CPU wouldn't be capped at 3.5 and 2.23. GHz
Cerny said this in DF interview :

It's really important to clarify the PlayStation 5's use of variable frequencies. It's called 'boost' but it should not be compared with similarly named technologies found in smartphones, or even PC components like CPUs and GPUs. There, peak performance is tied directly to thermal headroom, so in higher temperature environments, gaming frame-rates can be lower - sometimes a lot lower. This is entirely at odds with expectations from a console, where we expect all machines to deliver the exact same performance. To be abundantly clear from the outset, PlayStation 5 is not boosting clocks in this way. According to Sony, all PS5 consoles process the same workloads with the same performance level in any environment, no matter what the ambient temperature may be

Yeah, he doesn't want it to be compared with PC GPU because these "boosted" clocks works differently
 
Last edited:
Looks like NXGamer NXGamer also agree with Lady's Gaia post :


6KZLlW5.jpg


What's your call on this NXGamer NXGamer ?

Check the post #966 and onwards

In isolation their post has some merit, but this is also a post from someone who doesn't really know the full extent of MS's design. In fact, MS already have something present to cut down on CPU usage for draw calls for GPU instructions, and they've had it for a few years now. It's called executeIndirect. Here's an article that speaks a bit more on it, some quotes are:

ExecuteIndirect is said to perform multiple draws with a single API call, and gives the ability to both the CPU and the GPU to control draw calls, as well as change bindings between draw calls.

Lastly, when switching to ExecuteIndirect, an epic 90 FPS result was achieved in the benchmark. This is where we see a significant reduction of CPU usage as well compared to the previous two DX12 benchmarks, making the feature one of the best solutions for delivering high-quality graphics at the lowest possible hardware usage.

Bolded emphasis mine. As you can tell, Lady Gaia's speculation (and NXGamer NXGamer 's for that matter) seems completely oblivious to this. If the CPU is being reduced in issuing draw calls for the GPU, then that cuts down on any scenario they're speculating WRT the full wider bus being "tied down" by the CPU significantly.

And that isn't even considering the likelihood MS would already be aware of that problem they present and have designed their memory controller at the hardware and software (kernel) level to alleviate much of that issue.

Ehh his videos seem pretty objective. And he has a significant amount of technical knowledge and experience.

Much more often than not, yes. Unfortunately he seems to associate with guys like Moore's Law Is Dead, who is definitely a PlayStation guy when it comes to consoles and has gotten info wrong on Series X multiple times in the past even when correct info was just a few clicks away.

I know we shouldn't necessarily judge people off the company they keep but it is something kept in the back of the head. FWIW tho I genuinely enjoy NX's retro gaming series and related content, there's not enough of it on YT at that quality. Only immediate others off the top of my head are DF's John's retro series stuff, Sega Lord X, Jenovi etc.
 
Last edited:

PaintTinJr

Member
I have not seen this diagram before, but I saw a very similar one from an Xbox official on Twitter. So it's safe to say that this is accurate.

There is no "High" or "Low" chips. The chips that are marked yellow are not separate, and are part of the red. Each 'cylinder' in the image represent a memory chip, where the short ones that are only covered in red are 1GB chips, while the long ones that are covered in both red and yellow are 2GB chips. So for the GPU, you're using all 1GB chips and half of each 2GB chip, while the rest uses the other half that is left on each of the 2GB chips. I'm sure you don't have to use it like that, but that is likely the most efficient way if you want/need to use all the RAM available.

At this point it is still unclear how the memory controller works. And the disk I/O is not through the CPU. There's a separate I/O hub. But the lanes of the 2GB chips are shared between the GPU and CPU, and keep in mind that the GPU can also access the CPU pool, but vice versa is not possible apparently, at least not natively.
I did read most of your older post, but I'm not sure it has aged so well with the newer info, but definitely was an interesting read - thinking in context of the info you had at the time.

When I said the disk IO goes through the CPU - correct me if I'm wrong - I was remembering that Microsoft said the decompressor uses a 10th of a CPU core - which would seem tricky without passing through the CPU - and passing through the associated memory pool for te CPU - and if you are now telling me that the CPU can't access the 10GB directly - which makes sense because 320 bit exceeds even the width of AVX2, then (IMO) it seems the only sensibly way that decompressed data can get into the 10GB, is by a copy by the GPU initiating a CPU request to copy from 6GB to the 10GB.
 
- and if you are now telling me that the CPU can't access the 10GB directly - which makes sense because 320 bit exceeds even the width of AVX2

Hmm, is this actually the case? I mean, AVX2's width is 256-bit, yet the One X had a 384-bit bus and a similar APU design to Series X (conceptually) outside of lack of dedicated I/O decompression hardware and a more unified memory pool. I don't recall hearing any issues of difficult memory management there though or anything regarding copying data from one pool to another.

Again, it doesn't have faster & slower mem pools like Series X, but Series X is still using the same GDDR6 memory tech in both pools. Fundamentally the memory functions the same in both in terms of how the data is handled. The reason you need to copy data from system RAM to VRAM on PC is partly because the CPU and GPU separated over a PCIe bus (which is ultimately narrow compared to stuff like NVLink and Infinity Fabric, with much higher latency), and also partly because the DRAM and GDDR technology are functionally different even if the latter started as a modification of the former (in the same way HBM basically started as stacked DRAM but is functionally much different).

But there's one thing I don't think is being considered here which some newer GPU cards and Series systems will have: HBCC and DirectStorage. With GPUDirectStorage the GPU doesn't need to copy data from system RAM to VRAM because it has direct access to storage to copy and load as required. Series X has DirectStorage, which is essentially GPUDirectStorage (GPUDirectStorage is Nvidia's branding for it with their cards), so theoretically it'll never need to copy data from the slower 6 GB pool to the faster 10 GB pool because it can simply select that data to place in its 10 GB pool from storage, knowing pretty much any data in the 10 GB pool is going to be GPU-optimized graphics data anyway. Any changes to initially loaded data the GPU makes, it can just write back in its 10 GB pool, coherency is maintained at all times anyway between CPU and GPU (plus among other things, the GPU can snoop the CPU caches; not sure if it can snoop the data in its 6 GB pool though, need to re-read that part of the slides).
 
Last edited:

Redlight

Member
Not boosted in way as you think it is. Otherwise, PS5 GPU and CPU wouldn't be capped at 3.5 and 2.23. GHz
Cerny said this in DF interview :

Yeah, he doesn't want it to be compared with PC GPU because these "boosted" clocks works differently
They're boosted in exactly they way I think they are. I said nothing about PC's.
 

Redlight

Member
Yes, but PS5 has Continuous Boost, not the same boost clock from current GPUs.

Boost clock is when the frequency ups when GPU needs more.

Continuous boost is when the frequency down when you don't need it.

Boost clock is something like a sustained overclock that you can't mantain for a long time.

Continuous boost is when the base and the boost are under the same clock.
The clocks are boosted. I get that some people might misrepresent what it means in a negative way, however that doesn't make it OK to counter-misrepresent them as 'not boosted'.
 
The clocks are boosted. I get that some people might misrepresent what it means in a negative way, however that doesn't make it OK to counter-misrepresent them as 'not boosted'.

Ok, but continuous boost dont have base clock. When people says "oh, PS5 has 2ghz base clock and it's 9.2 Tflops (2ghz x 36CUs x 2 x 64).. it's wrong.

I guess it's impossible to have fixed 2.23ghz GPU clock in RDNA 2, but i believe that statistically you can have 2.23ghz GPU clock for the time you really need it.
 
Last edited:

GODbody

Member
They usually seem to talk longer on the positives on Sony and longer on the negatives on Xbox, the info seems correct but the tone is always there.

Yeah that kind of makes sense though PlayStation is King in the UK. Doesn't take away from his credentials and experience though. He doesn't stray away from the facts from what I've seen. I think we're all guilty of some form of bias. I'll admit I find the Series X architecture to be much more robust than that of the PS5 myself. But that's probably due to the lack of details we have on the PS5.

A good example of this is the Xbox Series X hardware. Microsoft two seprate pools of Ram. The same mistake that they made over Xbox one. One pool of RAM has high bandwidth and the other pool of RAM has lower bandwidth. As a result, coding for the console is sometimes problematic. Because the total number of things we have to put in the faster pool RAM is so much that it will be annoying again, and add insult to injury the 4k output needs even more bandwidth. So there will be some factors which bottleneck XSX’s GPU.

Ali Salehi, Crytek Rendering Enginner.

Ali Salehi Complete Interview

Memory isn't seperate it's unified.

"Memory performance is asymmetrical - it's not something we could have done with the PC," explains Andrew Goossen "10 gigabytes of physical memory [runs at] 560GB/s. We call this GPU optimal memory. Six gigabytes [runs at] 336GB/s. We call this standard memory. GPU optimal and standard offer identical performance for CPU audio and file IO. The only hardware component that sees a difference in the GPU."
In terms of how the memory is allocated, games get a total of 13.5GB in total, which encompasses all 10GB of GPU optimal memory and 3.5GB of standard memory. This leaves 2.5GB of GDDR6 memory from the slower pool for the operating system and the front-end shell. From Microsoft's perspective, it is still a unified memory system, even if performance can vary. "In conversations with developers, it's typically easy for games to more than fill up their standard memory quota with CPU, audio data, stack data, and executable data, script data, and developers like such a trade-off when it gives them more potential bandwidth," says Goossen.

XSX has a 320-bit bus hence 32 bit data type can't be strip across 320-bit bus when you need 10 32bit data payload to fully populate 320-bit bus.


One problem, GPUs don't operate on full 64 or 128 or 192 or 320-bit datatypes. XSX has a 320-bit bus hence 32-bit data type can't be strip across 320-bit bus when you need 10 32bit data payload to fully populate 320-bit bus.

When compared to RX 5700 XT, XSX's on-chip L0/L1 cache and instruction queue storage are 25% higher for XSX GPU which is backed by 5 MB L2 cache while RX 5700 XT has 4 MB L2 cache.

For Gears 5, XSX GPU is superior over RX 5700 XT by about +25%.

Framebuffers have the highest memory bandwidth and with low memory storage consumers.

Yeah I wasn't taking data types into consideration in my estimations.
 
Last edited:

hm... let's see this DF article:

"There are customisations to the CPU core - specifically for security, power and performance, and with 76MB of SRAM across the entire SoC, it's reasonable to assume that the gigantic L3 cache found in desktop Zen 2 chips has been somewhat reduced."

Where are these 'gigantic L3 cache' and the 76MB of SRAM?

Ali is working with devkits, he's not creating 'magical theories' like many people.
 
hm... let's see this DF article:

"There are customisations to the CPU core - specifically for security, power and performance, and with 76MB of SRAM across the entire SoC, it's reasonable to assume that the gigantic L3 cache found in desktop Zen 2 chips has been somewhat reduced."

Where are these 'gigantic L3 cache' and the 76MB of SRAM?

Ali is working with devkits, he's not creating 'magical theories' like many people.

Ali is also one developer of MANY developers. Hell, there were even questions on if he was working on any next-gen projects. That's still up for questioning, in fact. And as expected, fanboys took his statements and exaggerated them while also not having enough of their own understanding to apply context to what was being said.

Ali was never implying PS5 was outright better (or certainly stronger) than Series X; he spoke from his personal dev preferences and history and said Sony's machine- for him -was the easier of the two to work with. And there are some things like with the memory setup that do make it relatively easier (if not with a more stunted ceiling for use extraction as the generation goes on, IMHO). However this does not suddenly mean the most elaborate or exaggerated of performance bottleneck takes or such are suddenly validated; quite the opposite, in fact. We've already gone into why Lady Gaia's post (which was in its own isolation a sensible post) is most likely wrong WRT Series X, because of design features they were not even considering that would obviously be present to circumvent those types of bandwidth drops in effective practice.

Now you can take what we're discussing here, and either genuinely consider it as the most practical solution, or put sand in your ears pretending that somehow a company with MS's resources in expenditure and engineering staff (as well as funding in their own Xbox division) would not have easily anticipated certain bottlenecks well early in the planning stages of their system design and developed contingencies and solutions for those. The choice is yours, but it'll say a lot from here on out.

P.S to answer the 76MB SRAM question, you can look over on discussions at B3D and (I belive) the Anandtech comments section that rather easily state where the 76MB of SRAM in the system is. Short answer: it's all over many, many different parts of the whole APU, that also includes the registers, L1/L2/L3$s for various cores, compute units etc. and more. Also 8MB L3$ is rather significant for home consoles since consoles before this gen didn't even have L3 caches (on the CPUs) whatsoever. Again, you can easily find this stuff on front-page results in a search engine.
 
Last edited:
Ali is also one developer of MANY developers. Hell, there were even questions on if he was working on any next-gen projects. That's still up for questioning, in fact. And as expected, fanboys took his statements and exaggerated them while also not having enough of their own understanding to apply context to what was being said.

Ali was never implying PS5 was outright better (or certainly stronger) than Series X; he spoke from his personal dev preferences and history and said Sony's machine- for him -was the easier of the two to work with. And there are some things like with the memory setup that do make it relatively easier (if not with a more stunted ceiling for use extraction as the generation goes on, IMHO). However this does not suddenly mean the most elaborate or exaggerated of performance bottleneck takes or such are suddenly validated; quite the opposite, in fact. We've already gone into why Lady Gaia's post (which was in its own isolation a sensible post) is most likely wrong WRT Series X, because of design features they were not even considering that would obviously be present to circumvent those types of bandwidth drops in effective practice.

Now you can take what we're discussing here, and either genuinely consider it as the most practical solution, or put sand in your ears pretending that somehow a company with MS's resources in expenditure and engineering staff (as well as funding in their own Xbox division) would not have easily anticipated certain bottlenecks well early in the planning stages of their system design and developed contingencies and solutions for those. The choice is yours, but it'll say a lot from here on out.

P.S to answer the 76MB SRAM question, you can look over on discussions at B3D and (I belive) the Anandtech comments section that rather easily state where the 76MB of SRAM in the system is. Short answer: it's all over many, many different parts of the whole APU, that also includes the registers, L1/L2/L3$s for various cores, compute units etc. and more. Also 8MB L3$ is rather significant for home consoles since consoles before this gen didn't even have L3 caches (on the CPUs) whatsoever. Again, you can easily find this stuff on front-page results in a search engine.

I don't care if Ali, you, trump or anyone thinks A or B is better. And i wasn't talk about specs in general. My point is about Ali's afirmation about RAM pools: here .

And Ali is not anyone. He works directly with consoles APIs, to refine his engine algorithms. It's not a developer that uses a ready engine.

Strange you're talking about fanboys statements and uses Microsoft and their "tons of money" to justify your point. Who is using fanboy arguments here?

So, 8MB L3 is "gigantic" for you? Really? And you compare with Jaguar to prove your point? Common. I like John and respect DF, but they're not Gods, they could make some mistakes sometimes, like me, you and everyone.
 
hm... let's see this DF article:

"There are customisations to the CPU core - specifically for security, power and performance, and with 76MB of SRAM across the entire SoC, it's reasonable to assume that the gigantic L3 cache found in desktop Zen 2 chips has been somewhat reduced."

Where are these 'gigantic L3 cache' and the 76MB of SRAM?

Ali is working with devkits, he's not creating 'magical theories' like many people.

Please, stop posting fake news.
 

GODbody

Member
hm... let's see this DF article:

"There are customisations to the CPU core - specifically for security, power and performance, and with 76MB of SRAM across the entire SoC, it's reasonable to assume that the gigantic L3 cache found in desktop Zen 2 chips has been somewhat reduced."

Where are these 'gigantic L3 cache' and the 76MB of SRAM?

Ali is working with devkits, he's not creating 'magical theories' like many people.
It literally says it's likely been reduced in that quote... I'm not sure what point you're trying to make here.
 
Last edited:
I don't care if Ali, you, trump or anyone thinks A or B is better. And i wasn't talk about specs in general. My point is about Ali's afirmation about RAM pools: here .

And Ali is not anyone. He works directly with consoles APIs, to refine his engine algorithms. It's not a developer that uses a ready engine.

Strange you're talking about fanboys statements and uses Microsoft and their "tons of money" to justify your point. Who is using fanboy arguments here?

So, 8MB L3 is "gigantic" for you? Really? And you compare with Jaguar to prove your point? Common. I like John and respect DF, but they're not Gods, they could make some mistakes sometimes, like me, you and everyone.

So Ali is the only person in the entire industry who works with console APIs? He is the only one in the entire industry who refines their engine algorithms? With the way you're talking, we should assume such. Do you know how bad this assertion is making you look right now? At the end of the day it is only his opinion and there are many people equally (if not more) knowlegeable who would say he's exaggerating the complexity of the RAM pool setup or imply he's outright wrong about it. When you consider the gamut of truly perplexing memory setups in consoles of prior generations and how devs were able to still work with them (and beautifully in many cases), Ali is in the minority on that front, quite easily.

Referencing MS has a lot of money is literally me being realistic; they have the means to employ top-of-industry engineers, staff, designers etc. to work on their products and handle R&D. You don't get in the ballpark of valuation of companies like Microsoft, Google, Apple, Amazon etc. without employ the best of the best in the field. And that's with no slight to people like Mark Cerny; he's a genius, but he is hardly alone in that department when it comes to contemporary system engineers. The way some people go on with quoting him (to the point where Cerny Said should be a quote meme) would lead lesser-knowing to think otherwise, however.

When did I say 8MB L3 was gigantic? If you bothered reading a few pages in the thread (or hell, searched my post history), you'd see I was a bit surprised at the 8MB since desktop variants of Zen 2 can have up to 32MB. However, you also don't seem to consider a lot about certain engineering when it comes to game consoles. Desktop garden-variety CPUs often need a lot of cache because they are doing a LOT more of varied types of workloads, and don't have dedicated hardware for certain tasks therefore requiring the CPU to pick up the slack.

That is not so much the case with the next-gen systems as both have a lot of dedicated hardware for tasks the CPU would normally perform. Therefore, they don't need 32MB of L3$, so given the Series X's design for example, yes even 8MB is quite a lot, especially compared to the current-gen systems. There's nothing wrong in comparing them to their previous gen systems because those are the designs they are superseding, with your logic no console specs in terms of CPU or GPU would ever look impressive when comparing against top-end PC counterparts because the PC will always have something better before the consoles ever launch.

DF made no mistakes in their assessment of the data they were provided at the time, and even now it mostly holds true. They weren't wrong about the 76MB of SRAM in the system either, you are simply looking at it from a narrow POV of some sort of L3$ or L4$ but not considering all the other components in the system (CPU/GPU/Audio register (L0$), L1$, L2$ etc.). It's all accounted for.
 
So Ali is the only person in the entire industry who works with console APIs? He is the only one in the entire industry who refines their engine algorithms? With the way you're talking, we should assume such. Do you know how bad this assertion is making you look right now? At the end of the day it is only his opinion and there are many people equally (if not more) knowlegeable who would say he's exaggerating the complexity of the RAM pool setup or imply he's outright wrong about it. When you consider the gamut of truly perplexing memory setups in consoles of prior generations and how devs were able to still work with them (and beautifully in many cases), Ali is in the minority on that front, quite easily.

Referencing MS has a lot of money is literally me being realistic; they have the means to employ top-of-industry engineers, staff, designers etc. to work on their products and handle R&D. You don't get in the ballpark of valuation of companies like Microsoft, Google, Apple, Amazon etc. without employ the best of the best in the field. And that's with no slight to people like Mark Cerny; he's a genius, but he is hardly alone in that department when it comes to contemporary system engineers. The way some people go on with quoting him (to the point where Cerny Said should be a quote meme) would lead lesser-knowing to think otherwise, however.

When did I say 8MB L3 was gigantic? If you bothered reading a few pages in the thread (or hell, searched my post history), you'd see I was a bit surprised at the 8MB since desktop variants of Zen 2 can have up to 32MB. However, you also don't seem to consider a lot about certain engineering when it comes to game consoles. Desktop garden-variety CPUs often need a lot of cache because they are doing a LOT more of varied types of workloads, and don't have dedicated hardware for certain tasks therefore requiring the CPU to pick up the slack.

That is not so much the case with the next-gen systems as both have a lot of dedicated hardware for tasks the CPU would normally perform. Therefore, they don't need 32MB of L3$, so given the Series X's design for example, yes even 8MB is quite a lot, especially compared to the current-gen systems. There's nothing wrong in comparing them to their previous gen systems because those are the designs they are superseding, with your logic no console specs in terms of CPU or GPU would ever look impressive when comparing against top-end PC counterparts because the PC will always have something better before the consoles ever launch.

DF made no mistakes in their assessment of the data they were provided at the time, and even now it mostly holds true. They weren't wrong about the 76MB of SRAM in the system either, you are simply looking at it from a narrow POV of some sort of L3$ or L4$ but not considering all the other components in the system (CPU/GPU/Audio register (L0$), L1$, L2$ etc.). It's all accounted for.

My goodness. You really like this discussion ahha


So, maybe one day you can stop posting 15~20 average messages everyday and start a job at Crytek to teach them.


Good luck.
 
But it's not a 18% gap in performance it's just a 18% gap in compute while PS5 will have around the same gap when it come to pixel fill rate & triangle output.

It's more than 18% in favor of the Series X and it's not only compute.
Series X has also more than 18% in texture rate, bandwidth and raytracing
Also it's less than 18% in favor of the PS5 for pixel fill rate and triangle output.
 
My goodness. You really like this discussion ahha


So, maybe one day you can stop posting 15~20 average messages everyday and start a job at Crytek to teach them.


Good luck.
And this is apparently the point where you ran out of come backs, lol. I don't have a fraction of the knowledge base of Thicc... But in todays world it is hard to take the word of any company or developer as gospel. Between NDAs, marketing agreements, and aspirations of partnerships or opportunities, the line between fact and fiction get pretty blurry at times.

I find it funny that both sides latch on to a few rogue developer voices who express opinions in a sea of developers. If they are speaking then they aren't under NDA. If they aren't under NDA then they probably don't have access to dev kits.
 

geordiemp

Member
If XSX CPU's base clock is at 3.6Ghz, then can sustain AVX 2 at 3.6 Ghz.

OK so you know that XSX CPU is better at heat dissipation than ps5, Intel and zen 2. Sources for AVX 256 claims ?.

AVX 256 has not been discussed by anyone other than Cerny explaining that case for CPU at 3.5 GHz and 3 Ghz fixed can be problematic (bombarded with AVX 256 constantly edge case) and ps5 has a liquid metal to heat sink.

And XSX can do constant AVX 256 as a CPU melter at 3.8 GHz because GAF Xbox fan says so....

Whatever.
 
Last edited:

geordiemp

Member
XSX has a 320-bit bus hence 32 bit data type can't be strip across 320-bit bus when you need 10 32bit data payload to fully populate 320-bit bus.


One problem, GPUs don't operate on full 64 or 128 or 192 or 320-bit datatypes. XSX has a 320-bit bus hence 32-bit data type can't be strip across 320-bit bus when you need 10 32bit data payload to fully populate 320-bit bus.

When compared to RX 5700 XT, XSX's on-chip L0/L1 cache and instruction queue storage are 25% higher for XSX GPU which is backed by 5 MB L2 cache while RX 5700 XT has 4 MB L2 cache.

For Gears 5, XSX GPU is superior over RX 5700 XT by about +25%.

Framebuffers have the highest memory bandwidth and with low memory storage consumers.

Consoles also have a lower CPU cache.

Lada Gaia is still correct, when the CPU is accessing memory, GPU is not, and if CPU access is slower in number of cycles, that frame time is lost.

When comparing performance, its about frame time to do things, XSX will spend more time accessing CPU RAM, less time accessing GPU RAM. The result will be how much time is taken relatively for each per game.

Also there is allot of mis understanding of Cache and memory on the apus, people do realise these caches are everywhere, every vertex shader, every pixel shader.......everywhere.
 
Last edited:
Slowly this became a PS5 thread. Dat @#$% panic concerns, nervousness in the air are so adorable.

What's really weird that every MS thread with legit, backed up by facts news these goddamn Lunatics go nuts to defend their gummy plastic of choice..relax 40yo's, yall have NOT seen nothing yet!. Wait for DF analysis, holiday line-up and actually see games running on XsX..that shit will make you humble, facts!!

Tired of this bs, but at the end of the day, we're at SonyGAF, afterall .
 

geordiemp

Member
Slowly this became a PS5 thread. Dat @#$% panic concerns, nervousness in the air are so adorable.

What's really weird that every MS thread with legit, backed up by facts news these goddamn Lunatics go nuts to defend their gummy plastic of choice..relax 40yo's, yall have NOT seen nothing yet!. Wait for DF analysis, holiday line-up and actually see games running on XsX..that shit will make you humble, facts!!

Tired of this bs, but at the end of the day, we're at SonyGAF, afterall .

If posters stop referring to ps5 in every post becasue they cannot help themselves, we can talk XSX hot chips.

The problem is nobody is dicussing the technical inputs we got, most dont understand it anyway and just want to war.
 

aries_71

Junior Member
Slowly this became a PS5 thread. Dat @#$% panic concerns, nervousness in the air are so adorable.

What's really weird that every MS thread with legit, backed up by facts news these goddamn Lunatics go nuts to defend their gummy plastic of choice..relax 40yo's, yall have NOT seen nothing yet!. Wait for DF analysis, holiday line-up and actually see games running on XsX..that shit will make you humble, facts!!

Tired of this bs, but at the end of the day, we're at SonyGAF, afterall .
This is Gaf my friend. As you already know, the amount of love and concern towards Sony is out of any rational parameter. Enjoy it, it's not going to change anytime soon.
 

jimbojim

Banned
Slowly this became a PS5 thread. Dat @#$% panic concerns, nervousness in the air are so adorable.

What's really weird that every MS thread with legit, backed up by facts news these goddamn Lunatics go nuts to defend their gummy plastic of choice..relax 40yo's, yall have NOT seen nothing yet!. Wait for DF analysis, holiday line-up and actually see games running on XsX..that shit will make you humble, facts!!

Tired of this bs, but at the end of the day, we're at SonyGAF, afterall .

Wait till next E3.
 

Allandor

Member
A good example of this is the Xbox Series X hardware. Microsoft two seprate pools of Ram. The same mistake that they made over Xbox one. One pool of RAM has high bandwidth and the other pool of RAM has lower bandwidth. As a result, coding for the console is sometimes problematic. Because the total number of things we have to put in the faster pool RAM is so much that it will be annoying again, and add insult to injury the 4k output needs even more bandwidth. So there will be some factors which bottleneck XSX’s GPU.

Ali Salehi, Crytek Rendering Enginner.

Ali Salehi Complete Interview
Oh, not Ali again.
Well, yes, the two-pool concept is .. well strange and makes it harder to optimize. But it is not the situation like on xbox one. There the sram was just to tiny to be really usefull. Now the big pool is the fast pool, while only 2.5-3 GB is in the slow pool (well actually it is not really slow, but slower) for games (the rest is the OS-memory). That is a big difference.
Could it be easier, sure, but it is a whole different situation.
 

MrFunSocks

Banned
Thanks, not only that, I digged up a post from last year that Matt corrected himself:


Considering how accurate Matt has been on numerous things, and the reports from multiple sources, it does appear that Microsoft cancelled Lockhart and then un-cancelled it.

It is a perfect explanation for the dev kits issue.
Matt has pretty much never leaked anything though. All he ever does is go “yeh I’ve heard that too” or something similar when other people leak something. He even “confirms” wrong things then just goes “things change”. He’s not an insider, he has a lowly position at a third party company.
 

PaintTinJr

Member
Hmm, is this actually the case? I mean, AVX2's width is 256-bit, yet the One X had a 384-bit bus and a similar APU design to Series X (conceptually) outside of lack of dedicated I/O decompression hardware and a more unified memory pool. I don't recall hearing any issues of difficult memory management there though or anything regarding copying data from one pool to another.

Again, it doesn't have faster & slower mem pools like Series X, but Series X is still using the same GDDR6 memory tech in both pools. Fundamentally the memory functions the same in both in terms of how the data is handled. The reason you need to copy data from system RAM to VRAM on PC is partly because the CPU and GPU separated over a PCIe bus (which is ultimately narrow compared to stuff like NVLink and Infinity Fabric, with much higher latency), and also partly because the DRAM and GDDR technology are functionally different even if the latter started as a modification of the former (in the same way HBM basically started as stacked DRAM but is functionally much different).

But there's one thing I don't think is being considered here which some newer GPU cards and Series systems will have: HBCC and DirectStorage. With GPUDirectStorage the GPU doesn't need to copy data from system RAM to VRAM because it has direct access to storage to copy and load as required. Series X has DirectStorage, which is essentially GPUDirectStorage (GPUDirectStorage is Nvidia's branding for it with their cards), so theoretically it'll never need to copy data from the slower 6 GB pool to the faster 10 GB pool because it can simply select that data to place in its 10 GB pool from storage, knowing pretty much any data in the 10 GB pool is going to be GPU-optimized graphics data anyway. Any changes to initially loaded data the GPU makes, it can just write back in its 10 GB pool, coherency is maintained at all times anyway between CPU and GPU (plus among other things, the GPU can snoop the CPU caches; not sure if it can snoop the data in its 6 GB pool though, need to re-read that part of the slides).

The main reason I would assume AVX2 is the decompressor on the XsX, is that (AFAIK) you get one unit per core, and if they are using a 10th of a CPU core, it is the most capable feature to do the job (IMO).

Your GPUdirectstorage point is interesting, but how does that work in regards to the very big indication that the XsX is really using lots of decompression tricks with BCpack - that uses zlib with RDO? The CPU would either have to decompress on the CPU core as we were told (IIRC) and write back the decompressed BCpack data to a temporary disk store, prior to the GPU requesting it uncompressed, directly - which will cost bandwidth/s on each read/write, or they'd need to have CUs in the GPU do the zlib decompression to make the BC files ready to use.

I know it was suggested that BCpack may include a random access ability to partially decompress BCpack to BC blocks - which after researching worked out that current random access for zlib has an overhead of 64KB per access, which is expensive for a potentially 6:1 compressed 4x4 pixel BC1 block that is 8bytes or 48bytes on access.
 
Last edited:

Allandor

Member
OK so you know that XSX CPU is better at heat dissipation than ps5, Intel and zen 2. Sources for AVX 256 claims ?.

AVX 256 has not been discussed by anyone other than Cerny explaining that case for CPU at 3.5 GHz and 3 Ghz fixed can be problematic (bombarded with AVX 256 constantly edge case) and ps5 has a liquid metal to heat sink.

And XSX can do constant AVX 256 as a CPU melter at 3.8 GHz because GAF Xbox fan says so....

Whatever.
You forget some things.
1. it is not an intel CPU. Even Zen+ could handle >3GHz under full AVX load
2. With 3.8 GHz you just have 7 cores (without HT)
3. With 3.6 Ghz you just have 14 threads
4. The CPU has less cache, so it is likely that "full load" is not the same as "full load" as on a 3700x CPU because it muss refill the caches faster therefore it has less time for Working in the AVX2 unit
5. The APUs will use a better process than existing Zen2 processors. We just don't know how this will change the needed power
6. The APU is bigger, so more Die area to transfer the heat

So you can never fully set the CPU under AVX2 (edge case) load and there are big unknowns.

And the last: Who cares? The MS architect told that it can handle it, why do we even question that unrealistic edge case? If Cerny would have said something like that, people wouldn't even bother to question that.


What I want to know is, where in the APU is the 24MB cache that is missing from the Zen2 cache? As MS already stated how much cache is on the APU (If I remember correctly it was 72 or 76 MB) all calculation I've seen calculated a 32MB 3rd level CPU cache into this account. But 24MB are now "missing" in those calculations. Is the rest IO buffers (e.g. for compression/decompression, ...)? Or have the RDNA2 CUs much more cache than RDNA1?
 
Last edited:
Top Bottom