• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Does PS5 Have Potential Bandwidth Issue? Does The SSD Alleviate Things If So? Let's Talk About It

Status
Not open for further replies.
gm-1fd5d77f-9bfd-4aa6-b9cf-4bb38410f590-ps5-xbox-series-x.jpg


So the other day I was running through some numbers on certain specifications regarding both systems, and I decided to try calculating the system memory bandwidth per system to their physical memory amounts, and I think when looking at it this way I can understand MS's decision to go with two pools.

See, the PS5 has 448 GB/s of bandwidth to 16 GB of physical memory. Each of those memory modules are 2 GB and provide 56 GB/s. While the BW to TF ratio breaks down to 43.6 GB per TF, if you break down the BW to physical memory ratio, you get 28 GB/s per 1 GB. This is particularly of importance when considering that, more often than not, you're going to want your RAM to be occupied with unique assets and shave down on duplicates, so please keep that in mind when reading further (tho TBF, I would need more time to elaborate on how this would fully shake out in practice).

Going back to bandwidth-per-TF for a moment, for XSX, you get about 36.88 GB/s per TF if you average out the bandwidths between both pools and divide that by the TF amount, though a more realistic way of specifying the ratio in XSX's case is 73 .77 GB/s per TF on a TF ratio matching the physical memory amount optimized for the GPU (10 GB physical memory, 7.591 TF), and 73.76 GB/s per TF on a TF ratio matching the physical memory amount optimized to the CPU/OS etc. (6 GB physical memory, 4.555 TF).

But then there is also the bandwidth per physical GB ratio that factors into the picture; with PS4 having a unified pool at 448 GB/s, you get 28 GB/s per 1 GB physical memory, or exactly 1/2 the 56 GB/s bandwidth of a single GDDR6 14Gbps module. Once again, for XSX you could technically just do an average of the two pools, but it's clear that MS went with a "split" (not split in traditional sense like DDR4/GDDR on PC or in older consoles like PS3, GC, or even XBO's ESRAM/DDR3 split memory pools) to stress maximization of bandwidth to the system processors. 560 GB/s divided by 10 GB of physical memory gives you... 56 GB/s per 1 GB. And 336 GB/s divided by 6 GB of physical memory also gives you 56 GB/s per 1 GB. In other words, you have a 1:1 ratio between bandwidth memory per physical GB and actual chip bandwidth per GDDR6 module.

While at it, we can also calculate the TF per physical GB ratio as well. Once again for XSX we'd have to "split" the calculation due to its memory setup, but it leaves you with 759.1 GFLOPs per physical GB (TF amount partition to 10 GB, 560 GB/s pool), and the same 759.1 GFLOPs per physical GB on the 6 GB, 336 GB/s pool. Again, a 1:1 ratio of balance. With PS5, we end up with 642.1 GFLOPs per physical GB, thanks to its memory setup.

This is a pretty notable difference between the two systems and it will be interesting to see how things play out in this regard. However, we should also not forget to take into consideration that with the above numbers, you are more realistically looking at total GPU-bound bandwidth figures of 560 GB/s @ 10 physical GB on XSX and, assuming non GPU-bound tasks of physical GB ratio is same between both systems (i.e both systems using 6 GB physical memory for non-GPU orientated data, including some reserved to the OS) 280 GB/s @ 10 physical GB on PS5. AGAIN, this is taking into account GPU TF per physical GB numbers, nothing else. This is why I pontificate PS5 might have a bandwidth problem, but admittedly this is before looking into the SSDs.

(**To the above paragraph, we can also adjust the PS5's figure of bandwidth-to-physical-memory ratio to reflect an equal amount of physical memory to GPU-bound tasks as on XSX, aka 10 GB, which puts the PS5's number closer to 44.8 GB/s per physical GB. However this assumes good-faith in PS5's memory setup (which is most likely the case) and also factoring into account asymmetrical access, i.e the full system bandwidth bus would be accessing 8 chips at 44.8 GB/s each or 356.8 GB/s altogether on the first past, and would then need to access the other 2 GB at 89.6 GB/s on a second pass.

Given PS5's bandwidth is over a unified pool of 16 GB memory this is not likely to be the case because it would also facilitate accessing 8 GBs @ 448 GB/s and then the 2nd 8 GBs @ 448 GB/s which would essentially be analogous to XSX's setup yet not physically possible since there is no asymmetric mix of modules at different capacities thus no presence of potential issues in bus access that would require this type of implementation (and no physical state that would allow it to be implemented, either)**)

With the SSDs, you're looking at a maximum RAM reoccupy rate of 2.9 seconds on PS5 if streaming in 5.5 GB of raw data per second, whereas on XSX you are looking closer at 6.6 seconds. That's a 2.27x advantage for PS5's SSD, so it can technically saturate its RAM with unique assets more quickly. I stress technically however, because there are quite a few aspects on the system's design we do not know about in terms of asking if this really cuts down on a bandwidth deficit compared to XSX. For example, we know the XSX has a very strong emphasis on ML and DLSS-style upscaling techniques, so the possibility is very much there for developers to populate the RAM with lower-quality (and therefore, much smaller) texture and data assets which can then be upscaled through dedicated hardware in the GPU for such tasks. While of course there is going to be a small latency penalty incurred (any sort of processing incurs a bit of a timing penalty, some more than others), it would seem safe to say the memory bandwidth alone helps to mitigate a lot of this.

While PS5 will likely feature its own implementation of such techniques, it stands to wonder if they'll be of as big a focus on the platform's architectural design. It could also be a case where a lot of this type of work is being handled in the I/O complex before being sent to system RAM, though from what's been shown of the design so far, I personally don't think this is the case as much of the customization in the PS5's I/O and flash memory controller are designed around efficient flow, access, and organization of the data.

Anyways, I just wanted to make this post because I had thought on it all a couple days ago when writing something else, and found it very interesting to come upon. It really does go to show that both platform holders have made very calculated and deliberate design decisions to their respective systems, and what's most interesting is to see how the games on these systems will make use of that hardware, both 3rd-party and 1st-party. And thankfully, we won't have to wait too much longer to start seeing some next-gen gameplay to drive all of this home 👍
 
per GDDR6 module.

Once again for XSX we'd have to "split" the calculation due to its memory setup, but it leaves you with 759.1 GFLOPs per physical GB , With PS5, we end up with 642.1 GFLOPs per physical GB, thanks to its memory setup.

With the SSDs, you're looking at a maximum RAM reoccupy rate of 2.9 seconds on PS5 if streaming in 5.5 GB of raw data per second, whereas on XSX you are looking closer at 6.6 seconds. That's a 2.27x advantage for PS5's SSD, so it can technically saturate its RAM with unique assets more quickly.

Based on this, each console seems to have a different advantage, but which one will more directly translate into "better gaming?" I think I know how the faster SSD/faster maximum RAM reoccupy rate can translate into what I experience when gaming on PS5, at least I have read about it a lot here on GAF recently, but what will more GFLOPS per physical GB on XsX look/feel like in practice? (basically, what does that DO exactly?)
 
Last edited:
Based on this, each console seems to have a different advantage, but which one will more directly translate into "better gaming?" I think I know how the faster SSD/faster maximum RAM reoccupy rate can translate into what I experience when gaming on PS5, at least I have read about it a lot here on GAF recently, but what will more GFLOPS per physical GB on XsX look/feel like in practice? (basically, what does that DO exactly?)

Well, the first thing to keep in mind is GFLOPs refer to computational performance of vector ALU, as Cerny said in his presentation. Each FLOP is 4 bytes in FP32 (Floating Point 32) mode, so you're looking at maxiumum theoretical performance of 3.0364 trillion bytes per 1 GB physical memory on XSX, or 3.0364 TB of computational performance per 1 GB of physical memory. Conversely, with PS5 it comes out to 2.568 TB of computational performance per 1 GB.

It basically means XSX can calculate more operations (theoretically, mind you) on contents in its memory compared to PS5. But it really comes down to what the tasks at hand call for. Not all tasks will be the same and there's more instances where both systems WON'T be maximizing their theoretical computation output than instances where they will.

To the RAM reoccupy rate from SSD, that one's a bit harder to answer honestly. What I do know is that while PS5 easily has the faster stream rate from SSD to RAM out of the two, since the XSX is also targeting server markets it's going to need a big focus on ML and related techniques, arguably more than PS5, so it wouldn't surprise me if it's the more capable of the two in terms of texture upscaling through DLSS-like techniques using APIs like DirectML and hardware on the GPU built for facilitating it. And depending on the rate of efficiency of BCPack texture compression, they can squeeze in more unique data into the RAM vs. PS5, and upscale those textures at runtime through the GPU.

There's already some examples of this in some Youtube videos showing of DLSS technology, and it's quite impressive. I know PS5 will feature similar capabilities, but this is one of those areas where XSX having the spare GPU headroom (due to larger GPU) comes into handy IMHO.
This is a great thread and a discussion needing its own thread.

Thank you, you brave brave man, for creating this thread and dedicating the time and effort to such a detailed op.

Thank you too xD. It's probably considered blasphemous to do a thread like this right now but it was just stuff on my mind wanting to share. At least it wasn't yet another SSD thread (not 100% the way, anyhow).

(P.S can't tell if my sarcasm detection is wonky today but hey, none of this is gonna be on people's minds tomorrow so let's just do it now I say).
 
Thank you too xD. It's probably considered blasphemous to do a thread like this right now but it was just stuff on my mind wanting to share. At least it wasn't yet another SSD thread (not 100% the way, anyhow).

(P.S can't tell if my sarcasm detection is wonky today but hey, none of this is gonna be on people's minds tomorrow so let's just do it now I say).
A mix of both. One thing NeoGaf was missing was concerned SSD threads.
 


JareBear: Remastered JareBear: Remastered Here's a vid that touches into some of the DLSS (though, through the RTX side of things) features next-gen can bring. If techniques like DLSS can bring real-time upscaling and processing of textures at big multipliers, that can result in much smaller file sizes to keep more data in RAM and simply upscaling the textures to 4K-level output at runtime when needed by the game code.

We've seen MS focus on this a good bit in particular, less so Sony, but I think we'll see at least somewhat analogous techniques on that side as well. Maybe not to the level we see on XSX (having both more GPU headroom and possibly some other customizations focused on ML techniques since it's also aiming for use in the server market), but it should still be a factor.

A mix of both. One thing NeoGaf was missing was concerned SSD threads.

To be fair, those threads are such clockwork now it's weird NOT to have one pop up on a meal break.
 

MCplayer

Member
good technical thread (y)

I would like to know more about the Bus Width. What is it exactly.
XboxSX has 320b width while PS5 is 256b width, what does it mean? Is bus width "how much data it can swallow at once"? therefore 320 being more expanded and faster with 560BG/s? or Bus width doesn't really give any advantage?
 
Last edited:

Jigsaah

Gold Member
How the hell did I get brain freeze from reading this. I thought that came from slurpees.

I kinda follow...but from I can tell it feels like Xbox is attacking the memory issue in a more...balanced way? Also, Sony's SSD may somehow makeup what it lacks in raw power by letting the SSD do a lot of the heavy lifting?

Am I in the ball park?
 

Silver Wattle

Gold Member
Your math is wrong, you used the full 560GB/s 16GB bandwidth of the Xbox on just the first 10GB then added a nonexistent 336GB/s for the remaining 6GB.
 

Azurro

Banned
Like someone else mentioned, you can't just add the bandwidth of each, I don't remember the layout but because of their configuration, some of the RAM chips use the same physical lanes, I believe, so the bandwidth actually gets divided depending on the use case, there was an interesting write up somewhere, I'll see if I can find it.

In any case, does it even make sense to just straight up divide TF performance over bandwidth? You are not loading up assets into memory all the time before rendering, some stay in memory and some leave, and not every single aspect of the rendering is covered by the TF figure.

My hw knowledge is very basic though, so it'd be nice to have someone here with more hardware knowledge to put his/her 2 cents here.
 
Last edited:
good technical thread (y)

I would like to know more about the Bus Width. What is it exactly.
XboxSX has 320b width while PS5 is 256b width, what does it mean? Is bus width "how much data it can swallow at once"? therefore 320 being more expanded and faster with 560BG/s? or Bus width doesn't really give any advantage?

Something to that effect, yes. For example a 256-bit bus means that 32 bytes of data can be transmitted across the bus at once; OTOH a 320-bit bus means that 40 bytes of data can be transmitted across the bus at once. In both cases the bandwidth indicates how many times per second chunks of 32 byte (PS5) or 40 byte (XSX) data can be transmitted across the bus. However not all data operations need the full bus width at all times, that applies to both systems.

Bus width does come with advantages; the Radeon VII for example has a 1024-bit bus thanks to using HBM2 memory. It can especially help with 4K textures and uncompressed data. However, all modern GPUs use compression algorithms for data going across the bus, that way more data can get through without needing to necessarily increase the physical bus size.

That's about my extent of knowledge on GPU data compression though; I'd assume thanks to being even more modern GPUs the Navi GPUs in the next-gen systems will have even more efficient compression techniques on data coming along the bus, backed by more sophisticated algorithms.


How the hell did I get brain freeze from reading this. I thought that came from slurpees.

I kinda follow...but from I can tell it feels like Xbox is attacking the memory issue in a more...balanced way? Also, Sony's SSD may somehow makeup what it lacks in raw power by letting the SSD do a lot of the heavy lifting?

Am I in the ball park?

Yeah, when it comes to the memory I'd say even with the compromises taken due to rising DRAM prices XSX's setup is very balanced and smart. Even if there are essentially two different pools, they aren't split like in typical PC so it means no need to explicitly copy assets between the two pools unless a developer sees a use-case for doing so to squeeze out extra performance.

But we won't be seeing examples of that in early-year games. And also yes, to an extent PS5's SSD does seem to be doing some of the work XSX might be letting the GPU pick up. The stream rate is very impressive (again, it can refresh the entire contents of its RAM in 2.9 seconds; btw it would mean the Spider-Man demo they showed off with the 0.8 load time was loading in 4.41 GB of data in that time frame), but it really does come down to if a game actually needs the RAM contents fully replaced in that span of time, and how often will they need to do so.

There's also the question of data asset sizes going into next gen; it's very likely they'll get smaller and more granular in their design, and techniques can be implemented in cases where a developer would rather want real-time dynamic alteration of graphics data using blends of textured and non-textured methods, yet can essentially"fake" the experience of streaming in more unique pre-baked textures faster.

Your math is wrong, you used the full 560GB/s 16GB bandwidth of the Xbox on just the first 10GB then added a nonexistent 336GB/s for the remaining 6GB.

No, the 560 GB/s is only to 10 GB of physical memory, the 336 GB/s is for the remaining 6 GB. I just specified the bandwidths for that amount of physical memory, not implying the XSX can access both pools simultaneously.

But that's also the reason I didn't really factor the average bandwidths as the product of taking both pools, summing them, then dividing them, because that would assume the system is spending 50% of its time in both pools when in reality we don't know what the ratio would be, plus it would change on a game-to-game basis so it's practically impossible to truly state what the "average" would be in that case unless doing so game-by-game.


Was expecting a couple posts moaning about 'another SSD thread' and i'm pleasantly surprised to see discussing these differences between the two consoles and not trolling and/or console warring.

Thanks OP. I like your post.

If investing in SSD threads worked like stock, we would probably be millionaires by now xD.

I don't mind the SSD talk in particular, but people gravitating to it like it's the only other aspect of next-gen of significance is kind of crazy IMO. There are some nuanced differences between the systems and they have their own unique strengths and weaknesses, but I also like trying to see if their unique strengths can help mitigate some of the weaknesses as well.

But it's hard to do, because there's so many factors involved.
Tl:dr

Thicc_girls_are, in fact, indeed teh_best.

Interesting read, exciting times.

Lol, reminds me to change up the avatar for tomorrow ;) I'm excited too; the AC Valhalla trailer looked really good and very strong chance there'll be some Elden Ring gameplay there too. Plus we're getting a PS5 article in that magazine (is it Edge?) this month and IIRC, the PS5 event in June which should hopefully be their big blowout. And the MS 1st-party event in July right after that, too.

Now if the corona mess would just go away, especially for people in the big cities because they don't even really get to enjoy a lot of natural scenery and big open skies unless they travel out, which is probably really difficulty right now.

Like someone else mentioned, you can't just add the bandwidth of each, I don't remember the layout but because of their configuration, some of the RAM chips use the same physical lanes, I believe, so the bandwidth actually gets divided depending on the use case, there was an interesting write up somewhere, I'll see if I can find it.

In any case, does it even make sense to just straight up divide TF performance over bandwidth? You are not loading up assets into memory all the time before rendering, some stay in memory and some leave, and not every single aspect of the rendering is covered by the TF figure.

My hw knowledge is very basic though, so it'd be nice to have someone here with more hardware knowledge to put his/her 2 cents here.

That's just it though, I never added the bandwidths. I just stated that the 10 GB pool has its bandwidth, and the 6 GB pool has its bandwidth. Never implied both could be accessed simultaneously, either (it would be nice if they could, but doing that while enforcing some type of heterogeneous memory scheme would require wide-scale use of a relatively new cache coherence standard like CXL, which was only established last year and on PCIe 5.0 IIRC).

It is true though that yes, assets aren't constantly being loaded into memory at all times. Some (quite a bit, in fact) just stay there. Others will be used scantly and then leave. And the TF figure doesn't cover everything. However, it's an easy metric to use and it's just meant to show that if data in the physical memory were to be calculated upon at maximum theoretical TF numbers, you'd get results similar to what I mentioned in the OP.

It's a theoretical example, but looking at the numbers that way also helps to see some of the design and engineering decisions MS and Sony might've taken into consideration when designing their systems and determining certain setups to roll with.
 

Silver Wattle

Gold Member
Since you have denied adding the bandwidths;
though a more realistic way of
specifying the ratio in XSX's case is 73 .77 GB/s per TF on a TF ratio matching the physical memory amount optimized for the GPU (10 GB physical memory, 7.591 TF), and 73.76 GB/s per TF on a TF ratio matching the physical memory amount optimized to the CPU/OS etc. (6 GB physical memory, 4.555 TF).
Right here.

.560 GB/s divided by 10 GB of physical memory gives you... 56 GB/s per 1 GB. And 336 GB/s divided by 6 GB of physical memory also gives you 56 GB/s per 1 GB. In other words, you have a 1:1 ratio between bandwidth memory per physical GB and actual chip bandwidth per GDDR6 module.
and here.

. While at it, we can also calculate the TF per physical GB ratio as well. Once again for XSX we'd have to "split" the calculation due to its memory setup, but it leaves you with 759.1 GFLOPs per physical GB (TF amount partition to 10 GB, 560 GB/s pool), and the same 759.1 GFLOPs per physical GB on the 6 GB, 336 GB/s pool. Again, a 1:1 ratio of balance. With PS5, we end up with 642.1 GFLOPs per physical GB, thanks to its memory setup.
here too.

560GB/s for the first 10GB of RAM means ZERO bandwidth for the remaining 6GB.
 
Most people seem to overlook this XBox Series X feature:

Sampler Feedback Streaming (SFS) – A component of the Xbox Velocity Architecture, SFS is a feature of the Xbox Series X hardware that allows games to load into memory, with fine granularity, only the portions of textures that the GPU needs for a scene, as it needs it. This enables far better memory utilization for textures, which is important given that every 4K texture consumes 8MB of memory. Because it avoids the wastage of loading into memory the portions of textures that are never needed, it is an effective 2x or 3x (or higher) multiplier on both amount of physical memory and SSD performance.”

2X to 3X reduction in effective texture size would have a big impact on bandwidth utilization.
 

Tripolygon

Banned
Here's a funny way to look at it.

Parent A - Has 12 children to feed and 12 loafs of bread

Parent B - Has 10 children to feed and 10 loafs of bread

Are any of the children starving? No, they have enough loafs of bread proportional to the amount of children they have to feed give or take + or - 20% as some children are liable to eat more than others.

On a more semi technical level. You can't arbitrarily decide XSX only has 10GB of storage, it has 16GB of unified storage on a unified 320bit bus. Also their storage is asymmetrical whereby 6GB on that unified 16GB can only be accessed at 336GB/s while the rest at 560GB/s.

This is a hUMA memory system so all memory is shared and can be accessed by both CPU and GPU at "the same" time. Meaning accessing the 6GB at 336GB/s reduces the overall bandwidth disproportionately than if all 16GB were accessible at 560GB/s. Yes Microsoft is so keen on labeling 10GB as Graphics optimized and yes developers get to decide what they put on each side of the 16GB but it still remains that 6GB still has to be accessed at some point every frame which will bring the overall memory bandwidth down. There is just no way around that. You're either reading all at full bandwidth or some at less than full bandwidth.
 
Last edited:

Aceofspades

Banned
Is this another case of "let me write huge walls of nonsense so I can sound clever and push my agenda"

Really? For some strange reason you only started averaging only the 10GB portion (supposedly going only to GPU) , why not do the same for PS5?

Cmon man, what a load of nonsense. Whats next? Go to best buy and start cheering people who bought Xboxes? ... oh wait 🤣 that job is already occupied by another person. Lol
 
Last edited:
Don’t forget PS5 has a pool of SRAM which carries out all the SSD tasks leaving the 16GB for the games itself + maybe the OS.

I’m probably missing it but I google searched and can’t find confirmation that this is wholly accurate, as far as all 16gigs being available for games. Again, I’m not saying you’re wrong, just wondering if you have a link that confirms this? My memory has gone to shit lately so maybe I read about it in March and forgot.
 
Last edited:

Aceofspades

Banned
I’m probably missing it but I google searched and can’t find confirmation that this is wholly accurate, as far as all 16gigs being available for games. Again, I’m not saying you’re wrong, just wondering if you have a link that confirms this? My memory has gone to shit lately so maybe I read about it in March and forgot.

2020-03-19_032038.jpg
 

Tripolygon

Banned
Most people seem to overlook this XBox Series X feature:

Sampler Feedback Streaming (SFS) – A component of the Xbox Velocity Architecture, SFS is a feature of the Xbox Series X hardware that allows games to load into memory, with fine granularity, only the portions of textures that the GPU needs for a scene, as it needs it. This enables far better memory utilization for textures, which is important given that every 4K texture consumes 8MB of memory. Because it avoids the wastage of loading into memory the portions of textures that are never needed, it is an effective 2x or 3x (or higher) multiplier on both amount of physical memory and SSD performance.”

2X to 3X reduction in effective texture size would have a big impact on bandwidth utilization.
No, nobody is overlooking it as its been mentioned ad nauseum. What you have described in a broad sense is what everybody does, its called partial resident textures, tiled resources, sparse textures etc. You only load into memory what is needed for that particular frame divided in pages. What sampler feedback does is make it a lot more granular, so instead of loading into memory 4 pages because you have to keep in memory what you might need, sampler feedback (supported by all dx12u compatible gpu) adds an extra step where you get a map of what is actually in view and based on that map you load those specific pages, so perhaps now you only need to have 3 pages in memory because you now know you only need the 3 rather than 4. What XSX brings to the table are new texture filters in XSX which does this.
hCmbcOF.jpg
 
Since you have denied adding the bandwidths;

Right here.

and here.

here too.

560GB/s for the first 10GB of RAM means ZERO bandwidth for the remaining 6GB.

What's your point? I never said the bandwidths pools would be accessed concurrently/simultaneously, so you're kind of making a mountain out of a molehill here. If system resources don't need the other 6 GB of physical memory, it doesn't matter then if it cannot be accessed. If it needs to be accessed, then the 6 GB pool is accessed in lieu of the 10 GB pool not being accessed for that given time.

I think you kind of missed the point in those metrics I posted :S


Don’t forget PS5 has a pool of SRAM which carries out all the SSD tasks leaving the 16GB for the games itself + maybe the OS.

It depends on what we mean by "large". DRAM cache pools on most SSDs usually is between 32 MB to 256 MB, to balance out prefetches and writes. And that's considering DRAM is generally cheaper than SRAM.

So the question is how large would the SRAM cache on the PS5's flash memory controller be. Realistically, we can assume it's somewhere between that 32 MB - 256 MB figure. With the added bonus that SRAM is faster, though there's plenty of variation in terms of SRAM capabilities as you have the high-end stuff and the low-end, and everything in-between.

I’ve seen this.

This proves that all 16 gigs are available for games?

No, it isn't. You're still looking at about 2 GB reserved for OS tasks and features, and of course the CPU and audio data need some of the RAM as well. But overall at maximum in most situations that's about 14 GB for gaming applications, which is a big step up from the 5.5 GB the OG PS4 had.

Is this another case of "let me write huge walls of nonsense so I can sound clever and push my agenda"

Really? For some strange reason you only started averaging only the 10GB portion (supposedly going only to GPU) , why not do the same for PS5?

Cmon man, what a load of nonsense. Whats next? Go to best buy and start cheering people who bought Xboxes? ... oh wait 🤣 that job is already occupied by another person. Lol

....?!?

So did you read past the thread title or didn't you? I'm guessing you didn't.

...because if you did you'd realize I did specify a like-for-like 10 GB physical memory scenario for PS5. It's...right there, in the OP. Doesn't change anything.

Here's a funny way to look at it.

Parent A - Has 12 children to feed and 12 loafs of bread

Parent B - Has 10 children to feed and 10 loafs of bread

Are any of the children starving? No, they have enough loafs of bread proportional to the amount of children they have to feed give or take + or - 20% as some children are liable to eat more than others.

On a more semi technical level. You can't arbitrarily decide XSX only has 10GB of storage, it has 16GB of unified storage on a unified 320bit bus. Also their storage is asymmetrical whereby 6GB on that unified 16GB can only be accessed at 336GB/s while the rest at 560GB/s.

This is a hUMA memory system so all memory is shared and can be accessed by both CPU and GPU at "the same" time. Meaning accessing the 6GB at 336GB/s reduces the overall bandwidth disproportionately than if all 16GB were accessible at 560GB/s. Yes Microsoft is so keen on labeling 10GB as Graphics optimized and yes developers get to decide what they put on each side of the 16GB but it still remains that 6GB still has to be accessed at some point every frame which will bring the overall memory bandwidth down. There is just no way around that. You're either reading all at full bandwidth or some at less than full bandwidth.

None of this was denied in the OP. However, you must also realize that the rate of access between the 10 GB and 6 GB pools will vary on a game-by-game basis. Also at least from the little that is out there right now it can be surmised the system has features in place to mitigate most bandwidth drags that could occur by switching access between the two pools.

The other thing is, they have a team of engineers working on the system, who likely would've came to these conclusions and much more complicated ones well ahead of time and designed aspects of the system with them in mind to cut down potential bottlenecks. Not that they are 100% out of the picture, but the concern over full system bandwidth being dragged down by seemingly nightmare fuel-inducing levels of access on the slower memory pool doesn't seem necessarily warranted IMO.
 

Tripolygon

Banned
None of this was denied in the OP. However, you must also realize that the rate of access between the 10 GB and 6 GB pools will vary on a game-by-game basis. Also at least from the little that is out there right now it can be surmised the system has features in place to mitigate most bandwidth drags that could occur by switching access between the two pools.
I don't deny that and at no point have i postulated that it is going to be a problem. Developers have known how to optimize around various memory access methods. PS2 WTF (lol) PS3 NUMA, PS4 hUMA. Somehow they'll have a problem with PS5 with slightly less bandwidth 22% on XSX 10GB and 22% more than 6GB XSX. Xbox One had 68GB/s memory bandwidth, with a small fast 32MB SRAM think about that.
The other thing is, they have a team of engineers working on the system, who likely would've came to these conclusions and much more complicated ones well ahead of time and designed aspects of the system with them in mind to cut down potential bottlenecks. Not that they are 100% out of the picture, but the concern over full system bandwidth being dragged down by seemingly nightmare fuel-inducing levels of access on the slower memory pool doesn't seem necessarily warranted IMO.
That's all well and good, extend the same courtesy to Sony because they have teams of some of the best engineers working on the system who would've just as likely come up with these conclusions and much more complicated ones well ahead of time and designed aspects of the system that help mitigate some of these potential bottleneck. I mean they went through the effort of designing one of the fastest consumer SSD, designed a system to pin point accurately flush parts of cache that is no longer needed to avoid wasting memory.
 
Last edited:

bitbydeath

Member
It depends on what we mean by "large". DRAM cache pools on most SSDs usually is between 32 MB to 256 MB, to balance out prefetches and writes. And that's considering DRAM is generally cheaper than SRAM.

So the question is how large would the SRAM cache on the PS5's flash memory controller be. Realistically, we can assume it's somewhere between that 32 MB - 256 MB figure. With the added bonus that SRAM is faster, though there's plenty of variation in terms of SRAM capabilities as you have the high-end stuff and the low-end, and everything in-between.

I don’t think size matters in this case, heh.
As we know it’s purpose and what it is restricted to.

But the point was alleviating this from the main memory means more bandwidth could be used elsewhere.

Edit: I suppose the greater question is how much memory this usually consumes.

Edit2: It is definitely an ongoing cost meaning it is continuously running as it’s a requirement for game streaming.
 
Last edited:

Ascend

Member
Interesting analysis. A while back I did a calculation with some speculation of how they would practically use the RAM. I did come to the same 56GB/s per 1 GB, but I didn't really think much of it in comparison to the PS5, since I was focusing on analysis of the XSX memory system itself. For the ones interested;


The Xbox Series X RAM Setup

How is it really set up?

After thinking about it, I think the XSX memory setup is a bit deceiving in its advertising... They are basically telling you it is like this (each number is a memory chip with the amount of GB it has);
1 1 1 1 1 1 1 1 1 1 + 1 1 1 1 1 1 (560GB/s + 336 GB/s)

While in reality the memory config is more like this;
2 2 2 2 2 2 + 1 1 1 1
That makes it look like 336 GB/s + 224 GB/s, but technically that's not true either... Because the lanes from the 2GB chips and the 1GB chips are not 'separate'. The RAM is not split, but one pool. So like this;

2 2 2 2 2 2 1 1 1 1
The question is, why don't they simply advertise with 560 GB/s? That looks like a perfectly viable 10 x 56GB/s setup... Right? Well... They are aware that if you do not allocate RAM efficiently, you'll run into problems. If you fill only the 2GB chips first, you get 336GB/s. If you fill the 1GB chips first, you get 224GB/s. If you fill them randomly, you'll get inconsistent performance and the effective bandwidth constantly changing on you. They want developers to use the RAM like they are advertising it, which is entirely possible. The more lanes you use for data, the better, obviously. Even though it is not configured like that in reality, by artificially 'splitting' the 2GB modules in two 1GB modules, you achieve the same result as what they are advertising.

Aaaand here's where the complexity starts...
There is one caveat though. Obviously the 2GB modules use the lanes that they have. So even if you artificially split them, there isn't magically additional lanes for data transfer. The lanes needs to be shared by the two sections of the 2GB chip... To put it another way, the 1GB chips get the full 56GB/s per chip and thus per GB (please stick with me here). The 2GB chips, if not used correctly, rather than getting the advertised 56GB/s to reach the total of 560 GB/s will get 28GB/s per GB in the worst case scenario . So you can't really advertise it as 560GB/s + 336 GB/s here. In the worst case scenario, you are talking about 280 GB/s + 336 GB/s. Now that is REALLY atrocious bandwidth.

Is the RAM split or not?
The reality is, that the RAM will work like a hybrid between a split and a unified RAM pool. What do I mean by that? It will work as a unified RAM pool in the sense that both the GPU and the CPU will have access to all the data on all the 16GB. However, it will work as a split RAM pool in terms of data allocation. There will have to be two priority levels in the 2GB RAM chips. Only when there are no 1st priority calls on the RAM, can the 2nd priority be executed. So whatever uses high bandwidth (like textures) will need to be given 1st priority, and whatever uses low bandwidth, can go into 2nd priority.

It's getting more complicated...
And sadly, once again it is not that simple either... Because if you need something right now on screen that is low bandwidth, and it is set to low priority, you will get pop-ins for example. Or if you allocate all sound to the low priority section, then you'll get weird sound delays etc... Yes. That is quite complicated... If it is like that, I can see why we are getting many developers saying are liking the PS5 more. It's simply much simpler. Despite the power of the XSX, it will require some creativity to learn and work with the RAM system of the XSX. If it really is like this, it's actually possible (if not inevitable) that initially we see PS5 games looking better than XSX games, unless they decide to keep all RAM usage under 10GB for both for ease of development. I don't think it will actually work that way. I certainly hope it doesn't...

Free cache lesson for you. It's relevant, I promise
The best way to really do it is that you use the 2GB RAM chips as a sort of L1 and L2 cache. I don't know if people here know how cache works... I'll try and explain it shortly...
Say you have a processor, and the processor has two levels of cache. The first level L1 can store two letters, and the second level can store four letters (typically, the L2 cache is larger, but for the XSX RAM it would be smaller). Cache basically saves the most frequently used data in it for data access. The 'closer' the cache to the CPU, the faster it is.
In the beginning, the cache is empty, and as the processor does jobs, it fills the caches and changes the data in the caches accordingly.
Now imagine I am typing a long word, like pneumonoultramicroscopicsilicovolcanoconiosis (yeah that's the longest word in the english language lol). The CPU has nothing in cache in the beginning, but it doesn't know that. It checks L1, no data. Then checks L2, no data. Then Checks RAM, no data. Ultimately it arrives at the storage device, and copies all the used letters of the program into the caches and the RAM, in order. The letter that is the most common will be saved in L1. The second most common letter will also be saved in L1. Now L1 is full. The 3rd, 4th, 5th, and 6th most common letters will be saved in L2. The rest are in RAM. Next time if I type that word, the CPU can read much of the data from L1, then L2, then the RAM. It will do it much faster than before.

So if we go letter for letter, first the L1 and L2 caches will look like this;
L1 [p,n]
L2 [e,u,m,o]

Now as we type further, things start to change... As we type pneumono, the o and the n become the most common, so p is 'downgraded' to L2, and o is added to L1. n stays;
L1 [n, o]
L2 [e,u,m,p]

As we type ultra, u has been used as often as the other letters, but L1 is full, so it stays in L2, and everything stays the same. And so on and so on. When a letter has been used 3 times it will shift down one of the ones in L1 towards L2, and since L2 is full, the least common letter will be shifted to RAM if it's not already there.

Enough caches. I want some more RAM sweetness!
So that was the short lesson on caches... Going back to the XSX RAM... If they let it work like cache, then everything is allocated to the 10GB first as data is accessed. When there's a data lookup, it will always look in the 10GB first. So, that means the 1GB ram chips and the first 'tier' of the 2GB chips will have priority. Only when the required data is not found there, the lookup will take place in the 2nd tier of the 2GB chips. If done this way, the bandwidth will not interfere with each other, and realistically give you 56GB/s per GB in the 2GB RAM chips also, for both tiers. Now, the 560 GB/s is practically guaranteed, and so is the 360 GB/s.
If it works that way, it's actually a really smart design... And the XSX will have practically zero issues with RAM allocation. Then the XSX will truly have a great bandwidth advantage over the PS5. This is more likely the solution that MS came up with. Having developers manually tune it would be a nightmare. You might have to be a bit more careful with RAM than the PS5, but it would not be a huge issue.


That indeed gives twice the bandwidth compared to the PS5 per GB. They probably found that the GPU would be bottlenecked by the bandwidth otherwise.
In other words, the XSX can sacrifice amount of RAM for increased bandwidth. But it's not an either/or per game, but rather per moment of gameplay. The full RAM pool can be used at will, at the cost of bandwidth. It's actually quite a smart design.
It kind of makes me wonder if Sony ran into the same bandwidth issue, and rather than finding a way around it, they designed the GPU around that bandwidth instead, which explains its weaker GPU implementation.
 
Last edited:
I don't deny that and at no point have i postulated that it is going to be a problem. Developers have known how to optimize around various memory access methods. PS2 WTF (lol) PS3 NUMA, PS4 hUMA. Somehow they'll have a problem with PS5 with slightly less bandwidth 22% on XSX 10GB and 22% more than 6GB XSX. Xbox One had 68GB/s memory bandwidth, with a small fast 32MB SRAM think about that.

That's all well and good, extend the same courtesy to Sony because they have teams of some of the best engineers working on the system who would've just as likely come up with these conclusions and much more complicated ones well ahead of time and designed aspects of the system that help mitigate some of these potential bottleneck. I mean they went through the effort of designing one of the fastest consumer SSD, designed a system to pin point accurately flush parts of cache that is no longer needed to avoid wasting memory.

The 2nd half of this response made me chuckle but I can't even be mad xD. I highly respect Sony's engineering team, and historically they've always been an innovative company. Looking back at stuff like Betamax, even if it was a commercial failure, just seeing how they iterated on the tech back in the day is really impressive.

But at least when it comes to the console space, particularly more around the modern day, sometimes I see the notion from some people that MS has no hardware expertise because they're mainly known for software, like Windows. It's true they're mostly known for that (I think their cloud services and business software is their biggest revenue source these days however), but I don't want to downplay their engineering prowess as well. Cerny's a great guy but some people tend to think he's on a whole other level by his lonesome, I don't quite see it that way.

Interesting analysis. A while back I did a calculation with some speculation of how they would practically use the RAM. I did come to the same 56GB/s per 1 GB, but I didn't really think much of it in comparison to the PS5, since I was focusing on analysis the XSX memory system itself. For the ones interested;


The Xbox Series X RAM Setup

How is it really set up?

After thinking about it, I think the XSX memory setup is a bit deceiving in its advertising... They are basically telling you it is like this (each number is a memory chip with the amount of GB it has);
1 1 1 1 1 1 1 1 1 1 + 1 1 1 1 1 1 (560GB/s + 336 GB/s)

While in reality the memory config is more like this;
2 2 2 2 2 2 + 1 1 1 1
That makes it look like 336 GB/s + 224 GB/s, but technically that's not true either... Because the lanes from the 2GB chips and the 1GB chips are not 'separate'. The RAM is not split, but one pool. So like this;

2 2 2 2 2 2 1 1 1 1
The question is, why don't they simply advertise with 560 GB/s? That looks like a perfectly viable 10 x 56GB/s setup... Right? Well... They are aware that if you do not allocate RAM efficiently, you'll run into problems. If you fill only the 2GB chips first, you get 336GB/s. If you fill the 1GB chips first, you get 224GB/s. If you fill them randomly, you'll get inconsistent performance and the effective bandwidth constantly changing on you. They want developers to use the RAM like they are advertising it, which is entirely possible. The more lanes you use for data, the better, obviously. Even though it is not configured like that in reality, by artificially 'splitting' the 2GB modules in two 1GB modules, you achieve the same result as what they are advertising.

Aaaand here's where the complexity starts...
There is one caveat though. Obviously the 2GB modules use the lanes that they have. So even if you artificially split them, there isn't magically additional lanes for data transfer. The lanes needs to be shared by the two sections of the 2GB chip... To put it another way, the 1GB chips get the full 56GB/s per chip and thus per GB (please stick with me here). The 2GB chips, if not used correctly, rather than getting the advertised 56GB/s to reach the total of 560 GB/s will get 28GB/s per GB in the worst case scenario . So you can't really advertise it as 560GB/s + 336 GB/s here. In the worst case scenario, you are talking about 280 GB/s + 336 GB/s. Now that is REALLY atrocious bandwidth.

Is the RAM split or not?
The reality is, that the RAM will work like a hybrid between a split and a unified RAM pool. What do I mean by that? It will work as a unified RAM pool in the sense that both the GPU and the CPU will have access to all the data on all the 16GB. However, it will work as a split RAM pool in terms of data allocation. There will have to be two priority levels in the 2GB RAM chips. Only when there are no 1st priority calls on the RAM, can the 2nd priority be executed. So whatever uses high bandwidth (like textures) will need to be given 1st priority, and whatever uses low bandwidth, can go into 2nd priority.

It's getting more complicated...
And sadly, once again it is not that simple either... Because if you need something right now on screen that is low bandwidth, and it is set to low priority, you will get pop-ins for example. Or if you allocate all sound to the low priority section, then you'll get weird sound delays etc... Yes. That is quite complicated... If it is like that, I can see why we are getting many developers saying are liking the PS5 more. It's simply much simpler. Despite the power of the XSX, it will require some creativity to learn and work with the RAM system of the XSX. If it really is like this, it's actually possible (if not inevitable) that initially we see PS5 games looking better than XSX games, unless they decide to keep all RAM usage under 10GB for both for ease of development. I don't think it will actually work that way. I certainly hope it doesn't...

Free cache lesson for you. It's relevant, I promise
The best way to really do it is that you use the 2GB RAM chips as a sort of L1 and L2 cache. I don't know if people here know how cache works... I'll try and explain it shortly...
Say you have a processor, and the processor has two levels of cache. The first level L1 can store two letters, and the second level can store four letters (typically, the L2 cache is larger, but for the XSX RAM it would be smaller). Cache basically saves the most frequently used data in it for data access. The 'closer' the cache to the CPU, the faster it is.
In the beginning, the cache is empty, and as the processor does jobs, it fills the caches and changes the data in the caches accordingly.
Now imagine I am typing a long word, like pneumonoultramicroscopicsilicovolcanoconiosis (yeah that's the longest word in the english language lol). The CPU has nothing in cache in the beginning, but it doesn't know that. It checks L1, no data. Then checks L2, no data. Then Checks RAM, no data. Ultimately it arrives at the storage device, and copies all the used letters of the program into the caches and the RAM, in order. The letter that is the most common will be saved in L1. The second most common letter will also be saved in L1. Now L1 is full. The 3rd, 4th, 5th, and 6th most common letters will be saved in L2. The rest are in RAM. Next time if I type that word, the CPU can read much of the data from L1, then L2, then the RAM. It will do it much faster than before.

So if we go letter for letter, first the L1 and L2 caches will look like this;
L1 [p,n]
L2 [e,u,m,o]

Now as we type further, things start to change... As we type pneumono, the o and the n become the most common, so p is 'downgraded' to L2, and o is added to L1. n stays;
L1 [n, o]
L2 [e,u,m,p]

As we type ultra, u has been used as often as the other letters, but L1 is full, so it stays in L2, and everything stays the same. And so on and so on. When a letter has been used 3 times it will shift down one of the ones in L1 towards L2, and since L2 is full, the least common letter will be shifted to RAM if it's not already there.

Enough caches. I want some more RAM sweetness!
So that was the short lesson on caches... Going back to the XSX RAM... If they let it work like cache, then everything is allocated to the 10GB first as data is accessed. When there's a data lookup, it will always look in the 10GB first. So, that means the 1GB ram chips and the first 'tier' of the 2GB chips will have priority. Only when the required data is not found there, the lookup will take place in the 2nd tier of the 2GB chips. If done this way, the bandwidth will not interfere with each other, and realistically give you 56GB/s per GB in the 2GB RAM chips also, for both tiers. Now, the 560 GB/s is practically guaranteed, and so is the 360 GB/s.
If it works that way, it's actually a really smart design... And the XSX will have practically zero issues with RAM allocation. Then the XSX will truly have a great bandwidth advantage over the PS5. This is more likely the solution that MS came up with. Having developers manually tune it would be a nightmare. You might have to be a bit more careful with RAM than the PS5, but it would not be a huge issue.


That indeed gives twice the bandwidth compared to the PS5 per GB. They probably found that the GPU would be bottlenecked by the bandwidth otherwise.
In other words, the XSX can sacrifice amount of RAM for increased bandwidth. But it's not an either/or per game, but rather per moment of gameplay. The full RAM pool can be used at will, at the cost of bandwidth. It's actually quite a smart design.
It kind of makes me wonder if Sony ran into the same bandwidth issue, and rather than finding a way around it, they designed the GPU around that bandwidth instead, which explains its weaker GPU implementation.

Damn, that's a really good analysis you wrote up there. Kinda mad I didn't see the thread earlier xD. And it's quite technical too but still easy enough to follow (IMO). Good stuff!

I also really like the cache analogy because it seems like a pretty common-sense approach to managing the two memory pools the way XSX has it. Alleviating devs of a lot of the work on their, though not 100% of the work (as I also agree it'll come with a learning curve and can see PS5 being easier to work with due to not having this type of memory setup. Though as you hint towards, it's not like devs haven't dealt with these let alone way more complicated memory pool setups in consoles prior), and just seeming like something MS would want to take into consideration and implement in a way the OS takes care of.

As for Sony and PS5, well IIRC there is the Oberon C0/E0 revision that made changes to the memory controller to support 512 GB/s of bandwidth, but as we know it now, the system is utilizing 448 GB/s bandwidth. There's a reason they made a revision for supporting 512 GB/s bandwidth, and we know it was most likely the Japanese side wanting to keep costs down why they've gone with slower GDDR6 modules. I'd even speculate that a good deal of the SSD's design might've been in relation to addressing the potential bandwidth issues you allude to. And the SSD being as fast as it is will help in that regard, for sure, but at the end of the day it's still "just" an SSD, it's still just NAND and it's still "just" a transfer of data over PCIe 4.0.

Those things and volatile memory like GDDR6, especially at the bandwidth these systems have, are like apples to oranges. You've given me a few other things to think about as well, but it's probably gonna take a while to sort all of that out xD.

I don’t think size matters in this case, heh.
As we know it’s purpose and what it is restricted to.

But the point was alleviating this from the main memory means more bandwidth could be used elsewhere.

Edit: I suppose the greater question is how much memory this usually consumes.

Edit2: It is definitely an ongoing cost meaning it is continuously running as it’s a requirement for game streaming.

You could be right in that regard; even 32 MB is "good enough" and since Sony are making these at quantities of scale SSD manufacturers don't get to, means they secure the components more cheaply, so they can match or exceed some of the larger/higher-grade SSDs and their DRAM caches with equivalent amounts of SRAM cache or greater.

It's going to be interesting to see to what effect the main memory bandwidth is alleviated due to the SSD and how it can be applied to other processes. Although one thing regarding PS5 I'm wondering about is if the SSD power draw plays any impactful role in the variable frequency of the system's power budget allocation balancing. These kind of SSDs aren't necessarily light on power demands, especially at peak performance, and PS5's cooling solution will need to be suitable enough not just for the internal SSD but any external expansion 3rd-party SSDs are compatible with the system.

I've no doubt the cooling system will be sufficient for that, the question at what cost. Could be a hefty bite of the BOM all things considered.

Signature request addressed to Jim Ryan for the Playstation 5 online to be free again


These companies are NOT giving up all dem online revenue and profits :LOL:
 
Last edited:
Based on what?
Based on knowledge? XsX has one and only bandwidth in 560GB/s with splitted set of memory chips with different capacity.
If the GPU has already fully utilized the memory bandwidth, nothing else will appear. The only logical explanation in this asynchronous separation is to prevent the CPU from getting into the 4x1GB setup, because it will really hurts the bandwidth for the GPU.
 

Aceofspades

Banned
Now the XSX memory bandwidth magically became twice that of the PS5.

I swear, when people say that about the PS5 SSD and controller (without mental gymnastics, mind you), they are mocked.

True, for some people PS5 MUST be inferior at EVERYTHING. They ignore a 138% advantage in SSD by bringing weird calculations and spreading misinformation all over the web.

Hell look at the lengths OP went through just to justify the strange RAM setup of Series X. Mind you we had a Nvidia card released last year with similar setup and internet went crazy and its performance was negatively affected by this setup, Nvidia had to issue a formal apology and fixed it later. But hey...X fans take whatever Microsoft dumps at them as "BeSt ThInG EvAr"
 
Status
Not open for further replies.
Top Bottom