• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Xbox Series X’s BCPack Texture Compression Technique 'might be' better than the PS5’s Kraken

Panajev2001a

GAF's Pleasant Genius
I'm not so sure I missed on the calculation. I definitely know I messed up on something I said earlier in a pretty hilarious fashion that even I myself started laughing at, but I suspect my current understanding may be accurate now.

So here is my thinking.

If the game requests 14GB of texture data, Sampler Feedback Streaming's efficiency should cut that 14GB texture demand to a more efficient 5.6GB due to the 2.5x efficiency advantage.

And here comes the part where some think I messed up or double counted compression. Keep in mind what SFS has done initially is not considered "compression." Cutting the demand to 5.6GB of actual texture data isn't compression, that's just SFS intelligently informing the system of all it will need for the current scene.

One way to look at this is to say that SFS is telling the system it needs 5.6GB of texture data.
Another way to look at this is SFS is telling the system it needs 2.8GB of BCPack format input data decompressed into main memory.


The number for the calculation must be 2.8GB otherwise after decompression it's no longer 5.6GB of textures. If the number used for the calculation is 5.6GB then that's 11.2GB worth of textures after decompression, double what SFS suggested is needed.

That's why I don't think the below calculation in the quoted post by Rea can work 5.6 / 2.4 / 2, which would equal 1.16 seconds. That calculation is the Series X decompressing 11.2GB of textures into main memory, way more than what was called for.

5.6GB of data being decompressed is actually 11.2GB of texture data, not the 5.6GB of texture data that Sampler Feedback Streaming suggests is actually required.

So the calculation is actually 2.8GB / 2.4GB/s / 2 = 0.58 seconds.
Another way to do it is to get rid of the 2 at the end and simply do this 2.8GB / 4.8GB/s = 0.58 seconds.


What are people missing? Just because Sampler Feedback Streaming says 14GB of data isn't required and cuts it down 5.6GB of texture data, do not confuse that 5.6GB to be the COMPRESSED data size. The compressed data size for 5.6GB worth of textures using BCPack is lower still at 2.8GB. It only becomes 5.6GB of texture data after decompression.




Going off of my post above, I think people only think my calculation is wrong because they are using the wrong data point. People are using the end result rather than the compressed size of 5.6GB worth of textures, which with BCPack is 2.8GB.

Even with Cerny's 2GB / 5GB / 1.5 example. That data is only 2GB in its compressed form, but once it's decompressed into main memory, it's actually 3GB of data. 2GB * 1.5 compression ratio gets you 3GB. In the same Cerny example, the 5GB/s SSD actually becomes 7.5GB/s with compression. This is why if you do 2GB / 7.5GB/s you get the exact same 0.27 result.

This is my understanding of how it works. 5.6GB is what will be in RAM after decompression, but it shouldn't be confused to be the same as the data size in its 50% compressed form, 2.8GB.
Look at it from this angle, since we clarified the maths before already. If BCPACK compression yields a 2:1 compression ratio and lowers the number to 2.8 GB then you cannot transfer it in 1s.

2.8GB / 4.8GB/s —> this is wrong… 2.8 GB / 2.4 GB/s as 2.4 GB/s is the maximum SSD I/O speed which (if you take decompression into account becomes 4.8 GB/s on average). You are factoring the compression going from 5.6 GB to 2.8 GB and then you are applying the same factor to boost bandwidth again. You should also consider the effects of SFS/PRT+ after compression as you stream compressed data (that gets uncompressed by the I/O unit).

Counting all the multiplication factors in an optimal scenario you could transfer the following amount of data in 1s: 2.4 GB * 2 (BCPACK) * 2.5 (PRT/SFS) = 12 GB.
 
Last edited:

PaintTinJr

Member
Interesting conversation.


I'm pretty sure he's incorrect on what he means, rather than what he said.

Obviously if you make it per second, then you've taken latency out of the equation. He's mixing dimensions - a physics way of describing things - inconsistently with the problem being described.

The data will be compressed at a given compression ratio, not a given rate. The rate of decompression is dependent on the effectiveness of the decompression hardware and the hardware's latency characteristics - and in that case, the speed and latency of decompression is directly impacted by the data lines from the SSD feeding the decompression hardware.

The PS5's IO complex will decompress it faster than the Ryzen cores/VA and with less latency (going by the specs we have to compare), and still have time to do other things in the remaining time.
 

Thief1987

Member
That's why I don't think the below calculation in the quoted post by Rea can work 5.6 / 2.4 / 2
5.6/2.4/2 = 2.8/2.4 = 1.16667

2.8GB / 2.4GB/s / 2 = 0.58
Why you divide it by two when your 2.8gb is already compressed? Your 2 in the end would be compress ratio, which would be 1 in your case because data is already compressed. Rea used raw data figure, that's why in his example ratio was 2.
 
Last edited:

twilo99

Member
you mean like Sony with cold storage, 1440p, vrr support, and expansion bay working? At least one console maker released a fully operational hardware where only dev tools are behind, while other is still enabling things on the hardware side.

Yeah, what is up with that? Isn't vrr support part of RDNA2 or whatever? I don't get why its taking them so long to implement.

1440p makes zero sense as well, it can't be that hard to have it working.

The storage thing is really bizarre.

If the roles on this were reversed.... oh my
 

elliot5

Member
Yeah, what is up with that? Isn't vrr support part of RDNA2 or whatever? I don't get why its taking them so long to implement.

1440p makes zero sense as well, it can't be that hard to have it working.

The storage thing is really bizarre.

If the roles on this were reversed.... oh my
aftermarket SSDs that can match parity with the internal is prob hard / expensive thus no support yet until a product is mass produced and tested.

1440p cuts in to their 4K TV division sales. They want to push their TVs.

VRR isn't supported on their TVs yet, so there's no point in them updating it because again, they want to push their TVs. Allowing VRR would let someone buy an LG OLED and not a Sony.
 

Three

Member
Wait wait hold on, let me get this straight.
☑️So the Series consoles launched out of the gate with expandable storage solutions, and are offering more 3rd party options down the line

☑️Sony launches with less storage, and here we are Six months later with no viable compatible expandable storage options

❎Both MS and Sony promise more options soon, but only MS is wrong? Are they lying? I'm confused
He is saying both have external options for PS4/Xbox One games and there is little point of getting an SSD internal storage especially for xbox as no games require it at the moment, they are all xbox one games. He is saying both of them are saying to wait for something in the future.

The less storage is also offset by smaller game sizes on PS5 anyhow.
 
Look at it from this angle, since we clarified the maths before already. If BCPACK compression yields a 2:1 compression ratio and lowers the number to 2.8 GB then you cannot transfer it in 1s.

2.8GB / 4.8GB/s —> this is wrong… 2.8 GB / 2.4 GB/s as 2.4 GB/s is the maximum SSD I/O speed which (if you take decompression into account becomes 4.8 GB/s on average). You are factoring the compression going from 5.6 GB to 2.8 GB and then you are applying the same factor to boost bandwidth again. You should also consider the effects of SFS/PRT+ after compression as you stream compressed data (that gets uncompressed by the I/O unit).

Counting all the multiplication factors in an optimal scenario you could transfer the following amount of data in 1s: 2.4 GB * 2 (BCPACK) * 2.5 (PRT/SFS) = 12 GB.

I think where we are getting confused is the 5.6GB figure. This is what the game needs for textures in this hypothetical scenario, but it is by no means the amount of compressed data the decompressor will work on. If the decompressed result will be 5.6GB of textures, then the amount of data being decompressed to create that 5.6GB must be a 2.8GB block of data.

2.8GB divided by 2.4GB/s divided by 2 giving you 0.58 seconds

Any scenario that uses 5.6GB in the calculation is basically loading 11.2GB of textures when SFS made it so that only 5.6GB is required aka.

4GB of requested texture data is only 2GB compressed. 8GB of requested texture data is only 4GB compressed. Hence 5.6GB of texture data following the 2.4GB/s (raw) 4.8GB/s (compressed) rules would be transferred to main ram in only 0.58 seconds.
 

Panajev2001a

GAF's Pleasant Genius
I think where we are getting confused is the 5.6GB figure. This is what the game needs for textures in this hypothetical scenario, but it is by no means the amount of compressed data the decompressor will work on. If the decompressed result will be 5.6GB of textures, then the amount of data being decompressed to create that 5.6GB must be a 2.8GB block of data.

2.8GB divided by 2.4GB/s divided by 2 giving you 0.58 seconds

Any scenario that uses 5.6GB in the calculation is basically loading 11.2GB of textures when SFS made it so that only 5.6GB is required aka.

4GB of requested texture data is only 2GB compressed. 8GB of requested texture data is only 4GB compressed. Hence 5.6GB of texture data following the 2.4GB/s (raw) 4.8GB/s (compressed) rules would be transferred to main ram in only 0.58 seconds.
I am not getting confused getting confused: provided you have a compression ratio of 2:1 (meaning you amplify/inflate your data from compressed status to uncompressed by a factor of 2 after… enters the BCPACK decoder and exits it at 2x the rate), and provided you have a physical drive bandwidth of 2.4 GB/s you can only carry over 4.8 GB worth of data in 1 second and 2.4 GB in 0.5s.

Now, if you take PRT into account you can multiply those last two figures by 2-2.5x as that is the equivalent bandwidth a system downloading full textures instead of portions would need.

You seem to be thinking that the system can transfer at 2x the speed if you pass compressed data to it, but that does not make sense. They call it equivalent bandwidth for the SSD because they assume the data enters the decoder block from the SSD at 2.4 GB/s and then it gets amplified by the compression ratio factor (~2).

This is why they say that the drive has the equivalent compressed bandwidth of 4.8 GB/s because that is the bandwidth it would need if it were transferring uncompressed data.
 
Last edited:

jroc74

Phone reception is more important to me than human rights
It would have been pretty expensive for Sony to go above 1TB with their SSD. I understand why they did it since it was the only way to offer that speed at an affordable price. As for Sony not allowing any expandable memory options it's probably because they need to make sure the SSDs on the market offer the same experience. Microsoft uses a proprietary solution so it's easy for them to offer that option while with Sony it's alot more difficult given the nature of their SSD.

It's definitely coming in the future and will offer the same experience as the soldered drive but the question is when?

Edit: I prefer Sonys solution because it will give PS5 owners access to a wide variety of NvMes on the market at a cheaper price. But I do admit that Microsofts solution is more convenient even though you pay a higher price for it.

Edit 2: I actually thought about a way that Microsoft can easily allow Series owners to access off the shelf NVME market. All they need to do is release a shell that allows people to put approved NVMEs into it. Since the Series I/O isn't as fast as the PS5s those options should be even cheaper. I think it's a great idea for them to explore.
Agree. Each solution has pros n cons. As with damn near everything in life, lol. Sony's way will be cheaper, sooner and probably in the long run as prices go down. 7GB's drives are already $199. But...MS's way is Plug n Play which is alot more convenient.
Look at it from this angle, since we clarified the maths before already. If BCPACK compression yields a 2:1 compression ratio and lowers the number to 2.8 GB then you cannot transfer it in 1s.

2.8GB / 4.8GB/s —> this is wrong… 2.8 GB / 2.4 GB/s as 2.4 GB/s is the maximum SSD I/O speed which (if you take decompression into account becomes 4.8 GB/s on average). You are factoring the compression going from 5.6 GB to 2.8 GB and then you are applying the same factor to boost bandwidth again. You should also consider the effects of SFS/PRT+ after compression as you stream compressed data (that gets uncompressed by the I/O unit).

Counting all the multiplication factors in an optimal scenario you could transfer the following amount of data in 1s: 2.4 GB * 2 (BCPACK) * 2.5 (PRT/SFS) = 12 GB.
So, would 12GB/s be considered the best case scenario or the average scenario?
aftermarket SSDs that can match parity with the internal is prob hard / expensive thus no support yet until a product is mass produced and tested.

1440p cuts in to their 4K TV division sales. They want to push their TVs.

VRR isn't supported on their TVs yet, so there's no point in them updating it because again, they want to push their TVs. Allowing VRR would let someone buy an LG OLED and not a Sony.
Yup, NVMe's with 7GB/s just launched late last year. ( I think most of us assume thats gonna be the needed drive speed) Why are some folks acting like its a regular mechanical hard drive?
 
Agree. Each solution has pros n cons. As with damn near everything in life, lol. Sony's way will be cheaper, sooner and probably in the long run as prices go down. 7GB's drives are already $199. But...MS's way is Plug n Play which is alot more convenient.

Pretty much. My plan is to use a huge external for cold storage and playing PS4 games and then I'll just expand the internal once the price is low enough.

I understand that Microsofts solution is very convenient but I'm not so dumb that I'm incapable of installing an NVME into the PS5. It's a great solution for those that are not capable of doing those types of things.
 

Boglin

Member
Well I'm certainly confused at the confusion. From what we know, the Xbox features:
・An SSD capable of 2.4 GB/s (Raw) or 4.8 GB/s (2x compression with custom hardware decompression block).
・SFS which acts as a 2.5x multiplier so the 4.8 GB/s could perform like 12 GB/s.



So going step-by-step, here is what a game requesting 14 GB looks like:

Requested 14 GB -> 5.6 GB SFS. After which you have 4.8 GB/s speed using the compression block to load the 5.6 GB which takes 1.16 seconds.

Or

14GB requested -> 5.6 GB SFS = 2.8 GB of data after 2x compression. You have 2.4 GB/s raw speed to load the 2.8 GB of compressed data which, again, would take 1.16 seconds.
 

Dozer831

Neo Member
Sampler Feedback is a reactive process. Once a higher MIP level becomes visibly needed, then you send an IO request to stream in the higher MIP level. It would take in theory twice as long to stream in the higher texture on Xbox.
 
5.6/2.4/2 = 2.8/2.4 = 1.16667


Why you divide it by two when your 2.8gb is already compressed? Your 2 in the end would be compress ratio, which would be 1 in your case because data is already compressed. Rea used raw data figure, that's why in his example ratio was 2.
The divide by 2 is meant to produce the effect of the 2.4GB/s becoming 4.8GB/s after decompression, otherwise there's zero benefit from compression at all.

The 2 has nothing to do with the first number in the equation, and more to do with the SSD bandwidth. But there are multiple ways to arrive at the same number.

But the main point is if the game needs 5.6GB of textures, then surely the decompression unit isn't grabbing 5.6GB, it's grabbing 2.8GB, which once decompressed becomes 5.6GB of textures in main memory.

I'm fairly certain using the 5.6GB is the inaccurate way here because 5.6GB of compressed texture data when decompressed will become 11.2GB, which is not the amount we are after. It's 5.6GB.

Series X decompresses 11.2GB of texture data into memory in 1.16 seconds using Sampler Feedback Streaming and BCPack. But for just 5.6GB of texture data, that takes just 0.58 seconds.
I am not getting confused getting confused: provided you have a compression ratio of 2:1 (meaning you amplify/inflate your data from compressed status to uncompressed by a factor of 2 after… enters the BCPACK decoder and exits it at 2x the rate), and provided you have a physical drive bandwidth of 2.4 GB/s you can only carry over 4.8 GB worth of data in 1 second and 2.4 GB in 0.5s.

Now, if you take PRT into account you can multiply those last two figures by 2-2.5x as that is the equivalent bandwidth a system downloading full textures instead of portions would need.

You seem to be thinking that the system can transfer at 2x the speed if you pass compressed data to it, but that does not make sense. They call it equivalent bandwidth for the SSD because they assume the data enters the decoder block from the SSD at 2.4 GB/s and then it gets amplified by the compression ratio factor (~2).

This is why they say that the drive has the equivalent compressed bandwidth of 4.8 GB/s because that is the bandwidth it would need if it were transferring uncompressed data.

Okay, is not 4GB of textures compressed with BCPack equal to 2GB on disk based on the compression ratio? And isn't 3GB of textures equal to 1.5GB on disk compressed?
 
Well I'm certainly confused at the confusion. From what we know, the Xbox features:
・An SSD capable of 2.4 GB/s (Raw) or 4.8 GB/s (2x compression with custom hardware decompression block).
・SFS which acts as a 2.5x multiplier so the 4.8 GB/s could perform like 12 GB/s.



So going step-by-step, here is what a game requesting 14 GB looks like:

Requested 14 GB -> 5.6 GB SFS. After which you have 4.8 GB/s speed using the compression block to load the 5.6 GB which takes 1.16 seconds.

Or

14GB requested -> 5.6 GB SFS = 2.8 GB of data after 2x compression. You have 2.4 GB/s raw speed to load the 2.8 GB of compressed data which, again, would take 1.16 seconds.


In both scenarios you have the Series X's SSD sending 11.2GB of textures to main memory, not 5.6GB.

It seems you guys aren't actually factoring in the Series X's compression at all. You are stop one step before doing so by keeping the 5.6GB worth of textures in it's uncompressed form.

Let's tackle this a slightly different way. Tell me how long the PS5 would take to transfer 5.6GB worth of textures into main memory using a compression ratio of 1.6.
 

PaintTinJr

Member
Well I'm certainly confused at the confusion. From what we know, the Xbox features:
・An SSD capable of 2.4 GB/s (Raw) or 4.8 GB/s (2x compression with custom hardware decompression block).
・SFS which acts as a 2.5x multiplier so the 4.8 GB/s could perform like 12 GB/s.



So going step-by-step, here is what a game requesting 14 GB looks like:

Requested 14 GB -> 5.6 GB SFS. After which you have 4.8 GB/s speed using the compression block to load the 5.6 GB which takes 1.16 seconds.

Or

14GB requested -> 5.6 GB SFS = 2.8 GB of data after 2x compression. You have 2.4 GB/s raw speed to load the 2.8 GB of compressed data which, again, would take 1.16 seconds.
I think the biggest mess to unravel how the effectiveness of SFS/BCpack + VA + SSD RAW stacks up against PS5's solution, is that both SFS and BCpack's benefits have alternatives on PS5 with PRT and Oodle textures.

So the actual gain of SFS/BCpack isn't really noteworthy IMO - unless comparing to PC - because the PS5 may even have the advantage to SFS because of lower latency asset check-in/cache scrubbers. And Oodle textures is at least on par with BCpack for rate adaption compression - if not better, and both use the same underlying block compression anyway.

Which then just brings the comparison back to the decompression units and SSD raw speeds, which gives the PS5 much lower latency (x5, from Road to PS5 info and VA's info - indirectly via RTX I/O reveal) and at least double decompression bandwidth - but probably closer to 4times in real software going by John at DF's recent tweets of comparing 1 second to 4 second loading, or 2 seconds to 8 seconds.
 

Boglin

Member
In both scenarios you have the Series X's SSD sending 11.2GB of textures to main memory, not 5.6GB.

It seems you guys aren't actually factoring in the Series X's compression at all. You are stop one step before doing so by keeping the 5.6GB worth of textures in it's uncompressed form.

Let's tackle this a slightly different way. Tell me how long the PS5 would take to transfer 5.6GB worth of textures into main memory using a compression ratio of 1.6.
You're wrong. We are factoring compression, but you're factoring in compression twice for 4x compression for some reason.

Lets ignore SFS for a moment.

You want to load 2.8GB of compressed data and the SSD loads 2.4GB raw. That takes 1.16 seconds, correct?

That 2.8 GB of data decompresses to 5.6 GB.
So that equals 1.16 seconds to load 5.6GB of data after it is uncompressed.

I just demonstrated 2x compression.
After decompression, the 2.4 GB/s loaded in 5.6 GB of data in 1.16 seconds.
Without compression it would have taken 2.33 seconds. Twice as long.


You're saying the 2.4GB/s turns into 11.2GB after decompression. Lets put that to rest right now.

11.2 GB / 4 = 2.8 GB. You would need 4x compression for the numbers you are getting.
 
Last edited:

Boglin

Member
Let's tackle this a slightly different way. Tell me how long the PS5 would take to transfer 5.6GB worth of textures into main memory using a compression ratio of 1.6.
Your math for Xbox is:
2.8GB(5.6GB compressed by 2x)/ 2.4GB/s (raw) / 2 (4.8/2.4 = compressed rate of 2x) = 0.58

Using your math. The PS5
3.5GB(5.6GB compressed by 1.6x with no oodle) / 5.5GB/s(raw) / 1.63 (9/5.5 = compressed rate of 1.63x with no oodle) = 0.39

*The above math doesn't represent reality, it was an answer to a question.
 
Last edited:

PaintTinJr

Member
Definitely not how it works. We have a real-time demo showing otherwise.
I wasn't sure exactly what Dozer831 Dozer831 meant by take twice as long, but they are definitely correct about SFS being reactive, with the sampler locations being feed back to bring higher quality (PRT) mip texels into memory - late - and blended by the SFS hardware units to hide the late-ness.

In theory, the late-ness could be twice as long because of the latency of the feedback request being completed later than the bandwidth suggests, because bandwidth isn't the only limiting factor. The demo doesn't exactly represent real-world game loading conditions, where latency is worse because of contention for bandwidth, and bandwidth is reduced, because of bandwidth contention of other requests being completed by other processors - and without excess headroom for raw bandwidth or decompressed, will likely fair worse than a system with significantly more raw and decompression unit headroom, and latency alleviating capabilities.
 
Last edited:

PaintTinJr

Member
In both scenarios you have the Series X's SSD sending 11.2GB of textures to main memory, not 5.6GB.

...
I think this is where you are getting confused. It sends the PRT reduced data, that would be equivalent of 11.2GB of standard BC5/DXT5 texture data if it was loading fully resident texture mipmaps, for the geometry being texture mapped.
 
You're wrong. We are factoring compression, but you're factoring in compression twice for 4x compression for some reason.

Lets ignore SFS for a moment.

You want to load 2.8GB of compressed data and the SSD loads 2.4GB raw. That takes 1.16 seconds, correct?

That 2.8 GB of data decompresses to 5.6 GB.
So that equals 1.16 seconds to load 5.6GB of data after it is uncompressed.

I just demonstrated 2x compression.
After decompression, the 2.4 GB/s loaded in 5.6 GB of data in 1.16 seconds.
Without compression it would have taken 2.33 seconds. Twice as long.


You're saying the 2.4GB/s turns into 11.2GB after decompression. Lets put that to rest right now.

11.2 GB / 4 = 2.8 GB. You would need 4x compression for the numbers you are getting.

Correction, you guys are indeed applying compression, but just to the wrong number of Gigabytes.

To get 5.6GB of texture data into Series X main memory it does not require 5.6GB of compressed data off the SSD. It requires only 2.8GB worth of compressed data off the SSD.

I applied compression only once, but you guys don't appear to be factoring at all what the proper size of the textures are in their compressed form when on the SSD BEFORE decompression.
 

Boglin

Member
Correction, you guys are indeed applying compression, but just to the wrong number of Gigabytes.

To get 5.6GB of texture data into Series X main memory it does not require 5.6GB of compressed data off the SSD. It requires only 2.8GB worth of compressed data off the SSD.

I applied compression only once, but you guys don't appear to be factoring at all what the proper size of the textures are in their compressed form when on the SSD BEFORE decompression.
I don't know how else to put it so if my next paragraph doesn't clear it up then we should agree to disagree.

If you agree that the Xbox has 2x compression then 2.8GB on the SSD will decompress to 5.6GB. You cannot decompress it again from 5.6GB to 11.2GB.

In order for you to not have double compression, your algorithm has to be either:
2.8GB(compressed) / 2.4GB/s = 1.16
Or
5.6Gb/2.4GB/s/2(compression) =1.16

What you're saying is you have 2.8(compressed)GB/2.4GB/s/2(more compression) = 0.58

I don't know where the confusion lies but you are double compressing.

I honestly think you're just having a brain fart which happens to the best of us.
 
Last edited:

jroc74

Phone reception is more important to me than human rights
Every once in awhile it seems old talking points come back to the surface. I thought I somewhat had a general idea about this but I'll just leave this here so some smarter minds can digest it:




Sony has previously published that the SSD is capable of 5.5 GB/s and expected decompressed bandwidth around 8-9 GB/s, based on measurements of average compression ratios of games around 1.5 to 1. While Kraken is an excellent generic compressor, it struggled to find usable patterns on a crucial type of content : GPU textures, which make up a large fraction of game content. Since then we've made huge progress on improving the compression ratio of GPU textures, with Oodle Texture which encodes them such that subsequent Kraken compression can find patterns it can exploit. The result is that we expect the average compression ratio of games to be much better in the future, closer to 2 to 1.
Since then, Sony has licensed our new technology Oodle Texture for all games on the PS4 and PS5. Oodle Texture lets games encode their textures so that they are drastically more compressible by Kraken, but with high visual quality . Textures often make up the majority of content of large games and prior to Oodle Texture were difficult to compress for general purpose compressors like Kraken.

Code:
Zip    1.64 to 1
Kraken    1.82 to 1
Zip + Oodle Texture    2.69 to 1
Kraken + Oodle Texture    3.16 to 1

This is why I asked would 12GB/s be considered the average or best case scenario for Series consoles.....because there is a best case for PS5, Cerny mentioned 22GB/s.

In my mind its gonna be similar to this: (GB's; raw/average compressed/best case scenario)

Series consoles:
2.4, 4.8, over 6 or up to 12

PS5:
5.5, 8-9 or 17, 22. I have seen the 17 number before (8-9 is official PS5 specs; I assume 17 is 5.5 x Kraken + Oodle and based on the Kraken dev article) ) ..but Cerny mentioned "capable of outputting as much as 22GB/s if the data happened to compress particularly well". So....anyone care to try to break this down in relation to this topic?
 
Last edited:
I wasn't sure exactly what Dozer831 Dozer831 meant by take twice as long, but they are definitely correct about SFS being reactive, with the sampler locations being feed back to bring higher quality (PRT) mip texels into memory - late - and blended by the SFS hardware units to hide the late-ness.

In theory, the late-ness could be twice as long because of the latency of the feedback request being completed later than the bandwidth suggests, because bandwidth isn't the only limiting factor. The demo doesn't exactly represent real-world game loading conditions, where latency is worse because of contention for bandwidth, and bandwidth is reduced, because of bandwidth contention of other requests being completed by other processors - and without excess headroom for raw bandwidth or decompressed, will likely fair worse than a system with significantly more raw and decompression unit headroom, and latency alleviating capabilities.

Very well said, but why is everybody assuming Xbox Series X possesses no or lackluster latency alleviating capabilities? Microsoft in the demo actually confirm that the multiplier and speed bear out all the same in an actual complex title also. Why do we assume DirectStorage isn't backed up, supported by real hardware? DirectStorage, for example, is what controls the built in decompression unit of Series X

  • New DirectStorage API: Standard File I/O APIs were developed more than 30 years ago and are virtually unchanged while storage technology has made significant advancements since then. As we analyzed game data access patterns as well as the latest hardware advancements with SSD technology, we knew we needed to advance the state of the art to put more control in the hands of developers. We added a brand new DirectStorage API to the DirectX family, providing developers with fine grain control of their I/O operations empowering them to establish multiple I/O queues, prioritization and minimizing I/O latency. These direct, low level access APIs ensure developers will be able to take full advantage of the raw I/O performance afforded by the hardware, resulting in virtually eliminating load times or fast travel systems that are just that . . . fast.

The constant mention of reducing overhead are in fact latency removing mechanisms in place in the hardware that directstorage manages. Sony has their own, but simply hasn't revealed the public name of it.
Existing APIs require the application to manage and handle each of these requests one at a time first by submitting the request, waiting for it to complete, and then handling its completion. The overhead of each request is not very large and wasn’t a choke point for older games running on slower hard drives, but multiplied tens of thousands of times per second, IO overhead can quickly become too expensive preventing games from being able to take advantage of the increased NVMe drive bandwidths.

On top of that, many of these assets are compressed. In order to be used by the CPU or GPU, they must first be decompressed. A game can pull as much data off the disk as it wants, but you still need an efficient way to decompress and get it to the GPU for rendering. By using DirectStorage, your games are able to leverage the best current and upcoming decompression technologies.

In a world where a game knows it needs to load and decompress thousands of blocks for the next frame, the one-at-a-time model results in loss of efficiency at various points in the data block’s journey. The DirectStorage API is architected in a way that takes all this into account and maximizes performance throughout the entire pipeline from NVMe drive all the way to the GPU.

It does this in several ways: by reducing per-request NVMe overhead, enabling batched many-at-a-time parallel IO requests which can be efficiently fed to the GPU, and giving games finer grain control over when they get notified of IO request completion instead of having to react to every tiny IO completion.

In this way, developers are given an extremely efficient way to submit/handle many orders of magnitude more IO requests than ever before ultimately minimizing the time you wait to get in game, and bringing you larger, more detailed virtual worlds that load in as fast as your game character can move through it.



Why NVMe?

NVMe devices are not only extremely high bandwidth SSD based devices, but they also have hardware data access pipes called NVMe queues which are particularly suited to gaming workloads. To get data off the drive, an OS submits a request to the drive and data is delivered to the app via these queues. An NVMe device can have multiple queues and each queue can contain many requests at a time. This is a perfect match to the parallel and batched nature of modern gaming workloads. The DirectStorage programming model essentially gives developers direct control over that highly optimized hardware.

In addition, existing storage APIs also incur a lot of ‘extra steps’ between an application making an IO request and the request being fulfilled by the storage device, resulting in unnecessary request overhead. These extra steps can be things like data transformations needed during certain parts of normal IO operation. However, these steps aren’t required for every IO request on every NVMe drive on every gaming machine. With a supported NVMe drive and properly configured gaming machine, DirectStorage will be able to detect up front that these extra steps are not required and skip all the necessary checks/operations making every IO request cheaper to fulfill.
 
  • Thoughtful
Reactions: Rea
You're wrong. We are factoring compression, but you're factoring in compression twice for 4x compression for some reason.

Lets ignore SFS for a moment.

You want to load 2.8GB of compressed data and the SSD loads 2.4GB raw. That takes 1.16 seconds, correct?

That 2.8 GB of data decompresses to 5.6 GB.
So that equals 1.16 seconds to load 5.6GB of data after it is uncompressed.

I just demonstrated 2x compression.
After decompression, the 2.4 GB/s loaded in 5.6 GB of data in 1.16 seconds.
Without compression it would have taken 2.33 seconds. Twice as long.


You're saying the 2.4GB/s turns into 11.2GB after decompression. Lets put that to rest right now.

11.2 GB / 4 = 2.8 GB. You would need 4x compression for the numbers you are getting.
There's COMPRESSION, and then there SFS on top of it which acts as a multiplier.

Very simple.

SSD and I/O speed:
2.4GB(raw) = 4.8GB(uncompressed) per second


Game requirement = 9.6GB:
Without SFS - 9.6GB of data (4.8GB/s) = 2 seconds
With SFS - 4.8GB of data (4.8GB/s) = 1 second


Throughput PER SECOND:
2x multiplier with SFS = 9.6GB effective per second
2.5x multiplier with SFS = 12GB effective per second
 

Boglin

Member
There's COMPRESSION, and then there SFS on top of it which acts as a multiplier.

Very simple.

SSD and I/O speed:
2.4GB(raw) = 4.8GB(uncompressed) per second


Game requirement = 9.6GB:
Without SFS - 9.6GB of data (4.8GB/s) = 2 seconds
With SFS - 4.8GB of data (4.8GB/s) = 1 second


Throughput PER SECOND:
2x multiplier with SFS = 9.6GB effective per second
2.5x multiplier with SFS = 12GB effective per second

I don't disagree with anything you're saying and nothing I said contradicts it. I was discussing compression alone due to some confusion happening.

Lets ignore SFS for a moment.
 

PaintTinJr

Member
Very well said, but why is everybody assuming Xbox Series X possesses no or lackluster latency alleviating capabilities? Microsoft in the demo actually confirm that the multiplier and speed bear out all the same in an actual complex title also. Why do we assume DirectStorage isn't backed up, supported by real hardware? DirectStorage, for example, is what controls the built in decompression unit of Series X



The constant mention of reducing overhead are in fact latency removing mechanisms in place in the hardware that directstorage manages. Sony has their own, but simply hasn't revealed the public name of it.
IIRC DirectStorage/VA by Xbox is essentially a solution shared with nvidia's RTX I/O add-on card for the RTX 3xxx series GPUs, and the latency reduction isn't 100x versus old interfaces - like PS5 is versus Ps4 - but just a 20x times improvement at best.

The I/O complex in the PS5 is a lot of hardware just for IO, and even Tim schooled that youtube influencer that tried to imply PC could get close to PS5 SSD bandwidth and check-in latency. Carmack even temporarily made a comment regarding bypassing kernel mapped memory (IIRC) on PC to reduce latency - before probably being given a PS5 devkit or a phone call from Tim, and quietly said no more - suggesting that PS5's IO is the paradigm shift that PS5 owners mostly think it is/will be.
 

BeardGawd

Banned
While I feel MS has the more intriguing solution with SFS it can't be denied the PS5 absolutely has the superior I/O solution.

But there are several things that aren't really taken into account. For one how much RAM does the PS5 reserve for it's OS? If the PS5 uses 1 or 2 GBs more RAM for it's OS that leaves more room on the Xbox to buffer data. The UE5 demo on PS5 was only using 768MBs of RAM for streaming data. That space could be doubled on Series X to compensate for the slower I/O for instance. Both solutions are orders of magnitude better than last gen. I doubt many devs will even get close to the limits.
 

Gurney

Neo Member
This is why I asked would 12GB/s be considered the average or best case scenario for Series consoles.....because there is a best case for PS5, Cerny mentioned 22GB/s.

Xbox S|XPS5
RAW2.4GB/s5.5GB/s
Compressed Max~6GB/s~22GB/s

RAW is the maximum bandwidth of the SSD.

Compressed Max is the maximum decompression speed of the hardware decompression chip.

So the 22GB/s figure needs to be compared to the 6GB/s figure.



If you want to compare the 12GB/s "effective" bandwidth of the Xbox you would first have to figure out how effective PRT+ is on the PS5.
 

jroc74

Phone reception is more important to me than human rights
And skimming the thread...I see the Kraken Oodle stuff was already posted.... numerous times. lol
 
I don't know how else to put it so if my next paragraph doesn't clearvit it up then we should agree to disagree.

If you agree that the Xbox has 2x compression then 2.8GB on the SSD will decompress to 5.6GB. You cannot decompress it again from 5.6GB to 11.2GB.

In order for you to not have double compression, your algorithm has to be either:
2.8GB(compressed) / 2.4GB/s = 1.16
Or
5.6Gb/2.4GB/s/2(compression) =1.16

You have 2.8(compressed) Gb/2.4GB/s/2(more compression) = 0.58

I don't know where the confusion lies but you are double compressing.

I honestly think you're just having a brain fart which happens to the best of us.

Hmm, I might finally see what you're saying now. I was mostly stuck on why you guys insisted on only using the 5.6GB figure rather than the 2.8GB figure. I wasn't viewing the divide by two at the end as extra compression. I literally just now caught it and realize where I personally got confused. Despite me pointing out these figures to highlight why I think Sampler Feedback Streaming is such a big deal, I quite literally got distracted on this 5.6GB vs 2.8GB stuff that I lost track of the point I was trying to make in the first place: Sampler Feedback Streaming is making possible a scenario in which the effective equivalent of 14GB of textures are loaded into main memory in just 1.16 seconds, but only needing to fill main memory with 5.6GB of data in order to get the job done, a massive memory savings.

That's what I meant to get at the entire time, Sampler Feedback Streaming's memory efficiency. I got lost on the 2.8GB vs 5.6GB stuff. I acknowledge, though, that I was indeed wrong with my calculations. Turns out I really was double compressing, and in doing so totally forgetting my main point about SFS.

I used the huge 14GB (exceeding Series X usable game memory) as example to make a larger point about SFS, but lost that point in my mistake in believing the 5.6GB of texture data was moved into main memory in just 0.58 seconds, but I was wrong the whole time. It really is 1.16, just like you guys said. The point I was getting at, but lost track of was now if a title requested 5.6GB worth of textures (not 14GB this time) SFS's 2.5x efficiency would turn that into 2.24GB of data, which Series X would load into main memory in just 0.46 seconds. I began thinking of the 5.6GB as an actual 5.6GB, but now as part of a larger effective figure, which was the whole point for me at least.

Thank you guys for being patient with my ass lol. It wasn't easy I'm sure.

Richard Pryor Reaction GIF
 

PaintTinJr

Member
Every once in awhile it seems old talking points come back to the surface. I thought I somewhat had a general idea about this but I'll just leave this here so some smarter minds can digest it:







Code:
Zip    1.64 to 1
Kraken    1.82 to 1
Zip + Oodle Texture    2.69 to 1
Kraken + Oodle Texture    3.16 to 1

This is why I asked would 12GB/s be considered the average or best case scenario for Series consoles.....because there is a best case for PS5, Cerny mentioned 22GB/s.

In my mind its gonna be similar to this: (GB's; raw/average compressed/best case scenario)

Series consoles:
2.4, 4.8, over 6 or up to 12

PS5:
5.5, 8-9 or 17, 22. I have seen the 17 number before (8-9 is official PS5 specs; I assume 17 is 5.5 x Kraken + Oodle and based on the Kraken dev article) ) ..but Cerny mentioned "capable of outputting as much as 22GB/s if the data happened to compress particularly well". So....anyone care to try to break this down in relation to this topic?

All textures already use DXT5/BC5 compression(or 3dc, etc) at fixed compression ratios, and have done for generations, but both BCpack, and Oodle textures do rate adaptation, which is just a fancy way of saying they take the DXT/BC encoders and provide alternative lossy encodings of the textures to help kraken/zlib losslessly compress those textures better.

The main problem is that zlib(and enhanced zlib compressor kraken) use block encoding, which means they aren't always so good at taking DXT/BC (block encoded too, but lossy assets) and compressing them further.

By using rate adaption you can tweak the quality of the lossy texture encoding, marginally and potentially find bigger zlib compression gains. The downside being that searching for these gains is computationally time consuming generating new DXT/BC texture blocks from RAW, and then using zlib/kraken to generate a compressed version to workout the compression gain, and you may still need a human to choose and trade off ratio against acceptable quality loss for the lossy texture because signal to noise ratio might not be correct for the context in which the texture is used.
 
You should read whole discussion from the beginning, because SenjutsuSage SenjutsuSage actually transferred 14GB of data in just 0.58 sec. And we trying to explain him that he is wrong.

I meant "effective" 14GB, not actual. I lost the overall point in us disagreeing over the 5.6GB vs 2.8GB. I was just stuck on those damn numbers, and lost what I was trying to say altogether. :messenger_grinning_sweat:

I'm man enough to admit when I was wrong. I was wrong lol.
 
Threads like this being out the worst parts of the forum sometimes.

Not really, I would say this was the forum at its best. Respectful disagreement and patience even when I was completely off base and insistent on a specific point. We ended on respectful terms, and I personally respect each person I interacted with all the more come the end. Plus I admitted I was in the wrong. This is how discussion should be on here when we disagree.
 

Boglin

Member
An effective 14GB in 1.16 seconds is still ridiculously fast. I really hope multiplatform games take advantage of these new I/O systems because it would be a shame if it were only regulated to first party stuff
 

LiquidMetal14

hide your water-based mammals
Not really, I would say this was the forum at its best. Respectful disagreement and patience even when I was completely off base and insistent on a specific point. We ended on respectful terms, and I personally respect each person I interacted with all the more come the end. Plus I admitted I was in the wrong. This is how discussion should be on here when we disagree.
I've been at this long enough to have too high a standard of which less drive by posting is tolerated and if it's conductive to the conversation then great.

If it wasn't the case we wouldn't have had a mod post about it earlier to settle things down.

I only read and process what I see. I'm sure others can observe it all and stay above it as can I but you still have to sift through it to get to the good posts most of the time.
 
I don't disagree with anything you're saying and nothing I said contradicts it. I was discussing compression alone due to some confusion happening.

Ah yes, my mistake. Sorry about that! I read the entire conversation. Yea, he's wrong.

Correction, you guys are indeed applying compression, but just to the wrong number of Gigabytes.

To get 5.6GB of texture data into Series X main memory it does not require 5.6GB of compressed data off the SSD. It requires only 2.8GB worth of compressed data off the SSD.

I applied compression only once, but you guys don't appear to be factoring at all what the proper size of the textures are in their compressed form when on the SSD BEFORE decompression.

You applied a 2x compression two times. The raw speed of the drive is the raw speed... you do a single decompression into memory, anything before that is raw data.

Simple Form
11.2 = 5.6 raw data from SSD
Series X SSD = 2.4GB/s raw
5.6/2.4 = 2.3 seconds



Without SFS:
11.2GB required by the game (decompressed)
11.2GB decompressed = 5.6GB raw
5.6GB(SSD) > 11.2GB(RAM)
To transfer 5.6GB raw from the SSD
5.6/2.4(raw drive speed) = 2.3 seconds

With SFS:

5.6GB required by the game (decompressed)
5.6GB decompressed = 2.8GB raw
2.8GB(SSD) > 5.6GB(RAM)
To transfer 2.8GB raw from the SSD
2.8/2.4(raw drive speed) = 1.16 seconds
 
An effective 14GB in 1.16 seconds is still ridiculously fast. I really hope multiplatform games take advantage of these new I/O systems because it would be a shame if it were only regulated to first party stuff

It'll get used once they transition to purely next gen development. Lots of crazy potential for this stuff. And, as is always the case, Microsoft needs to show it in the games for Xbox because Sony is certainly showing it for PS5.
 
Ah yes, my mistake. Sorry about that! I read the entire conversation. Yea, he's wrong.



You applied a 2x compression two times. The raw speed of the drive is the raw speed... you do a single decompression into memory, anything before that is raw data.

Simple Form
11.2 = 5.6 raw data from SSD
Series X SSD = 2.4GB/s raw
5.6/2.4 = 2.3 seconds



Without SFS:
11.2GB required by the game (decompressed)
11.2GB decompressed = 5.6GB raw
5.6GB(SSD) > 11.2GB(RAM)
To transfer 5.6GB raw from the SSD
5.6/2.4(raw drive speed) = 2.3 seconds

With SFS:

5.6GB required by the game (decompressed)
5.6GB decompressed = 2.8GB raw
2.8GB(SSD) > 5.6GB(RAM)
To transfer 2.8GB raw from the SSD
2.8/2.4(raw drive speed) = 1.16 seconds

Yep, correct on the 1.16 figure. I was totally wrong there. I got mixed up thinking about something else. As to the without SFS side, we started from a 14GB figure, but I get what you mean.

Without SFS such a texture load couldn't even be attempted is my thinking.
 

BeardGawd

Banned
The max amount of textures Series X|S can pull in is 4.8GB of compressed textures. Due to the efficiency of SFS those 4.8GBs of textures are equivalent to 14.4GBs of textures without SFS.

SFS gives on average 3x efficiency.
 

Rea

Member
Going off of my post above, I think people only think my calculation is wrong because they are using the wrong data point. People are using the end result rather than the compressed size of 5.6GB worth of textures, which with BCPack is 2.8GB.
1st, your understanding of Compression ratio is wrong, lets say BC pack has 50% Compression efficiency that means the ratio is 1.5 to 1. So after decompressing bcpack format texture of 2.6 ×1.5 = 3.9, not 5.6gb.
2nd, when SFS request data from SSD, the data in SSD is Always compressed (Raw data). So you cannot use the 4.8gb/s speed. That's after applying the Compression ratio of Zlib+bcpack, it is not only for BCpack.
 
Top Bottom