• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Xbox Series X’s BCPack Texture Compression Technique 'might be' better than the PS5’s Kraken

Dozer831

Neo Member
Yes, the GPU could brute force through the data but textures etc must be processed, applied to polygons, shaded, .....
It just seems that some in this forum think that the data must just be loaded and is ready to display. Games are not movies where this is possible. Here we have just raw data that must be processed.

The other thing is really memory contention. The good old jaguar cores did less than 20GB/s in normal cases but stole much more bandwidth the GPU could need. The HDD in the old systems was not capable enough to really make a bandwidth difference. But now the SSD is one of the "big players" that can get bandwidth hungry quite fast. But therefore you save much memory space. Always depends on what you want to do. 10GB/s Bandwidth can do a lot of stuff. A few gigs of saved texture buffer also.

Don't get me wrong, this topic is ... well not really the best, as it is clear that PS5 has the edge in theory. But in real life applications it won't make that much of a difference. That is my whole point. And I only answered here, because someone wanted to make a really unrealistic special case claim. That is just not how the whole thing works. Console games won't be IO limited anymore and that is what all current-gen machines deliver.
Btw, we should also not forget, the diminishing returns you get from even higher asset quality. At some point it doesn't really matter how much higher res some textures are, because of diminishing returns. We already saw that with the last 2 generations. Games looked better, for sure, but the steps get smaller and smaller.
 
Last edited:

Thief1987

Member
Next post, I added "in a meaningful way" ;)
Yes, the GPU could brute force through the data but textures etc must be processed, applied to polygons, shaded, .....
It just seems that some in this forum think that the data must just be loaded and is ready to display. Games are not movies where this is possible. Here we have just raw data that must be processed.

The other thing is really memory contention. The good old jaguar cores did less than 20GB/s in normal cases but stole much more bandwidth the GPU could need. The HDD in the old systems was not capable enough to really make a bandwidth difference. But now the SSD is one of the "big players" that can get bandwidth hungry quite fast. But therefore you save much memory space. Always depends on what you want to do. 10GB/s Bandwidth can do a lot of stuff. A few gigs of saved texture buffer also.

Don't get me wrong, this topic is ... well not really the best, as it is clear that PS5 has the edge in theory. But in real life applications it won't make that much of a difference. That is my whole point. And I only answered here, because someone wanted to make a really unrealistic special case claim. That is just not how the whole thing works. Console games won't be IO limited anymore and that is what all current-gen machines deliver.
Btw, we should also not forget, the diminishing returns you get from even higher asset quality. At some point it doesn't really matter how much higher res some textures are, because of diminishing returns. We already saw that with the last 2 generations. Games looked better, for sure, but the steps get smaller and smaller.
GPU might be constrained if it will operate with tens or even hundreds GB of texture data at a time, but in case of consoles we have only 12GB of VRAM max, much less in reallife scenarios because there are much more data in RAM than just textures. I don't know how you can call yourself software developer and spelling this bullshit.
 
Last edited:
  • Like
Reactions: Rea
My best guess is that the average is around 4.8/s considering compression ratio of 2 when combining Zlib+BCpack. For some best cases it will be 2.5 compression ratio, so 2.4×2.5=6. We don't know exactly what's Compression efficiency of BCpack. My personal guess is no more than 3. So it can probably hit 2.4x3 = 7.2gb/s.
So basically you're guessing?
 
Last edited:

MonarchJT

Banned
as leviathan said on twitter ..the thing is There is a limit of data size and assets produced for the game which makes, at least for this gen, both subsystems overkill. No matter how much pr there behind it
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
as leviathan said on twitter ..the thing is There is a limit of data size and assets produced for the game which makes, at least for this gen, both subsystems overkill. No matter how much pr there behind it
Again, that was a flawed argument and as partisan smelling as saying that anything beyond 10 TFLOPS is in and of itself overkill.
 

MonarchJT

Banned
Again, that was a flawed argument and as partisan smelling as saying that anything beyond 10 TFLOPS is in and of itself overkill.
then one day maybe you will explain to me how many seconds you intend to saturate an i / o capable of 22gb / s considering the average for this gen of 100gb per game (and we both know that it will be the + or - half the average). amaze me
 
Last edited:

Dr Bass

Member
Educated guesses based on information available from multiple channels (including tech presentations on either) is fair game, not sure why the pearl clutching here
He wasn’t pearl clutching at all, simply pointing out you pulled a wild ass guess in your statement. He might as well have asked you how many licks does it take to get to the center of a tootsie pop …

“3”
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
He wasn’t pearl clutching at all, simply pointing out you pulled a wild ass guess in your statement. He might as well have asked you how many licks does it take to get to the center of a tootsie pop …


“3”
It was not a wild guess as people have been using the numbers both companies have presented as well as data coming from RAD Game Tools (Oodle Texture folks). You can also look at MS’s Hot Chips presentation for reference too (before people take > 6 GB/s as meaning any number between 6 and 200 common sense is advised :p).
Sometimes people do not post as if they were making research papers with all the references linked and vetted.

Oodle Texture blog:

Hot Chips references:
nxm8Ble.jpg
xh4vgvS.jpg

etc…

If you think the numbers do not work (I do not think that on XSX|S you go through BCPACK and then zlib for example… you can take those numbers and compare to the official numbers Sony gave, Oodle folks confirmed Sony’s initial numbers did not include the use of Oodle Texture btw, and later official Oodle Texture technical blog articles by its devs).
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
then one day maybe you will explain to me how many seconds you intend to saturate an i / o capable of 22gb / s considering the average for this gen of 100gb per game (and we both know that it will be the + or - half the average). amaze me
You tend to provision a unit to ensure you can cover edge cases unless they are 0.000001% of the scenarios you envision and Oodle folks report games hitting over 3:1 compression ratios (even 3 * 5.5 GB/s is quite a lot…).

Current consoles are designed to have very minimal prefetch buffers meant to store “seconds of gameplay” (they try to maximise the detail used to paint the actual picture on screen as much as possible instead) and they deal with RAM seeing a very modest increase over last generation consoles too.
Still, this does not mean every frame has unique texture and models data… nor that textures are never reused throughout the game, but that data is constantly and very quickly swapped in and out. Data is reused but not kept in main RAM for long.

22 GB/s is the maximum speed the decoder can inflate the compressed input stream: if you were to assume a 4:1 compression ratio (which is a not very likely edge case as Cerny himself said on stage, but Oodle did report practical scenarios hitting 3.16:1) that would require a 5.5 GB/s data rate coming in.
Also, using SSD space is at a premium so high compression ratios of data on disk is of the upmost importance. You will see PS5 games bucking the trend and becoming smaller than people thought but with assets quality going expectancy higher snd higher (aka they seem to have a solution that compressed well and is very fast at decompressing ;)).
 
Last edited:
  • Strength
Reactions: Rea

MonarchJT

Banned
You tend to provision a unit to ensure you can cover edge cases unless they are 0.000001% of the scenarios you envision and Oodle folks report games hitting over 3:1 compression ratios (even 3 * 5.5 GB/s is quite a lot…).

Current consoles are designed to have very minimal prefetch buffers meant to store “seconds of gameplay” (they try to maximise the detail used to paint the actual picture on screen as much as possible instead) and they deal with RAM seeing a very modest increase over last generation consoles too.
Still, this does not mean every frame has unique texture and models data… nor that textures are never reused throughout the game, but that data is constantly and very quickly swapped in and out.
22 GB/s is the maximum speed the decoder can inflate the compressed input stream: if you were to assume a 4:1 compression ratio (which is a not very likely edge case as Cerny himself said on stage, but Oodle did report practical scenarios hitting 3.16:1) that would require a 5.5 GB/s data rate coming in.

Also, using SSD space is at a premium so high compression ratios of data on disk is of the upmost importance. You will see PS5 games bucking the trend and becoming smaller than people thought but with assets quality going expectancy higher snd higher (aka they seem to have a solution that compressed well and is very fast at decompressing ;)).
Pana ..is amazing that the i/o of the PS5 is so fast but is easy math 100gb (some of biggest games) * 3 ....300gb imagining that by pure fantasy we can quintuple this figure by reusing the assets you have 1500gb (ahah and this is honestly impossibile) you would have 68 seconds of game maximized at 22gb / s ....acting silly and doubling this number utilizing alien PS6 compression tech..we go to from 100gb of data to 3000gb ..and a 22gb/s we could have an incredible 2.2 minutes of gameplay. I repeat that i / o is overkill exactly like the one in series x / s and it is because of the size of the games not because it is useless to have that speed. exactly as leviathan said. Now back with our feet on the ground please....and let's say that next gen games probably need less than 500 mb/s during streaming and that's already more than 10 times what naughty dogs needed in the last of us 2.
and I don't even want to go into the power of the GPU..who should be able to rework this data
 
Last edited:
  • LOL
Reactions: Rea
Could very well be possible that the I/O in the PS5 is so fast that it doesn't need those screens to hide the loading to the main menu. While on Xbox you still need them. Sort of like how RE8 on the PS5 doesn't have loading screens while the Series version still has them.

Or it could be like you said just a feature built into the OS.
That kind of stuff, based on what we know about the PS5 I/O setup, comes easier because its drive is just plain faster combined with kraken.

For Series X, though still pretty fast and have their own excellent texture compression solution in BCPack, they have a slower SSD, which is why their overall I/O solution with the Directstorage API is designed to be utilized in conjunction with Sampler Feedback Streaming and BCPack.
Marketing fluff has polluted this conversation to a hilarious degree.

If the Xbox's SSD has a bandwidth of 2.4GB/s then there is nothing - and I mean nothing you can do to increase that bandwidth other than 'overclocking' it.
The bandwidth is the bandwidth.

All shit like Zlib, SFS, Kraken or whatever does is reduce the size of a given dataset that need to be moved, by compressing it. It doesn't multiply the bandwidth it just divides the amount of data that can be moved.

Yes, you aren't changing the raw bandwidth ever, but you are definitely getting better "effective" bandwidth with compression and memory efficiency techniques such as SFS, which technically works out to being the same thing as if the raw did change because you actually are getting increased I/O performance. That's not imaginary. This is how the PS5 SSD can reach 8-9GB/s typical with compression or potentially as high as 22GB/s with really well compressed data. This is how Series X's 2.4GB/s raw becomes 4.8GB/s effective with compression or can be as much as 12GB/s effective (and even higher) with Sampler Feedback streaming.

Based on the Series X's SSD spec it should be able to move 10GB into memory without compression in 4.16 seconds. With compression, that same feat is accomplished in 2.08 seconds. The same work was done, but it was done an entire 2 seconds faster because the effective performance increased. How fast would that be with Sampler Feedback Streaming? 0.83 seconds. Effective bandwidth of the SSD then becomes 10GB/s because Sampler Feedback Streaming immediately decided that 6 out of that 10GB of data was unnecessary.

The Technical Director for Dirt 5 has said he was able to move 10GB into RAM on Series X in only 2 seconds WITHOUT using any of the compression hardware. He did it with the raw performance of the Series X SSD. He said you have to make many requests at once and organize your data properly and know where it's going in memory, and this is what you can achieve. So,

Significantly cutting down on the amount of data that needs to be brought into main memory while still getting the same visual result is one of the best ways possible to improve effective I/O.
 

Godfavor

Member
Next post, I added "in a meaningful way" ;)
Yes, the GPU could brute force through the data but textures etc must be processed, applied to polygons, shaded, .....
It just seems that some in this forum think that the data must just be loaded and is ready to display. Games are not movies where this is possible. Here we have just raw data that must be processed.

The other thing is really memory contention. The good old jaguar cores did less than 20GB/s in normal cases but stole much more bandwidth the GPU could need. The HDD in the old systems was not capable enough to really make a bandwidth difference. But now the SSD is one of the "big players" that can get bandwidth hungry quite fast. But therefore you save much memory space. Always depends on what you want to do. 10GB/s Bandwidth can do a lot of stuff. A few gigs of saved texture buffer also.

Don't get me wrong, this topic is ... well not really the best, as it is clear that PS5 has the edge in theory. But in real life applications it won't make that much of a difference. That is my whole point. And I only answered here, because someone wanted to make a really unrealistic special case claim. That is just not how the whole thing works. Console games won't be IO limited anymore and that is what all current-gen machines deliver.
Btw, we should also not forget, the diminishing returns you get from even higher asset quality. At some point it doesn't really matter how much higher res some textures are, because of diminishing returns. We already saw that with the last 2 generations. Games looked better, for sure, but the steps get smaller and smaller.
I think that's what a traditional engine would do, it cannot handle complex meshes and lod transitions would have pop in. Mesh shaders/geometry engine are going to counter that by dynamically reduce or add vertexes on the fly according to players distance without having to rely on traditional LOD's, that would go as small as a polygon per pixel if possible.

See futuremark latest benchmark that has mesh shading in action. They could push 10x the detail in a scene. So vertexes would no longer be a bottleneck.

The second part of the problem is to shade all these polygons. PRT+/SFS would help here, as they only have to shade the visible parts on the screen by adding or removing texture mips on the fly, again by camera distance. Like a distant oblect would not have to use the full texture,if it takes 100 pixels on screen, then these are the only ones to be shaded. It is a waste of RAM space otherwise.
 

jroc74

Phone reception is more important to me than human rights
So basically you're guessing?
Usually thats what 'my best guess" means...
Educated guesses based on information available from multiple channels (including tech presentations on either) is fair game, not sure why the pearl clutching here
Exactly.
By his own admission he's guessing. Everyone is an expert these days 😅

If you read 'my best guess" then what exactly is the problem...?
It was not a wild guess as people have been using the numbers both companies have presented as well as data coming from RAD Game Tools (Oodle Texture folks). You can also look at MS’s Hot Chips presentation for reference too (before people take > 6 GB/s as meaning any number between 6 and 200 common sense is advised :p).
Sometimes people do not post as if they were making research papers with all the references linked and vetted.

Oodle Texture blog:

Hot Chips references:
nxm8Ble.jpg
xh4vgvS.jpg

etc…

If you think the numbers do not work (I do not think that on XSX|S you go through BCPACK and then zlib for example… you can take those numbers and compare to the official numbers Sony gave, Oodle folks confirmed Sony’s initial numbers did not include the use of Oodle Texture btw, and later official Oodle Texture technical blog articles by its devs).
Yup.

I swear I have no idea what some folks are trying to get at, lol.

Cerny gave a best case scenario number. We have averages officially from Sony and from the Kraken folks which include pre and post Oodle. I even said whats the average vs best case scenario for Series consoles. That is something I dont think anyone knows.

I even said over ' over 6GB/s, up to 12GB/s' in trying to determine whats the average and best case scenarios for Series consoles.
 
Last edited:

Jose92

[Membe
There are multiple ways of improving texture streaming I/O performance besides just making the file size smaller or having a faster SSD. Sampler Feedback Streaming happens to be a method for significantly cutting down a game's streaming requirements. That fact alone changes what can be done with 2.4GB/s raw 4.8GB/s compressed. We have seen some evidence, it's called Quick Resume. It's the only thing thus far actually taking proper advantage of Xbox Velocity Architecture that we know of.

Much of what you're saying seems premature in light of the fact the generation has barely just started and have yet to truly see what XVA can do when fully utilized in a game built around it. But it'll come I'm certain. But, long story short, for what Microsoft designed around, 2.4GB/s raw and 4.8GB/s compressed is all they will ever need.


Xbox Velocity Architecture is the whole i/o architecture which includes the SSD SFS Decompression engines even the Zen 2 architecture etc..

What he is saying SFS can't magically increase the amount of compressed transmitted data beyond the numbers XBox stated.

Quick Resume dumps a compressed, maybe a selective uncompressed, image of the ram, the game uses into the SSD. It works nearly identical to the way save states works on Snes/ps1/ps2/etc.. emulators...,I am guessing here but pretty certain it is similar.

The advances the Zen 2 and the decompression/compression capabilities the Series consoles brings to the table allows for such a feature to run without hiccups/stutters.
 
Last edited:
As to the game sizes argument, the Xbox focus isn't just about smaller game sizes. It's about using Sampler Feedback Streaming to cut down on the amount of texture data that needs to be copied into main memory in the first place. So game sizes aren't an accurate means of judging Series X I/O performance. Their strategy is RAM efficiency and in getting RAM efficiency it's a big effective boost on their SSD I/O beyond the raw spec.
 
Xbox Velocity Architecture is the whole i/o architecture which includes the SSD SFS Decompression engines even the Zen 2 architecture etc..

What he is saying SFS can't magically increase the amount of compressed transmitted data beyond the numbers XBox stated.

Quick Resume dumps a compressed, maybe a selective uncompressed, image of the ram, the game uses into the SSD. It works nearly identical to the way save states works on Snes/ps1/ps2/etc.. emulators...,I am guessing here but pretty certain it is similar.

The advances the Zen 2 and the decommission/compression capabilities the Series consoles brings to the table allows for such a feature to run without hiccups/stutters.

Full details of how it works, and what hardware or tech plays a role is still not fully clear.
 
Um, I'm not talking about the initial loading of the game, but about the delivery of data during the game by constant and continuous streaming. You can't wait 1-2-3 seconds here, um.

Both systems comfortably handle both scenarios with ease as long as they're designed for it. Prior to Series X|S and PS5 devs have been using a variety of tricks to stream in more continuous data than what you might otherwise think makes sense for the spec. Those tricks haven't suddenly disappeared, and the hardware to assist has only gotten significantly better.

Dirt 5 tech director said the Series X drive is fast enough for him to be able to load data mid-frame, use it, consume it, unload and replace it with something else in the middle of a frame. These things are fast, but they have to be properly targeted.
 

Panajev2001a

GAF's Pleasant Genius
Pana ..is amazing that the i/o of the PS5 is so fast but is easy math 100gb (some of biggest games) * 3 ....300gb imagining that by pure fantasy we can quintuple this figure by reusing the assets you have 1500gb (ahah and this is honestly impossibile) you would have 68 seconds of game maximized at 22gb / s ....acting silly and doubling this number utilizing alien PS6 compression tech..we go to from 100gb of data to 3000gb ..and a 22gb/s we could have an incredible 2.2 minutes of gameplay. I repeat that i / o is overkill exactly like the one in series x / s and it is because of the size of the games not because it is useless to have that speed. exactly as leviathan said. Now back with our feet on the ground please....and let's say that next gen games probably need less than 500 mb/s during streaming and that's already more than 10 times what naughty dogs needed in the last of us 2.
and I don't even want to go into the power of the GPU..who should be able to rework this data

It sure you fully read what was posted earlier.. Data being reused mean the game is not streaming unique data over, but streaming in displaying it, streaming other data over using it, streaming some of the old data again, etc… and doing it fast. How much can you transfer influences how fast characters move, how deep these buffers have to be (aka how big they are).

It is how fast data needs to be moved at (and what it means when you look at how that numbers looks “per second”). You can transfer 0.55 GB in 100ms is more important to look at than 5.5 GB in 1s.

Next you are going to use the same calculations to prove that 400-500 GB/s of bandwidth is even more wasted then ;). So we have people saying the SSD’s are too slow to be used to “extend RAM” and a similar group of people saying the SSD’s are too fast for what they are needed for
:rolleyes:
 

MonarchJT

Banned
It sure you fully read what was posted earlier.. Data being reused mean the game is not streaming unique data over, but streaming in displaying it, streaming other data over using it, streaming some of the old data again, etc… and doing it fast. How much can you transfer influences how fast characters move, how deep these buffers have to be (aka how big they are).

It is how fast data needs to be moved at (and what it means when you look at how that numbers looks “per second”). You can transfer 0.55 GB in 100ms is more important to look at than 5.5 GB in 1s.

Next you are going to use the same calculations to prove that 400-500 GB/s of bandwidth is even more wasted then ;). So we have people saying the SSD’s are too slow to be used to “extend RAM” and a similar group of people saying the SSD’s are too fast for what they are needed for
:rolleyes:
I do not question the goodness of the speed of the I / O. but there is a limit to the reuse of assets before they start to get repetitive. I have really big doubts that during to next gen open world games ..consoles will stream more than 100mb / s of data
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
I do not question the goodness of the speed of the I / O. but there is a limit to the reuse of assets before they start to get repetitive. I have really big doubts that during to next gen open world games ..consoles will stream more than 100mb / s of data
It is part reuse and part, big part, the time it takes to move data in quickly (if you have 100-160 ms to fetch the data or less, how much data can you move over? This solution on both MS and Sony side is not just about large data transfer but low latency fetches of smaller blocks of data… see DRAM paired with the SSD, see SRAM in the I/O block, see the cache scrubbers and the coherency engines taking the blunt of the overhead of mapping and unmapping memory, etc…).
 

MonarchJT

Banned
It is part reuse and part, big part, the time it takes to move data in quickly (if you have 100-160 ms to fetch the data or less, how much data can you move over? This solution on both MS and Sony side is not just about large data transfer but low latency fetches of smaller blocks of data… see DRAM paired with the SSD, see SRAM in the I/O block, see the cache scrubbers and the coherency engines taking the blunt of the overhead of mapping and unmapping memory, etc

sure latency is a BiG thing too..but I was talking about data streaming (for openworlds as example) and i think that both console are really overkill
 
Maybe not an expert but being able to read and connect things without double counting? People are posting a bit longer than one liners…
Yes and it's certainly worth discussing, but with a reasonable amount of guess work what tends to happen is a difference of opinion. Then of course without having some sort of expertise within this field, who is right and who is wrong?
 
sure latency is a BiG thing too..but I was talking about data streaming (for openworlds as example) and i think that both console are really overkill

Both systems when properly designed to the hardware WILL prove overkill because the hardware is all there. I don't think Microsoft is going all in on RPGs and open world games for nothing, for example. Bethesda is key, but then you also have those xbox publishing deals they're working on. Guess time will tell though. Exciting times ahead for both machines.
 

Panajev2001a

GAF's Pleasant Genius
Yes and it's certainly worth discussing, but with a reasonable amount of guess work what tends to happen is a difference of opinion. Then of course without having some sort of expertise within this field, who is right and who is wrong?
I would start with trying to follow the argument logically and at least moving from double counting compression ratios ;). beyond going philosophical or talking down people… you could play the same game and try to show by the points made do not make sense.

There are people with modicum of experience and even some game devs on the board (not just one Veteran) btw.
 
I would start with trying to follow the argument logically and at least moving from double counting compression ratios ;). beyond going philosophical or talking down people… you could play the same game and try to show by the points made do not make sense.

There are people with modicum of experience and even some game devs on the board (not just one Veteran) btw.
Are you a hardware engineer? Or is anyone posting here an engineer? You see what I was talking about? You believe you are correct but you have no first hand knowledge of what you're talking about. The poster you are alluding to is in the same boat as you, third hand information, which to put it frankly is unreliable.

Try and stick to what you know, and you may not come across so patronising.
 

Panajev2001a

GAF's Pleasant Genius
Are you a hardware engineer? Or is anyone posting here an engineer?
I graduated as an ECE major actually and I develop for a living (albeit transitioned to an Engineering Manager role a while ago, not that hands off, but if we meet IRL one day we can chat more :p, no I do not work on AAA console titles). There are console developers (beyond low level PS2/PS3 Linux kit development [VU assembly, DMA chains handling, etc…] or GBA home brew, or mobile I mean) posting in this thread, you can do the rest of the homework though.

Do not recall you being this aggressive in other threads, not sure what is triggering it, but some of this discussion is not rocket science and more similar to “how can you say these are 2 apples, do you have a maths degree?” IMHO.

The thread revived by getting basic maths a bit wrong is now turning into ad hominem instead of discussing the data reasonably and so think it is a bit odd and condescending to say the least… ironic that you are calling others out as sounding patronising…
You are also not establishing a reason to doubt the third party information being presented or how it was used nor claiming you have knowledge or insights on the matter that suggests what is occurring is problematic.
 
Last edited:
I graduated as an ECE major actually and I develop for a living (albeit transitioned as an Engineering Manager a while ago, not that hands off, but if we meet IRL one day we can chat more :p, no I do not work on AAA console titles). There are console developers (beyond low level PS2/PS3 Linux kit development [VU assembly, DMA chains handling, etc…] or GBA home brew, or mobile I mean) posting in this thread, you can do the rest of the homework though.

Do not recall you being this aggressive in other threads, not sure what is triggering it, but some of this discussion is not rocket science and more similar to “how can you say these are 2 apples, do you have a maths degree?” IMHO.

The thread revived by getting basic maths a bit wrong is now turning into ad hominem instead of discussing the data reasonably and so think it is a bit odd and condescending to say the least… ironic that you are calling others out as sounding patronising…
You are also not establishing a reason to doubt the third party information being presented or how it was used nor claiming you have knowledge or insights on the matter that suggests what is occurring is problematic.

lol damnit my math was accurate man, it was my thought process and logic behind the math that was way wrong lol. I thought what I was doing wasn't double compression until it was presented to me a certain way where it just clicked. Then I was like wait... :messenger_tears_of_joy:

I was thinking if a game requests 14GB of texture data, SFS cuts the streaming requirement down to 5.6GB, but then my dumbass was like

"AH HA! That's just the size of the requested texture data AFTER decompression. Prior to decompression it's 2.8GB on disk! So now let me bust out my trusty calculator! 2.8GB / 2.4GB/s = 1.16 seconds. Shoulda stopped there.. but then I was like...

anigif_sub-buzz-13011-1489759247-1.gif


Divide that shit by 2, cause compression, which makes effective speed 4.8GB/s, which brings me to... 0.58 seconds! :messenger_grinning_sweat:

Took me a minute to realize I was double compressing. If I'm going to use the compressed size for the calculation, I can no longer divide by two. If I use the proper uncompressed size, then I can divide by two. So all the math was accurate, I was just applying it wrong.

Chevy Chase Reaction GIF
 
Right so the I/O is overkill now. That went for fake secret sauce to overkill very quickly.

Literally none of Microsoft's biggest new titles have released yet. There hasn't yet really been anything that discounts what Microsoft says is going to be possible when Xbox Velocity Architecture is fully utilized with Sampler Feedback Streaming. We have two examples of Sampler Feedback Streaming demos running, one video on Series S, where it's running in the 200-fps range, and another on Series X where the same sections are running in the 600-800 and even 1000+fps range. No, I'm not saying Series X titles will get those framerates. I'm simply pointing out that the massive performance gap adds weight to the fact that they said the video last year was running on Series S and the April of this year was running on Series X.
 
People still here believe in SFS PR´? Its just old tech with new maketing name.... really never seen anything like that befor :messenger_tears_of_joy:

Let's put it this way. There isn't a single Xbox title that uses what Microsoft demonstrated here.





4A Games is one of the most technologically proficient studios out there, and even they aren't using it.


Out of the features available in DirectX 12 Ultimate, which one do you believe will be most useful in terms of performance? Do you plan to utilize them all in the next 4A Games project?

Currently, we use DXR 1.1 inline raytracing and VRS. I like sampler-feedback - I’ve asked hardware
vendors about this for years and it will be utilized for our future projects.
Not sure if we’d go for
mesh shaders in the future as we are not that dependent on traditional vertex/primitive/raster
processing anymore on recent architectures. Our current frames are only about 10% raster and
90% compute on PlayStation 5 and Xbox Series X. And raster pairs well with async compute.

So clearly the tech must not be that old.

Also PRT has weaknesses Sampler Feedback Streaming does not.



They are not the same thing. They are different capabilities.



Sampler Feedback is also more than just SFS, it also enables some incredible stuff with another powerful feature called texture space shading that allows you to get access to the ability to do more complex lighting without the same costs associated with doing so. A very useful feature.


The hardware capability, to my knowledge, was first introduced with Nvidia Turing, but I think we all know none of these consoles were running Nvidia RTX 2000 series GPUs when they introduced Sampler Feedback.


Enter texture-space-shading, or TSS, also known as object-space shading. TSS is a technique where you do your expensive lighting computations in object space, and write them to a texture— maybe, something that looks like a UVW unwrapping of your object. Since nothing is being rasterized you could do the shading using compute, without the graphics pipeline at all. Then, in a separate step, bind the texture and rasterize to screen space, performing a dead simple sample. This way, you have some opportunities for visual quality improvements. Plus, the ability to get away with computing lighting less often than you rasterize, if you want to do that.

One obstacle in getting TSS to work well is figuring out what in object space to shade for each object. Everything? You could, but hopefully not. What if only the left-hand side of an object is visible? With the power of sampler feedback, your rasterization step could simply record what texels are being requested and only perform the application’s expensive lighting computation on those.

Now that we’ve discussed scenarios where sampler feedback is useful, what follows are some more details how it’s exposed in Direct3D.


Not the kind of feature I'd try to downplay.
 
Last edited:

Fafalada

Fafracer forever
It just seems that some in this forum think that the data must just be loaded and is ready to display.
That's kinda the idea behind storage this fast and ASICs that handle the compression. Moving away from complexities of bespoke file-formats and 40 years old I/O access patterns into treating graphics data on disk as extension of memory. That's what MS meant when they referenced 'virtual memory' and it's also the backbone of things Cerny referenced in his talks. It's a paradigm shift though - and doesn't come for free, software stacks and pipelines need to be rewritten/adapted etc.
 
That's kinda the idea behind storage this fast and ASICs that handle the compression. Moving away from complexities of bespoke file-formats and 40 years old I/O access patterns into treating graphics data on disk as extension of memory. That's what MS meant when they referenced 'virtual memory' and it's also the backbone of things Cerny referenced in his talks. It's a paradigm shift though - and doesn't come for free, software stacks and pipelines need to be rewritten/adapted etc.

Sounds like HDDs and old file systems have to be abandon for that to work. It could explain why games like Ratchet look that good since they don't have to support the old way of doing things.
 
With a single difference that it has some hardware support, and that's all.

Incorrect. PRT and Sampler Feedback are not the same capability at all.

And that "some hardware support" is a much bigger deal than you think. You dismiss it carelessly.


"Think about an in-game cinematic when you see a character's face up close. Sampler Feedback Streaming allows that character to have incredible detail loaded extremely quickly and not waste time loading detail on the other side of the character you can't see.

"Now I'd like to show you how our Sampler Feedback Hardware feature works (exact time-stamp below)"




"Sampler Feedback is configured per texture, and we're looking at just one of the 475 textures in this scene that are producing feedback" They chose to focus on one texture, but Sampler Feedback's hardware is seeing and analyzing every single possible texture in a scene with the same accuracy and speed as it's doing for that one texture they're focusing on.

"The GPU is capable of providing an enormous volume of sampler feedback data, and we can carefully tune the level of output to best balance accuracy and performance. We have found that we can discard over 99% of the raw data and still get great results." And it can be specifically tuned as per developer needs.

That's the "some hardware" people are dismissing. That's insanely impressive stuff. This feature, when it starts seeing wider adoption, will be huge.
 
Last edited:

Rea

Member
So basically you're guessing?
Yea, because Microsoft never tell us what is the maximum bandwidth of decompressor and also what is the compression ratio of Bcpack. All they are spreading is the SFS multipliers PR spinning bullshits. SFS has nothing to do with hardware decompressor, it helps in saving RAM space but the SSD speed will always be the same. And also their official specs of 4.8gb/s is with the compression ratio of (2) including BCpack with Zlib (Zlib+BCpack). In the future, they might improve their compression efficiency of BCpack, so the ratio might become 2.5 to 3, or maybe not.
 

Boglin

Member
Yea, because Microsoft never tell us what is the maximum bandwidth of decompressor and also what is the compression ratio of Bcpack. All they are spreading is the SFS multipliers PR spinning bullshits. SFS has nothing to do with hardware decompressor, it helps in saving RAM space but the SSD speed will always be the same. And also their official specs of 4.8gb/s is with the compression ratio of (2) including BCpack with Zlib (Zlib+BCpack). In the future, they might improve their compression efficiency of BCpack, so the ratio might become 2.5 to 3, or maybe not.
I agree that the "multiplier" talking point can be misleading at times but it made sense in the context of textures. SFS saves on bandwidth for the same reasons it saves on RAM space. If you were previously needing 3GB/s second because you had to stream in an entire mip vs 1GB/s streaming in only the parts of the mip you need then in practice, in this situation, it's acting like a 3x multiplier.
 

Rea

Member
I agree that the "multiplier" talking point can be misleading at times but it made sense in the context of textures. SFS saves on bandwidth for the same reasons it saves on RAM space. If you were previously needing 3GB/s second because you had to stream in an entire mip vs 1GB/s streaming in only the parts of the mip you need then in practice, in this situation, it's acting like a 3x multiplier.
Agree but my point is SFS won't magically increases the bandwidth of raw Data throughput of SSD. It just helps the game do not load unnecessary data into ram memory. In the future, what if an open world game that "NEEDED" textures data more than the capabilities of SSD I/O throughput? The SFS can't increase the bandwidth of SSD and I/O will becomes bottleneck. Same Goes for PS5.
 

Boglin

Member
You're correct of course that SFS doesn't compress or decompress anything to artificially increase bandwidth but even still, the end result of texture compression and SFS can be the same.

In a hypothetical scenario, if you're needing a 3GB texture loaded and you're limited to 1GB/s raw transfer speed then 3x compression could get the job done in 1 second. Alternatively, if SFS needs only a third of the bandwidth on average then through no magic it will also get the job done in 1 second. Both help assist the SSD get the data you actually need to the VRAM faster.

Because they tackle a similar problem in a completely different way, you can use both simultaneously and load what would have taken 9GB/s of raw bandwidth using no compression and streaming textures with older techniques.

Using your example, if an open world game needed more bandwidth for textures beyond its raw I/O then the only way SFS would not help in that situation is if the entire mip needed to be loaded because none of it was being occluded.
Conversely, texture compression would help in that scenario despite how much of the texture needed to be loaded so it's more reliable.
 
Last edited:

Three

Member
as leviathan said on twitter ..the thing is There is a limit of data size and assets produced for the game which makes, at least for this gen, both subsystems overkill. No matter how much pr there behind it
Nonsense, he says the exact opposite. He said the drives are too slow to have any kind of diminishing return.

Incorrect. PRT and Sampler Feedback are not the same capability at all.

And that "some hardware support" is a much bigger deal than you think. You dismiss it carelessly.


"Think about an in-game cinematic when you see a character's face up close. Sampler Feedback Streaming allows that character to have incredible detail loaded extremely quickly and not waste time loading detail on the other side of the character you can't see.

"Now I'd like to show you how our Sampler Feedback Hardware feature works (exact time-stamp below)"




"Sampler Feedback is configured per texture, and we're looking at just one of the 475 textures in this scene that are producing feedback" They chose to focus on one texture, but Sampler Feedback's hardware is seeing and analyzing every single possible texture in a scene with the same accuracy and speed as it's doing for that one texture they're focusing on.

"The GPU is capable of providing an enormous volume of sampler feedback data, and we can carefully tune the level of output to best balance accuracy and performance. We have found that we can discard over 99% of the raw data and still get great results." And it can be specifically tuned as per developer needs.

That's the "some hardware" people are dismissing. That's insanely impressive stuff. This feature, when it starts seeing wider adoption, will be huge.

PRT+ and SFS both do this. i.e. they load only what is visible.
 

Three

Member
Are you a hardware engineer? Or is anyone posting here an engineer? You see what I was talking about? You believe you are correct but you have no first hand knowledge of what you're talking about. The poster you are alluding to is in the same boat as you, third hand information, which to put it frankly is unreliable.

Try and stick to what you know, and you may not come across so patronising.
You don't need to be an engineer to apply logic. I'm an 'engineer' but that doesn't mean I need to dismiss other people's posts simply because they don't have a title. As long as the logic is sound what difference would it make? If somebody has some counter argument or better knowledge then they can present it, that is the literal definition of a forum.
 

muteZX

Banned
Both systems comfortably handle both scenarios with ease as long as they're designed for it. Prior to Series X|S and PS5 devs have been using a variety of tricks to stream in more continuous data than what you might otherwise think makes sense for the spec. Those tricks haven't suddenly disappeared, and the hardware to assist has only gotten significantly better.

Dirt 5 tech director said the Series X drive is fast enough for him to be able to load data mid-frame, use it, consume it, unload and replace it with something else in the middle of a frame. These things are fast, but they have to be properly targeted.

I'll tell you otherwise. You have the same game but the memory of the graphics card has a speed of 250 or 500 or 750 giga / sec. What will it do with what we see on the screen? It will definitely not be perfectly fine for both. Exactly the same is the 200-500% difference between SSD IOP on PS5 and XSX.
 
Top Bottom