• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Xbox Series X’s BCPack Texture Compression Technique 'might be' better than the PS5’s Kraken

Lethal01

Member
Do I believe SFS is roughly 2.5x more memory efficient than PRT on last gen consoles and 2.5x more memory efficient than PRT on next gen consoles? Absolutely! Because no matter how much faster the SSD gets for next gen consoles, that fundamental lack of accuracy of what is missing and what to stream next is not as powerful without Sampler Feedback.

This is a solid and understandable theory if not for the fact that we literally have a demo from microsoft showing how what they are 2.5x better than and they are comparing against a system that isn't using PRT. So it could be 2.4x for all we know but I think it's crazy to think that PRT is literally useless without sampler feedback.
 

IntentionalPun

Ask me about my wife's perfect butthole
This is a solid and understandable theory if not for the fact that we literally have a demo from microsoft showing how what they are 2.5x better than and they are comparing against a system that isn't using PRT. So it could be 2.4x for all we know but I think it's crazy to think that PRT is literally useless without sampler feedback.
Yeah this is my point; their engineer says it's more efficient than PRT, and I believe them.. but they didn't quantify it.

MS has only quantified the advancement over non-PRT streaming w/ XVA.

I'd be curious to see what SFS in particular does for the efficiency, actually quantified.. wonder if we'll ever get that.
 
Last edited:

IntentionalPun

Ask me about my wife's perfect butthole
Since you mentioned future frames I have a question about that.

In The Road to PS5 it was talked about how developers had to load assets for the next X seconds of gameplay.

How are they able to predict what those assets will be? It's not like for the next 30 seconds they will know what the player will see.

They are always a few frames ahead of actually effectively responding to control inputs is my understanding.

They have the data for the next few frames because of that, until they then start processing what's next. That's how they use it for things like TAA.

SFS really efficiently uses this data to make decisions on what to load/unload, so there's little wasted cache, and few cache "misses". (is my understanding)
 

muteZX

Banned
This is a solid and understandable theory if not for the fact that we literally have a demo from microsoft showing how what they are 2.5x better than and they are comparing against a system that isn't using PRT. So it could be 2.4x for all we know but I think it's crazy to think that PRT is literally useless without sampler feedback.

Very simply put. If you lack HW sampler feedback .. it is quite easy to replace it with code and light assistance from the GPU. I personally rate the decline in performance as marginal.
 

IntentionalPun

Ask me about my wife's perfect butthole
Very simply put. If you lack HW sampler feedback .. it is quite easy to replace it with code and light assistance from the GPU. I personally rate the decline in performance as marginal.
Well all PRT attempts to guess what data will/won't be use, the question is how accurate their predictions are vs SFS which is incredibly accurate. It could be more than marginal, but won't be 2.5-3x. Gonna vary by engine/technique/implementation. (and CPU usage would vary)
 
Last edited:

Fafalada

Fafracer forever
How are they able to predict what those assets will be? It's not like for the next 30 seconds they will know what the player will see.
Depends on the type of game - if player traversal is linear (racing, platformers, action adventures, non-openworld RPGs etc.) you know exactly what player will see if they keep progressing, you have the max movement speed, so you can compute how much space you need to store 'X - seconds of forward movement' and just constantly fill a look-ahead buffer that way (usually several seconds worth to accommodate for variations).
For open-world, heuristics are loosely similar to what 2d scrolling engines used to do, you have currently 'visible'/'on-screen' stuff (which is usually just radius/bounding box) and then prefetch the edge-tiles based on I/O speed and available memory(so you have a small buffer in 'every' possible direction).
There are other more bespoke scenarios (like world-split, where one world is loading in background while player plays the first) etc. - but it's all just variations of the two themes.
You wouldn't try to estimate anything based on camera visibility for obvious reasons - latency of I/O requests is in hundreds of frames, more ore less.

The reason this all works with mechanical/optical storage, is that in most games, movement is temporally coherent (like scrolling on a map), so delta from one frame to the next is relatively tiny, and all I/O needs to keep-up with is that, not larger changes (for those, you put-up load screens or live with low-res assets for a few seconds after a teleport).

What 'needs to be loaded' is from a view request you just made, which is based on current game-state. Guessing the future(couple of frames ahead, as per MS demonstration) is a software problem (not unlike old-school examples above).
 

Godfavor

Member
What 'needs to be loaded' is from a view request you just made, which is based on current game-state. Guessing the future(couple of frames ahead, as per MS demonstration) is a software problem (not unlike old-school examples above).
Yes, bad choice of words from my side. SFS mapping is instantaneous and draws info from the feedback map that determines which mips are to be shaded.

Worldguessing 1 sec ahead is a completely different story.
 

PaintTinJr

Member
if it doesn't matter then why it says Nanite memory?
8wl1rua.png
I had the impression the UE5 Demo data was just 768MB.

Going by that slide, it reads more like the "current view" - presumably the camera position at the current frame of 1 in 30 - using 768MB.

If that's true, then the theoretical maximum memory bandwidth use per second in UE5 demo would be 30 x 768MB = 22.5GB/s 🤔 looks like a very familiar figure, but I very much doubt it.

However, assuming that figure was correct, then in reality - because of them probably using a very effective PRT, assuming nanite heavily uses signed distance field volumetric rendering that feedbacks 4 exact sample locations per pixel - multiplying that number by 0.7 (=15.75GB, assuming +70% texture data) and dividing that number by x 2 (=7.8GB, under estimated PRT saving) and adding back the 30% of the 22.5GB, gives a memory bandwidth use of 14.7GB per second for the demo through the decompressor, and would indicate a compression ratio of around 3:1 for Kraken in the demo.

Probably I've misunderstood the context of the 768MB in the slide, but seems interesting the numbers all fit inside the PS5 IO complex specs.
 
Last edited:
This is a solid and understandable theory if not for the fact that we literally have a demo from microsoft showing how what they are 2.5x better than and they are comparing against a system that isn't using PRT. So it could be 2.4x for all we know but I think it's crazy to think that PRT is literally useless without sampler feedback.

You're right, I just had it confirmed by a dev that it is indeed not being compared to PRT, but that it doesn't matter since what it's being compared against is what most AAA games are doing anyway, and would have likely continued to be comfortable doing, and so making something like this easier and more convenient will help greatly with adoption, thus making the multiplier true, especially since they think SFS is being intentionally undersold by Microsoft.

What's more, they said PRT on its own simply isn't as fast as SFS, and so despite fundamental similarities (because PRT is a foundational element of SFS) being present and PRT being able to contribute to over 2x memory efficiency and many, many times more when virtual texturing is factored in, the SFS memory efficiency advantage still mostly holds because of the speed and accuracy factor. It's able to sample and inform the streaming system of the things PRT can't see or tell the game anywhere as effectively or as quickly.

SFS cleans up PRT's blindspots while also just being faster and better for the task. Sampler Feedback Streaming completes PRT.
 

Lethal01

Member
You're right, I just had it confirmed by a dev that it is indeed not being compared to PRT, but that it doesn't matter since what it's being compared against is what most AAA games are doing anyway, and would have likely continued to be comfortable doing, and so making something like this easier and more convenient will help greatly with adoption, thus making the multiplier true, especially since they think SFS is being intentionally undersold by Microsoft.

What's more, they said PRT on its own simply isn't as fast as SFS, and so despite fundamental similarities (because PRT is a foundational element of SFS) being present and PRT being able to contribute to over 2x memory efficiency and many, many times more when virtual texturing is factored in, the SFS memory efficiency advantage still mostly holds because of the speed and accuracy factor. It's able to sample and inform the streaming system of the things PRT can't see or tell the game anywhere as effectively or as quickly.

SFS cleans up PRT's blindspots while also just being faster and better for the task. Sampler Feedback Streaming completes PRT.

I absolutely believe that it's faster, but I am not really willing to put any number on how much better it is. It could be 30% better and I would consider that a lot but right now all we can do is play guessing games. I am happy regardless since as you said most games weren't using PRT before but I personally am interested in how much of a benefit SFS is than the old method of "just" PRT. Did your source give a number on the multiplier you would expect from PRT vs SFS?

Also, could you remind me what hardware customizations are specific to Series X, the only one I remember of hand is the special filters.
 
I absolutely believe that it's faster, but I am not really willing to put any number on how much better it is. It could be 30% better and I would consider that a lot but right now all we can do is play guessing games. I am happy regardless since as you said most games weren't using PRT before but I personally am interested in how much of a benefit SFS is than the old method of "just" PRT. Did your source give a number on the multiplier you would expect from PRT vs SFS?

Also, could you remind me what hardware customizations are specific to Series X, the only one I remember of hand is the special filters.

That's fair. Those texture filters are a much bigger deal than people think. It's literally integral to the process, but because of how basic they sound it's being overlooked as I've learned. It's a more important factor in the speed than I thought.

They said a lot more, but unfortunately it would probably end up derailing the thread due to how bullish they were on thinking Microsoft undersold not just the multiplier for SFS, but texture compression ratios for BCPack, so that's why I'm intentionally resisting blabbing. They confirmed to me who they were, surprisingly, and where they worked. It's a AAA studio and they specifically work on exactly this type of stuff that we're discussing. They were paying attention to thread and also laughing at my double compression mistake lol. But they were cool enough to give me some info, and they seem to be of a frame of mind that SFS is a much bigger deal than people think because of how easily they said it is to implement, and all the usual problems with something like this, Microsoft pretty much anticipated those and has answers for various scenarios.
 
Last edited:

IntentionalPun

Ask me about my wife's perfect butthole
I had the impression the UE5 Demo data was just 768MB.

Going by that slide, it reads more like the "current view" - presumably the camera position at the current frame of 1 in 30 - using 768MB.

I think you have it backwards; they only need 768MB for cache data.. the rest of the RAM is used for the current view.

Hence why they talk about optimizing it, aka increasing the amount of memory dedicated for the view, and lowering/optimizing the cache.

That was always my guess at that slide at least.. as only using 768MB for what is in view.. makes no sense to me... (and in computing, a memory pool is a type of cache, called a lookaside cache)
 
Last edited:

Boglin

Member
But they were cool enough to give me some info, and they seem to be of a frame of mind that SFS is a much bigger deal than people think because of how easily they said it is to implement, and all the usual problems with something like this, Microsoft pretty much anticipated those and has answers for various scenarios.

I was getting the impression that Sampler Feedback's primary purpose was more of a quality of life improvement which may sound underwhelming by itself but if third party developers this generation utilize SFS more than PRT, then I think that alone would prove Microsoft made a very smart and forward thinking move. After all, it doesn't matter if PRT could stream in textures just as efficiently as SFS if games don't utilize it because it's too difficult to implement.
 

Panajev2001a

GAF's Pleasant Genius
That's fair. Those texture filters are a much bigger deal than people think. It's literally integral to the process, but because of how basic they sound it's being overlooked as I've learned. It's a more important factor in the speed than I thought.

They said a lot more, but unfortunately it would probably end up derailing the thread due to how bullish they were on thinking Microsoft undersold not just the multiplier for SFS, but texture compression ratios for BCPack, so that's why I'm intentionally resisting blabbing.
As a whole SF is a big deal and an API finally exposing what the GPU has been accessing for you and how fully is a big deal.

Again we shall see how that solution stacks against the predictions and against PS5 in time, developer(s) in this thread (not a tiny sap either) have also expressed their opinions and were not as explosively bullish as your personal source seems to be… glad we moved to the point where we are on almost the same page about MS doing SFS comparisons to PRT based virtual texturing implementations or not… and about how close devs can get with optimised PRT based implementations which is important for comparisons and we are avoiding multipliers that close the wide SSD I/O gap magically.

You can construct demos where you make PRT work 10x less efficiently than SFS, but proving you can make such edge cases is as meaningful in and of itself as saying Kraken can output 22 GB/s or other e-peen waving even tech experts from both sides will still do.

BCPACK might be greater than initially advertised (but is it when they advertised the 2:1 average ratio or when they said at HotChips that the maximum decompression rate of their I/O block was 6+ GB/s and how much would the latter be supposedly underestimating the data), SF might make it much easier to implement a virtual texturing and streaming engine, but developers might be able to get super close by doing work in compute shaders to give what SF is automating (it also does not say much about comparisons between consoles).

I do believe that the different consoles are more customised and unlike each other in terms of tradeoffs than we thought and some of these differences are undersold and underrated (higher clockspeed, less Shader Array cache contention, cache scrubbers, etc…).
 

Darius87

Member
I had the impression the UE5 Demo data was just 768MB.
No i see many people in here to think for some reason it's just 768MB for everything. it's just for geometry(nanite memory).
Going by that slide, it reads more like the "current view" - presumably the camera position at the current frame of 1 in 30 - using 768MB.
yes streaming is for current view aka screen space but it doesn't need to stream all 768MB of mesh pool every frame.
If that's true, then the theoretical maximum memory bandwidth use per second in UE5 demo would be 30 x 768MB = 22.5GB/s 🤔 looks like a very familiar figure, but I very much doubt it.
it's far from true PS5 SSD or any SSD couldn't handle such speeds actuall streaming depends on camera movement so by frame basis it would be some fraction of 768MB streaming pool actually streamed to RAM.
However, assuming that figure was correct, then in reality - because of them probably using a very effective PRT, assuming nanite heavily uses signed distance field volumetric rendering that feedbacks 4 exact sample locations per pixel - multiplying that number by 0.7 (=15.75GB, assuming +70% texture data) and dividing that number by x 2 (=7.8GB, under estimated PRT saving) and adding back the 30% of the 22.5GB, gives a memory bandwidth use of 14.7GB per second for the demo through the decompressor, and would indicate a compression ratio of around 3:1 for Kraken in the demo.
it's not correct there's a lot of triangles in UE5 demo basically 768MB of pool is total size of 20 millions of triangles compressed.
Probably I've misunderstood the context of the 768MB in the slide, but seems interesting the numbers all fit inside the PS5 IO complex specs.
i see many misunderstood that slide i don't know why? it literally says nanite memory meaning geometry, but you right in a sense that SSD isn't fully utilized with UE5 demo( i think it gets closer to heavy utilization at flying part of the demo)
i hope once and for this will make it clear for some of you.
 

Rea

Member
it's not correct there's a lot of triangles in UE5 demo basically 768MB of pool is total size of 20 millions of triangles compressed.
I'm curious, how does it decompress? Is it by GPU, using Asynchronous Compute units? The mesh data is streaming from Ssd to Ram in compressed format bypassing hardware decompressor?
 

Darius87

Member
I'm curious, how does it decompress? Is it by GPU, using Asynchronous Compute units? The mesh data is streaming from Ssd to Ram in compressed format bypassing hardware decompressor?
it's unclear most likely like you said by bypassing decompressor then gpu decompress data to it's L3 cache when needed. it's shows just how much geometry there is when already they have to hold compressed geometry in RAM though geometry compresses very well that might be the real reason.
 
Last edited:
As a whole SF is a big deal and an API finally exposing what the GPU has been accessing for you and how fully is a big deal.

Again we shall see how that solution stacks against the predictions and against PS5 in time, developer(s) in this thread (not a tiny sap either) have also expressed their opinions and were not as explosively bullish as your personal source seems to be… glad we moved to the point where we are on almost the same page about MS doing SFS comparisons to PRT based virtual texturing implementations or not… and about how close devs can get with optimised PRT based implementations which is important for comparisons and we are avoiding multipliers that close the wide SSD I/O gap magically.

You can construct demos where you make PRT work 10x less efficiently than SFS, but proving you can make such edge cases is as meaningful in and of itself as saying Kraken can output 22 GB/s or other e-peen waving even tech experts from both sides will still do.

BCPACK might be greater than initially advertised (but is it when they advertised the 2:1 average ratio or when they said at HotChips that the maximum decompression rate of their I/O block was 6+ GB/s and how much would the latter be supposedly underestimating the data), SF might make it much easier to implement a virtual texturing and streaming engine, but developers might be able to get super close by doing work in compute shaders to give what SF is automating (it also does not say much about comparisons between consoles).

I do believe that the different consoles are more customised and unlike each other in terms of tradeoffs than we thought and some of these differences are undersold and underrated (higher clockspeed, less Shader Array cache contention, cache scrubbers, etc…).

More or less in agreement with you. I do think people confuse what Microsoft meant when they said their decompression block can deliver over 6GB/s. I think that's referring to how much input data it can take. It means decompression will never be a bottleneck for Series X because there's no way a 2.4GB/s raw SSD will ever operate at a speed to where decompression would end up holding anything back due to not being able to keep up, so it's pretty well matched in that regard. Decompression far exceeds

"Our second component is a high-speed hardware decompression block that can deliver over 6GB/s," reveals Andrew Goossen. "This is a dedicated silicon block that offloads decompression work from the CPU and is matched to the SSD so that decompression is never a bottleneck. The decompression hardware supports Zlib for general data and a new compression [system] called BCPack that is tailored to the GPU textures that typically comprise the vast majority of a game's package size."

I really suspect that's what they meant, not that their decompression hardware wasn't capable of decompress at "effective" speeds beyond 6GB/s. Cerny just went further in providing a number on how far Kraken decompression could theoretically be pushed at higher compression ratios. It's difficult to believe

I also just noticed this from Hotchips PDF. While they don't offer a number, they mention supporting higher ratios.

Iu2VR3o.jpg
 
As far as the Unreal 5 demo goes guys, that's virtualized geometry and textures, so don't be surprised at how small the streaming requirements may have been in some respects compared to what you saw.

They utilize both disk compression and runtime compression to keep everything so small, plus the geometry they're generating is virtual in nature, almost like how with virtual texturing it's possible to shrink 32GB of texture data down to just 16MB as demonstrated years ago in Microsoft's Tiled Resources talk at Build.
 

Three

Member
The point I continue to make is Microsoft was comparing to all Xbox One generation games period, with PRT, without, doesn't matter. And they came away from all that extensive monitoring and analysis with this one conclusion.











What did they do with that information?



Finally, what people aren't appreciating is that there are 3 bars in that real-time tech demo. Xbox One X with a HDD, a 9th gen console running XVA with an SSD and a well optimized streaming system, but without SFS (Basically Series X), and finally the 3rd bar is showcasing Series X running XVA with SFS enabled this time. Microsoft is saying we are getting 2.5x effective multiplier effect on our system memory and I/O bandwidth compared to Xbox Series X optimized games. The multiplier isn't coming from comparing the Xbox One X to Series X with SFS, but from comparing Xbox Series X XVA without SFS to Xbox Series X XVA with SFS.

lZ4uksx.jpg



Microsoft planned for, created and implemented a solution they feel is integral to the architecture of Xbox Series X, and that will make all Xbox games better unless a developer out there can come up with something superior. Maybe PRT saw limited adoption because this extensive degree of work was never done before to ensure its adoption and make it more accessible and useful, and Microsoft saw a way to make that happen with their custom Xbox implementation of Sampler Feedback. Some are stuck on comparing to PS5. I know this thread is titled in a way that gets attention, but the larger point here has nothing to do with Playstation 5. This is exciting for the future of Xbox games period. If the majority of them were not using PRT, and would have likely never started using PRT, then Sampler Feedback Streaming by Microsoft is a way to make that happen once Xbox and PC games start making use of it.
PRT+ in the form you see now was never used extensively before because consoles and PCs never had low latency high speed drives. The engines written for those consoles therefore took this into account and kept things in memory even if they were not visible in the scene.

Gen 9 engines will use PRT+/SFS.

It has nothing to do with the 'extensive degree of work not being done' before. It's not secret sauce if that's what you're getting at. It simply was not as beneficial to do it before due to the slow drive.

I'm as excited for engines written for gen 9 too, Unreal Engine 5s virtualised geometry and virtualised 8k textures looked insane, if your only point is to say that you are excited for the possibilities this gen then so am I.
 
PRT+ in the form you see now was never used extensively before because consoles and PCs never had low latency high speed drives. The engines written for those consoles therefore took this into account and kept things in memory even if they were not visible in the scene.

Gen 9 engines will use PRT+/SFS.

It has nothing to do with the 'extensive degree of work not being done' before. It's not secret sauce if that's what you're getting at. It simply was not as beneficial to do it before due to the slow drive.

I'm as excited for engines written for gen 9 too, Unreal Engine 5s virtualised geometry and virtualised 8k textures looked insane, if your only point is to say that you are excited for the possibilities this gen then so am I.

I said nothing about secret sauce lol, but it's a very big fucking deal that I won't dare downplay if more games end up taking advantage of this. It's going to open up a mountain of incredible possibilities for Xbox games.

Low latency, high speed drives weren't the only problem according to official documentation.


Why Feedback: A Streaming Scenario​

Suppose you are shading a complicated 3D scene. The camera moves swiftly throughout the scene, causing some objects to be moved into different levels of detail. Since you need to aggressively optimize for memory, you bind resources to cope with the demand for different LODs. Perhaps you use a texture streaming system; perhaps it uses tiled resources to keep those gigantic 4K mip 0s non-resident if you don’t need them. Anyway, you have a shader which samples a mipped texture using A Very Complicated sampling pattern. Pick your favorite one, say anisotropic.

The sampling in this shader has you asking some questions.

What mip level did it ultimately sample? Seems like a very basic question. In a world before Sampler Feedback there’s no easy way to know. You could cobble together a heuristic. You can get to thinking about the sampling pattern, and make some educated guesses. But 1) You don’t have time for that, and 2) there’s no way it’d be 100% reliable.

Where exactly in the resource did it sample?
More specifically, what you really need to know is— which tiles? Could be in the top left corner, or right in the middle of the texture. Your streaming system would really benefit from this so that you’d know which mips to load up next. Yeah while you could always use HLSL CheckAccessFullyMapped to determine yes/no did-a-sample-try-to-get-at-something-nonresident, it’s definitely not the right tool for the job.

Direct3D Sampler Feedback answers these powerful questions.


At times, the accuracy of sampling information is everything. In the screencap shown below, this demo-scene compares a “bad” feedback approximation to an accurate one. The bad feedback approximation loads higher-detailed mips than necessary:



Bad feedback approximation showing ten times the memory usage as good feedback approximation
The difference in committed memory is very high— 524,288 versus 51,584 kilobytes! About a tenth the space for this tiled resource-based, full-mip-chain-based texturing system. Although this demo comparison is a bit silly, it confirms something you probably suspected: good judgments about what to load next can mean dramatic memory savings. And even if you’re using a partial-mip-chain-based system, accurate sampler feedback can still allow you to make better judgments about what to load and when.


In this Microsoft dev blog post, they directly compare to PRT based solutions, and state that there are still advantages to knowing. I'm not saying low latency, high speed drives don't help, but notice how what they point to in this entire blog describing the benefits of Sampler Feedback on texture streaming, the focus is on the accuracy of knowing what to load when, and what's definitely not required because it won't be seen? That single issue was a problem for not only many texture streaming systems, but especially PRT based solutions, even if PRT based solutions help save memory. The problem is more system memory can be saved, and Microsoft making it a big, highly publicized part of their Xbox APIs, making it easier than tiled resources implementations have been in the past does represent an extensive degree of work having been done on Microsoft's end to drive adoption.

Another example is the Agility SDK, which is designed to open up access to newer DirectX features to a far wider audience of Windows 10 users, as opposed to keeping them exclusive to just whoever upgraded to the latest version of Windows 10. That, too, is an example of the work Microsoft has put in to get more developers to adopt these features because now they have a guarantee that a lot of people who can benefit actually will, and those who can't, the game will operate within the parameters of that hardware as designed.
 
Last edited:
22 pages on a texture decompression discussion. The amount of console warring around here should be epic lol.

We've actually been having pretty good discussion. Very little console warring.

I did threaten to compress someone's face and unleash the kraken though. :messenger_tears_of_joy:

So tread carefully, it gets pretty hardcore around these parts. All the biggest badasses that frequent this site post in this thread. Compression and effective I/O is a hot topic.
 
Last edited:

Corndog

Banned
it's unclear most likely like you said by bypassing decompressor then gpu decompress data to it's L3 cache when needed. it's shows just how much geometry there is when already they have to hold compressed geometry in RAM though geometry compresses very well that might be the real reason.
Wouldn’t it just use the lossless compression or does it not work well with vertex data?
 

Three

Member
Got any links to documentation of any other PRT+ impelementations?

The only thing that comes back is Microsoft's own reference doc for SFS..

Where are these games using PRT+ or hardware w/ SFS?

I'm not saying it doesn't exist.. but I don't think it's some prevalent thing.
Part of the reason you have so much coverage about SFS is because of the SSD difference and trying to nullify it so it really has come into the limelight.

PRT+ and SFS isn't something you would hear a lot about or find a lot of information on usually and partly because 'PRT+' is colloquial and is implemented differently in various engines.
One example of an implementation of it is the middleware Granite for UE4


Notice with their demo (which is actually a UE4 demo) with the use of granite they got VRAM usage from 4GB+ down to 1GB for example and even increased texture res.

UE5 and this demo goes even further by virtualising everything. Virtual shadow maps, virtual 8k textures and even virtualised geometry

 
Part of the reason you have so much coverage about SFS is because of the SSD difference and trying to nullify it so it really has come into the limelight.

PRT+ and SFS isn't something you would hear a lot about or find a lot of information on usually and partly because 'PRT+' is colloquial and is implemented differently in various engines.
One example of an implementation of it is the middleware Granite for UE4


Notice with their demo (which is actually a UE4 demo) with the use of granite they got VRAM usage from 4GB+ down to 1GB for example and even increased texture res.

UE5 and this demo goes even further by virtualising everything. Virtual shadow maps, virtual 8k textures and even virtualised geometry



Some people are choosing to view it this way, and the thread is titled as such, but that's not the focus of the discussion. There is no PS5 comparison happening here. This is something Microsoft has been telling us about from the very start. The Sampler Feedback feature is a key pillar of the next DirectX, it's a key pillar of Sampler Feedback Streaming in the Xbox Velocity Architecture. How it compares to the PS5 is irrelevant and not more important than the question of what can it do for Xbox Series X games.

If it works like Microsoft says it does, and appears it does, Xbox owners have a great deal to be excited about when the most anticipated titles start releasing.
 

THE:MILKMAN

Member
Can't believe the debate around the effective throughput is still raging. I fall back to what Spencer and Ronald said right at the start in December 2019: 40X boost over One.

One = 120MB/s X 40 = 4800MB/s effective after XVA applied.

The XVA marketing might try and blind us with a 2X multiplier here and stack a multiplier there but the bottom line will always be 4.8GB/s all in because that is what they ultimately commit to claim!

And without doing a comparison I'm sure the same logic applies to PS5. Raw speed + Kraken/Oodle = 11GB/s.
 

quest

Not Banned from OT
Some people are choosing to view it this way, and the thread is titled as such, but that's not the focus of the discussion. There is no PS5 comparison happening here. This is something Microsoft has been telling us about from the very start. The Sampler Feedback feature is a key pillar of the next DirectX, it's a key pillar of Sampler Feedback Streaming in the Xbox Velocity Architecture. How it compares to the PS5 is irrelevant and not more important than the question of what can it do for Xbox Series X games.

If it works like Microsoft says it does, and appears it does, Xbox owners have a great deal to be excited about when the most anticipated titles start releasing.

It will also pc gamers who benefit. That is the reason Microsoft went this direction. They release on 2 platforms day 1 and needed both at a baseline. As great as the Sony solution is no way it was being adopted on the PC quickly. What Microsoft has done with several modest ideas that all 3 hardware vendors could agree to. That should see fast adoption on the PC side.
 

Three

Member
Some people are choosing to view it this way, and the thread is titled as such, but that's not the focus of the discussion. There is no PS5 comparison happening here. This is something Microsoft has been telling us about from the very start. The Sampler Feedback feature is a key pillar of the next DirectX, it's a key pillar of Sampler Feedback Streaming in the Xbox Velocity Architecture. How it compares to the PS5 is irrelevant and not more important than the question of what can it do for Xbox Series X games.

If it works like Microsoft says it does, and appears it does, Xbox owners have a great deal to be excited about when the most anticipated titles start releasing.
I don't know man. You say it's not secret sauce and not to view it that way. you say there is no comparison happening here yet all you seem to talk about is possibilities for 'xbox games' and 'xbox users' and Microsoft making a huge technical breakthrough. Not only that but this is a quote from you in this thread from yesterday

Microsoft placed a premium on getting much better than normal system memory efficiency, to the point where the end result is that their SSD can get away with doing 2.5x less work yet still get the same exact results onto screen. That's not me saying what Sony is doing isn't smart. If it were a foot race Microsoft isn't concerned about outrunning anybody, but thinking about how they can make it so they can still finish faster

Granite has existed before for 3+ years. They have demos of reducing VRAM usage from 4GB down to 1GB. 4x multiplier. It's an implementation of PRT+. Some games use it but not a lot.

What does a quote like this from their site tell you?:

"For an HD screen, Granite requires roughly 650MB of VRAM at any given time, no matter how much texture content you use."

Think about it constructively.
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
Microsoft meant when they said their decompression block can deliver over 6GB/s. I think that's referring to how much input data it can take
Sorry, but I do not think that is the case… nobody really reports numbers that way.
That is normally the decompression bandwidth of the unit: if their average decompression rate is 4.8 GB/s then it makes sense that the unit is designed to support 6-7 GB/s hence over 6 GB/s.

I also just noticed this from Hotchips PDF. While they don't offer a number, they mention supporting higher ratios
The number does not need to be expressed in a redundant way in that slide too: they already mention “over 6 GB/s” when talking about the decompression rate for the I/O HW decoder (zlib + BCPack = over 6 GB/s).
 
Last edited:

jroc74

Phone reception is more important to me than human rights
Can't believe the debate around the effective throughput is still raging. I fall back to what Spencer and Ronald said right at the start in December 2019: 40X boost over One.

One = 120MB/s X 40 = 4800MB/s effective after XVA applied.

The XVA marketing might try and blind us with a 2X multiplier here and stack a multiplier there but the bottom line will always be 4.8GB/s all in because that is what they ultimately commit to claim!

And without doing a comparison I'm sure the same logic applies to PS5. Raw speed + Kraken/Oodle = 11GB/s.
The wild part is it seems Sony, Cerny were being modest with the official numbers. Which IMO is good, helps keeps expectations in check.
Sorry, but I do not think that is the case… nobody really reports numbers that way.
That is normally the decompression bandwidth of the unit: if their average decompression rate is 4.8 GB/s then it makes sense that the unit is designed to support 6-7 GB/s hence over 6 GB/s.


The number does not need to be expressed in a redundant way in that slide too: they already mention “over 6 GB/s” when talking about the decompression rate for the I/O HW decoder (zlib + BCPack = over 6 GB/s).

Basically Sony is showing a realistic number while Microsoft is showing an ideal number.
Yup, once again.... the average vs the best case scenarios.
 

Lethal01

Member
Some people are choosing to view it this way, and the thread is titled as such, but that's not the focus of the discussion. There is no PS5 comparison happening here. This is something Microsoft has been telling us about from the very start. The Sampler Feedback feature is a key pillar of the next DirectX, it's a key pillar of Sampler Feedback Streaming in the Xbox Velocity Architecture. How it compares to the PS5 is irrelevant and not more important than the question of what can it do for Xbox Series X games.

If it works like Microsoft says it does, and appears it does, Xbox owners have a great deal to be excited about when the most anticipated titles start releasing.

I think how it compares to PS5 is important or atleast interesting since they are both extremely similar systems, Like I'm less interested in how SFS works without an SSD, if Sony has something comparable then it means both consoles have a brighter future than expected. Comparing it also lets us understand the benefits of Microsofts specific implementation. It's not like we are forced to only talk about Xbox and totally ignore everything else.
 
Last edited:

Lethal01

Member
Basically Sony is showing a realistic number while Microsoft is showing an ideal number.
I don't really see it that way,

Sony and Microsoft put out realistic numbers of what their SSD can transfer both raw and if compressed.
Microsoft outlined how efficient their system is at transferring only the texture data that is needed, Sony did not.
 
I don't know man. You say it's not secret sauce and not to view it that way. you say there is no comparison happening here yet all you seem to talk about is possibilities for 'xbox games' and 'xbox users' and Microsoft making a huge technical breakthrough. Not only that but this is a quote from you in this thread from yesterday



Granite has existed before for 3+ years. They have demos of reducing VRAM usage from 4GB down to 1GB. 4x multiplier. It's an implementation of PRT+. Some games use it but not a lot.

What does a quote like this from their site tell you?:

"For an HD screen, Granite requires roughly 650MB of VRAM at any given time, no matter how much texture content you use."

Think about it constructively.

You really on your warrior thing today. Sampler Feedback Streaming cuts streaming and memory demands by at least 2.5x according to Microsoft compared to what most titles seem to be using. So what's wrong with me pointing that fact out? And the other one line you bolded is a response to constant comparison with the PS5 when this isn't about PS5. Each time someone keeps telling me how it will never be faster than the PS5, and I continue to say "don't care, I care about it being the improvement over previous xbox games development that Microsoft says it will be."

It's about how I think it can and will benefit Xbox Series X games. People seem to feel everything that is positive or hopeful Xbox is a threat or shot at Playstationi 5. Trust me it isn't that serious.
 

Panajev2001a

GAF's Pleasant Genius
I don't really see it that way,

Sony and Microsoft put out realistic numbers of what their SSD can transfer both raw and if compressed.
Microsoft outlined how efficient their system is at transferring only the texture data that is needed, Sony did not.

Sony did not really need and it possibly is not the only thing they improved upon and did not shout about (DualSense reduced input latency, but they did not really make a big fuss over it), their devs know and even if they just assumed basic old PRT support their solution starts from a very very high point (best surprise was seeing how Oodle Texture helps with improving Kraken compression ratios).
 
I think how it compares to PS5 is important or atleast interesting since they are both extremely similar systems, Like I'm less interested in how SFS works without an SSD, if Sony has something comparable then it means both consoles have a brighter future than expected. Comparing it also lets us understand the benefits of Microsofts specific implementation. It's not like we are forced to only talk about Xbox and totally ignore everything else.

That's fair. I'm only referencing people thinking I'm talking about this or interested in it solely to diminish what the PS5 is doing. Not the case. On its face the PS5 should have the basic hardware functionality to do their own version of sampler feedback streaming on top of already having a faster SSD, so I've said to people already that PS5 will always be faster if that's the case. This isn't to take away from what the PS5 is doing. I care about what it means for the big xbox titles I'm anticipating if they really embrace Sampler Feedback Streaming. Especially as a big RPG gamer.
 

betrayal

Banned
I wonder if "BCPack Texture Compression Technique" will be more revolutionary than the previous Xbox games powered by the Cloud.
 

Boglin

Member
I care about it being the improvement over previous xbox games development that Microsoft says it will be.

It's about how I think it can and will benefit Xbox Series X games. People seem to feel everything that is positive or hopeful Xbox is a threat or shot at Playstationi 5. Trust me it isn't that serious.

It's sometimes hard to get out of the console war mentality but I agree with you 100% here.
If we pretend the PS5 doesn't exist for a minute then it's easier to contrast the XSX to the last generation and you can see just how massive of a leap its I/O system is by itself. A person can look at Velocity Architecture and dismiss it a PR buzzword but I prefer to look at it as a package of improvements over the previous gen and each of its components is a big deal in their own right. It's not JUST marketing.

So with that said, I hope when I make comparisons between Xbox and Playstation I don't sound too much like I'm just being a fanboy. My goal isn't to show one console is better than the other but to understand the reasoning, compromises and benefits behind each console design. I've said it many times in the past that both of these consoles have incredibly intelligent people behind their designs and behind every decision they made was a highly informed thought process.

I'd also like to point out that a lot of people like to put down the words of the system architects as purely PR which really takes away from the individuals. Perhaps I'm being naive, but when I watched the road to PS5, or the various presentations for XSX I don't get the impression that the presenters are trying to swindle me. I see people who are enthusiastic and proud of their work trying to show us why we should be as excited as they are. These engineers are some of the biggest nerds on the planet and I'm sure their passion isn't to make Microsoft and Sony money.
 
Last edited:
Basically Sony is showing a realistic number while Microsoft is showing an ideal number.

Microsoft from the beginning has only ever stressed sustainable speed over peak speed. Sampler Feedback Streaming has nothing to do with best case scenario. The rough 2.5x efficiency is their worst case scenario understanding that a game isn't only streaming textures, and Sampler Feedback Streaming is only about textures.

PRT by itself gives you over 2x memory efficiency over a normal streaming texture system.

SFS will almost surely have the efficiencies Microsoft suggests since apparently the large majority of big AAA Xbox games were utilizing the traditional method that isn't quite as efficient.

Based on Microsoft's messaging and API targets, they are focused on trying to copy roughly 2.5x less texture data into system memory while achieving the same visual result on screen as if they had never achieved such efficiency. That is the foundation of what Microsoft is guarantee will hold true for Sampler Feedback Streaming in Xbox titles.

These are not their best case scenarios, this has been the average Microsoft has said their system achieves from day one if Sampler Feedback Streaming is utilized. Go back to march last year when they said it on xbox news wire site, to digital foundry, when they said it in hotchips, when they said it in podcasts and showed real-time demos, one running on series s, the other on series x.
 

Ev1L AuRoN

Member
The PS5 I/O architecture is a chain of technologies to maximize the data throughput. I don't think Microsoft would have a magic bullet to counter that with software. But we need to see how this advantage will translate into games. Xbox has advantages in GPU grunt and memory bandwidth. In the end both consoles excel in different aspects and there will be engines that likes one over the other, to me, they are very similar. What Microsoft need is to bring competition for the Sony Worldwide Studios. It would be a shame if the Xbox advantages translate only in slightly better resolution or a little better performance that we need Digital Foundry to look closely to tell us the difference. It's Microsoft studios job to fully utilize the console potential.
 

M1chl

Currently Gif and Meme Champion
Why is this thread still running, either MS does not have anything or they are again late. Like this is all baseless speculation on Xbox side of things. On PS5, we know, that they did it right.
 
It's sometimes hard to get out of the console war mentality but I agree with you 100% here.
If we pretend the PS5 doesn't exist for a minute then it's easier to contrast the XSX to the last generation and you can see just how massive of a leap its I/O system is by itself. A person can look at Velocity Architecture and dismiss it a PR buzzword but I prefer to look at it as a package of improvements over the previous gen and each of its components is a big deal in their own right. It's not JUST marketing.

So with that said, I hope when I make comparisons between Xbox and Playstation I don't sound too much like I'm just being a fanboy. My goal isn't to show one console is better than the other but to understand the reasoning, compromises and benefits behind each console design. I've said it many times in the past that both of these consoles have incredibly intelligent people behind their designs and behind every decision they made was highly informed one.

I'd also like to point out that a lot of people like to put down the words of the system architects as purely PR which really takes away from the individuals. Perhaps I'm being naive, but when I watched the road to PS5, or the various presentations for XSX I don't get the impression that the presenters are tying to swindle me. I see people who are enthusiastic and proud of their work trying to show us why we should be as excited as they are. These engineers are some of the biggest nerds on the planet and I'm sure their passion isn't to make Microsoft and Sony money.

Agreed 110%. This is how I see it also. Now we do know when even the engineers are trying to be "cute" such as when Microsoft's engineers wanted to avoid acknowledging the fact that Xbox One was clearly weaker than PS4 and wouldn't be equal. We knew which parts were an attempt to soften the blow and which parts were still relevant enough to take seriously and take under advisement.

Take Halo Infinite, for example, I'm judging the finished product visually based only on past Halo games and what I think Halo going for a "classic trilogy" look should look like, not, for example, what Battlefield 6, Star Wars Battlefront II, Red Dead Redemption 2 or the latest Call of Duty look like. When I do bring up other games in reference to Halo Infinite, I do so mostly to make an extreme point of how different the games are, and to remind people that what Xbox Series X technically is aiming to do is to outdo or leapfrog is its last incarnation, same for each Xbox developers last project. Same mentality here. I've seen what I feel is representative of the best work on Xbox. Now I want devs to use this kind of stuff to take it to another level.

When we see Sampler Feedback Streaming's apparent efficiency, it's being compared against a large number of games developed on Xbox One/S/X that do things in a much less efficient manner, but the multiplier has always been in reference to Xbox Series X.


The picture below shows SFS having an advantage over Xbox One X of 5.7x, but only a 2.6x advantage over Series X without SFS.

ydFoPwc.jpg
 
Why is this thread still running, either MS does not have anything or they are again late. Like this is all baseless speculation on Xbox side of things. On PS5, we know, that they did it right.

Hardly baseless. We have a demo of it in action in a scene with over 10GB of highly detailed textures, a number. Now we just need to get some amazing games using this tech.
 
Top Bottom