• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Xbox Series X’s BCPack Texture Compression Technique 'might be' better than the PS5’s Kraken

I'll tell you otherwise. You have the same game but the memory of the graphics card has a speed of 250 or 500 or 750 giga / sec. What will it do with what we see on the screen? It will definitely not be perfectly fine for both. Exactly the same is the 200-500% difference between SSD IOP on PS5 and XSX.

I repeat. It will be fine for both. People are underestimating XVA bigtime.

Using an insane 22GB texture request, Sampler Feedback Streaming's proven efficiency slashes that down to 8.8GB due to the 2.5x or higher rough efficiency. Then do

8.8GB / 2.4GB/s / 2 and you accomplish the visual equivalent of 22GB worth of textures in just 1.83 seconds. That's 22GB/s effective in terms of what the I/O on Series X with its 2.4GB/s raw SSD would be accomplishing.

I used a super extreme example to showcase to you that if what Microsoft says about Sampler Feedback Streaming's efficiency is true, the advantage of the PS5 SSD will make no difference with these kinds of insane memory multiplying results.

They're either feeding us PR, or its true. We have a real demo of it running on both the Series S and on Series X, now just need to see the games. I think it's true. Microsoft even said during the SFS demo a few weeks ago that the multiplier effect will hold true even in more visually complex videogames than what their demo represents, meaning they've done the tests.


This innovation results in approximately 2.5x the effective I/O throughput and memory usage above and beyond the raw hardware capabilities on average. SFS provides an effective multiplier on available system memory and I/O bandwidth, resulting in significantly more memory and I/O throughput available to make your game richer and more immersive.

This hypothetical would either be a crap ton of highly varied textures, or a whole lot of highly, highly detailed textures, or they can simply not use so much variety or such high quality textures and instead use the memory savings they got from SFS towards improving other parts of the game.

People are underestimating Series X with SFS fully utilized, and the games will be coming that proves these people wrong is my strong belief. SFS turns the effective I/O performance of Xbox Series X, and even S, on its head entirely if it really works the way Microsoft says it does.

The man who worked on SFS seems to think it's the real deal.

 
Last edited:

Utherellus

Member
I'll tell you otherwise. You have the same game but the memory of the graphics card has a speed of 250 or 500 or 750 giga / sec. What will it do with what we see on the screen? It will definitely not be perfectly fine for both. Exactly the same is the 200-500% difference between SSD IOP on PS5 and XSX.
PS5 will load it at 2x 5GB/s speed


On Xbox side, if SFS is implemented in that game, it will load at 2-3x 4.8GB/s speed
 
PRT+ and SFS both do this. i.e. they load only what is visible.

SFS isn't just about only loading what's visible, but operating much faster and helping the streaming system much more accurately determine what should have never been loaded in the first place. Sampler Feedback's functionality combined with PRT is what allows the kinds of insane 2.5x-3x effective multipliers on system memory and I/O bandwidth Microsoft are talking about because whereas PRT simply streams out high-res texture mips, SFS steams in/out parts of a mip in a much more fine grained way, delivering only a specific corner or middle section of a whole texture if need be without having the rest of the parts not seen occupying memory. PRT is swapping out whole texture mips, not tiny parts of them.

Though often called the same thing or conflated to describe the functionality, even in Microsoft own official DirectX documentation, the capabilities of SFS and basic PRT are not entirely the same. These various tweets by the guy working on it at Microsoft probably gives a better picture than I could.

















ZK3RdbC.jpg
 
Last edited:
People look at PS5 SSD, bashing Xbox and dont really realize how big of a deal SFS really is.

And I blame Microsoft for that. They are soooooooo silent about it and sooooo delaying the launch of it in games, its painful.

Well, they do need time to work on the kinds of games that will take advantage of this stuff, but with them having a much larger pool of first party development talent, things like this which will bring out the best in Xbox have a greater likelihood of being taken advantage of. I can only imagine what Bethesda Game Studios could do with this kind of tech once they start utilizing it, which I suspect at some point down the road they certainly will. Alex from DF recently suggested Halo Infinite could end up using it, which I really and truly hope it does. Even with the work they still needed to finish regarding their engine and the art and graphics side of things, I've suspected Halo Infinite was using it and they just didn't have the hang of it yet. Hopefully it proves true.
 

Gurney

Neo Member
Though often called the same thing or conflated to describe the functionality, even in Microsoft own official DirectX documentation, the capabilities of SFS and basic PRT are not entirely the same.

PRT is the foundation of SFS, but PRT by itself is definitely not SFS.

PRT is the foundation of PRT+...

PRT by itself is definitely not PRT+...
 

skit_data

Member
In terms of optimization I get it, but data is data, 1’s and 0’s, shouldn’t matter if its a Super Famicom game
Yeah, well i guess you know that those 1s and 0s represent instructions to specific parts and functions of the hardware. There is quite a lot of hardware in both these consoles not accounted for in instructions written for last gen consoles.

Edit: On the topic of install sizes, I guess Sony could make their games smaller because they have licensed RAD Game Tools stuff but the instructions would still tell the game to decompress via the CPU which would at best result in awfully long loading times and at worst a game that cannot fetch any data from the hdd when actually playing the game in time = complete lag fest
 
Last edited:
In terms of optimization I get it, but data is data, 1’s and 0’s, shouldn’t matter if its a Super Famicom game

Nah, it definitely matters because it isn't just about what the data and raw speeds are like. Without utilizing the proper APIs, which make proper use of the hardware to break through any I/O bottlenecks or limitations, you won't see the PS5 doing what it can truly do in a game that isn't properly built for it. Xbox's APIs clearly make getting something extra from the new hardware easier without necessarily having to build it ground up, which is why I often don't really see those things as anything to brag about when I see it. If I'm bragging about it, I'm just having some fun on that day lol.
 

coffinbirth

Member
SFS isn't just about only loading what's visible, but operating much faster and helping the streaming system much more accurately determine what should have never been loaded in the first place. Sampler Feedback's functionality combined with PRT is what allows the kinds of insane 2.5x-3x effective multipliers on system memory and I/O bandwidth Microsoft are talking about because whereas PRT simply streams out high-res texture mips, SFS steams in/out parts of a mip in a much more fine grained way, delivering only a specific corner or middle section of a whole texture if need be without having the rest of the parts not seen occupying memory. PRT is swapping out whole texture mips, not tiny parts of them.

Though often called the same thing or conflated to describe the functionality, even in Microsoft own official DirectX documentation, the capabilities of SFS and basic PRT are not entirely the same. These various tweets by the guy working on it at Microsoft probably gives a better picture than I could.

















ZK3RdbC.jpg

Talked about this a couple of weeks ago, and got lols...smh.
 

muteZX

Banned
PS5 will load it at 2x 5GB/s speed


On Xbox side, if SFS is implemented in that game, it will load at 2-3x 4.8GB/s speed

Both of these claims are completely incorrect. Virtualized texture or geometry is an ancient technology, just like PRT, PRT+ or later SFS. If it is supported at the HW level with XSX GPU, it also supports PS5 GPU. The API layer is software and code. It is not tied only to the DX12 ultimate. So I'm going back to the question of what benefits it will mean for the same game if the PS5 SSD IOP delivers 2-5 times more data to the frame ?
 

Three

Member
SFS isn't just about only loading what's visible, but operating much faster and helping the streaming system much more accurately determine what should have never been loaded in the first place. Sampler Feedback's functionality combined with PRT is what allows the kinds of insane 2.5x-3x effective multipliers on system memory and I/O bandwidth Microsoft are talking about because whereas PRT simply streams out high-res texture mips, SFS steams in/out parts of a mip in a much more fine grained way, delivering only a specific corner or middle section of a whole texture if need be without having the rest of the parts not seen occupying memory. PRT is swapping out whole texture mips, not tiny parts of them.

Though often called the same thing or conflated to describe the functionality, even in Microsoft own official DirectX documentation, the capabilities of SFS and basic PRT are not entirely the same. These various tweets by the guy working on it at Microsoft probably gives a better picture than I could.

















ZK3RdbC.jpg

If you read what he has written it only confirms it. Notice he says SFS is PRT + SF.
The purpose of PRT is to only have the required parts of the texture in VRAM.

This is where most of your memory saving is. Determining residency and streaming can even be software based, as for hardware support any turing card supports it (way before XVA). SFS is essentially identical to PRT+. They are in fact interchangeable terms.

The 2-3x effective multipliers that MS have given are compared to xbox one games. This is because not many of them actually use PRT+/SFS to stream textures as needed.
 

Panajev2001a

GAF's Pleasant Genius
Sampler Feedback's functionality combined with PRT is what allows the kinds of insane 2.5x-3x effective multipliers on system memory and I/O bandwidth Microsoft are talking about because whereas PRT simply streams out high-res texture mips
No that multiplier does not come on top of PRT usage and Stanard is the one disagreeing with you in the tweets you just quoted saying PRT + SF = SFS, for example, clarifies what SF does in this case, and does not say anything suggesting that PRT loads entire high res texture mips and not only part of the texture (which nobody stated.
5fOR8F1.jpg


This is not underselling it, SF is still saving you time by calculating this data/tracking this data for you and handles edge cases when filtering streaming data. It also makes it simpler to use virtual texturing and texture streaming than before.
 

muteZX

Banned
Ancient. The term "partially resident textures" has been around for a very long time: AMD's GCN 1.0 /read PS4 level/ brought hardware support starting at the end of 2011.

The DX API manifests the HW features of a particular GPU on the outside. The DX API is a software layer and you can do something different for yourself, better, worse, slower, faster. RT, VRS, SFS, mesh shaders are HW extensions of the RDNA2 architecture, from which the PSP5 and XSX GPU are based. SFS is nothing unique or exclusive to the XSX GPU.
 

Boglin

Member
Sampler Feedback's functionality combined with PRT is what allows the kinds of insane 2.5x-3x effective multipliers on system memory and I/O bandwidth Microsoft are talking about because whereas PRT simply streams out high-res texture mips, SFS steams in/out parts of a mip in a much more fine grained way, delivering only a specific corner or middle section of a whole texture if need be without having the rest of the parts not seen occupying memory. PRT is swapping out whole texture mips, not tiny parts of them.

I have to disagree with you again.
You are contradicting Stanard with the bolded. What you're describing is traditional mip streaming.

In Stanard's own words:
"PRT is using virtual memory to keep only part of a texture loaded in physical memory."

"The rough 2.5x efficiency comes from not reading whole mips but only the texture regions of interest."


SFS is being compared to traditional texture streaming, i.e. Whole mips, to get its 2.5x efficiency. He's not comparing it against PRT.

Here is an official AMD slide showing PRT as only loading partial mips.

7yTAcFk.jpg


Here is an amazing writeup I found that explains what PRT, PRT+, and SFS are. Full link is in the spoiler at the bottom.

    • Classic
    • PRT
    • PRT+SF
    • SFS
Classic Texture Streaming

Let’s start with classic texture streaming, which is the most basic and simple one. As we’ve talked about “mipmapping”, developers now have gained a new set of assets that is at least a half smaller than the original Mip0.

So, for saving the precious memory space, developers start to find out ways to use the high level mip8s (mip8 just for example). Before classic texture streaming, everything in a game level is loaded with mip0. With classic streaming, developers can now use different mip level for different objects, with different ranges or sizes.

Partial Resident Texture or Virtual Texture

PRT is the term used by Unreal Engine, and Virtual Texture is the term used by idTech. But generally they’re the same thing.

As the time moving forward, the mip0 is now larger and larger. We’re seeing 4K and 8K textures now, that can be a huge burden for the memory when loaded in a whole.

So, what about just loading parts of them?

PRT used the same idea of Virtual Memory. We don’t have to load every part of a texture into the memory. We can divide the large texture into small tiles.

image



By dividing the large texture into a tile array, now we can have more fine grained control over the tiles.

For different parts of the texture, some of them can be a part from Mip0, and some of them can be a part from Mip 3 or so.

The MinMip map above, have shown a 8x8 area, requesting for different level of mips.

In this particular example, Every tile has the same memory size. A single tile in Mip1 covers (2^1)^2=4 area size of a Mip0 tile. Thus it’s 4 times less detailed, smaller in general. But still covering the same area size. Likewise, a Mip2 tile covers (2^2)^2=4^2=16 area size, 16 times less detailed and smaller. But still covering the same area size. And the Mip3 tile can cover the whole 64 area size single handedly. Awesome right? But it’s extremely poor quality so we can only use it on the most insignificant part.

Before PRT, we need 64 units of tile memory space to cover that 8x8 area. With PRT, we can now use 1+3+3+1=8 memory space to cover that area. Assuming the mipmap is efficient, that’s a huge save isn’t it?

Well, that’s where the things get tricky: How to make sure the mipmap is efficient?


Before Sampler Feedback, the developers lack the ability to optimize things to the absolutely last drop. They could only make some guesses about visibility, importance or so, but they lack the direct control on things. It’s like you were riding a bike without your hands on the handle, yes you can still control the weight balance and speed using your muscles, but isn’t that shakey?

PRT+(Sampler Feedback)

Time to save the day! With DirectX 12 Ultimate, developers can now get reports from the sampler, and use that report to minimize artifacts, lag spikes and memory wastes! We can finally put our hands back on the bike’s handle now

Traditional PRT solutions were based on guess,

PRT+(PRT with sampler feedback) is based on hard facts. Because samplers are the real smart end consumers of texture assets, they know what they need (unlike some poor market in other areas of gaming, just kidding LOL). With SF, the streaming engine always only stream needed assets, no waste.

However you do need hardware support for PRT+, you need a modern GPU and SSD at least. And even PRT+ can be refined and optimised. Here we finally goes to the almighty

Sampler Feedback Streaming

SFS is based on PRT+, and PRT+ is based on PRT&Sampler Feedback. SFS it’s a complete solution for texture streaming, containing both hardware and software optimizations.

Firstly, Microsoft built caches for the Residency Map and Request Map, and records the asset requests on the fly. The difference between this method and traditional PRT methods is kinda like, previously you have to check the map but now you have a gps.

Secondly, you need a fast SSD to use PRT+ and squeeze everything available in the RAM. You won’t want to use a HDD with PRT+, because when the asset request emerges, it has to be answered fast (within milliseconds!). The SSD on Xbox is now priotized for game asset streaming, to minimize latency to the last bit.

Thirdly, Microsoft implemented a new method for texture filtering and sharpening on hardware. This is used to smooth the loading transition from mip8 to mip4 or mip0…etc. It’s not magic, but it works like magic:

image
image1118×648 140 KB



As we have stated, the Sampler knows what it needs. The developer can answer the request of Mip 0 by giving Mip 0.8 on frame 1, Mip 0.4 on frame 2, and eventually Mip 0 on frame 3.

The fraction part is used on texture filtering, so that the filter can work as intended and present the smoothest transition between LOD changes.

It also allows the storage system to have more time to load assets without showing artifacts.

 

Utherellus

Member
Both of these claims are completely incorrect. Virtualized texture or geometry is an ancient technology, just like PRT, PRT+ or later SFS. If it is supported at the HW level with XSX GPU, it also supports PS5 GPU. The API layer is software and code. It is not tied only to the DX12 ultimate. So I'm going back to the question of what benefits it will mean for the same game if the PS5 SSD IOP delivers 2-5 times more data to the frame ?
SFS is proprietary technology to Microsoft and it uses custom hardware piece. PS5 does not have an analogue. Nor HW supported, not SW. If they had, they would have talked about it. Sure they can develop SW accelerated solution, of course. But their focus was SSD and I/O performance itself, not optimizing asset size that need to be loaded in the first place.

Microsoft even talked about bringing it on PC. But they added that is will be software accelerated. They would have said: "you will get HW accelerated SFS on RDNA2 gpus". But no. They custom designed it on Xbox, not available on other RDNA2 configurations.

PRT was VERY limited in it's adoption rate and overall performance in last gen. Because you simply did not have storage fast enough to handle it. Not my words. I am quoting MS developer from Game Stack Event.

Timestamp included.

 
Last edited:

muteZX

Banned
SFS is proprietary technology to Microsoft and it uses custom hardware piece. PS5 does not have an analogue. Nor HW supported, not SW. If they had, they would have talked about it. Sure they can develop SW accelerated solution, of course. But their focus was SSD and I/O performance itself, not optimizing asset size that need to be loaded in the first place.

Microsoft even talked about bringing it on PC. But they added that is will be software accelerated. They would have said: "you will get HW accelerated SFS on RDNA2 gpus". But no. They custom designed it on Xbox, not available on other RDNA2 configurations.

PRT was VERY limited in it's adoption rate and overall performance in last gen. Because you simply did not have storage fast enough to handle it. Not my words. I am quoting MS developer from Game Stack Event.

Timestamp included.




SFS is part of the DX12 Ultimate .. API for many graphics cards. PRT has been here with us for over 10 years. PS4 supports PRT. It's naive to think that SONY left it as it was 10 years ago.
 

sinnergy

Member
SFS is part of the DX12 Ultimate .. API for many graphics cards. PRT has been here with us for over 10 years. PS4 supports PRT. It's naive to think that SONY left it as it was 10 years ago.
Small adjustment : SF is, not SFS, SFS is until now exclusive for Xbox, they made console specific adjustments for SF on Series consoles.
 
Last edited:
Both of these claims are completely incorrect. Virtualized texture or geometry is an ancient technology, just like PRT, PRT+ or later SFS. If it is supported at the HW level with XSX GPU, it also supports PS5 GPU. The API layer is software and code. It is not tied only to the DX12 ultimate. So I'm going back to the question of what benefits it will mean for the same game if the PS5 SSD IOP delivers 2-5 times more data to the frame ?

We don't know if the PS5 GPU supports it. It's certainly not a guarantee that it does, but let's assume that it does even if I personally doubt it. Then in such a scenario PS5 is even more insanely faster than Series X still if it does, but with what the Series X will be capable of already at what point will we be at "fast enough?"

As I said in my 22GB texture request hypothetical.

22GB becomes 8.8GB due to SFS

8.8 / 2.4GB/s / 2 = 1.83 seconds. An insane effective I/O rate of 22GB/sec.

or

14GB texture request hypothetical.

14GB becomes 5.6GB due to SFS

5.6GB / 2.4GB/s / 2 = 1.16 seconds. An awesome effective I/O rate of 14GB/s

When would such a crazy thing NOT be enough? 2 on screen equivalent results that without SFS would easily exceed the RAM capacity of Series X, but is possible thanks to SFS. There is no way this will never be enough for this generation. If both consoles had way more than just 16GB of RAM, and more than 13.5GB-14GB of usable ram, then perhaps then the 2.4GB/s raw for the Series X SSD could become an issue, but that will never be the case for this generation of games.
 

Utherellus

Member
SFS is part of the DX12 Ultimate .. API for many graphics cards. PRT has been here with us for over 10 years. PS4 supports PRT. It's naive to think that SONY left it as it was 10 years ago.
I know that PRT has been here with us for over 10 years.

But you fail to get the point. Listen to my timestamped video. MS developer explains that PRT was severely limited in use and performance in the last gen because slow HDD just could not load out-of-memory textures of other side of the character fast enough, if you turn camera very quickly.

What is the point of PRT if you game looks like nightmare because HDD cant load textures properly/fast enough?

So it was used in very limited ways.

P.S. as sinnergy said, SF is part of DX12, not SFS. It is custom to Xbox because its hardware accelerated.
 
Last edited:

Lysandros

Member
This fighting over specs is a bit pathetic

both will be fast enough and produce similat results - better/worse depending on game/dev/engine.. just like the gpu and cpu difference

the main difference this gen will be exclusive games and services

- gamepass, quick resume, enhanced bc with fps boost for xbox..
- game help feature and game stream sharing on ps5
In matter of I/O throughput? No, not even close. XSX won’t magically close the very significant hardware/processing power/speed gap with fantasy theories. As to GPU/CPU performance, you are right.
 

sinnergy

Member
We don't know if the PS5 GPU supports it. It's certainly not a guarantee that it does, but let's assume that it does even if I personally doubt it. Then in such a scenario PS5 is even more insanely faster than Series X still if it does, but with what the Series X will be capable of already at what point will we be at "fast enough?"

As I said in my 22GB texture request hypothetical.

22GB becomes 8.8GB due to SFS

8.8 / 2.4GB/s / 2 = 1.83 seconds. An insane effective I/O rate of 22GB/sec.

or

14GB texture request hypothetical.

14GB becomes 5.6GB due to SFS

5.6GB / 2.4GB/s / 2 = 1.16 seconds. An awesome effective I/O rate of 14GB/s

When would such a crazy thing NOT be enough? 2 on screen equivalent results that without SFS would easily exceed the RAM capacity of Series X, but is possible thanks to SFS. There is no way this will never be enough for this generation. If both consoles had way more than just 16GB of RAM, and more than 13.5GB-14GB of usable ram, then perhaps then the 2.4GB/s raw for the Series X SSD could become an issue, but that will never be the case for this generation of games.
On a side not, we don’t know how automatic this all is , and if a dev needs to do more work, seems Sony’s hardware approach is a little easier. And how this would translate to real world use and results .
 

JackMcGunns

Member
Nah, it definitely matters because it isn't just about what the data and raw speeds are like. Without utilizing the proper APIs, which make proper use of the hardware to break through any I/O bottlenecks or limitations, you won't see the PS5 doing what it can truly do in a game that isn't properly built for it. Xbox's APIs clearly make getting something extra from the new hardware easier without necessarily having to build it ground up, which is why I often don't really see those things as anything to brag about when I see it. If I'm bragging about it, I'm just having some fun on that day lol.

Again, you’re talking about processor and performance which has nothing to do with static data on a disc.

The disc stores the compressed data, the system decompresses and reads said data. The amount of compression may vary depending on the amount of video or texture work used, but we’re talking about the same content, there’s equal amount of FMV and textures for both.
 
On a side not, we don’t know how automatic this all is , and if a dev needs to do more work, seems Sony’s hardware approach is a little easier. And how this would translate to real world use and results .

100% correct. It does require more work to implement this, and Sony's approach is a whole lot easier than doing this. This is true, and that's a major advantage for Sony. Microsoft has to drive adoption for more devs to pick this up bigtime. The massive boost in their number of studios makes more sense in the context of a need to have stuff like this supported.

Microsoft still has to showcase games supporting this. I have a strange feeling that Halo Infinite will be using it, and was perhaps using it this whole time, but due to things not being completed engine/graphics/art wise in time for the gameplay demo it didn't quite show like how they expected. But I've always been amazed at the level of detail on many far out or seemingly insignificant things in not only Halo Infinite's older trailers, but even in some of the criticized gameplay footage and campaign trailer there are things that look pretty damn good.
 
Again, you’re talking about processor and performance which has nothing to do with static data on a disc.

The disc stores the compressed data, the system decompresses and reads said data. The amount of compression may vary depending on the amount of video or texture work used, but we’re talking about the same content, there’s equal amount of FMV and textures for both.

If the right API to use the SSD and I/O aren't in use then none of it would matter really. You could be getting heavy under-utilization.
 

Heisenberg007

Gold Journalism
What’s the latest on install size? Are PS5 games still smaller? What’s Mass Effect’s install size difference?
I heard that Mass Effect is smaller but it still BC (there is no PS5 or XSX version) so I wouldn't count that either way, even if it was.

Control definitely wasn't a one-off. We saw Avengers, Resident Evil Village, Crash, MLB The Show, and a few other games that are smaller on PS5 -- ranging from ~10% to ~35%.

The latest one is Subnautica's next-gen version. If I am correct, then Subnautica is 124% bigger on XSX than it is on PS5 (~3.7 GB on PS5 vs. ~8.3 GB on XSX). For reference, the PS4 version was 14GB.
 

Panajev2001a

GAF's Pleasant Genius
SFS is proprietary technology to Microsoft and it uses custom hardware piece.
Yes, MS added some additional instructions for non blocking I/O and texture filters and documented the additions they made (see prior posts in the thread, they make it easier to use efficiently, but you are not leaving competing consoles in the dust, what people have an issue with is it being oversold for some reason).
PS5 does not have an analogue. Nor HW supported, not SW. If they had, they would have talked about it. Sure they can develop SW accelerated solution, of course.
So they can and have a SW enhanced analogue for the SF/SFS parts as devs need to track visibility, prefetch the right data, and handle transitions between levels of detail (they do, that is how people had been doing virtual texturing before… see Rage on console or Doom 2016 as two examples out of likely a lot more)… about the if they had they would have talked about it bit? It would not be the first nor the last thing they gloss over while MS screams about it (look at DualSense improved input latency over DS4 which they dm ever mentioned when talking about the controller update while MS shouted about it).
But their focus was SSD and I/O performance itself, not optimizing asset size that need to be loaded in the first place.
Do not quite find proof of this statement, you are implying a brawn vs brains approach. I do not see evidence they have failed to prioritise asset size/reducing the asset size of data that needs to be loaded (PRT is the core mechanism for both solutions, we do not know if SF is part of PS5 with another name, but you do have the HW mechanism to ensure the GPU does not store more data than it strictly needs and only partial texture regions, what you are doing yourself as a developer with PRT [SFS should be easier to use] is to determine what you need to load next and tell the GPU to prefetch it… and that is assuming PS5’s GPU does not include any improvements to its PRT support for texture streaming).

I have to disagree with you again.
You are contradicting Stanard with the bolded. What you're describing is traditional mip streaming.

In Stanard's own words:
"PRT is using virtual memory to keep only part of a texture loaded in physical memory."

"The rough 2.5x efficiency comes from not reading whole mips but only the texture regions of interest."


SFS is being compared to traditional texture streaming, i.e. Whole mips, to get its 2.5x efficiency. He's not comparing it against PRT.

Here is an official AMD slide showing PRT as only loading partial mips.

7yTAcFk.jpg


Here is an amazing writeup I found that explains what PRT, PRT+, and SFS are. Full link is in the spoiler at the bottom.

    • Classic
    • PRT
    • PRT+SF
    • SFS
Classic Texture Streaming

Let’s start with classic texture streaming, which is the most basic and simple one. As we’ve talked about “mipmapping”, developers now have gained a new set of assets that is at least a half smaller than the original Mip0.

So, for saving the precious memory space, developers start to find out ways to use the high level mip8s (mip8 just for example). Before classic texture streaming, everything in a game level is loaded with mip0. With classic streaming, developers can now use different mip level for different objects, with different ranges or sizes.

Partial Resident Texture or Virtual Texture

PRT is the term used by Unreal Engine, and Virtual Texture is the term used by idTech. But generally they’re the same thing.

As the time moving forward, the mip0 is now larger and larger. We’re seeing 4K and 8K textures now, that can be a huge burden for the memory when loaded in a whole.

So, what about just loading parts of them?

PRT used the same idea of Virtual Memory. We don’t have to load every part of a texture into the memory. We can divide the large texture into small tiles.

image



By dividing the large texture into a tile array, now we can have more fine grained control over the tiles.

For different parts of the texture, some of them can be a part from Mip0, and some of them can be a part from Mip 3 or so.

The MinMip map above, have shown a 8x8 area, requesting for different level of mips.

In this particular example, Every tile has the same memory size. A single tile in Mip1 covers (2^1)^2=4 area size of a Mip0 tile. Thus it’s 4 times less detailed, smaller in general. But still covering the same area size. Likewise, a Mip2 tile covers (2^2)^2=4^2=16 area size, 16 times less detailed and smaller. But still covering the same area size. And the Mip3 tile can cover the whole 64 area size single handedly. Awesome right? But it’s extremely poor quality so we can only use it on the most insignificant part.

Before PRT, we need 64 units of tile memory space to cover that 8x8 area. With PRT, we can now use 1+3+3+1=8 memory space to cover that area. Assuming the mipmap is efficient, that’s a huge save isn’t it?

Well, that’s where the things get tricky: How to make sure the mipmap is efficient?


Before Sampler Feedback, the developers lack the ability to optimize things to the absolutely last drop. They could only make some guesses about visibility, importance or so, but they lack the direct control on things. It’s like you were riding a bike without your hands on the handle, yes you can still control the weight balance and speed using your muscles, but isn’t that shakey?

PRT+(Sampler Feedback)

Time to save the day! With DirectX 12 Ultimate, developers can now get reports from the sampler, and use that report to minimize artifacts, lag spikes and memory wastes! We can finally put our hands back on the bike’s handle now

Traditional PRT solutions were based on guess,

PRT+(PRT with sampler feedback) is based on hard facts. Because samplers are the real smart end consumers of texture assets, they know what they need (unlike some poor market in other areas of gaming, just kidding LOL). With SF, the streaming engine always only stream needed assets, no waste.

However you do need hardware support for PRT+, you need a modern GPU and SSD at least. And even PRT+ can be refined and optimised. Here we finally goes to the almighty

Sampler Feedback Streaming

SFS is based on PRT+, and PRT+ is based on PRT&Sampler Feedback. SFS it’s a complete solution for texture streaming, containing both hardware and software optimizations.

Firstly, Microsoft built caches for the Residency Map and Request Map, and records the asset requests on the fly. The difference between this method and traditional PRT methods is kinda like, previously you have to check the map but now you have a gps.

Secondly, you need a fast SSD to use PRT+ and squeeze everything available in the RAM. You won’t want to use a HDD with PRT+, because when the asset request emerges, it has to be answered fast (within milliseconds!). The SSD on Xbox is now priotized for game asset streaming, to minimize latency to the last bit.

Thirdly, Microsoft implemented a new method for texture filtering and sharpening on hardware. This is used to smooth the loading transition from mip8 to mip4 or mip0…etc. It’s not magic, but it works like magic:

image
image1118×648 140 KB



As we have stated, the Sampler knows what it needs. The developer can answer the request of Mip 0 by giving Mip 0.8 on frame 1, Mip 0.4 on frame 2, and eventually Mip 0 on frame 3.

The fraction part is used on texture filtering, so that the filter can work as intended and present the smoothest transition between LOD changes.

It also allows the storage system to have more time to load assets without showing artifacts.


Thanks for this recap, not sure how many times it has to be explained again where the 2.5-3x multiplier comes from (the PRT part of SFS)… :).
 

Boglin

Member
We don't know if the PS5 GPU supports it. It's certainly not a guarantee that it does, but let's assume that it does even if I personally doubt it. Then in such a scenario PS5 is even more insanely faster than Series X still if it does, but with what the Series X will be capable of already at what point will we be at "fast enough?"

As I said in my 22GB texture request hypothetical.

22GB becomes 8.8GB due to SFS

8.8 / 2.4GB/s / 2 = 1.83 seconds. An insane effective I/O rate of 22GB/sec.

or

14GB texture request hypothetical.

14GB becomes 5.6GB due to SFS

5.6GB / 2.4GB/s / 2 = 1.16 seconds. An awesome effective I/O rate of 14GB/s

When would such a crazy thing NOT be enough? 2 on screen equivalent results that without SFS would easily exceed the RAM capacity of Series X, but is possible thanks to SFS. There is no way this will never be enough for this generation. If both consoles had way more than just 16GB of RAM, and more than 13.5GB-14GB of usable ram, then perhaps then the 2.4GB/s raw for the Series X SSD could become an issue, but that will never be the case for this generation of games.
Judging from everything I've read, I don't think the PS5 has any sort of hardware support for Sampler feedback which should make PRT easier to implement on the Series X/S. I'm also very confident the PS5 does not contain hardware filters for smoothing the transition from one mip level to another so you might see some pop in on playstation despite the faster I/O.

In my opinion, even when SFS for XSX and PRT for PS5 start being used to increase their I/O throughput even further, the difference between the consoles will show up just as it did in RE8 in third party games on average. Better initial loading times for one and better resolution/FPS for the other.

Outside of the loading extravaganza that is R&C, I think if you want to see where the fast I/O will be most beneficial for far more typically designed games then you should start looking at the bandwidth in milliseconds rather than in seconds. Small repeating assets can really benefit from these huge numbers. The theoretical 12GB/s for Xbox by utilizing compression and SFS is nearly 200MB per frame at 60FPS. That's nuts!

For instance, in Halo you carry two weapons and you don't necessarily have to keep them both in memory anymore because you can stream their assets in so fast in real time.
 
Last edited:

MonarchJT

Banned
Nonsense, he says the exact opposite. He said the drives are too slow to have any kind of diminishing return.
You should read the whole convo










and honestly what he says is really simple to understand as a concept. Anyway latency is at least as important as speed but I don't think the two consoles differ that much
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
You should read the whole convo










and honestly what he says is really simple to understand as a concept. Anyway latency is at least as important as speed but I don't think the two consoles differ that much


He is an enthusiast (by his own admission) not sure why he is a Carmack level source now, but anyways he is wrong and/or is not supporting some of his statements you are using as evidence.

For example:
“The SSD speed can't help you when you are limited by file size. For example, Spider-Man was file size limited for streaming in assets and was not HDD speed limited.”
This is something Sony both addressed in a PS5 Spider-man tech demo AND in their Spider-man GDC talks (not to mention their game was a lot bigger than needed due to assets size duplication)… not to mention disk speed (as well as available RAM of course) being a limiting factor in open world streaming engines (dictating the max movement speed of players) is something that nobody ever brought in as if it were debatable (it is common sense too).
 
Last edited:

MonarchJT

Banned
He is an enthusiast (by his own admission) not sure why he is a Carmack level source now, but anyways he is wrong and/or is not supporting some of his statements you are using as evidence.

For example:

This is something Sony both addressed in a PS5 Spider-man tech demo AND in their Spider-man GDC talks (not to mention their game was a lot bigger than needed due to assets size duplication)… not to mention disk speed (as well as available RAM of course) being a limiting factor in open world streaming engines (dictating the max movement speed of players) is something that nobody ever brought in as if it were debatable (it is common sense too).
i know where you going...Pana and i gave you at least 3 or 4 practical example (that you ignored) of how file size is the limit and not the SSD / io speed .......you can't have games long 10 hours with a size of 47gb and stream 22gb/s....cmon now ...lol the stream will be even lower 500 mb/s
 
Last edited:

muteZX

Banned
Small adjustment : SF is, not SFS, SFS is until now exclusive for Xbox, they made console specific adjustments for SF on Series consoles.

SFS is SFS, "nothing" special about it ..

DirectX-Specs

Sampler Feedback

About this document

This document describes a Direct3D 12 runtime feature.

Overview

Sampler Feedback is a Direct3D feature for capturing and recording texture sampling information and locations. Without sampler feedback, these details would be opaque to the developer.

Motivation

Sampler feedback is one feature with two distinct usage scenarios: streaming and texture-space shading.
 

MonarchJT

Banned
SFS is SFS, "nothing" special about it ..

DirectX-Specs

Sampler Feedback

About this document

This document describes a Direct3D 12 runtime feature.

Overview

Sampler Feedback is a Direct3D feature for capturing and recording texture sampling information and locations. Without sampler feedback, these details would be opaque to the developer.

Motivation

Sampler feedback is one feature with two distinct usage scenarios: streaming and texture-space shading.

you talking about SF he was talking about SFS ...and yes in correlation to SF .. is a big big thing
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
Judging from everything I've read, I don't think the PS5 has any sort of hardware support for Sampler feedback which should make PRT easier to implement on the Series X/S. I'm also very confident the PS5 does not contain hardware filters for smoothing the transition from one mip level to another so you might see some pop in on playstation despite the faster I/O.

In my opinion, even when SFS for XSX and PRT for PS5 start being used to increase their I/O throughput even further, the difference between the consoles will show up just as it did in RE8 in third party games on average. Better initial loading times for one and better resolution/FPS for the other.

Outside of the loading extravaganza that is R&C, I think if you want to see where the fast I/O will be most beneficial for far more typically designed games then you should start looking at the bandwidth in milliseconds rather than in seconds. Small repeating assets can really benefit from these huge numbers.
Both consoles have a lot to give for sure and no, we do not have proof PS5 has anything beyond basic PRT although it is a bit unlikely (SFS should make it easier to implement efficient texture streaming without an even small shader cost handling the texture pop-in case or handling texture data for prefetching calculations… interesting that people are saying SFS has yet to be used in a production title more than 6 months after launch then).

The theoretical 12GB/s for Xbox by utilizing compression and SFS is nearly 200MB per frame at 60FPS. That's nuts!

For instance, in Halo you carry two weapons and you don't necessarily have to keep them both in memory anymore because you can stream their assets in so fast in real time.

Both have an insane theoretical per frame bandwidth (200-307.2+ MB per frame on XSX [307.2 MB assumes you are saturating the BCPack decoder and a 3x multiplier for SFS instead of 2.5x]… on PS5 we are talking about 341-600 MB if you take into account Oodle Texture + Kraken and inch a bit closer to maxing the Kraken unit decompression rate) indeed.

As you and others pointed out before the key is games being written and data authored to allow these SSD’s to flex their muscles and load lots of small assets very quickly and with a very low latency too… and as low CPU overhead as possible.
This is another area where I think the silicon they spent on PS5’s custom SSD controller and custom I/O complex will come in handy (considering some of the cuts they made to the Ryzen 2 cores and the lower peak frequency 3.5 GHz with SMT vs 3.6 GHz with SMT and 3.8 GHz)… and it better considering it took transistors away from many other components ;).
 

Heisenberg007

Gold Journalism
i know where you going...Pana and i gave you at least 3 or 4 practical example (that you ignored) of how file size is the limit and not the SSD / io speed .......you can't have games long 10 hours with a size of 47gb and stream 22gb/s....cmon now ...lol the stream will be even lower 500 mb/s
Demon's Souls is a 24-hour long game, has a 66 GB file, and streamed data at 4 Gb/s.
 

Three

Member
You should read the whole convo

and honestly what he says is really simple to understand as a concept. Anyway latency is at least as important as speed but I don't think the two consoles differ that much
These are two different things and I'm not sure you've understood what was being said.

One is talking about saturating the SSD bandwidth completely whereby this game would not be possible in a slower drive.

This means that little to no games will be streaming in 22GB of uncompressed data in a second. If a game is 32GB you're talking about a game where you've streamed in 70% of the game in a second. No game is like this, you will never experience 70% of the entire game in a second. Games are size limited.

However this doesn't mean that the speed of the PS5 drive isn't twice as fast for streaming in whatever is required. For example if you had to stream in say 500mb as you traverse the world on a PS5. The game can either limit the max speed of the player or lower the asset quality while moving to maintain the same speed on an Xbox SSD. If you were doing say 500mb for a given speed your asset quality would only be 250mb (ie lower resolution) on an xbox SSD.

That is what is being referred to here



 
Last edited:

Heisenberg007

Gold Journalism
These are two different things and I'm not sure you've understood what was being said.

One is talking about saturating the SSD bandwidth completely whereby this game would not be possible in a slower drive.

This means that little to no games will be streaming in 22GB of uncompressed data in a second. If a game is 32GB you're talking about a game where you've streamed in 70% of the game in a second. No game is like this, you will never experience 70% of the entire game in a second. Games are size limited.

However this doesn't mean that the speed of the PS5 drive isn't twice as fast for streaming in whatever is required. For example if you had to stream in say 500mb as you traverse the world on a PS5. The game can either limit the max speed of the player or lower the asset quality while moving to maintain the same speed on an Xbox SSD. If you were doing say 500mb for a given speed your asset quality would only be 250mb (ie lower resolution) on an xbox SSD.
This is literally all there is to it.

Everyone is just complicating stuff for the sake of complicating it. MS has given us the numbers: 2.4 Gb/s raw and 4.8 Gb/s compressed after using Velocity Architecture (which includes SFS). People here are unnecessarily complicating everything by taking 4.8 Gb/s compressed speed and then applying the SFS and other multipliers on top of it.
 
Last edited:

Heisenberg007

Gold Journalism
The TL;DR of all this is that the 3rd party games, where you can actually make 1:1 comparisons between consoles, will be limited by whatever PC does anyway, so all of this kinda doesn't matter?

Disappointed Let Down GIF by SWR3
Yeah, sort of. The lowest denominator, whatever it may be (PS5, XSX, XSS, PC, last gen console) will be a factor.

My hope is that games built on a mature UE5 will be able to bypass some of the restricitions with virtualized geometry and scaling asset quality and LODs up and down, based on available I/O.
 

Panajev2001a

GAF's Pleasant Genius
i know where you going...Pana and i gave you at least 3 or 4 practical example (that you ignored) of how file size is the limit and not the SSD / io speed .......you can't have games long 10 hours with a size of 47gb and stream 22gb/s....cmon now ...lol the stream will be even lower 500 mb/s
First of all his example was about Spider-man and general open world assets streaming (if you cannot tie the speed of movement to the speed data is fetched from the disk at well 🤷‍♂️), second of all you made several straw man arguments and I already said why I disagreed with them and especially how you framed them.
Should we start again from 22 GB/s being an edge case or how we are not talking about streaming completely unique data all of the time vs reuse/dedicating more RAM to what is on screen, reducing buffers depth, and thus streaming the same blocks of data over and over or how it is not how many GB you move per second but how many ms you need to move a specific piece of data that is important?

You are repeating them over and over regardless of what people reply back and using the fact you are repeating it over and over as evidence you were right to begin with…. If people stop replying == they admitted defeat eh ;)?
 
Last edited:
This is literally all there is to it.

Everyone is just complicating stuff for the sake of complicating it. MS has given us the numbers: 2.4 Gb/s raw and 4.8 Gb/s compressed after using Velocity Architecture (which includes SFS). People here are unnecessarily complicating everything by taking 4.8 Gb/s compressed speed and then applying the SFS and other multiplayers on top of it.
"Includes SFS" seems to oversimplify it. The entire point of SFS is to reduce the amount of assets you have to stream in the first place, so it has nothing to do with bandwidth throughput. Or to stick with Three's example, the PS5 is pushing 500 MB, the XSX is pushing 250 MB, but ideally both have the same asset quality because of SFS.
 

Panajev2001a

GAF's Pleasant Genius
The TL;DR of all this is that the 3rd party games, where you can actually make 1:1 comparisons between consoles, will be limited by whatever PC does anyway, so all of this kinda doesn't matter?

Disappointed Let Down GIF by SWR3

DirectStorage is coming to PC’s and modern GPU’s have need to spend those extra TFLOPS somewhere why not decompressing data right ;)?

Gaming PC’s with 16-24 GB or more main RAM and 6-8 GB or more of VRAM already have more RAM than they need or will have it to bear with a longer much longer initial load and then handle the rest of the game as if they had a similarly crazy fast SSD.

As next generation consoles consoles sell and take a bigger and bigger pie of the revenue PC games will see their minimum and recommended requirements going up as they tend to do with each console generation.
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
"Includes SFS" seems to oversimplify it. The entire point of SFS is to reduce the amount of assets you have to stream in the first place, so it has nothing to do with bandwidth throughput. Or to stick with Three's example, the PS5 is pushing 500 MB, the XSX is pushing 250 MB, but ideally both have the same asset quality because of SFS.
PRT reduces the amount of assets (portions vs entire texture) you have to stream, SFS is about the HW doing work for you to help you understand what you need to load next and get it loaded there. So no, SFS does not buy you 2-3x bandwidth over PS5.
pogkJrL.jpg
 
Last edited:

Boglin

Member
Both consoles have a lot to give for sure and no, we do not have proof PS5 has anything beyond basic PRT although it is a bit unlikely (SFS should make it easier to implement efficient texture streaming without an even small shader cost handling the texture pop-in case or handling texture data for prefetching calculations… interesting that people are saying SFS has yet to be used in a production title more than 6 months after launch then).
I attribute cross-gen stuff to be holding things back a bit but you're right, I shouldn't be asserting it isn't being used because I really don't know. I'm curious if R&C is using PRT.

Both have an insane theoretical per frame bandwidth (200-307.2+ MB per frame on XSX [307.2 MB assumes you are saturating the BCPack decoder and a 3x multiplier for SFS instead of 2.5x]… on PS5 we are talking about 341-600 MB if you take into account Oodle Texture + Kraken and inch a bit closer to maxing the Kraken unit decompression rate) indeed.

As you and others pointed out before the key is games being written and data authored to allow these SSD’s to flex their muscles and load lots of small assets very quickly and with a very low latency too… and as low CPU overhead as possible.
This is another area where I think the silicon they spent on PS5’s custom SSD controller and custom I/O complex will come in handy (considering some of the cuts they made to the Ryzen 2 cores and the lower peak frequency 3.5 GHz with SMT vs 3.6 GHz with SMT and 3.8 GHz)… and it better considering it took transistors away from many other components ;).
I'm really happy the companies diverged a bit because I love seeing unique hardware. Prior to launch I was trying to advocate that Sony wouldn't spend money on r&d and give up valuable die space for nothing. They also wouldn't have developed an SSD that offers less storage than an off the shelf counterpart unless the speed was worth it.

My argument back then was "Sony would not actively hurt themselves by compromising graphics with giving up CUs for I/O and nearly 200GB of storage space for a custom SSD if the only benefit was 1 or 2 seconds of reduced load times." The response I got at the time gave me my only experience dealing with a 100% unreasonable person on this site, imo. Shout out to MrFunSocks MrFunSocks .
 
Last edited:
Top Bottom