• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Next-Gen PS5 & XSX |OT| Console tEch threaD

Status
Not open for further replies.

Mr Moose

Member
Has there been any game with 3D audio on Xbox though? Seems devs always mentioning PS5's 3D audio but nothing on the other side (usually still using old 7.1 channel).

I think it'll come to PS Plus within 1-2 months, so will ignore the game as I'm not really enthusiastic about it.
I think Cold War supports 3D audio, and erm... There was another that I remembered for a second but my mind has gone blank.
Edit: Resident Evil Village and Outriders, maybe?
 
Last edited:

Bo_Hazem

Banned
I think Cold War supports 3D audio, and erm... There was another that I remembered for a second but my mind has gone blank.

Actually even games with no support the PS5 does a great job at mixing the 7.1 channel to 3D audio like Genshin (PS4 BC, not sure about the new one) as if you turn 3D audio off it's pretty obvious to feel the 7.1 channel limitation which is a really steep downgrade when you get used to smooth, dynamic 3D audio.
 
Last edited:

Godfavor

Member
I guess regardless of what they may call it, Xbox Series X was, according to Microsoft, strongly designed around it as one of the premier features of the console. The GPU even has customizations unique to Series X with the feature in mind according to a Microsoft graphics engineer.












dwho6Lh.jpg

As I understand while using SFS and the SSD as RAM, the IO will be loading assets in real time according to the player's actions. SFS will do the guesswork of what mips would load next as it will predict how the player will move (direct ML?).

If the player will do out of the ordinary actions and the SSD would fail to load the streaming asset in time. The algorithm will use a temporary mip in place ( guessing according to the nearest neighborhood pixel?) until the quality mip would load, making the transition smoother (rather than having ugly lower quality textures waiting to be replaced).

So much for MS that has nothing custom to eliminate streaming issues. At least they explain of how it works
 
Last edited:

Mr Moose

Member
Actually even games with no support the PS5 does a great job at mixing the 7.1 channel to 3D audio like Genshin (PS4 BC, not sure about the new one) as if you turn 3D audio off it's pretty obvious to feel the 7.1 channel limitation which is a really steep downgrade when you get used to smooth, dynamic 3D audio.
I haven't tried the PS5 version with headphones yet, I'll give it a try tomorrow if I remember when I am doing my daily missions.
 

Krizalidx11

Banned
When will consoles have the same IQ as PC?? not talking about the console sending a 4k signal to the monitor, PC games on monitors looks so sharp and clear while consoles always look a bit blurry in comparison
 

skit_data

Member
Probably doing his daily "googling my own name to bask in the glory" that he naturally should be doing.
He seems like such a quiet and nice guy, sometimes I feel really bad for him. I imagine him scrolling through memes of him doing the Cerny stare doing all kinds if hideous things to Xbox representatives. And vice versa of course. Having that stuff plastered all over every extreme fanboys twitter must take it’s toll on such a soft spoken man as Cerny.

Or maybe not. It might be a cover. We saw a glimpse of his true persona in that PS4 reveal.
ugJ1Q6x.jpg
 
Last edited:

IntentionalPun

Ask me about my wife's perfect butthole
I wonder why MS uses the term SFS and not PRT+ as they are the same thing. Even in their own explanation page the terminology for SFS can be named as PRT+. PRT+ was known for years as partial texture rendering.

I don't think "PRT+" is some thing that's been around for years; PRT has, but not "PRT+".

At least as far as I can tell googling; literally the only reference to it is in the MS documentation and people discussing it on forums.
 

Jacir

Member
Sounds more like foveated rendering to me. From a comment I read elsewhere:

For those not hip on VR tech, foveated rendering is an inevitable holy grail that will make it so that VR is an order of magnitude easier to run than games on traditional displays. The human eye can only resolve an area about the size of your thumbnail at arms length in full clarity. You can try just looking around you while paying attention to how much of your vision is actually in focus, or you can use this https://www.shadertoy.com/view/4dsXzM for a pretty instant example.

What this means is that it is unnecessary to render 95% of the image in full resolution because your eyes cannot resolve those details anyway. So now you have an image that is perfectly sharp in the 5% you are rendering at full res, but as soon as you move your pupils off center then it's going to be very noticeable. To combat this in HMDs like the Quest while also getting some perf benefit they essentially match the full res area with the sharpest center part of the lens and then gradually reduce res towards the edges where you're getting some optical blur/distortion anyway. This static style is called "fixed foveated rendering" or "lens matched shading" and it does improve performance but can still be noticeable and leaves a boatload of potential rendering savings on the table.

Now enter eye tracking that is fast and accurate enough to keep up with eye movements and you can render only the part of the display you are looking at in a specific moment in full resolution while going as far as filling in the rest with an ML inferred reconstruction based off of a sparse cloud. Once this gets fully worked out and solved, you would get an absolutely gamechanging rendering cost reduction (20x reduction according to FB's Michael Abrash and that's not out of line with other experts) while being indistinguishable from rendering the whole damn thing in full res at once. Gabe Newell a couple years back talked about how he believes we will hit a point where VR HMDs leapfrog traditional displays and anyone that wants to see the highest graphical fidelity will need to put one on their face, this is what he was talking about and it is going to happen eventually.
I never had perfect eye sight but I'm so glad I'm near-sighted, when it comes to this situation.
 
Last edited:

Boglin

Member
No need to be unclear, just watch the latest Microsoft Game Stack demonstration of it in action versus a highly optimized, even unrealistically so, gen9 console with fast SSD texture streaming system without Sampler Feedback. Pay attention to when they make clear that the advantages do not change with more visual complex games. The percentage improvement over a texture streaming system without Xbox's Sampler Feedback Streaming remains the same.
I did watch that video and watched it again to see if I missed something but the ambiguity is still there about how much SFS is an improvement over PRT.

The demo shows an HDD using traditional texture streaming as an example and describes traditional streaming as loading whole mips based on what objects are visible and what size they are on screen, i.e. not PRT. The narrator then says "We can estimate what an optimized generation 9 traditional texture streaming engine would need to have in memory to render what you're seeing right now", which is again describing a non-PRT solution in conjunction with SSD. You can verify this because along with the voice over, it's captioned at the bottom of the screen during this portion of the video that traditional texture streaming speeds up when using a solid state drive.

Please correct me if I'm wrong, but at no point in the video was SFS compared to other versions of PRT. I am still left unclear as to how much of a multiplier improvement SFS is compared to PRT.
 

IntentionalPun

Ask me about my wife's perfect butthole
I did watch that video and watched it again to see if I missed something but the ambiguity is still there about how much SFS is an improvement over PRT.

The demo shows an HDD using traditional texture streaming as an example and describes traditional streaming as loading whole mips based on what objects are visible and what size they are on screen, i.e. not PRT. The narrator then says "We can estimate what an optimized generation 9 traditional texture streaming engine would need to have in memory to render what you're seeing right now", which is again describing a non-PRT solution in conjunction with SSD. You can verify this because along with the voice over, it's captioned at the bottom of the screen during this portion of the video that traditional texture streaming speeds up when using a solid state drive.

Please correct me if I'm wrong, but at no point in the video was SFS compared to other versions of PRT. I am still left unclear as to how much of a multiplier improvement SFS is compared to PRT.

In looking at the video.. it sounds like they are describing PRT?

All texture streaming involves loading partial textures.. traditional streaming guesses what will be visible and then over-streams to compensate for lack of ability to quickly resolve incorrect calculations... as you said " based on what objects are visible", that IS PRT from what I can tell.

All texture streaming involves partial textures.. "traditional" ones just arent as fine tuned and have to load far more than what is actually visible.

I don't think they ever say it's the "whole mips"? The whole mip would be the entire texture w/o any guess at what wouldn't be visible.

edit: to be clear, my question marks are there because I don't really know lol
 
Last edited:

Allandor

Member


I'm sorry you had to see this SlimySnake SlimySnake

As expected. >150m Userbase (PS4 + xbo together) vs <<20m (currently). As long as the new consoles are not sold in region of 50m (together) EA, Ubi, ... will definitely make multiplattform games for the last gen + updated for the new gen. Maybe next year this might change.
The current gen is just to young to get it's only next-gen only titles from the big publishers.
 

Boglin

Member
In looking at the video.. it sounds like they are describing PRT?

All texture streaming involves loading partial textures.. traditional streaming guesses what will be visible and then over-streams to compensate for lack of ability to quickly resolve incorrect calculations... as you said " based on what objects are visible", that IS PRT from what I can tell.

All texture streaming involves partial textures.. "traditional" ones just arent as fine tuned and have to load far more than what is actually visible.

I don't think they ever say it's the "whole mips"? The whole mip would be the entire texture w/o any guess at what wouldn't be visible.
It says in the captions that "traditional texture streaming loads entire texture detail levels at once." Perhaps I'm entirely misunderstanding what partial resident textures are because I understood PRT as streaming relevant tiles from a portion of a texture, rather than the whole texture itself.
 

IntentionalPun

Ask me about my wife's perfect butthole
It says in the captions that "traditional texture streaming loads entire texture detail levels at once." Perhaps I'm entirely misunderstanding what partial resident textures are because I understood PRT as streaming relevant tiles from a portion of a texture, rather than the whole texture itself.

I think you are right.

If you go back a minute or so from the start of the SFS demo; MS basically claims that PRT is barely used today though.. so I dunno, maybe we are all missing something lol

Logically I can't imagine how a 5400RPM drive would do what PRT claims to do. On PC isn't this done by loading a ton in main RAM, and then pulling things into GPU memory using PRT?

How is a console going to pull anything dynamically from a 5400RPM drive as someone turns their head for instance?

I dunno.. FARRR from an expert.. just an idiot doing reading lol

But I always thought console data streaming was about loading things way off screen ahead of time.. not loading and unloading small parts of textures on the fly.

I know both consoles technically had support for their form of PRT... MS is basically saying it was barely used (tiled resources) right in this presentation.
 
Last edited:

IntentionalPun

Ask me about my wife's perfect butthole
For instance, this article is basically saying PRT was too ahead of it's time due to i/o constaints, and was barely used as well:


But this all begs the question.. if PRT wasn't really used last gen.. an SFS is just a better way to do PRT...

This likely isn't a multiplier on what PS5 would be capable of, as it would also be "more capable" of PRT / megatextures than "gen 9."

So MS is comparing to typical "gen 9" texture streaming, which WASNT PRT/megatextures outside of a few examples.

So it's both true, but also misleading.. as the fast I/O on the PS5 would be able to do some good PRT whether they had specific sampler feedback tech built into their I/O pipeline.. I doubt Sony totally ignored PRT in their design though either... as whether it was used or not it was in PS4, and is part of AMD GPUs (that couldn't really be used last gen due to I/O being so slow.)
 
Last edited:

Boglin

Member
I think you are right.

If you go back a minute or so from the start of the SFS demo; MS basically claims that PRT is barely used today though.. so I dunno, maybe we are all missing something lol

Logically I can't imagine how a 5400RPM drive would do what PRT claims to do. On PC isn't this done by loading a ton in main RAM, and then pulling things into GPU memory using PRT?

How is a console going to pull anything dynamically from a 5400RPM drive as someone turns their head for instance?

I dunno.. FARRR from an expert.. just an idiot doing reading lol

But I always thought console data streaming was about loading things way off screen ahead of time.. not loading and unloading small parts of textures on the fly.

I know both consoles technically had support for their form of PRT... MS is basically saying it was barely used (tiled resources) right in this presentation.
I think we are in the same page now. I agree that there is no way that a 5400RPM drive could do PRT in a practical manner so it was most definitely underutilized. Even if the bandwidth was high enough, the seek times would completely bottleneck it. The PS5 hardware I/O and SFS are both bandaids for a lack of memory, though, and PC was always able to solve the problem by throwing more hardware at it.

Still, the presentation did a wonderful job of finally showing a useful demonstration of the tech and it shows how much more efficient hardware will be in the future. It also shows why only using an SSD without using better streaming techniques leaves a lot of performance on the table.

Thank you for listening to my TED talk on how a random guy whose never done graphics programming thinks he knows what MS was talking about now.

Outside a select few, we are all dumb-dumbs here who are trying to wrap our heads around stuff we aren't qualified for 🤣
 

Fafalada

Fafracer forever
How is a console going to pull anything dynamically from a 5400RPM drive as someone turns their head for instance?
It's been done using a 2x BDRom / 8x DVD. And of course that ran on mechanical drives too. Those approaches also virtualized assets (textures or geometry alike) to maximize temporal coherency of disc access, the buffers had to be larger to accommodate for latency etc. Nevertheless - it's been done.

The whole mip would be the entire texture w/o any guess at what wouldn't be visible.
What is referenced here is streaming only the 'necessary' mip-levels. Ie. you still have to compute what is actually in view at any given time(not unlike SFS, just at coarser granularity, as all you care about is what mip-level hardware wants to fetch). Compared to loading entire mip-chain for each visible texture - that is itself about a 3x multiplier in 'effective' bandwidth, and given it's relatively simple to implement (we've been doing it for good 20 years now), it's still rather common way of doing things.

I know both consoles technically had support for their form of PRT... MS is basically saying it was barely used (tiled resources) right in this presentation.
Yea - I/O limitations aside, implementations on older hw still require jumping through some hoops. The whole point with 'Series' modifications is to make things more/actually practical to use, targeted heuristics aren't really new.
 
Last edited:

IntentionalPun

Ask me about my wife's perfect butthole
It's been done using a 2x BDRom / 8x DVD. And of course that ran on mechanical drives too.

Streaming data literally while someone turns their head?

I'd like to see that lol

But then again we had a fraction of the RAM when games were running off of disks. The fidelity going way up since the Xbox 360/PS3 generation while the I/O capabilities didn't was always a real bummer technologically.
 
Last edited:

IntentionalPun

Ask me about my wife's perfect butthole
Yea - I/O limitations aside, implementations on older hw still require jumping through some hoops. The whole point with 'Series' modifications is to make things more/actually practical to use, targeted heuristics aren't really new.

Is it the bespoke HW that is the difference?

I don't want to bash what MS has done.. it does sound great, just trying to garner how much different it would be from PRT on PS5. Sony had it in their SDK last gen.. and I can't imagine it's not there this-gen.. but would it be using CPU power or something? Maybe not quite as efficient? (perhaps it would run later down the line in a pipeline?)

It's also kinda funny ID moved away from that technique lol
 
Last edited:
Hello, everyone. This is not gaming related, but I've been on this thread for a LONG time...way back at least before post #900. Look...just had an extended family member pass from cancer this morning. Won't hit the details, but just for all of the people who get a little too invested in threads like this and go after each other.... Once you've seen a person go from normal and healthy to looking like a WWII death camp survivor in the matter of two months and pass away....puts this shit in perspective. We all have our hobbies that we love and enjoy, but this shit isn't worth getting so fired up about. It could all end for any of us, any day. Try to make sure that what you're putting out into the world is positive as much as possible, because these things matter pretty much not at all in the grand scheme of life on this planet.
Be well, dude. Hope you can cope with it.
 
I did watch that video and watched it again to see if I missed something but the ambiguity is still there about how much SFS is an improvement over PRT.

The demo shows an HDD using traditional texture streaming as an example and describes traditional streaming as loading whole mips based on what objects are visible and what size they are on screen, i.e. not PRT. The narrator then says "We can estimate what an optimized generation 9 traditional texture streaming engine would need to have in memory to render what you're seeing right now", which is again describing a non-PRT solution in conjunction with SSD. You can verify this because along with the voice over, it's captioned at the bottom of the screen during this portion of the video that traditional texture streaming speeds up when using a solid state drive.

Please correct me if I'm wrong, but at no point in the video was SFS compared to other versions of PRT. I am still left unclear as to how much of a multiplier improvement SFS is compared to PRT.

Actually, Microsoft has most likely always been comparing Sampler Feedback Streaming to PRT solutions, as evidenced by comments made by a Microsoft graphics engineer. He also confirms here that they are indeed different capabilities.







Games have been streaming virtual memory pages for a while, it's called partially resident textures. As pointed out Sampler Feedback is a different capability on top that greatly improves the texture streaming process. The improved video memory efficiency and SSD streaming bandwidth efficiency all comes with Sampler Feedback being added into the equation combined with a really fast SSD. The optimized, and unrealistically efficient according to Microsoft, gen 9 streaming system shown in the Sampler Feedback Streaming demo is almost certainly utilizing PRT. Sampler Feedback Streaming is just night and day superior to the methods that have been utilized in modern games up to this point.

And he clearly makes it clear that most GPUs can do PRT now, but isn't as willing to say the same regarding Sampler Feedback. Even the Metro studio's technical director confirms that he has been asking for a feature like Sampler Feedback for a very long time. If it was just something that was always available, why wouldn't one of the most technically gifted studios out there never take advantage of it?


Out of the features available in DirectX 12 Ultimate, which one do you believe will be most useful in terms of performance? Do you plan to utilize them all in the next 4A Games project?

Currently, we use DXR 1.1 inline raytracing and VRS. I like sampler-feedback - I’ve asked hardware
vendors about this for years and it will be utilized for our future projects.
Not sure if we’d go for
mesh shaders in the future as we are not that dependent on traditional vertex/primitive/raster
processing anymore on recent architectures. Our current frames are only about 10% raster and
90% compute on PlayStation 5 and Xbox Series X. And raster pairs well with async compute.

Thank you for your time.


DF discussing Sampler Feedback Streaming on DF Direct Weekly. Alex explains what it is and why it's such a big deal in pretty good detail. The only drawback is teams must specifically build for it and take advantage of it. It won't just work on its own. I expect Microsoft teams to take advantage of it. Wanting their features to be taken advantage of could be a big reasons also why Microsoft went so far as to buy Bethesda. They need big ticket releases showing what their hardware can do. Bethesda will be able to provide those and then some, as I assume other Xbox internal studios will be looking to take advantage of these also. Alex seems to suggest he thinks Halo Infinite will do it

 
Last edited:

Boglin

Member
Games have been streaming virtual memory pages for a while, it's called partially resident textures. As pointed out Sampler Feedback is a different capability on top that greatly improves the texture streaming process. The improved video memory efficiency and SSD streaming bandwidth efficiency all comes with Sampler Feedback being added into the equation combined with a really fast SSD. The optimized, and unrealistically efficient according to Microsoft, gen 9 streaming system shown in the Sampler Feedback Streaming demo is almost certainly utilizing PRT. Sampler Feedback Streaming is just night and day superior to the methods that have been utilized in modern games up to this point.
Can you please explain further why you believe the bolded part is true? In the video, traditional texture streaming is defined as "loading entire texture detail levels at once" which is not PRT and is contrary to what you are saying. This slide prior to the demonstration also indicates they do not consider PRT as part of a traditional texture streaming engine.

9YshaWN.jpg


I'm assuming the rest of your post is directed at someone else because I have said earlier that I think SFS is superior to PRT. I'm just trying to find out to what extent.
 
Last edited:
Can you please explain further why you believe the bolded part is true? In the video, traditional texture streaming is defined as "loading entire texture detail levels at once" which is not PRT and is contrary to what you are saying.

I'm assuming the rest of your post is directed at someone else because I have said earlier that I think SFS is superior to PRT. I'm just trying to find out to what extent.


Because it's what modern texture streaming systems are using in games, and PRT was directly compared to SF by a Microsoft Graphics engineer that worked on Sampler Feedback Streaming.

Another reason I'm confident is because SFS would demonstrate much better than just a 2.5-3x memory efficiency advantage over a system that doesn't utilize virtual memory pages for parts of textures. The advantage with SFS comes from your texture streaming system no longer having to do the same guesswork on which parts of a texture aren't needed for the current scene, hence the superior memory efficiency. SFS against a non-virtual memory texture streaming system would be significantly more efficient than an average 2.5x-3x effective boost.
 

Boglin

Member
Because it's what modern texture streaming systems are using in games, and PRT was directly compared to SF by a Microsoft Graphics engineer that worked on Sampler Feedback Streaming.


But it is clearly stated in that demo that "traditional texture streaming loads entire texture detail levels at once". That is not partially resident textures. Please address this specific point because it's the part I am hung up on. The voice over, the captions, and the graphs of the demo all indicate that SFS is being compared to "traditional texture streaming" which they, themselves defined. Can you see why I'm unclear? I swear I'm not trying to be dense.

All other statements you have shown are a saying that SFS is an evolution and is superior to PRT which I have no doubt is true, but those statements aren't quantifying the benefit.
 

onesvenus

Member
"traditional texture streaming loads entire texture detail levels at once". That is not partially resident textures.
I might be wrong and I need to refresh my knowledge on this but as far as I remember:
1) Traditionally you load the whole texture and all mip levels to memory
2) Using PRT you only load some mip levels but you load each mip level completely (i.e. a subset of 1)
3) SF is a way to only partially load what's visible for each needed mip level (i.e. a subset of 2)
4) SFS adds some hardware blending capabilities for requests for missing mip level parts

So yes, "traditional texture streaming loads entire texture detail levels at once" IS PRT. Notice that it's saying "entire texture detail levels" as in an entire mip level, not the whole mip chain as it was done before but the whole mip nonetheless in contrast to parts of it as in SF
 
Last edited:

Boglin

Member
I might be wrong and I need to refresh my knowledge on this but as far as I remember:
1) Traditionally you load the whole texture and all mip levels to memory
2) Using PRT you only load some mip levels but you load each mip level completely (i.e. a subset of 1)
3) SF is a way to only partially load what's visible for each needed mip level (i.e. a subset of 2)
4) SFS adds some hardware blending capabilities for requests for missing mip level parts

So yes, "traditional texture streaming loads entire texture detail levels at once" IS PRT. Notice that it's saying "entire texture detail levels" as in an entire mip level, not the whole mip chain as it was done before but the whole mip nonetheless in contrast to parts of it as in SF

You know this stuff isn't the most intuitive when so many people have different ideas as to what is going on haha.

I do believe you're incorrect, however. Every article I can find discussing PRT in detail shows it tiling texture resources within each mip level.

Here is a slide from an old AMD presentation giving a visual representation.

QLFpkKq.jpg



Here is a good writeup I found on an Xbox forum that seems pretty accurate from the things I have read.
  • History of texture streaming: Classic, PRT, PRT+ (SFS)
    • Classic
    • PRT
    • PRT+SF
    • SFS
Classic Texture Streaming

Let’s start with classic texture streaming, which is the most basic and simple one. As we’ve talked about “mipmapping”, developers now have gained a new set of assets that is at least a half smaller than the original Mip0.

So, for saving the precious memory space, developers start to find out ways to use the high level mip8s (mip8 just for example). Before classic texture streaming, everything in a game level is loaded with mip0. With classic streaming, developers can now use different mip level for different objects, with different ranges or sizes.

Partial Resident Texture or Virtual Texture

PRT is the term used by Unreal Engine, and Virtual Texture is the term used by idTech. But generally they’re the same thing.

As the time moving forward, the mip0 is now larger and larger. We’re seeing 4K and 8K textures now, that can be a huge burden for the memory when loaded in a whole.

So, what about just loading parts of them?

PRT used the same idea of Virtual Memory. We don’t have to load every part of a texture into the memory. We can divide the large texture into small tiles.

image


By dividing the large texture into a tile array, now we can have more fine grained control over the tiles.

For different parts of the texture, some of them can be a part from Mip0, and some of them can be a part from Mip 3 or so.

The MinMip map above, have shown a 8x8 area, requesting for different level of mips.

In this particular example, Every tile has the same memory size. A single tile in Mip1 covers (2^1)^2=4 area size of a Mip0 tile. Thus it’s 4 times less detailed, smaller in general. But still covering the same area size. Likewise, a Mip2 tile covers (2^2)^2=4^2=16 area size, 16 times less detailed and smaller. But still covering the same area size. And the Mip3 tile can cover the whole 64 area size single handedly. Awesome right? But it’s extremely poor quality so we can only use it on the most insignificant part.

Before PRT, we need 64 units of tile memory space to cover that 8x8 area. With PRT, we can now use 1+3+3+1=8 memory space to cover that area. Assuming the mipmap is efficient, that’s a huge save isn’t it?

Well, that’s where the things get tricky: How to make sure the mipmap is efficient?


Before Sampler Feedback, the developers lack the ability to optimize things to the absolutely last drop. They could only make some guesses about visibility, importance or so, but they lack the direct control on things. It’s like you were riding a bike without your hands on the handle, yes you can still control the weight balance and speed using your muscles, but isn’t that shakey?

PRT+(Sampler Feedback)

Time to save the day! With DirectX 12 Ultimate, developers can now get reports from the sampler, and use that report to minimize artifacts, lag spikes and memory wastes! We can finally put our hands back on the bike’s handle now

Traditional PRT solutions were based on guess,

PRT+(PRT with sampler feedback) is based on hard facts. Because samplers are the real smart end consumers of texture assets, they know what they need (unlike some poor market in other areas of gaming, just kidding LOL). With SF, the streaming engine always only stream needed assets, no waste.

However you do need hardware support for PRT+, you need a modern GPU and SSD at least. And even PRT+ can be refined and optimised. Here we finally goes to the almighty

Sampler Feedback Streaming

SFS is based on PRT+, and PRT+ is based on PRT&Sampler Feedback. SFS it’s a complete solution for texture streaming, containing both hardware and software optimizations.

Firstly, Microsoft built caches for the Residency Map and Request Map, and records the asset requests on the fly. The difference between this method and traditional PRT methods is kinda like, previously you have to check the map but now you have a gps.

Secondly, you need a fast SSD to use PRT+ and squeeze everything available in the RAM. You won’t want to use a HDD with PRT+, because when the asset request emerges, it has to be answered fast (within milliseconds!). The SSD on Xbox is now priotized for game asset streaming, to minimize latency to the last bit.

Thirdly, Microsoft implemented a new method for texture filtering and sharpening on hardware. This is used to smooth the loading transition from mip8 to mip4 or mip0…etc. It’s not magic, but it works like magic:

image
image1118×648 140 KB



As we have stated, the Sampler knows what it needs. The developer can answer the request of Mip 0 by giving Mip 0.8 on frame 1, Mip 0.4 on frame 2, and eventually Mip 0 on frame 3.

The fraction part is used on texture filtering, so that the filter can work as intended and present the smoothest transition between LOD changes.

It also allows the storage system to have more time to load assets without showing artifacts.

SF seems like it will make developers lives a lot easier when implementing PRT by having hardware fetch samples from mapped textures. Doing so can minimalize wasted bandwidth and pop-in by not over and under sampling textures through guesswork.

As always, if any of this is wrong or outdated, please correct me. I don't care about being right, I just want to understand the technology.
 
Last edited:

onesvenus

Member
You know this stuff isn't the most intuitive when so many people have different ideas as to what is going on haha.

I do believe you're incorrect, however. Every article I can find discussing PRT in detail shows it tiling texture resources within each mip level.

Here is a slide from an old AMD presentation giving a visual representation.

QLFpkKq.jpg



Here is a good writeup I found on an Xbox forum that seems pretty accurate from the things I have read.
  • History of texture streaming: Classic, PRT, PRT+ (SFS)
    • Classic
    • PRT
    • PRT+SF
    • SFS
Classic Texture Streaming

Let’s start with classic texture streaming, which is the most basic and simple one. As we’ve talked about “mipmapping”, developers now have gained a new set of assets that is at least a half smaller than the original Mip0.

So, for saving the precious memory space, developers start to find out ways to use the high level mip8s (mip8 just for example). Before classic texture streaming, everything in a game level is loaded with mip0. With classic streaming, developers can now use different mip level for different objects, with different ranges or sizes.

Partial Resident Texture or Virtual Texture

PRT is the term used by Unreal Engine, and Virtual Texture is the term used by idTech. But generally they’re the same thing.

As the time moving forward, the mip0 is now larger and larger. We’re seeing 4K and 8K textures now, that can be a huge burden for the memory when loaded in a whole.

So, what about just loading parts of them?

PRT used the same idea of Virtual Memory. We don’t have to load every part of a texture into the memory. We can divide the large texture into small tiles.

image


By dividing the large texture into a tile array, now we can have more fine grained control over the tiles.

For different parts of the texture, some of them can be a part from Mip0, and some of them can be a part from Mip 3 or so.

The MinMip map above, have shown a 8x8 area, requesting for different level of mips.

In this particular example, Every tile has the same memory size. A single tile in Mip1 covers (2^1)^2=4 area size of a Mip0 tile. Thus it’s 4 times less detailed, smaller in general. But still covering the same area size. Likewise, a Mip2 tile covers (2^2)^2=4^2=16 area size, 16 times less detailed and smaller. But still covering the same area size. And the Mip3 tile can cover the whole 64 area size single handedly. Awesome right? But it’s extremely poor quality so we can only use it on the most insignificant part.

Before PRT, we need 64 units of tile memory space to cover that 8x8 area. With PRT, we can now use 1+3+3+1=8 memory space to cover that area. Assuming the mipmap is efficient, that’s a huge save isn’t it?

Well, that’s where the things get tricky: How to make sure the mipmap is efficient?


Before Sampler Feedback, the developers lack the ability to optimize things to the absolutely last drop. They could only make some guesses about visibility, importance or so, but they lack the direct control on things. It’s like you were riding a bike without your hands on the handle, yes you can still control the weight balance and speed using your muscles, but isn’t that shakey?

PRT+(Sampler Feedback)

Time to save the day! With DirectX 12 Ultimate, developers can now get reports from the sampler, and use that report to minimize artifacts, lag spikes and memory wastes! We can finally put our hands back on the bike’s handle now

Traditional PRT solutions were based on guess,

PRT+(PRT with sampler feedback) is based on hard facts. Because samplers are the real smart end consumers of texture assets, they know what they need (unlike some poor market in other areas of gaming, just kidding LOL). With SF, the streaming engine always only stream needed assets, no waste.

However you do need hardware support for PRT+, you need a modern GPU and SSD at least. And even PRT+ can be refined and optimised. Here we finally goes to the almighty

Sampler Feedback Streaming

SFS is based on PRT+, and PRT+ is based on PRT&Sampler Feedback. SFS it’s a complete solution for texture streaming, containing both hardware and software optimizations.

Firstly, Microsoft built caches for the Residency Map and Request Map, and records the asset requests on the fly. The difference between this method and traditional PRT methods is kinda like, previously you have to check the map but now you have a gps.

Secondly, you need a fast SSD to use PRT+ and squeeze everything available in the RAM. You won’t want to use a HDD with PRT+, because when the asset request emerges, it has to be answered fast (within milliseconds!). The SSD on Xbox is now priotized for game asset streaming, to minimize latency to the last bit.

Thirdly, Microsoft implemented a new method for texture filtering and sharpening on hardware. This is used to smooth the loading transition from mip8 to mip4 or mip0…etc. It’s not magic, but it works like magic:

image
image1118×648 140 KB



As we have stated, the Sampler knows what it needs. The developer can answer the request of Mip 0 by giving Mip 0.8 on frame 1, Mip 0.4 on frame 2, and eventually Mip 0 on frame 3.

The fraction part is used on texture filtering, so that the filter can work as intended and present the smoothest transition between LOD changes.

It also allows the storage system to have more time to load assets without showing artifacts.

SF seems like it will make developers lives a lot easier when implementing PRT by having hardware fetch samples from mapped textures. Doing so can minimalize wasted bandwidth and pop-in by not over and under sampling textures through guesswork.

As always, if any of this is wrong or outdated, please correct me. I don't care about being right, I just want to understand the technology.
Yup I was wrong!
That post you are quoting is great. Can you provide with a link to it?
Thanks!
 

Boglin

Member
Yup I was wrong!
That post you are quoting is great. Can you provide with a link to it?
Thanks!
Sure thing!

I left out some good parts too so it's worth the read. The person who wrote it seems genuine and enthusiastic.
 
Last edited:
It's barely taking away any CPU power at all. It uses just one tenth of a single CPU core to handle all of this. Microsoft has confirmed this multiple times now. A PC without DirectStorage would require 13 Zen 2 cores to match what Series X does. It's far from a software solution. There's a good bit of hardware involved. That's how it comes down to one tenth of a single core.

uYmHOag.jpg


What also puzzles me about people calling the Series X solution software is that it has a full dedicated hardware decompression unit built that supports zlib as well as Microsoft's custom texture compression BCPack. The hardware decompression unit can deliver over 6GB/s.

CuV394H.jpg


Then there's the custom designed SSD itself, where Microsoft confirms a guaranteed minimum of 2GB/s at all times no matter what. It'll be 2.4GB when the hardware or OS isn't doing any other type of work, such as maintenance. The SSD was designed around sustained performance as opposed to peak performance.

So taking Microsoft's guaranteed minimum the numbers change with Sampler Feedback Streaming to 5GB/s without using the hardware decompression and 10GB/s with hardware decompression. So we are no longer talking about "theoreticals" These are Microsoft's guaranteed minimum SSD streaming figures. The reason Microsoft can guarantee those minimums is due to the custom work done on their SSD.

And before people go screaming "omg an SSD downgrade"

Here is Microsoft explaining the spec has not changed, but they explain in more detail what they mean.

gDcn8s8.jpg
pvTQ14j.jpg


They also touch on it some in a blog post last year.

GnBHtSY.jpg

You know that during Eurogamer interview Xbox architect Andrew talked about SFS, BCPack and what not regarding SSD. If Andrew Goossen said over 6, surely he was careful with words. Over 6 can mean 6,1, 6,2, 6,3.....if it is 6,8 or 6,9 surely he would say closer to 7. If it is 7, he would say 7 and so on. And therefore, Digital Foundry immediatelly after the interview in their video about XSX mentioned highest number for XSX SSD is 6 GB/s. No need to spin otherwise.
 
Last edited:

Fafalada

Fafracer forever
Streaming data literally while someone turns their head?
I'd like to see that lol


But then again we had a fraction of the RAM when games were running off of disks.
That's largely irrelevant to this discussion, the parameters that matter are display resolution and target fidelity (number of unique data samples/frame). That gives you a hard-cap of what you end up sampling in any given frame, and even at 4k, we're looking at under 100MB or so (which is still huge by standards of mechanical drives, but SSDs can get there... just about).
At 720P and baking down some of the layers (like mega-texture did) you could get away with a few MB/s frame (and that's still all wholly unique data - unlike what traditional game would display) for textures, and you could do pixel~polygon(sort of) detail meshes at similar bandwidth too (that also was done same gen).
The latter is something that gets left out of this conversation a lot - but as demonstrated with things like Nanite, the next fidelity jump isn't relying on texture density alone, and SFS isn't really helping with that directly.

Is it the bespoke HW that is the difference?
Pretty much. Though I'd call it extensions - there's not really much in way of a additional hw.

but would it be using CPU power or something?
Yea - the likes of Rage used CPU to determine sample-coverage (it's basically just a variant of occlusion queries). Even if you do it on per-mipmap basis, you do something similar (simpler query though).
That said - IMO the tech-reason that kept PRT usage down last gen is the filtering overhead, as you have to handle certain things through shader-code instead of relying on texture-hw. On paper that's what the filters MS added are meant to address.
 

MrLove

Banned
I do know that one "in theory" is better than the other. And out of both systems the PS5 has impressed me the most with it's I/O applications.

But you're right that anyone saying the XSX doesn't have any custom I/O hardware is just trolling.
MS has only a decompression block that is not even half as fast as the one in the PS5, meanwhile, Sony put about 6 custom chips in the apu to remove all possible bottlenecks.

Matt is reputable source



Matt said:
There is no comparison.

There is absolutely nothing wrong with the SX’s IO or the speed you can get on a PC, and moving to any SSD based solution as a baseline is an incredible upgrade over the past that all games and gamers will benefit from.

But the PS5’s IO is on another level. It does basically everything significantly faster than any competition in the consumer space. It is easily and by far the largest difference between the two next gen consoles.

 
Last edited:

thewire

Member
MS has only a decompression block that is not even half as fast as the one in the PS5, meanwhile, Sony put about 6 custom chips in the apu to remove all possible bottlenecks.

Matt is reputable source





Should be the end of the discussion, we definitely haven’t seen any games demonstrate anything too the level of rachet & clank on Xbox either which is still an early gen example but here we have Microsoft trying to rally the troops to overcome the biggest deficit between either console.

It’s not even the sheer speed of the raw bandwidth, it’s has a lot more channels, significantly higher clock speeds to process data faster, coherency engine informs the gpu over overwritten address ranges whilst the cache scrubbers to evict data without stalling the gpu, uses better decompression software in kraken & rad, has more decompression specific hardware blocks such as data decompression blocks (equivalent to 9 zen 2 cores), a dedicated DMA controller block that’s moves the data to where it needs to be, another two co processors that handle I/O & memory mapping (equivalent to 2 zen 2 cores) whilst the coherency engine house keeps this all to together. Also lots of devs talk about how the ps5 is designed around reducing latency which I think the coherency engine helps with but someone with more knowledge can chime in here but this all leads to the I/0 subsystem capable of over 22gbs of bandwidth speeds at its peak (obviously only occurs when certain conditions are met) but it’s still way ahead of xbox series I/O architecture and we will see Sony first party showcase this consistently throughout the generation, rachet is already doing it this early, god of war ragnorok will probably further showcase that at PlayStation’s next event.
 
Status
Not open for further replies.
Top Bottom