• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

NVIDIA announces RTX I/O - PCIe 4.0, on-GPU storage decompression, collaberating with Microsoft on DirectStorage

thelastword

Banned
Well thanks to Cerny, we are going to see a true evolution in the gaming world, his vision is truly the tide that lifts all boats in this industry.

One thing that's missing from Ampere though will be the high clocks of Navi 2x.....The higher clock speeds will be important for the bandwidth and throughput we want to push with textures and physics and even raytracing for simul operations....That hardware I/O controller that Cerny designed is the real winner, yes the SSD is great, but the PS5's 12 channel controller is the winner here, the driver that makes all the magic happen....
 
Well thanks to Cerny, we are going to see a true evolution in the gaming world, his vision is truly the tide that lifts all boats in this industry.

One thing that's missing from Ampere though will be the high clocks of Navi 2x.....The higher clock speeds will be important for the bandwidth and throughput we want to push with textures and physics and even raytracing for simul operations....That hardware I/O controller that Cerny designed is the real winner, yes the SSD is great, but the PS5's 12 channel controller is the winner here, the driver that makes all the magic happen....

Yeah Ampere has a lot to fear from Navi.

Your post will age well. :messenger_winking:
 

thelastword

Banned
Yeah Ampere has a lot to fear from Navi.

Your post will age well. :messenger_winking:
I never said that, but I'm waiting to see Navi 2X's unveil, moreso it's new suite of software features......I'm going all in with AMD on my new build.....For some reason I think I may have caught a glimpse of one or two of your posts in the other NV threads. Aren't you so sure that NV absolutely crushes Navi 2X even if we have not seen the AMD cards....What I said about the clockspeeds is a known fact based on the GPU inside of the PS5, the higher clock speeds helps speed up lots of graphics operations and it's partly because it's on a better node than Ampere and also because of architectural improvements or a full fledged transition to RDNA with no legacy GCN constraints remaining...

Yes, NV showed it's stuff. I watched, but their lower than usual price, their obsfucation of power draw results, their reluctance to publish results at the purported 4k and 8k native resolutions with RT and without RT sans DLSS, has me a bit miffed....I'm not in a hurry to build my kit this minute. I will purchase the PS5 first and when AMD reveals it's stuff in October or one day before NV's launch....I can see how the GPU battle stacks up....You know I that don't pussyfoot around my stance, I'm more excited for what AMD has to offer based on some of the features/custom work we've already seen on the PS5.....
 
It's an AMD GPU launch. I expect to be disappointed.

I expect it will be less powerful and more expensive than rumors led us to believe. And I'd be extremely surprise if they can compete in ray tracing performance or have anything at all to answer DLSS 2.0.

Prove me wrong AMD. I love you for CPUs but you're still pretty shit in the GPU area until proven otherwise.
 

jimbojim

Banned
So now I'm a PC fanboy? xD FOH, stop getting caught so up in your feelings you lose sight of the bigger picture at discussion here.

Gameseeker (whoever that is; they are just another poster to me, same as anyone else on any other forum) might have an account here, but they aren't posting here. If they want to discuss their points, they can post here whenever they want. If they want to debate anyone here (such as myself), they can freely drop by and do so. Point is, they're most likely a grown adult and can speak for themselves if they wish, they don't need you doing it for them (poorly).

I've responded like that because it was a reaction to previous post. It's easy to label everyone as fanboys.

GameSeeker GameSeeker posts here regularly on this forum .

CPU overhead and latency aren't weaknesses to MS's approach tho like some users have been adament on trying to imply (their "weakness" is more in the fact that the scalability and stackability means lower-end SSDs can also benefit but they will always offer lesser performance than higher-end SSDs supporting the same thing). That's not me implying Sony's solution is weak in those areas, either, but a "weakness" to it could be lack of scalability with other hardware-based solutions, especially if AMD's approach moreso takes after MS's.

Latency is always weakness, not other way around. The less latency, the better. Also, less CPU overhead is always better, even better if it is completely removed.
 
Last edited:

PaintTinJr

Member
thicc_girls_are_teh_best thicc_girls_are_teh_best

I don't quite understand the later part of your last comment, because you didn't quote the specific parts you were responding to from something I actual wrote.

The first part - concerning the PS5 refreshing memory and the RTX 30 series refreshing memory, so VRAM size isn't such an issue - well the difference is that Sony specifically explained how they were empowering a new paradigm REYES and 100x less latency than the PS4 Pro HDD solution, and then finished off by letting Epic showcase the UE5 demo on PS5 devkit hardware.

By comparison PCIexpress has major latency issues - by comparison to a 256 line wide Dedicated DMA controller in the IO complex interfacing with the unified memory in the PS5 - because PCIe is a set of serial link channels that provide an external interface. Below is a link to a very informative whitepaper (by a PCiexpress product company) with worked examples of how the PCIe protocol layers work and how realistic latency and bandwidth efficiency can be calculated for real application. Sadly the last read example massively under-estimates bandwidth efficiency because it doesn't account for pipelining packets - as they explain - but even just looking at the ~13,000 ns latency on a 200post 25KB payload for writing, indicates that channel throughput around the 75% mark wouldn't be a million miles away.


Then if we consider the PCie 4.o requirement, you effectively have two PCi 3.0 setups. 1 for all the PC's other PCIe stuff to work without interfering with the RTX IO Nic to RTX 30 serie GPU , and then the other PCIe 3.0 for the two nvidia cards to work in tandem. A PCIE 3.0 interface gives 8GB/s in the direction from NIC SSD to GPU, and if that is only 75% efficient, then even the 7GB/s RAW figure isn't looking comparative to PS5 SSD soluiton IMHO. Obviously, I might have got the wrong end of the stick from the whitepaper, so feel free to correct anything you think I've got wrong. But AFAIK the RTX IO is a band-aid solution for not having an APU, unified memory and a dedicated decompression block integrated inside the APU as a solution, and is probably why I think Big Navi will have an entire system card solution as one of its options.
 
PaintTinJr PaintTinJr PCIe 4.0 has 128b/130b encoding; PCIe 3.0 used 8b/10b. You absolutely have to factor that into the discussion, the former has much less overhead and much improved latency figures too. The vast majority of a connection bandwidth on PCIe 4.0 will be saturated moreso than comparative PCIe 3.0

You are buying into some Sony marketing speak if I'm being perfectly honest; REYES isn't anything that hasn't been done before and Sony are hardly the only company that has looked into significantly reducing latency. MS's Flashmap papers are actually more detailed on that note and get into very technical details for how their approaches reduce latency. Comparing latency reduction for NAND-based solution over platter disc solution isn't saying too much because even entry-level older SSDs dramatically reduce latency compared to upper-level platter drives and, again, "100x" is essentially a marketing term not too different from some of MS's ones. I think if we're going to be critical of things like "3x effective multiplier" and "instantly available SSD memory" (both being things I can at least rationalize somewhat and have done in the past; others have as well), we should treat some of Sony's claims in a similar light and not blindly buy into them until there is data (either in real-world testing terms or detailed research data) that can hint towards those claims being mostly possible.

PCIexpress whitepapers are always valuable, but you shouldn't simply just look at the whitepapers. Various companies always innovate on the implementation of the technological standards and often push them to achieve results the "generic" whitepapers do not account for. In terms of virtually every given metric of measure, this happens regularly. We also don't know what Sony's latency figures look like; given they are still working with NAND modules at the end of the day (and relatively slower ones compared to what MS are using as an example), they are still going to have to deal with latency on their own end. That is probably one of the main reasons they chose SRAM for the cache rather than DRAM; it could be indicative of a means on their end to cut down on the latency the NAND modules themselves have within their design.

Even considering the latency implications you bring up, NV's solution still implements much more VRAM bandwidth (both on the 3080 and certainly the 3090; most likely 3070 as well) with faster VRAM modules and that ultimately is also going to factor very quickly into the picture, not to mention having more physical VRAM on the board mean more data coming off from the storage can be kept in if needed, cutting down on storage access cycles. It seems more than a bit cynical and disingenuous to implicate NV's solution as a "band-aid"; considering they've worked closely with MS on implementing their GPUDirectStorage solution and this has been technology the companies have been working on and investing in for quite a few years by now. As I've said before and will continue to say going forward, what MS have here (and NV have with their implementation) are excellent solutions that account for future-proofing, scalability and stackability, and offer a very valid alternative to what Sony's accomplished with their I/O setup. Which by all accounts, is a more insular implementation and potentially not as future-proofed (not without iterative physical hardware redesigns and implementations in new setups), but has its obvious strengths. I don't see any of these as being "band-aids", that comes off as snarky dismissal of the R&D these companies put into their products IMHO.

I've responded like that because it was a reaction to previous post. It's easy to label everyone as fanboys.

GameSeeker GameSeeker posts here regularly on this forum .



Latency is always weakness, not other way around. The less latency, the better. Also, less CPU overhead is always better, even better if it is completely removed.

Sorry but you have a tendency to be rather exaggerated and not in the best way.

I won't need to go into the rest of your post because I think I covered that well responding to PaintTinJr, but I know I've also clarified that Sony's solution doesn't "completely" remove the CPU out of the process. Since you guys seem to keep forgetting, the CPU still needs to query the processor core in the I/O block on instructions to carry out its duties, similar to how it does this with the GPU. So there is still some CPU work being factored into this.

The thing is that the bulk of the grunt work is then offloaded from the CPU and handled through the I/O block. That said, these distinctions need to be regularly emphasized.
 

jimbojim

Banned
I won't need to go into the rest of your post because I think I covered that well responding to PaintTinJr, but I know I've also clarified that Sony's solution doesn't "completely" remove the CPU out of the process. Since you guys seem to keep forgetting, the CPU still needs to query the processor core in the I/O block on instructions to carry out its duties, similar to how it does this with the GPU. So there is still some CPU work being factored into this.

The thing is that the bulk of the grunt work is then offloaded from the CPU and handled through the I/O block. That said, these distinctions need to be regularly emphasized.

Really? I need really a proof that CPU is tasked for SSD operations, otherwise, Cerny would mark CPU for something, but he didn't.

71340_512_understanding-the-ps5s-ssd-deep-dive-into-next-gen-storage-tech_full.png


IMO you have a tendency to somewhat trying to mitigate XSX I/O diminishing PS5 I/O solution to prove that SSD in PS5 does have CPU overhead.
 

PaintTinJr

Member
thicc_girls_are_teh_best thicc_girls_are_teh_best

If you are going say that listening, comprehending and believing a GDC presentation by Cerny is "buying into some Sony marketing speak" with regards to reducing latency by 100x and REYES, then with all due respect I might as well save myself and not bother reading the rest of your comment/s if you are going to stand by that point.

I find it incredulous that even with the proof to follow the GDC talk - zero pop-in with 4 polys per pixel, etc, etc in the UE5 demo, while Nvidia's own blur only states reduced pop-in - you would suggest the Nvidia RTX IO and PS5 IO complex solution will effectively be about the same.

If we can't agree that improving latency from PCIe 3.0 -> 4.0 is tinkering at the edges and insignificant to the tightly coupled high cohesion IO complex solution to latency, then it seems we have a different view of the facts. Especially when the RTX IO will need to write and the GPU will need to read from the PCIe lines - further increasing latency.

The internal memory speed of a GPU using a lower clock, doesn't tell us anything in comparison to the console solutions, because the consoles have integrated decompression blocks.

I made the "band-aid" comment in a fair way, just like the 3dfx voodoo was a band-aid solution for graphics cards that could handle both 2D and 3D in a unified solution.
 
It's an AMD GPU launch. I expect to be disappointed.

I expect it will be less powerful and more expensive than rumors led us to believe. And I'd be extremely surprise if they can compete in ray tracing performance or have anything at all to answer DLSS 2.0.

Prove me wrong AMD. I love you for CPUs but you're still pretty shit in the GPU area until proven otherwise.
Their cpu's are great, but their gpu's on the other hand....





RvpqsIY.jpg
 

Papacheeks

Banned
I never said that, but I'm waiting to see Navi 2X's unveil, moreso it's new suite of software features......I'm going all in with AMD on my new build.....For some reason I think I may have caught a glimpse of one or two of your posts in the other NV threads. Aren't you so sure that NV absolutely crushes Navi 2X even if we have not seen the AMD cards....What I said about the clockspeeds is a known fact based on the GPU inside of the PS5, the higher clock speeds helps speed up lots of graphics operations and it's partly because it's on a better node than Ampere and also because of architectural improvements or a full fledged transition to RDNA with no legacy GCN constraints remaining...

Yes, NV showed it's stuff. I watched, but their lower than usual price, their obsfucation of power draw results, their reluctance to publish results at the purported 4k and 8k native resolutions with RT and without RT sans DLSS, has me a bit miffed....I'm not in a hurry to build my kit this minute. I will purchase the PS5 first and when AMD reveals it's stuff in October or one day before NV's launch....I can see how the GPU battle stacks up....You know I that don't pussyfoot around my stance, I'm more excited for what AMD has to offer based on some of the features/custom work we've already seen on the PS5.....

With RDNA 3 tapped out for late next year into 2022. Radeon chiplet design will be a thing and AMD will have it out on a 5nm process. His posts wont age well.

Really? I need really a proof that CPU is tasked for SSD operations, otherwise, Cerny would mark CPU for something, but he didn't.

71340_512_understanding-the-ps5s-ssd-deep-dive-into-next-gen-storage-tech_full.png


IMO you have a tendency to somewhat trying to mitigate XSX I/O diminishing PS5 I/O solution to prove that SSD in PS5 does have CPU overhead.

Just because it's not on the slide doesn't mean the CPU is standing there doing nothing when game data is being installed from a disc or online packets. I can assure you it is helping the process. But the customized I/O with the accelerated cache scrubbing is doing the brunt of the work. The CPU will facilitate some of the tasks for sure when it's put into memory.
 
Last edited:

jimbojim

Banned
Just because it's not on the slide doesn't mean the CPU is standing there doing nothing when game data is being installed from a disc or online packets. I can assure you it is helping the process. But the customized I/O with the accelerated cache scrubbing is doing the brunt of the work. The CPU will facilitate some of the tasks for sure when it's put into memory.

I see. thanks. I apologize to thicc_girls_are_teh_best thicc_girls_are_teh_best
 

Papacheeks

Banned
Agreed...
Rumors are saying ~50% over 2080Ti @ $699.

Grain of salt, obviously.

I think RADEON is going to reveal one stack now with GDDR6..........and another card down the road next year with HBM2E. Production of SK HYNIX HBM2E went into full production in July.

I think HBM2E could be what they use, which will be able to give them even more bandwidth than GDDRX at lower speeds. So you have to wonder why all the change within a year from NVIDIA to move their flagship cards with GDDRX? :pie_thinking:

UNLESS they knew they had to have something with large bandwidth for their gpu, and to compete with AMD.
 
thicc_girls_are_teh_best thicc_girls_are_teh_best

If you are going say that listening, comprehending and believing a GDC presentation by Cerny is "buying into some Sony marketing speak" with regards to reducing latency by 100x and REYES, then with all due respect I might as well save myself and not bother reading the rest of your comment/s if you are going to stand by that point.

I find it incredulous that even with the proof to follow the GDC talk - zero pop-in with 4 polys per pixel, etc, etc in the UE5 demo, while Nvidia's own blur only states reduced pop-in - you would suggest the Nvidia RTX IO and PS5 IO complex solution will effectively be about the same.

If we can't agree that improving latency from PCIe 3.0 -> 4.0 is tinkering at the edges and insignificant to the tightly coupled high cohesion IO complex solution to latency, then it seems we have a different view of the facts. Especially when the RTX IO will need to write and the GPU will need to read from the PCIe lines - further increasing latency.

The internal memory speed of a GPU using a lower clock, doesn't tell us anything in comparison to the console solutions, because the consoles have integrated decompression blocks.

I made the "band-aid" comment in a fair way, just like the 3dfx voodoo was a band-aid solution for graphics cards that could handle both 2D and 3D in a unified solution.

If that's something you feel you have to do then I won't be able to suggest otherwise, though it is a shame if your takeaway from my comments were that negative. However even in this response there's things I can point out you're doing which are maybe clouding your sense of judgement.

See, you're still making a critical mistake of comparing these approaches as if they are apples-to-apples. They are not. They both see all of the same problems in terms of current I/O, but have taken different, VALID approaches to solving it that suit the range of hardware implementations expected to support them. Sony's is more hardware-dependent, MS's and NV's are more hardware-agnostic.

The problem is that yourself and several other posters then take this and try objectively comparing one to the other to then make an absolute statement that one (usually Sony's) is objectively better, but then a few take it to such lengths that they essentially treat other approaches as being invalid. That is contrary to how the tech world has operated, wherein there have always been multiple valid approaches and more than a few that are within hairs-length of each other in terms of overall effectiveness for the implementations they go in.

I feel, personally, that this level of judgement on solutions that we've yet to see in real-time gameplay (yes that includes the UE5 demo too; as impressive as it was, there were little to no mechanics at work typical of a real game, simplified physics models, no NPC A.I or other game logic really occurring etc.) is ultimately unnecessary.

About the Road to PS5 stuff; I am not saying it was 100% marketing talk. Never have said that. However, if you don't think Mark Cerny wasn't embellishing small aspects of some of the features he spoke on, or did subtle downplaying on certain design features that they're aware aren't necessarily playing to the PS5's strengths, you have to be fooling yourself. The GDC presentation may've been a call to developers to get their briefing, but it also served at least in some capacity as a pseudo-advert for the PS5. After all, it was the system's first time being discussed in public in any serious official capacity of great length, and the fact they messaged the presentation to followers on Twitter further supports this.

There was some element of PR involved in that presentation, but Sony aren't the only company that does this. Microsoft's done it, Nvidia did a bit of it, AMD did a bit of it, etc. So I don't see anything to be affronted by in me stating the obvious. Encoding and latency improvements over PCIe 3.0 to 4.0 is nothing to sneeze at; you are merely (softly) disregarding it for whatever reason, though I don't see why to do so in the first place. It's a net improvement overall, and believe it or not there are methods available to companies like NV to mitigate PCIe lane latency issues so should they prove to be an issue here or there. It may not be "tightly coupled" like it is in the consoles but that doesn't mean NV haven't put a lot of R&D into this area and designed a setup which is as well integrated as they can make it.

Regarding the "band-aid" stuff, AFAIK there is almost never an instance in the tech world where someone refers to a product as a "band-aid" with positive connotations. Every reference from what I have seen has generally been with negative connotations, i.e the SEGA 32X. I also feel referring to the work Nvidia are doing here with I/O as a band-aid is, again, dismissive of the effort they've put into this, and once again it turns looking at these various solutions from a sum of great options into a needless "my preferred solution is better than these other ones!", even if those aren't the direct words used. In truth, ALL of these solutions will have their benefits and drawbacks, so it's generally best to wait until real games come out for all these various platforms utilizing the solutions to therefore see how they truly check against one another.

And even in that case, over the majority of titles I'm expecting relatively equal performance metrics. Some might have slight or more noticeable edges in very specific areas, but no one solution is going to "reign king among them all", as it were. None of these companies are sleeping at the wheel on pushing forward I/O solutions. Absolutely not a one of them.

jimbojim jimbojim It's okay man; Papacheeks basically said what I was trying to communicate.
 

PaintTinJr

Member
Just because it's not on the slide doesn't mean the CPU is standing there doing nothing when game data is being installed from a disc or online packets. I can assure you it is helping the process. But the customized I/O with the accelerated cache scrubbing is doing the brunt of the work. The CPU will facilitate some of the tasks for sure when it's put into memory.
What you have said is perfectly true, but the context in which the original comment was made - basically implying some tit-for-tat CPU percentage burden for the IO complex solution- your comment is being misrepresented IMO.

Clearly, everything runs under the control of the primary CPU core thread - in all systems without Satellite processor cores - but in a Co-processor solution the burden on the CPU is trivial (like a peripheral device trivial) compared to a task with notable CPU overhead on another core.

The trivial burden of the decompressor using a CPU core is already being factored out, so in context, the co-processor solution has 0% CPU burden on another core, versus the CPU decompressor solution running on one of the other cores.
 

Papacheeks

Banned
What you have said is perfectly true, but the context in which the original comment was made - basically implying some tit-for-tat CPU percentage burden for the IO complex solution- your comment is being misrepresented IMO.

Clearly, everything runs under the control of the primary CPU core thread - in all systems without Satellite processor cores - but in a Co-processor solution the burden on the CPU is trivial (like a peripheral device trivial) compared to a task with notable CPU overhead on another core.

The trivial burden of the decompressor using a CPU core is already being factored out, so in context, the co-processor solution has 0% CPU burden on another core, versus the CPU decompressor solution running on one of the other cores.

50//50.

Console OS for instructions to the cpu is totally different than something like windows.

Thats where the disconnect in this thread is coming from.

PC unless SSD nvme's or gpu's start adding extra co-processors and their own possible I/O(cough...chiplet design) the cpu is going to have to do a lot of the instructions, which is why Microsoft is doing a huge re-write in their Direct x instructions so it will be able to support what NVIDIA just announced. Microsoft has been doing this for data centers on the back end, but it's not ready because the New NVME contollers and intel are behind.
 

PaintTinJr

Member
If that's something you feel you have to do then I won't be able to suggest otherwise, though it is a shame if your takeaway from my comments were that negative. However even in this response there's things I can point out you're doing which are maybe clouding your sense of judgement.

See, you're still making a critical mistake of comparing these approaches as if they are apples-to-apples. They are not. They both see all of the same problems in terms of current I/O, but have taken different, VALID approaches to solving it that suit the range of hardware implementations expected to support them. Sony's is more hardware-dependent, MS's and NV's are more hardware-agnostic.

The problem is that yourself and several other posters then take this and try objectively comparing one to the other to then make an absolute statement that one (usually Sony's) is objectively better, but then a few take it to such lengths that they essentially treat other approaches as being invalid. That is contrary to how the tech world has operated, wherein there have always been multiple valid approaches and more than a few that are within hairs-length of each other in terms of overall effectiveness for the implementations they go in.

I feel, personally, that this level of judgement on solutions that we've yet to see in real-time gameplay (yes that includes the UE5 demo too; as impressive as it was, there were little to no mechanics at work typical of a real game, simplified physics models, no NPC A.I or other game logic really occurring etc.) is ultimately unnecessary.

About the Road to PS5 stuff; I am not saying it was 100% marketing talk. Never have said that. However, if you don't think Mark Cerny wasn't embellishing small aspects of some of the features he spoke on, or did subtle downplaying on certain design features that they're aware aren't necessarily playing to the PS5's strengths, you have to be fooling yourself. The GDC presentation may've been a call to developers to get their briefing, but it also served at least in some capacity as a pseudo-advert for the PS5. After all, it was the system's first time being discussed in public in any serious official capacity of great length, and the fact they messaged the presentation to followers on Twitter further supports this.

There was some element of PR involved in that presentation, but Sony aren't the only company that does this. Microsoft's done it, Nvidia did a bit of it, AMD did a bit of it, etc. So I don't see anything to be affronted by in me stating the obvious. Encoding and latency improvements over PCIe 3.0 to 4.0 is nothing to sneeze at; you are merely (softly) disregarding it for whatever reason, though I don't see why to do so in the first place. It's a net improvement overall, and believe it or not there are methods available to companies like NV to mitigate PCIe lane latency issues so should they prove to be an issue here or there. It may not be "tightly coupled" like it is in the consoles but that doesn't mean NV haven't put a lot of R&D into this area and designed a setup which is as well integrated as they can make it.

Regarding the "band-aid" stuff, AFAIK there is almost never an instance in the tech world where someone refers to a product as a "band-aid" with positive connotations. Every reference from what I have seen has generally been with negative connotations, i.e the SEGA 32X. I also feel referring to the work Nvidia are doing here with I/O as a band-aid is, again, dismissive of the effort they've put into this, and once again it turns looking at these various solutions from a sum of great options into a needless "my preferred solution is better than these other ones!", even if those aren't the direct words used. In truth, ALL of these solutions will have their benefits and drawbacks, so it's generally best to wait until real games come out for all these various platforms utilizing the solutions to therefore see how they truly check against one another.

And even in that case, over the majority of titles I'm expecting relatively equal performance metrics. Some might have slight or more noticeable edges in very specific areas, but no one solution is going to "reign king among them all", as it were. None of these companies are sleeping at the wheel on pushing forward I/O solutions. Absolutely not a one of them.

jimbojim jimbojim It's okay man; Papacheeks basically said what I was trying to communicate.
The other approaches probably are invalid, in a pure computer science and data networking protocols solution type of way,, and it really does show how well Carmack's comments about Playstation and Xbox making the right choice for PS4 and XB1 using APUs and unified memory (around this time last gen) have aged.

The problem here is that the current hardware investment by the PC install base stands in the way of the PC's progress - and Sweeney has tweeted in some way about the PC fixing this deep rooted latency issue, and hoped all parties would work together IIRC.

No one running a PC rig for gaming wants PCs to go the APU route, but that is the future - at some point, Carmack has already predicted and championed it - but we begin seeing such ridiculous workarounds like what Nvidia are proposing with the RTX IO because they can't build an APU solution, - as they don't own an x86 license and doing ARM or PPC instead would be economic suicide - Intel are off the pace with graphics - but are clearly investing again in graphics and probably looking to APU the market in the next 10year - and AMD don't have the GPU performance crown or the market share in CPU or GPU market to aggressively change the market.

If this was a telecoms industry problem all the solutions would converge and the conversations would be moot.
 

PaintTinJr

Member
50//50.

Console OS for instructions to the cpu is totally different than something like windows.

Thats where the disconnect in this thread is coming from.

PC unless SSD nvme's or gpu's start adding extra co-processors and their own possible I/O(cough...chiplet design) the cpu is going to have to do a lot of the instructions, which is why Microsoft is doing a huge re-write in their Direct x instructions so it will be able to support what NVIDIA just announced. Microsoft has been doing this for data centers on the back end, but it's not ready because the New NVME contollers and intel are behind.
But that doesn't change the point I made, the Y intercept, however big or small, is only being highlighted in the solution that doesn't use a secondary core for decompression, so the with all things equal the Y intercept doesn't exist in the comparison -so the IO complex is zero cost to the system CPU.

But then again, if the IO complex is a reworked SPU, like it probably has a 50/50 chance(maybe more 90/10 if the burden you suggest is true), then even that burden doesn't exist on primary CPU core thread of the PS5, because SPUs are capable of running fully independently once setup to run.
 
Last edited:
The problem here is that the current hardware investment by the PC install base stands in the way of the PC's progress - and Sweeney has tweeted in some way about the PC fixing this deep rooted latency issue, and hoped all parties would work together IIRC.

All parties already work together, though. That's what open-standard consortium bodies are for, and have been for decades. The idea that PC vendors and partners are not working together simply isn't true whatsoever; many of them in fact converge together regularly to iterate on standards and design new ones that all partners can either license out or (more usually) freely adapt into products for the open market.

That in fact is the strength of PCs over virtually every single platform in existence, including game consoles. Only arcades (before they started more or less using PCs) compare to PCs in this regard, for even microcomputers lacked some of this. PC is an open platform, so as long as partners design to standards that consortium has established, anybody is welcome to implement it and, more importantly, innovate it.

How do you think NV's new cards are featuring GDDR6X memory? That was them and Micron collaborating together and developing an iteration on a current technology that, theoretically, anyone is allowed to make, but for various reasons it's generally 3 key memory manufacturers: Samsung, SK Hynix and Micron.

There are already solutions to the things Sweeney (and as much of a legacy he has, he's just one person in an industry filled with many excellent people past and present) might've mentioned regarding latency: multiple approaches exist to the same problem and many of them are valid within a splitting hair's distance of one another.

Which kind of highlights something we need to actually acknowledge: I think if you're only looking at the PC from a game consumer's POV, you are severely limiting your viewpoint of it. Again, PC is an open platform of a standard and there are MANY different designs always made to suit the specific needs of a market. A data center-focused PC may want something besides PCIe 4.0? Cool, they can use RapidIO 3.2, or if they want something with enforce cache coherence they can implement designs with CCX at the heart, switch fabrics, the works. There's no other platform besides, again, arguably arcades themselves (up until they started relying heavily on PC hardware) with this level of openness to provide valid solutions for virtually any design challenge a company needs to answer.

And let's not forget, at the end of the day, both PS5 and Series X are stooped strongly in that same "PC" environment of sorts, they are both using x86-64 architectures ;)
 
Lets not forget that MS needs to get the Direct Storage API out there first..
They plan to hand out a "preview" version to developers in 2021.
For all we know that could mean late 21.
And that release will be a demo...
I think games that make use of direct storage API won't hit market until 2024..
Then there is still inefficiencys in PC even with Direct Storage..
They would first also patch windows to accept such an Distortion..

And that API needs to be approached every time from a game developer.
There's code that needs to be written.

Code can be bugy.

Quick switch to PS5:

Here everything just funktions from day one. Up to 22GB/s is unlocked from the start.

The best thing is:
Devs don't need to know anything about the PS5s I/O .
All is abstracted from their gamecode wich they write anyway.

Switch back to PC market:

Since the momentum of RTX I/O (snd AMDs solution for that matter)
is heavy reliant on MS Direct Storage , we will see a Vakuum in PC Sector for quite some time..
So fast streaming will indeed be a console thing for now.
So i hope that PS5s capabilitys are shown in one of its Exclusives while PC Games still use NVMe Drives the old way.
Not to mock the PC Players (iam one also actualy) but to give them a good taste what lies ahead.

Oh and iam hoping that UE5 demo gets a public release BEFORE Direct Storage is available on PC.

Just fore the people who still think that some Notebook just runs that demo better than ps5..
Note - better means same polycount along with same or higher fps ..
 
Last edited:

geordiemp

Member


Coming to turing as well, so looks iike its juts using GPU cores instead of CPU cores for decompress.

So AMD will also do this....

So Sweeny is still correct....

The faster loading is ALL in MS hands......
 
Last edited:

PaintTinJr

Member
All parties already work together, though. That's what open-standard consortium bodies are for, and have been for decades. The idea that PC vendors and partners are not working together simply isn't true whatsoever; many of them in fact converge together regularly to iterate on standards and design new ones that all partners can either license out or (more usually) freely adapt into products for the open market.

That in fact is the strength of PCs over virtually every single platform in existence, including game consoles. Only arcades (before they started more or less using PCs) compare to PCs in this regard, for even microcomputers lacked some of this. PC is an open platform, so as long as partners design to standards that consortium has established, anybody is welcome to implement it and, more importantly, innovate it.

How do you think NV's new cards are featuring GDDR6X memory? That was them and Micron collaborating together and developing an iteration on a current technology that, theoretically, anyone is allowed to make, but for various reasons it's generally 3 key memory manufacturers: Samsung, SK Hynix and Micron.

There are already solutions to the things Sweeney (and as much of a legacy he has, he's just one person in an industry filled with many excellent people past and present) might've mentioned regarding latency: multiple approaches exist to the same problem and many of them are valid within a splitting hair's distance of one another.

Which kind of highlights something we need to actually acknowledge: I think if you're only looking at the PC from a game consumer's POV, you are severely limiting your viewpoint of it. Again, PC is an open platform of a standard and there are MANY different designs always made to suit the specific needs of a market. A data center-focused PC may want something besides PCIe 4.0? Cool, they can use RapidIO 3.2, or if they want something with enforce cache coherence they can implement designs with CCX at the heart, switch fabrics, the works. There's no other platform besides, again, arguably arcades themselves (up until they started relying heavily on PC hardware) with this level of openness to provide valid solutions for virtually any design challenge a company needs to answer.

And let's not forget, at the end of the day, both PS5 and Series X are stooped strongly in that same "PC" environment of sorts, they are both using x86-64 architectures ;)

Your comment isn't based in reality and it moves the goal posts by re-framing it in terms of "open standards" which is a complete strawman.

Of course they all work together to make money, but they also are in competition and retain their own things for their best interests.

Case in point, Nvidia want an x86 license, and AMD or Intel could let them into that exclusive club, but they know Nvidia is going to eventually die on the vine if both AMD and Intel can wait it out long enough.

Nvidia have now confirmed that RTX IO decompression isn't done by a dedicated block, and what was already atleast 3x more latency, has now added two more hops to traffic flow, meaning it is way behind the PS5 IO complex solution - probably an entire generation.
 
Last edited:

pawel86ck

Banned


Coming to turing as well, so looks iike its juts using GPU cores instead of CPU cores for decompress.

So AMD will also do this....

So Sweeny is still correct....

The faster loading is ALL in MS hands......

I think Nvidia would build a dedicated decompression block into their GPU if it would be worth it. CUDA cores in Ampere GPUs can do a lot without additional complications (building a separate block dedicated just for HW decompression).
 

geordiemp

Member
I think Nvidia would build a dedicated decompression block into their GPU if it would be worth it. CUDA cores in Ampere GPUs can do a lot without additional complications (building a separate block dedicated just for HW decompression).

I dont think it really matters, all will be equally relatively good at decompression and I dont think that is where the bottleneck lies, its with Microsoft file IO system and security systems.

When Linus tech tips tested that27 GBs SSD, it was faster in Linux than in Windows....
 
Last edited:

PaintTinJr

Member
I dont think it really matters, all will be equally relatively good at decompression and I dont think that is where the bottleneck lies, its with Microsoft file IO system and security systems.

When Linus tech tips tested that27 GBs SSD, it was faster in Linux than in Windows....
And more importantly the latency difference between an extenal 16 lane 3 way (cpu, gpu, rtx io) communication - before decompression - and a 256 lane internal 2 way (cpu, io complex) communication.

It looks like Nvidia were thrown a bone by xbox, after being caught out by PS5/UE5 demo and their rtx 40 series cards will be like AMD"s onboard ssd pro cards.
 

Looks like I was wrong indeed; got 3.0 mixed up with 2.0. So basically raw overhead is the same between 3.0 and 4.0.

Your comment isn't based in reality and it moves the goal posts by re-framing it in terms of "open standards" which is a complete strawman.

Of course they all work together to make money, but they also are in competition and retain their own things for their best interests.

Case in point, Nvidia want an x86 license, and AMD or Intel could let them into that exclusive club, but they know Nvidia is going to eventually die on the vine if both AMD and Intel can wait it out long enough.

Nvidia have now confirmed that RTX IO decompression isn't done by a dedicated block, and what was already atleast 3x more latency, has now added two more hops to traffic flow, meaning it is way behind the PS5 IO complex solution - probably an entire generation.

I don't need to shift goalposts; my post was illustrating that the market is a lot wider in terms of what can be considered an open-standard than what your narrow POV was getting at. Open standards aren't simply developed to "make money"; they're designed to move the industry forward and enable more options. That doesn't remove the capitalist drive though and it's be foolish of you to assume that's something I didn't realize when writing the post.

The funny thing about x86 licenses is that even if Intel or AMD tried preventing a company like Nvidia from getting one, eventually the courts would get involved. Which is exactly how AMD were able to get an x86 license in the first place from Intel, who tried such a thing on AMD. Eventually, some level of checks-and-balances comes into the picture and if the courts decide intervention is needed to allow a market to remain open, competitive but also fair, then they will do so and there's not much corporations can do about it.

Your end-point seems to be ignoring some technological realities if I'm being perfectly honest, and just shows what I suspected you were doing from a while back; everything for you seems to keep routing back to making a statement in favor of Sony and PS5's SSD I/O. It's difficult to engage in a genuine discussion with you if every point of contention raised is simply a way for you to roundabout back to a defense for a preferred system I/O. It's a bit ridiculous IMHO. You're making these conclusions based on assumptions without taking into consideration the fact that if raw hardware throughput/capability gains are sufficient enough, not to mention actual implementation of API tools and support, algorithm use etc., then some of the things you are arguing in favor of simply fall off the side of the cliff.

But I get a sense at this point these things are being lost on you because you're still stuck in an apples-to-apples, "one to rule them all" mindset here and that's disappointing considering the insight you tend to provide otherwise. But it is what it is 🤷‍♂️
 
Last edited:

M1chl

Currently Gif and Meme Champion
Well thanks to Cerny, we are going to see a true evolution in the gaming world, his vision is truly the tide that lifts all boats in this industry.

One thing that's missing from Ampere though will be the high clocks of Navi 2x.....The higher clock speeds will be important for the bandwidth and throughput we want to push with textures and physics and even raytracing for simul operations....That hardware I/O controller that Cerny designed is the real winner, yes the SSD is great, but the PS5's 12 channel controller is the winner here, the driver that makes all the magic happen....
Cerny vision, you mean like this, which nVidia already had in store?
 

Panajev2001a

GAF's Pleasant Genius
Cerny vision, you mean like this, which nVidia already had in store?


You gotta credit executing on this which means starting even earlier. RTX30x0 will not be using Direct Storage on PC until possibly late 2021 or later as the API is not out and we will need to see the latency of the final solution compared to the consoles before judging it... rumours around PS5 talk about a very low latency solution not just a fast one.
 

M1chl

Currently Gif and Meme Champion
You gotta credit executing on this which means starting even earlier. RTX30x0 will not be using Direct Storage on PC until possibly late 2021 or later as the API is not out and we will need to see the latency of the final solution compared to the consoles before judging it... rumours around PS5 talk about a very low latency solution not just a fast one.
Sure they made it first, I am not denying, however these solution are already in working order on enterprise, there is some Nasa Mars model landing where it is shown.

That's a batch-oriented high badwidth solution. Has nothing to do with low-latency Cerny solution. Apart from some words used.
Hmm, then again do they clarified that it's also going to be lower latency through RTX i/O? Because on PC you can utilise shitload of memory, which you don't have on console...
 

psorcerer

Banned
Hmm, then again do they clarified that it's also going to be lower latency through RTX i/O? Because on PC you can utilise shitload of memory, which you don't have on console...

Dunno. We know pretty much nothing about DirectStorage and RTXIO.
I'm not sure that pumping stuff in RAM and then in VRAM can pass as a low latency solution.
But obviously you can be much better than the current PC latency when doing SSD read.
 
Last edited:
And more importantly the latency difference between an extenal 16 lane 3 way (cpu, gpu, rtx io) communication - before decompression - and a 256 lane internal 2 way (cpu, io complex) communication.

It looks like Nvidia were thrown a bone by xbox, after being caught out by PS5/UE5 demo and their rtx 40 series cards will be like AMD"s onboard ssd pro cards.

"Thrown a bone"? So we're back to the dismissive language I addressed a few days earlier and that you tried claiming wasn't actually dismissive when the connotative use indicates the opposite? Okay.

These companies research and plan this stuff out years in advance, over iterative, continuous R&D projects. Companies organized to the level of NV, MS, Sony, Nintendo etc. rarely "shit the can" and just hastily hobble a solution together. While there may be aspects to NV's RTX I/O implementation of current that could be further improved with some hardware firmware revisions, driver and/or API improvements, the state of such as it is right now in no way shows them having been "thrown a bone" and, as insinuated by such, throwing together a hastily stitched-together solution.

If you mean to belay a different takeaway you should try using a descriptive tone that gives the impression of such, because this isn't the first time you've done this (and again, it's seemingly being done with a particular angle that I didn't want to publicly state at first but at this point it's kind of hard not to suggest it :S).
 

PaintTinJr

Member
Cerny vision, you mean like this, which nVidia already had in store?

Going by the TPC-H slide at just past 10mins, it looks like this doesn't really have much comparison to the console solutions, because the latency improvements are on bulk workloads where the axis(for latency) is in increments of 250,000msec - first workload being about ~80,000msec being reduced with GPUDirectStorage to ~16,000msec (16secs?). And the second workload is reduced to 40,000msec, but obviously doesn't do the exact same work as the CPU run - because clean up is removed.

So whether the 5x latency improvement or 20-30x improvement would translate for REYES and be indicative of smaller render datasets that are only alive for 16msec or 32msec(1000 lower latency)it is hard to say, but I suspect if amortising workload setup costs in shorter workloads showed equal or better latency then they would have shown those instead.
 

PaintTinJr

Member
Looks like I was wrong indeed; got 3.0 mixed up with 2.0. So basically raw overhead is the same between 3.0 and 4.0.



I don't need to shift goalposts; my post was illustrating that the market is a lot wider in terms of what can be considered an open-standard than what your narrow POV was getting at. Open standards aren't simply developed to "make money"; they're designed to move the industry forward and enable more options. That doesn't remove the capitalist drive though and it's be foolish of you to assume that's something I didn't realize when writing the post.

The funny thing about x86 licenses is that even if Intel or AMD tried preventing a company like Nvidia from getting one, eventually the courts would get involved. Which is exactly how AMD were able to get an x86 license in the first place from Intel, who tried such a thing on AMD. Eventually, some level of checks-and-balances comes into the picture and if the courts decide intervention is needed to allow a market to remain open, competitive but also fair, then they will do so and there's not much corporations can do about it.

Your end-point seems to be ignoring some technological realities if I'm being perfectly honest, and just shows what I suspected you were doing from a while back; everything for you seems to keep routing back to making a statement in favor of Sony and PS5's SSD I/O. It's difficult to engage in a genuine discussion with you if every point of contention raised is simply a way for you to roundabout back to a defense for a preferred system I/O. It's a bit ridiculous IMHO. You're making these conclusions based on assumptions without taking into consideration the fact that if raw hardware throughput/capability gains are sufficient enough, not to mention actual implementation of API tools and support, algorithm use etc., then some of the things you are arguing in favor of simply fall off the side of the cliff.

But I get a sense at this point these things are being lost on you because you're still stuck in an apples-to-apples, "one to rule them all" mindset here and that's disappointing considering the insight you tend to provide otherwise. But it is what it is 🤷‍♂️
I've read what you wrote, but the gist of it again is, that Cerny, and now Sweeney and the UE5 demo development team are being dishonest on the reality of the solution on the real-world throughput because of reasons.

As for the licensing issue of x86, well it turns out there's two licenses, one for historical Intel x86 patents and a newer AMD's x64 license - licenses that both Intel and AMD cross license following AMD's win against Intel. So nvidia would have to find a way to get both licenses, and after their settled legal dispute with Intel, that including them not building solutions that could emulate x86, and it didn't grant them a license for x86, in exchange for getting a big cash settlement and access to other Intel patents - that would stop them getting sued by Intel while developing their GPGPU-ARM solution (like what's in the Switch) - which they were happy with, they might have a task convincing a judge that it wasn't just poor business judgement that led them to this situation; or atleast until they sought to buy an Intel x86 license, that apparently VIA technologies has, and could be acquired.
 
Last edited:

PaintTinJr

Member
"Thrown a bone"? So we're back to the dismissive language I addressed a few days earlier and that you tried claiming wasn't actually dismissive when the connotative use indicates the opposite? Okay.

These companies research and plan this stuff out years in advance, over iterative, continuous R&D projects. Companies organized to the level of NV, MS, Sony, Nintendo etc. rarely "shit the can" and just hastily hobble a solution together. While there may be aspects to NV's RTX I/O implementation of current that could be further improved with some hardware firmware revisions, driver and/or API improvements, the state of such as it is right now in no way shows them having been "thrown a bone" and, as insinuated by such, throwing together a hastily stitched-together solution.

If you mean to belay a different takeaway you should try using a descriptive tone that gives the impression of such, because this isn't the first time you've done this (and again, it's seemingly being done with a particular angle that I didn't want to publicly state at first but at this point it's kind of hard not to suggest it :S).

I stand by that comment, and I think that with Playstation and Xbox being ready to release in less than a hundred days, but nvidia still only in developer preview stage and failing to price the RTX IO NIC part of the solution is evidence of that.

Equally, the RTX 3070 price is inconsistent and odd. IMHO it is either because the RTX IO NIC is going to cost almost the same, or because the 30 series solution is considered a working prototype and they expect to burn those buyers early with a replacement 40 series product that will be as harsh and as rapid as the transition to Opengl2.1/DX10 GPUS - for customers that got in early on Opengl2/DX9b at a high cost.
 
I've read what you wrote, but the gist of it again is, that Cerny, and now Sweeney and the UE5 demo development team are being dishonest on the reality of the solution on the real-world throughput because of reasons.

Okay, let's stop for a moment and examine this, because I NEVER said this and NEVER insinuated this to anywhere near the degree you are thinking I did. I merely said that, at the Road to PS5 presentation, Cerny may've embellished on a few aspects of their design and downplayed parts of the design course that didn't necessarily play to their strengths, because even though it was a technical dissertation, there was also SOME element of marketing involved, both to appeal to devs and also to appeal to regular press and gamers who would be watching the presentation.

That's all I said. I even said other companies like Microsoft and Nvidia have done similar and will continue to do so in the future, same with Sony. But you've taken this as a stronger affront than anticipated and I have no idea why. As for Sweeney, I just said he is one mind of many minds in the industry, and just because he or even Carmack feel a certain way about a technology doesn't mean they are the sole authoritative voice. This is 100% true and I will stand by it. Just as one of many examples, Carmack, if he had his way, would've probably said back in the day the Saturn was incapable of handling DOOM because he prevented his team from using a custom engine purpose-built for the Saturn to do the port for...whatever the hell reason. Yet that assertion, so should he had ever believed it, would be false because Lobotomy did arguably some of the best ports of PC shooters for any system that gen, which just happened to be on the Saturn (Exhumed/Powerslave springs to mind immediately, but their port of Quake for the system was also amazing).

I could go in on Sweeney's company having a $250 million investment from Sony as means of indicating how that could also have flavored some of his statements (never mind the fact Epic has a history of demonstrating tech demos on Sony systems going back to the PS2), but I'll refrain from doing such because IMHO it's not a major factor into his statements at all and I'd rather not give the impression he was doing the equivalent of a paid testimonial. He might just genuinely really like Sony's I/O solution and I don't blame him at all because it's very impressive work. I just think it's not the ONLY impressive answer to resolving I/O bottlenecks, and don't take the words of guys like Sweeney as an authoritative gospel to reign over all others.

As for the licensing issue of x86, well it turns out there's two licenses, one for historical Intel x86 patents and a newer AMD's x64 license - licenses that both Intel and AMD cross license following AMD's win against Intel. So nvidia would have to find a way to get both licenses, and after their settled legal dispute with Intel, that including them not building solutions that could emulate x86, and it didn't grant them a license for x86, in exchange for getting a big cash settlement and access to other Intel patents - that would stop them getting sued by Intel while developing their GPGPU-ARM solution (like what's in the Switch) - which they were happy with, they might have a task convincing a judge that it wasn't just poor business judgement that led them to this situation; or atleast until they sought to buy an Intel x86 license, that apparently VIA technologies has, and could be acquired.

Welp it's a good thing x86 and x86-64 aren't the only viable solutions out there. Nvidia's been implementing ARM and FPGA-based cores into their GPUs for a while now. Open platforms like ARM and RISC-V are only getting better and better, and that's ultimately to the boon of companies like Nvidia.

I stand by that comment, and I think that with Playstation and Xbox being ready to release in less than a hundred days, but nvidia still only in developer preview stage and failing to price the RTX IO NIC part of the solution is evidence of that.

Equally, the RTX 3070 price is inconsistent and odd. IMHO it is either because the RTX IO NIC is going to cost almost the same, or because the 30 series solution is considered a working prototype and they expect to burn those buyers early with a replacement 40 series product that will be as harsh and as rapid as the transition to Opengl2.1/DX10 GPUS - for customers that got in early on Opengl2/DX9b at a high cost.

This is a very pessimistic and not exactly neutral takeaway, but I don't think it's based on a balanced look into the state of things. RTX I/O is dependent on progress of DirectStorage since the underlying GPUDirectStorage technology literally has DirectStorage in the name. You're basically calling Nvidia incompetent because one part of their GPU design is still in developer preview stage, yet the cards are releasing in three weeks? That is a bad take on your part IMHO.

To the second part I guess we'll just have to wait and see. NV could be setting pricing as they see it for a variety of reasons; all I've seen from various people on the 3070 pricing is that it's lower than people were thinking and that it's a seemingly good price. Many, many channels, posters, websites etc. have been of this sentiment. Virtually none have put the pricing into a perspective of it being "shady" or negative the way you're looking at it, so your viewpoint on that front is definitely within the minority.
 

RayHell

Member
Dunno. We know pretty much nothing about DirectStorage and RTXIO.
I'm not sure that pumping stuff in RAM and then in VRAM can pass as a low latency solution.
But obviously you can be much better than the current PC latency when doing SSD read.
Exactly.
And I would add that Direct Storage / RTXIO is using CPU process to speak to the huge amount of different drive controller, using different amount of channel and priorities.

RTXIO is great for fast loading but it's not gonna fundamentally change game engine like PS5 storage solution does.
No game dev will make streaming assets quality based on RAM and Storage speed. There's too much variation of those component on PC to rely on such concept.
So instead of reducing the amount of VRAM buffer, they will use the same "next 30 seconds of gameplay" buffer loaded in ram and Vram.
On a closed system like PS5, dev can reduce buffer to the minimum because the IO throughput is always the same. Freeing VRAM and using SSD as slower virtual RAM.
Of course i'm talking of PS5 1st party games only. XSX could do the same but their commitment to PC gaming make me believe that they wont build their engine around this concept.
 
Last edited:
Exactly.
And I would add that Direct Storage / RTXIO is using CPU process to speak to the huge amount of different drive controller, using different amount of channel and priorities.

RTXIO is great for fast loading but it's not gonna fundamentally change game engine like PS5 storage solution does.
No game dev will make streaming assets quality based on RAM and Storage speed. There's too much variation of those component on PC to rely on such concept.
So instead of reducing the amount of VRAM buffer, they will use the same "next 30 seconds of gameplay" buffer loaded in ram and Vram.
On a closed system like PS5, dev can reduce buffer to the minimum because the IO throughput is always the same. Freeing VRAM and using SSD as slower virtual RAM.
Of course i'm talking of PS5 1st party games only. XSX could do the same but their commitment to PC gaming make me believe that they wont build their engine around this concept.

How exactly does this come into play knowing Sony are committing to bringing more PS5 1st-party titles to PC with "shorter cash conversion cycles" (i.e less time gaps between PS5 and PC version releases)?

It's almost like devs can scale their various engine designs for ranges of hardware configurations even if they happen to also be developing on a system like PS5, and even on PC, can choose to ignore certain hardware calibrations as ranges to include in performance platforms for their software to run on. A decent number of PC games do that already.

In other words; devs can design varying numbers of scalable buffer windows or prefetch texture schemes that target performance on different hardware configurations. They've already been doing this with graphics settings for decades, pretty sure they have the experience in this department.

Fail to see anything you're mentioning being a massive differentiating factor for consoles compared to PC. The main advantage the consoles will have is ease-of-use in that it's just one spec to configure to. But it's not necessarily a Herculean effort to configure for multiple specs, either, even for only modest-sized teams. It's nothing new.
 

RayHell

Member
How exactly does this come into play knowing Sony are committing to bringing more PS5 1st-party titles to PC with "shorter cash conversion cycles" (i.e less time gaps between PS5 and PC version releases)?

Below of the full list of PS5 games that will be released on PC:
  • Bugsnax
  • Deathloop
  • Ghostwire: Tokyo
  • GodFall
  • Goodbye Volcano High
  • Hitman III
  • JETT: The Far Shore
  • Kena: Bridge of Spirits
  • Little Devil Inside
  • Oddworld: Soulstorm
  • Pragmata
  • Project Athia
  • Resident Evil VIII
  • Solar Ash
  • Stray
No first party title.

It's almost like devs can scale their various engine designs for ranges of hardware configurations even if they happen to also be developing on a system like PS5, and even on PC, can choose to ignore certain hardware calibrations as ranges to include in performance platforms for their software to run on. A decent number of PC games do that already.

Of course everything is possible with the right planning. Lets say the the whole level is 100 Gigs but i'm using 7GB/s of data streaming continuously. I don't need to preload anything Vram other than few seconds of buffer. I doesn't matter if I change room or teleport.

On PC you have to deal with those who have 500MB/s SSD or even 100Mb/s HDD.
So either you preload the whole level in RAM and get the same level of quality (and ask for 128GB of RAM).
Or you lower the quality to cut down on loading time and RAM. (I can really see the frustration of 3700 owner who got his assets quality cut down in quality because of his SSD)
Or you create a version of the game with "elevator sequence" for background load that wasn't needed in the PS5 version but I don't see that happening.

My opinion is that 1st party will try to make the most of PS5 data speed so they can claim it's only possible on PS5.

In other words; devs can design varying numbers of scalable buffer windows or prefetch texture schemes that target performance on different hardware configurations. They've already been doing this with graphics settings for decades, pretty sure they have the experience in this department.

Scaling down on graphics quality based on RAM and drive speed might not be the best idea if you target mass market.
 

Rikkori

Member
Transcript (yes, it's a mess, deal with it :p ):

Rtx io really solves, or is meant to solve, two problems in in the pc gaming experience the first is it makes the io operations themselves much more efficient. So there's some new low-level software and some parallelized apis that we've worked on and we work with Microsoft on with direct storage that makes fundamentally reading, particularly reading from nvme or SSDs much more efficient, meaning you can get a lot more IO operations per CPU which so what that means is you can get a lot more out of your storage with a lot less cpu overhead.

And that's something we actually already talked about, right, cpu limitedness, so we want to get the cpu out of the way as much as possible.
The second thing that rtx io enables is gpu based decompression - so game developers have started using or they've been using lossless compression, not lossy - lossless compression, for quite some time and that's to reduce the file size on the disk and in particular for pcs that have, or or game platforms that have, relatively slow storage say like a hard drive, lossless compression allows you to keep the the data stored on the hard drive compressed, it's read off the hard drive and then historically decompressed by the cpu for use - and you get roughly a two to one amplifier of your bandwidth, so if you were getting a 100 MB/s off of a hard drive you now get 200 MB/s worth of io and your install size might have been half as big.

When it's a hard drive and you have lots of cpu cores that's maybe okay, but when you have a 7 GB/s ssd there's just no practical way a reasonable cpu can keep up, you might keep 20 cpu cores fully saturated decompressing off of a 7 GB/s nvme drive. So what we've done is we've implemented the compression on the gpu side along with those efficient api so you can read in a super efficient manner from the ssd, get the data to the gpu in a compressed format which means you're moving it around in the system in its most efficient manner, have the gpu decompress it, and you get all the benefits of that you know compression 2:1 amplification to bandwidth 2:1 reduction in store size, without burdening the cpu.
So what that'll mean is potentially much much faster load times and in the streaming case you could have large open worlds where you have less stuttering more detail that kind of stuff and because it's offloading from the cpu in in cpu bound circumstances.
So developers will have to implement direct storage or rtx io so it requires app side integration but for the developers that do yeah they would see a cpu offload or a reduction in cpu utilization in some cases quite dramatically.

I think it can benefit all gamers, in fact you know a lot of the games that have the longest load times aren't even the competitive games they can be open world rpgs where they're loading gigabytes and gigabytes of data just to start the game and then you're navigating through these large open worlds and again moving potentially gigabytes of data around as you shift between biomes and stuff like that.
So personally i think all gamers can benefit and i can benefit current games because they can just generally load faster and the way most games work today is they'll kind of what i call bulk load they'll load for the level or the world that you're in and then when you go to the next level world you'll be this pause where it does some loading, or they'll have some kind of tricks they'll do, basically trigger points will start loading the next scenes. Destiny for example when you go to between planets that time when you're flying in your spaceship that's not just there to give you a pretty trip in your spaceship they're doing loading they're doing io in those cases so you can cut those kinds of things down pretty dramatically.

Rtx io is able to saturate way beyond a gen 4 ssd in terms of bandwidth so even if you have a x4 gen 4 you know m2 you know the latest generation like the 980 pro that samsung just announced right 7 GB/s sustained read we can sync meaning we can absorb a compressed stream far beyond that - in fact if you look at the bus interface to an ampere gpu is a x16 gen 4, and we should be able to sync that which would be multiple gen 4 ssds, so you could build a big raid array of gen 4 ssd and we could we could sync that and we would we'd amplify the bandwidth of all of that.
 

T-Cake

Member
This sounds to me like several years in the making yet. The games have to be coded for the DirectStorage solution to take advantage of it. How long has DLSS/RTX been available and how many games are there using it now? I think you can count them on one hand. So maybe it will be 2022/3 before we even see anything using DirectStorage.
 
Last edited:
Top Bottom