• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Microsoft Xbox Series X's AMD Architecture Deep Dive at Hot Chips 2020

Panajev2001a

GAF's Pleasant Genius
we're talking a pittance overall in terms of raw power between Series X's audio solution and Tempest, barely 10 GFLOPs if not even less.
Just like PS3 should be faster with its 2 TFLOPS than Xbox One S’s 1.3 TFLOPS. You are comparing fixed function “equivalent FLOPS” numbers with a single unit (not the entirety of the audio processing logic PS5 has).

Not like people oversimplified things before eh?... Like the commons belief that sound processing on Xbox One and PS4 is all software based running on the CPU...

If MS provided a rough comparison of their audio performance to greater than One X's CPU, given this was at a technical presentation and not an E3-style presser, then it should be assumed that is the level of performance even if it is not "fully programmable" in the way Tempest might be.

Well it could be assumed... shows you that sometimes assuming leads to the wrong conclusion. Hotchips is sometimes used, depending on the company, to boast and make yourself look great and hire people too (look at Intel at Hotchips this year, it was boasting season). Some companies sometimes embellish some details or patch together numbers to seem to more impressive.

Notice how much of the murkiness around these numbers and the maths/methods used to produce them on the Xbox side are all around items where the narrative is that PS5 has strengths: SSD and I/O and Sound Processing.
The pattern is clear: either the advantage is in something called useless or a word soup follows to show how if you look at it in a certain way XSX is just as good at those things too.
 

Panajev2001a

GAF's Pleasant Genius
People kept saying that about GDDR5 vs DDR3 in the PS4 vs Xbox One and the memory specification sheets showed the latency was the same on the memory chip itself. We also have geekbench tests of PS4 vs jaguar based laptops and memory latency was about the same in both (120ns). It would be interesting if we can find any tests of that subor(sp?) Zen based console that had GDDR5 memory. DF had some hands on time with it but didnt do any memory latency testing so boo on them.

We have no empirical evidence via memory latency tests that GDDR6 has higher inherent latency than DDR4 when paired with the same CPU so it cannot be categorically stated that GDDR6 has higher latency because it is GDDR6. I expect it may be higher than 3600 C16 because the consoles will probably stick to JEDEC standards but I would not be surprised if the console memory latency was the same or lower than JEDEC 3200 C20.

Never stopped people using Intel’s Rambus implementation to judge Rambus Direct RDRAM solution and XDR later on as bearing any effect on how it worked in PS2, PS3, or Alpha EV7/8 albeit EV8 never came out (and being wrong, not the EV8... of course :D).

Solutions get adapted and especially so in closed systems, not open PC’s, they are optimised holistically and adapted to perform at their best.
 
Last edited:
People kept saying that about GDDR5 vs DDR3 in the PS4 vs Xbox One and the memory specification sheets showed the latency was the same on the memory chip itself. We also have geekbench tests of PS4 vs jaguar based laptops and memory latency was about the same in both (120ns). It would be interesting if we can find any tests of that subor(sp?) Zen based console that had GDDR5 memory. DF had some hands on time with it but didnt do any memory latency testing so boo on them.

We have no empirical evidence via memory latency tests that GDDR6 has higher inherent latency than DDR4 when paired with the same CPU so it cannot be categorically stated that GDDR6 has higher latency because it is GDDR6. I expect it may be higher than 3600 C16 because the consoles will probably stick to JEDEC standards but I would not be surprised if the console memory latency was the same or lower than JEDEC 3200 C20.
I'd like to add these to the discussion:
Eft3as4XoAgRJ-i


Can we please end this myth? :) No one in their right mind believes OG XB1 (DDR3) is a better console compared to XB1X (GDDR5), even in latency-sensitive tasks. Even die hard XBOX fanatics have abandoned this narrative.
 

Fafalada

Fafracer forever
If you break down the transfer speed of the Quick Resume feature for Forza Horizon 3, 18 GB over the course of 6.3 seconds, that actually breaks the per second raw bandwidth for the drive down to 2.857 GB/s.
There's no scenario where memory-states will be 100% of available memory. Games are lucky to approach 90% occupancy in best cases (and many will be less aggressive) and compression will cut that down significantly - realistic numbers of what's on-disk will be closer to half of that. Even less for many 1X enhanced games that don't really have anything useful to fill the extra memory with.
Anyway - point is you're not looking at raw transfers either way - the compression hardware is very much in use, putting that 2.8GB/s on the lower-mid-end of MSs numbers - not exactly conservative.
 

pawel86ck

Banned
you're not looking at raw transfers either way - the compression hardware is very much in use, putting that 2.8GB/s on the lower-mid-end of MSs numbers - not exactly conservative.
What compression hardware? XSX has only BCpack HW decompression, there's no HW compression.
 
Last edited:
Slowly this became a PS5 thread. Dat @#$% panic concerns, nervousness in the air are so adorable.

What's really weird that every MS thread with legit, backed up by facts news these goddamn Lunatics go nuts to defend their gummy plastic of choice..relax 40yo's, yall have NOT seen nothing yet!. Wait for DF analysis, holiday line-up and actually see games running on XsX..that shit will make you humble, facts!!

Tired of this bs, but at the end of the day, we're at SonyGAF, afterall .
Funny thing is you and other posters with similar off topic posts and baiting tactics are what drive these threads in this direction (along with fanboys on both sides). If you would just ignore them they will leave and this thread wouldnt turn into another console war battle field, but hey thanks for your contribution.
 

Fafalada

Fafracer forever
What compression hardware?
Sorry - the 'Velocity Architecture'.
Anyway yes if you want to be pedantic - 'decompression hw' - I don't think there's a dedicated compressor in there but those 8 Zen cores are plenty to realtime compress a binary stream on suspend when nothing else is happening.
 

pawel86ck

Banned
Sorry - the 'Velocity Architecture'.
Anyway yes if you want to be pedantic - 'decompression hw' - I don't think there's a dedicated compressor in there but those 8 Zen cores are plenty to realtime compress a binary stream on suspend when nothing else is happening.
BCpack is texture compression format. XSX has no HW decompression for other formats, so are you suggesting MS can compress whole 9GB RAM dump using BCpack?
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
Really? Lol. Is this like a Me, Myself, and Irene thing? Do you turn into Hank when you come into Xbox threads, defend and propagate, then just not remember?

I do not take seriously what is not to the taken seriously :). If you stopped thinking about this as your little echo chamber you own and where no disagreement can be had then we can discuss. It not like either of you is making a point.
 

Redlight

Member
You are cute 😊. Wrong, mostly projecting, but cute nonetheless (next time read the whole thing and comment in context please).
I am cute, at least we agree on that. It's odd that you accuse me of projecting when, let's face it, the 'Panajev 2001a' actually does sound like a projector model. :)

Though not a 4K projector of course.
 
Just like PS3 should be faster with its 2 TFLOPS than Xbox One S’s 1.3 TFLOPS. You are comparing fixed function “equivalent FLOPS” numbers with a single unit (not the entirety of the audio processing logic PS5 has).

Not like people oversimplified things before eh?... Like the commons belief that sound processing on Xbox One and PS4 is all software based running on the CPU...



Well it could be assumed... shows you that sometimes assuming leads to the wrong conclusion. Hotchips is sometimes used, depending on the company, to boast and make yourself look great and hire people too (look at Intel at Hotchips this year, it was boasting season). Some companies sometimes embellish some details or patch together numbers to seem to more impressive.

Notice how much of the murkiness around these numbers and the maths/methods used to produce them on the Xbox side are all around items where the narrative is that PS5 has strengths: SSD and I/O and Sound Processing.
The pattern is clear: either the advantage is in something called useless or a word soup follows to show how if you look at it in a certain way XSX is just as good at those things too.

You obviously either have bad reading skills or selectively chose to ignore things I've mentioned throughout the thread; in several posts I've already said PS5 has a more general ("generic" if you want to call it that) audio DSP. However, it's nothing much worth really speculating about since Cerny himself barely talked about it; that audio core is present for traditional audio tasks like in virtually any other modern electronics device.

Keep getting upset with the fact I'm simply using Sony and Microsoft's own rough comparisons as reference points? Then take your issues up with them, not me. You guys keep thinking I'm speaking to the audio processors as LITERALLY being PS4 or One X CPU audio cores in capability; no, I have just used them as rough comparisons, the same way Sony and MS did. Believe it or not, many audio DSP manufacturers list GFLOP capability of MAC operations on their chips, there were even posters ITT who thought FLOPs were only used for GPUs! Go figure...

I find it incredibly funny you're insinuating Hot Chips can be used as pseduo-PR, yet you are ignorant to the reality that Cerny did a good bit of his own PR in Road to PS5. Touches of PR is something regularly done even at tech-focused presentations, since all companies want to present their products in the best light. But it's nowhere near the level you'd get in marketing pressers or the such. We already know you have certain feelings IRT Sony and MS as you've stated your bias multiple times in the past, and I'm going to be honest and say you're letting that pepper your assessment on my comments a little too strongly. You're also letting it pepper your assessment of the Hot Chips presentation in a rather ludicrous fashion, it fits in with some of the double-standards stuff I mentioned in the past when it comes to how some posters treat dissertations and divulging of info from Sony and Microsoft, often based on past events that are not even necessarily relevant anymore.

That's all I really have to say to you in regards to this.

I do not take seriously what is not to the taken seriously :). If you stopped thinking about this as your little echo chamber you own and where no disagreement can be had then we can discuss. It not like either of you is making a point.

You do realize your posts have now devolved into nothing much more than using any info others are discussing from the Series X Hot Chips presentation....to for whatever reason defend the PS5, right? It actually is kind of funny because I know at least myself, was never attacking the PS5. Not immediately mentioning PS5's more generic/general audio DSP is not "attacking" PS5. Stating that bandwidth contention on the RAM won't produce effective bandwidth drops the likes some posters like Lady Gaia and NXGamer have tried implying in the past (and pointing out they weren't even considering technologies MS have present to either eliminate or cut down the issues they raised), is not "attacking" PS5. Saying MS's raw SSD bandwidth figures were conservative when simple calculations of demonstrations show them hitting bandwidths quite higher than what paper specs have mentioned, is not "attacking" PS5. Saying Series X's audio solution is remarkably close/even with Sony's is not "attacking" PS5, especially when we have official statements and numbers/info to prove this.

It seems like for some people, any conclusions WRT to Series X that are outside of a stupidly accepted narrative in terms of strengths or weaknesses is viewed as an "attack" against the PS5 and honestly it's getting retarded. What happened to the wishes months earlier for the systems to be close to each other in overall capability? Series X being a lot more competitive in areas like SSD I/O and audio than some people were comfortable in entertaining, does not make the PS5 any less impressive a design, and it certainly isn't an "attack" on PS5, either. It's ridiculous of anyone who might prefer PS5 to feel threatened enough to assume these things as affronts, full stop. However the way some people with that preference have been engaging in this thread the past few pages, it would seem they are almost offended at the idea of some of the stuff I've mentioned either being true or having a strong probability of being true, and often with at least some data to back it up.

There's nothing wrong with discussing the systems relative to one another ITT but there's always the fact that, at some point, it turns into one side trying in earnest to discredit or downplay any speculation, analysis, or outright confirmed info that plays outside the common accepted (tolerated) narrative in relation to the next-gen systems, and there's always a stronger propensity to sharpen this proverbial dagger against MS compared to Sony. That's the (unfortunate) reality, and I think the fact gaming discussion has become so polarized among so many on both sides is part of the problem. However, again, when you have a lot more Sony/PS5 people in general, given the virulence of that behavior runs similar in both camps, you simply end up with a lot more of #TeamBlue™ doing it than #TeamGreen™. It makes healthy discussion of even MS systems in isolation almost impossible to sustain for a given measure of time similar to what we see with Sony systems on the boards, never mind once people start throwing both into the mix to compare and contrast their features. By that point, it almost always feels like threads become the gaming equivalent of a CNN vs. Fox News panel debate.

There's no scenario where memory-states will be 100% of available memory. Games are lucky to approach 90% occupancy in best cases (and many will be less aggressive) and compression will cut that down significantly - realistic numbers of what's on-disk will be closer to half of that. Even less for many 1X enhanced games that don't really have anything useful to fill the extra memory with.
Anyway - point is you're not looking at raw transfers either way - the compression hardware is very much in use, putting that 2.8GB/s on the lower-mid-end of MSs numbers - not exactly conservative.

Why are you bringing up disk space for the saving state of the Quick Resume? Never brought that up, it wasn't to what was being touched on. In any case, the 9 GB of RAM the game occupies (talking One X games here) still has to be read for compressing onto the SSD, and then decompressed when populated back in the memory pool. The FH3 Quick Resume demo did this within 6.3 seconds, including any other backend tasks that had to be handled.

The numbers still stand.
 
Last edited:

Redlight

Member
Oh ad hominem... cute.
Wow. I fear that you may be completely lacking in self awareness. My response was to this...
You are cute 😊. Wrong, mostly projecting, but cute nonetheless (next time read the whole thing and comment in context please).
If that's not ad hominem, then what is? Let me know if you need a definition of 'hypocrisy'.
 

GODbody

Member
Notice how much of the murkiness around these numbers and the maths/methods used to produce them on the Xbox side are all around items where the narrative is that PS5 has strengths: SSD and I/O and Sound Processing.
The pattern is clear: either the advantage is in something called useless or a word soup follows to show how if you look at it in a certain way XSX is just as good at those things too.
The narrative that PS5 has strengths in those areas becomes more and more eroded as we gain more information about the Series X. PS5 has high raw SSD speeds I/O bandwidth. Series X answers this with 40% of the raw bandwidth and a 60% bandwidth and memory savings using SFS.
PS5 has a high end custom audio tech that is capable of powerful audio. Series X answers this with high end custom audio tech that appears to be just as, or a negligible amount less, capable. You only see word soup in here because there are characters that enjoy trying to downplay any of the Series X's tech because it doesn't fit the narrative that was created by Sony for the PS5.

It becomes more clear as we gain more details about the Series X that the reason Sony leaned so heavily into the 'strengths' of their I/O Bandwidth and Audio customizations in Road to PS5 was because their system is not as impressive as the Series X in almost any other area. Then you learn more information about the Series X and those 'strengths' seem to vanish altogether.
 

Fafalada

Fafracer forever
BCpack is texture compression format. XSX has no HW decompression for other formats

https://www.techradar.com/uk/news/velocity-architecture-is-the-soul-of-xbox-series-x-heres-what-it-does said:
Hardware Accelerated Decompression
The Xbox Series X's utilizes an industry standard LZ decompressor, while also making use ...
 

Fafalada

Fafracer forever
Why are you bringing up disk space for the saving state of the Quick Resume?
You were using it to compute 'GB/sec'. In terms of SSD speeds - the only GBs that matter are those actually written out/read from physical storage.
To put it another way - we're just discussing 2-6GB/s MS estimate that includes compression acceleration, making that - well, the expected result.
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
You obviously either have bad reading skills or selectively chose to ignore things I've mentioned throughout the thread; in several posts I've already said PS5 has a more general ("generic" if you want to call it that) audio DSP. However, it's nothing much worth really speculating about since Cerny himself barely talked about it; that audio core is present for traditional audio tasks like in virtually any other modern electronics device.

Keep getting upset with the fact I'm simply using Sony and Microsoft's own rough comparisons as reference points? Then take your issues up with them, not me. You guys keep thinking I'm speaking to the audio processors as LITERALLY being PS4 or One X CPU audio cores in capability; no, I have just used them as rough comparisons, the same way Sony and MS did. Believe it or not, many audio DSP manufacturers list GFLOP capability of MAC operations on their chips, there were even posters ITT who thought FLOPs were only used for GPUs! Go figure...

I find it incredibly funny you're insinuating Hot Chips can be used as pseduo-PR, yet you are ignorant to the reality that Cerny did a good bit of his own PR in Road to PS5. Touches of PR is something regularly done even at tech-focused presentations, since all companies want to present their products in the best light. But it's nowhere near the level you'd get in marketing pressers or the such. We already know you have certain feelings IRT Sony and MS as you've stated your bias multiple times in the past, and I'm going to be honest and say you're letting that pepper your assessment on my comments a little too strongly. You're also letting it pepper your assessment of the Hot Chips presentation in a rather ludicrous fashion, it fits in with some of the double-standards stuff I mentioned in the past when it comes to how some posters treat dissertations and divulging of info from Sony and Microsoft, often based on past events that are not even necessarily relevant anymore.

That's all I really have to say to you in regards to this.



You do realize your posts have now devolved into nothing much more than using any info others are discussing from the Series X Hot Chips presentation....to for whatever reason defend the PS5, right? It actually is kind of funny because I know at least myself, was never attacking the PS5. Not immediately mentioning PS5's more generic/general audio DSP is not "attacking" PS5. Stating that bandwidth contention on the RAM won't produce effective bandwidth drops the likes some posters like Lady Gaia and NXGamer have tried implying in the past (and pointing out they weren't even considering technologies MS have present to either eliminate or cut down the issues they raised), is not "attacking" PS5. Saying MS's raw SSD bandwidth figures were conservative when simple calculations of demonstrations show them hitting bandwidths quite higher than what paper specs have mentioned, is not "attacking" PS5. Saying Series X's audio solution is remarkably close/even with Sony's is not "attacking" PS5, especially when we have official statements and numbers/info to prove this.

It seems like for some people, any conclusions WRT to Series X that are outside of a stupidly accepted narrative in terms of strengths or weaknesses is viewed as an "attack" against the PS5 and honestly it's getting retarded. What happened to the wishes months earlier for the systems to be close to each other in overall capability? Series X being a lot more competitive in areas like SSD I/O and audio than some people were comfortable in entertaining, does not make the PS5 any less impressive a design, and it certainly isn't an "attack" on PS5, either. It's ridiculous of anyone who might prefer PS5 to feel threatened enough to assume these things as affronts, full stop. However the way some people with that preference have been engaging in this thread the past few pages, it would seem they are almost offended at the idea of some of the stuff I've mentioned either being true or having a strong probability of being true, and often with at least some data to back it up.

There's nothing wrong with discussing the systems relative to one another ITT but there's always the fact that, at some point, it turns into one side trying in earnest to discredit or downplay any speculation, analysis, or outright confirmed info that plays outside the common accepted (tolerated) narrative in relation to the next-gen systems, and there's always a stronger propensity to sharpen this proverbial dagger against MS compared to Sony. That's the (unfortunate) reality, and I think the fact gaming discussion has become so polarized among so many on both sides is part of the problem. However, again, when you have a lot more Sony/PS5 people in general, given the virulence of that behavior runs similar in both camps, you simply end up with a lot more of #TeamBlue™ doing it than #TeamGreen™. It makes healthy discussion of even MS systems in isolation almost impossible to sustain for a given measure of time similar to what we see with Sony systems on the boards, never mind once people start throwing both into the mix to compare and contrast their features. By that point, it almost always feels like threads become the gaming equivalent of a CNN vs. Fox News panel debate.

You keep inserting these bits about me stating my bias and this supposedly peppering my opinion on this not sure why, but well played as you have a good way with words, kudos. I made it very simple, did not say their HotChips presentation being a marketing stunt, but you know what I was saying and why companies go there to begin with.

One set of numbers talk about a single unit, the other sums up all possible audio related DSP’s no matter what: like PS3 is 2 TFLOPS (seen it on a slide).
Is it the intention of the side to boast and as a reply to Tempest or are people just using it as such? It does not matter really.

No need to get aggressive or defensive. You disagree fine: I find it funny that we dance between “<5.5 GB/s SSD or Tempest> is overkill and XSX does not need it because Y” and “XSX actually matches that thanks to trump card Z”.
The systems are closer to each other because of the various pro’s and con’s: the XSX has higher GPU shader performance period but trying to make it so that is the only difference and having a word soup trying to claim to be above all the partisan crap in good faith is either misguided or disingenuous.

Believe it or not, many audio DSP manufacturers list GFLOP capability of MAC operations on their chips,
I believe it and that was my point. I just find it technically correct yet out of place in this talk comparing the two audio solutions.

MS did not state that their solution was comparable to Tempest, it is forums fans that did that either rushing to conclusions or being disingenuous to make the XSX look better as if it needed their help.

MS technically just described overall FLOPS count of their DSP’s and their number is above 100 GFLOPS and it still looks like an impressive audio solution (and a bit of an evolution over Xbox One’s SHAPE, http://www.redgamingtech.com/playstation-4-audio-dsp-based-on-amds-trueaudio-technology/ and https://www.neogaf.com/threads/xbox...y-examples-of-good-utilization-so-far.988271/ where MS made a better difference between programmable and fixed function, and PS4’s TruAudio solution, http://www.redgamingtech.com/playstation-4-audio-dsp-based-on-amds-trueaudio-technology/ ).
 
Last edited:

Aladin

Member
You keep inserting this bits about me stating my bias and this supposedly peppering my opinion on this not sure why, but well played as you have a good way with words, kudos. I made it very simple, did not say their HotChips presentation being a marketing stunt, but you know what I was saying and why companies go there to begin with.

One set of numbers talk about a single unit, the other sums up all possible audio related DSP’s no matter what: like PS3 is 2 TFLOPS (seen it on a slide).
Is it the intention of the side to boast and as a reply to Tempest or are people just using it as such? It does not matter really.

No need to get aggressive or defensive. You disagree fine: I find it funny that we dance between <5.5 GB/s SSD or Tempest> is overkill and XSX does not need it because Y and XSX actually matches that thanks to trump card Z.
Should you not reserve your judgement on which audio unit is better till you have sufficient information from yours truly cerny ? Why are you being passive aggressive ? You cannot counter somebody's opinion arrived at by known facts, by claiming to have missed some secret sauce.
 

Panajev2001a

GAF's Pleasant Genius
The narrative that PS5 has strengths in those areas becomes more and more eroded as we gain more information about the Series X. PS5 has high raw SSD speeds I/O bandwidth. Series X answers this with 40% of the raw bandwidth and a 60% bandwidth and memory savings using SFS.
PS5 has a high end custom audio tech that is capable of powerful audio. Series X answers this with high end custom audio tech that appears to be just as, or a negligible amount less, capable. You only see word soup in here because there are characters that enjoy trying to downplay any of the Series X's tech because it doesn't fit the narrative that was created by Sony for the PS5.

It becomes more clear as we gain more details about the Series X that the reason Sony leaned so heavily into the 'strengths' of their I/O Bandwidth and Audio customizations in Road to PS5 was because their system is not as impressive as the Series X in almost any other area. Then you learn more information about the Series X and those 'strengths' seem to vanish altogether.

This is what I was saying people dancing between strengths are not strengths because they are a waste of time (“lol 2s to 1s loading times lolololol”) and magic trump cards or taking anything that could be stretched to mean that actually those advantages are not really there at all (we already talked about SFS not being a magic bullet and nobody ever making a claim that it was such a massive jump over what developers could achieve with PRT based virtual texturing schemes to close the gap people seems to be worried of...).
 

Panajev2001a

GAF's Pleasant Genius
Should you not reserve your judgement on which audio unit is better till you have sufficient information from yours truly cerny ? Why are you being passive aggressive ? You cannot counter somebody's opinion arrived at by known facts, by claiming to have missed some secret sauce.

My truly Cerny... and I am passive aggressive? We started from numbers stretched to say one solution was like 30+ GFLOPS faster and that did not make sense based on public data, nothing that offended me... just weird that from something that did not matter some numbers are taken to suddenly claim it trumps it.
 

PaintTinJr

Member
Sounds about to what might be the case, however I'm not exactly sure if the GPU's using CUs for that work. I keep going back to the explicit mention of ARM cores in the APU design from the Indian AMD engineer a long few months back. It was a pretty deliberate mention and rather interesting since it would beg the question where it's being used.

My guess is it's being utilized in some way for extended executeIndirect functionality in the GPU; it was actually one of the things I hoped MS would specify info on but no dice. Back to the BCn stuff, I do recall the tweet about them working on optimizing the algorithm, so you're right it's at least partially software-based, just like SFS. However, just like SFS, there's also some dedicated hardware involved; SFS in particular with the mip-blending hardware in the GPU which has been mentioned several times, but not readily present (or even mentioned) at the Hot Chips presentation or in any of the slides.

It's possible their strategy is both things you mention: zlib decompression through CPU, random access zilb decompression through SFS on GPU. Just keeping in mind they still do have a dedicated decompressor block, which is what I would say is actually doing the bulk of the decompression. The CPU and GPU probably just take turns in controlling it (the GPU has a DMA block in the diagram, maybe that is for the decompressor?).

Going by how SFS works, it will (presumably) need to add frames of latency - at the start of rendering new scene assets - where the assets are textured with lower resolution texture mips on the first frame/s- 1024x1024 and below would be my guesstimate - before resolving which 64KB zlib BCn blocks in 2k,4k and 8k mips are to be random accessed and streamed in - determined by LOD. So, although that latency will provide a bit of processing time, I don't think the SFS - which I assume is just PRT enhanced by BCpack - can work without the texture data being right inside the GPU or served from the 10GB - because of how tightly couple textures need to be for GPU compute, shading and fixed functions like the mip blending you mentioned or just the BCn accelerated texture sampling.

Giving the BCpack/SFS/ decompressor (XVA) a little more thought, I'm inclined to believe the hardware customization for SFS and BCpack is the same (as I already mentioned why, above) and that the customization is fixed path enhancements for the 64KB random access of zlib compressed texture data, and that the customization is really more about the minimum 64KB map page access size - which I'm speculating is for random access of zlib BCpack.

64KB seems like it could be an L0 or L1 cache size for CUs. If BCpack does use CUs which I'm heavily leaning towards, now - because it would explain the flexibility for ratios to be improved via programming, and would also explain why that info would be NDA'ed, as it is subtractive to overall TFlop/s specs in the same way the Tempest Engine when used for audio doesn't add to GPU performance specs.

If that is how the BCpack/SFS hardware is provided, it also sort of explains why the XVA is such a vague technology, because if it is AVX2 on the CPU, customization of the CU hardware so zlib block compression random access can be accelerated, but in a generic way to allow software changes, and because these improvements ae scattered elements that's what is making the XVA technology hard to describe.

When considering why Xbox haven't seemed keen to license other zlib algorithms (oodle/kraken), I don't think it is because BCpack/SFS couldn't accommodate such changes, if they were prepared to pay for such licenses,, which they probably aren't, I think it is just that SFS's latency makes the added decompression time delta no benefit to XsX on the GPU, and I suspect with random access zlib decompression improvements would be negligible when working with fine grain 64KB map workloads.

.
 
Last edited:

PaintTinJr

Member
64KB is probably just an ssd page.
Sorry, I've not added enough context to my comment there. The relevance of the 64KB and a GPU cache size - in relation to fixed path customization of a CU to (presumably) hw accelerate random access in zlib compressed BCn textures - is that 64KB was tweeted by the xbox engineer (a while back)as the smallest SFS mip page size, and that just so happens to be the minimum size of the block you'd need to decompress when using zlib compression in a random access method.
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
Going by how SFS works, it will (presumably) need to add frames of latency - at the start of rendering new scene assets - where the assets are textured with lower resolution texture mips on the first frame/s- 1024x1024 and below would be my guesstimate - before resolving which 64KB zlib BCn blocks in 2k,4k and 8k mips are to be random accessed and streamed in - determined by LOD. So, although that latency will provide a bit of processing time, I don't think the SFS - which I assume is just PRT enhanced by BCpack - can work without the texture data being right inside the GPU or served from the 10GB - because of how tightly couple textures need to be for GPU compute, shading and fixed functions like the mip blending you mentioned or just the BCn accelerated texture sampling.

Giving the BCpack/SFS/ decompressor (XVA) a little more thought, I'm inclined to believe the hardware customization for SFS and BCpack is the same (as I already mentioned why, above) and that the customization is fixed path enhancements for the 64KB random access of zlib compressed texture data, and that the customization is really more about the minimum 64KB map page access size - which I'm speculating is for random access of zlib BCpack.

64KB seems like it could be an L0 or L1 cache size for CUs. If BCpack does use CUs which I'm heavily leaning towards, now - because it would explain the flexibility for ratios to be improved via programming, and would also explain why that info would be NDA'ed, as it is subtractive to overall TFlop/s specs in the same way the Tempest Engine when used for audio doesn't add to GPU performance specs.

If that is how the BCpack/SFS hardware is provided, it also sort of explains why the XVA is such a vague technology, because if it is AVX2 on the CPU, customization of the CU hardware so zlib block compression random access can be accelerated, but in a generic way to allow software changes, and because these improvements ae scattered elements that's what is making the XVA technology hard to describe.

When considering why Xbox haven't seemed keen to license other zlib algorithms (oodle/kraken), I don't think it is because BCpack/SFS couldn't accommodate such changes, if they were prepared to pay for such licenses,, which they probably aren't, I think it is just that SFS's latency makes the added decompression time delta no benefit to XsX on the GPU, and I suspect with random access zlib decompression improvements would be negligible when working with fine grain 64KB map workloads.
.

Nice post thanks :)!

The way I se BCPack is the equivalent of Oodle’s BC7Prep + Oodle Texture RDO pre-processing + zlib compression all in HW.
PS5, in HW, supports “only” the Oodle Texture RDO pre-processing + Kraken compression part and if you want to add BC7Prep to it you will need to do an extra decoding step using GPU async shaders or the CPU (Oodle has PS5 optimised code for it).

In addition to what F Fafalada said, I wanted to add a couple of points. SFS (SFS being an extension of SF which is an extension of PRT) seems to allow the GPU to , as you were saying, compute which block should be loaded and update the local residency map and automatically stream new data (they have instructions that are fully asynchronous and return right away and cause the GPU to compute which texture will be needed and trigger a page fault and load it in/prefetch hints). Also they have TMU enhancements to blend the streamed data in with the low resolution copy not to cause gaps or seams or other artefacts. Some of those tasks need to be performed by compute shaders in modern GPU’s with “only” PRT support.
 

T-Cake

Member
T-75 days til launch (November 6th?). I can't take much more! Need to see games running on the box and comparisons between PC-XSX-PS5 to see how all this tech talk translates into game performance and visuals.
 
Last edited:

pawel86ck

Banned
F Fafalada
Ok, so it looks like XSX decompressor can decompress general data as well besides textures (BCpack). That's very good, but I'm still not sure if XSX is really compressing "quick resume" memory dumps. You are assuming XSX does it but only MS knows that for sure. With HW decompression speed (4.8-6GB/s) it would only take 1.5-2 seconds to decompress 9GB, but maybe compression is longer.
 
Last edited:

Fafalada

Fafracer forever
You are assuming XSX does it but only MS knows that for sure.
True - but anything else would be crazy - SSD wear pretty much mandates minimizing writes, so even if compression only saves 10% on average it would be something you'd want to use on an operation that will write 'a lot' to the SSD.
As for speed - as others have pointed, inevitably physical writes are expected to be slower, and compression itself probably doesn't reach 6GB/s even with all 8 Zen cores, so yea that likely has an impact. Although we also don't know what compression ratios are being achieved.
 
Going by how SFS works, it will (presumably) need to add frames of latency - at the start of rendering new scene assets - where the assets are textured with lower resolution texture mips on the first frame/s- 1024x1024 and below would be my guesstimate - before resolving which 64KB zlib BCn blocks in 2k,4k and 8k mips are to be random accessed and streamed in - determined by LOD. So, although that latency will provide a bit of processing time, I don't think the SFS - which I assume is just PRT enhanced by BCpack - can work without the texture data being right inside the GPU or served from the 10GB - because of how tightly couple textures need to be for GPU compute, shading and fixed functions like the mip blending you mentioned or just the BCn accelerated texture sampling.

Giving the BCpack/SFS/ decompressor (XVA) a little more thought, I'm inclined to believe the hardware customization for SFS and BCpack is the same (as I already mentioned why, above) and that the customization is fixed path enhancements for the 64KB random access of zlib compressed texture data, and that the customization is really more about the minimum 64KB map page access size - which I'm speculating is for random access of zlib BCpack.

64KB seems like it could be an L0 or L1 cache size for CUs. If BCpack does use CUs which I'm heavily leaning towards, now - because it would explain the flexibility for ratios to be improved via programming, and would also explain why that info would be NDA'ed, as it is subtractive to overall TFlop/s specs in the same way the Tempest Engine when used for audio doesn't add to GPU performance specs.

If that is how the BCpack/SFS hardware is provided, it also sort of explains why the XVA is such a vague technology, because if it is AVX2 on the CPU, customization of the CU hardware so zlib block compression random access can be accelerated, but in a generic way to allow software changes, and because these improvements ae scattered elements that's what is making the XVA technology hard to describe.

When considering why Xbox haven't seemed keen to license other zlib algorithms (oodle/kraken), I don't think it is because BCpack/SFS couldn't accommodate such changes, if they were prepared to pay for such licenses,, which they probably aren't, I think it is just that SFS's latency makes the added decompression time delta no benefit to XsX on the GPU, and I suspect with random access zlib decompression improvements would be negligible when working with fine grain 64KB map workloads.

.

No, it's not "just" PRT enhanced with BCPack; Xbox engineers on Twitter (in rather involved tweet chains) have outright stated this. PRT is just the foundation for SFS; even in the slides for the Hot Chips presentation they highlight all of the deficiencies with PRT and illustrate how SFS fixes them. Those level of fixes would not be possible if it was simply PRT with BCPack support, I don't think you give them enough credit on this. As just another example the GPU has the custom mip-blending hardware to facilitate SFS capabilities, again this isn't something you're just going to find anywhere to the point you can slap in PRT with BCPack support and call it a day.

IIRC Jason Ronald has been quoted on Twitter illustrating in discussions with other game devs that the minimum page size for SFS is actually 4KB, not 64KB. I'll go look back over those sometime but I specifically remember 4KB figures being referenced. So with that in mind if 64KB is the L0/L1 (probably L0) for the CUs, if the minimum page size is 4KB like I'm recalling it's been said it is by Jason, then the cache would be able to hold multiple pages concurrently.

I don't quite see where you assume the CUs would be involved with this? Flexible compression ratio (if it is a thing) doesn't mean the CUs are doing it. That could easily be something done through the decompression block itself. We still don't have a full detailing on the decompression block, FWIW. Additionally if the CUs were being used for this task (presumed taking away cycles for otherwise intended game code to execute), then that would make MS's marketing dishonest and disingenuous. Something that plays a bit on a prevailing narrative that seems to tick up regularly among some folks, for rather outlandish reasons, at least WRT their hardware. I wouldn't hang that air of suspicion around their head and knowing the CUs would be regularly used for this type of purpose kind of "locks" them away from direct developer access, so no need to include them in a GPU performance specification that is generally aimed at letting devs know what they can can regularly aim for in terms of theoretical maximum saturation of the hardware.

Tempest is not subtractive to PS5's TFs because it is its own processor component, although with you saying this perhaps it would be more keen to direct this towards people who feel Tempest will act as some GPU booster through flexibility. Some do seem to think it will be a "significant" sort of "secret sauce" booster by the way they speak of it, but you seem to be more in the camp that won't be the case. I would more in the middle of that, personally: Tempest can't provide any substantial/significant boost to GPU performance, but it could be used for some very selective, neat offloading of non-audio taskwork sparingly. I brought up Shining Force III on the SEGA Saturn as an example of this being done in the past, which also would suggest that a similar type of offloading could be done with Series X's audio (the programmable portion) if developers wanted to be similarly creative to squeeze out performance from the hardware down the line of the generation.

Speaking of SFS's latency is odd because that run contrary to everything devs have said when discussing SFS in piecemeal within various interviews, what lead Xbox engineers have said, and what research papers, patents etc. on XvA and analogous systems that have been R&D'd (such as FlashMap) speak on. All of them focus very keenly on cutting down latency, that has been one of the chief design goals of XvA. So focusing on an implementation that suggests it's creating more latency frankly doesn't mesh with what any of the other sources I listed speak about. For example, one of the stated goals with SFS is to cut down the prefetch window size, that way there's less frames ahead of time a texture needs to be resident in memory for feeding the GPU. How exactly it goes about doing that isn't clarified officially at this time, but I doubt it would involve any additional latency, at least nothing that is significant.

There's a lot more I could go into on this but I'd need to cite a few posts from other places and some of the MS research/project published papers. But essentially, whenever I'm able to post again WRT XvA and SFS I'll try indicating more definitively why the latency factor you're bringing up either doesn't exist or is not much of a process cost you're suggesting it is, if I haven't done so already. But really there's just a lot of other aspects to XvA touched on from disparate sources that I don't think are being considered here, and it's actually quite a lot to go into in this post, let alone organize the way I'd like.

F Fafalada

Ok, so it looks like XSX decompressor can decompress general data as well besides textures (BCpack). That's very good, but I'm still not sure if XSX is really compressing "quick resume" memory dumps. You are assuming XSX does it but only MS knows that for sure. With HW decompression speed (4.8-6GB/s) it would only take 1.5-2 seconds to decompress 9GB, but maybe compression is longer.

Generally decompression takes longer than the compression itself. After all it's easier to tear things down than to build something back up ;)
 
Last edited:

PaintTinJr

Member
Saying it's not GPU friendly isn't exactly paraphrasing. Where's the link to that?

Now that I'm leaning towards BCpack probably using randomly accessible zlib (in 64KB chunks) and now expecting random block staging to be hardware accelerated on the GPU - which is a far more deterministic task - then that comment wouldn't be reflective of my current opinion about BCpack.

However, at the time, I was still of the opinion that the major difference between BCn and BCpack, was just regular zlib guided by RDO - with BC7prep data rotation, as F Fafalada mentioned was probably being used, as well. And that other configuration of data wouldn't be GPU friendly as I said - in the way something like BCn textures are, because you would need to have an unpacking stage to access the inner BCn texture/s and that process is non-deterministic by compute/time and size required, PC GPUs would do it, but IMHO it would be an extravagant use of a closed console's GPU resources.

Looking at oodle textures, we can see they aren't likely going to be unpacked by the PS5 GPU, and instead that workload will be passed to the CPU's versatile co-processors - the IO complex.
 

PaintTinJr

Member
No, it's not "just" PRT enhanced with BCPack; Xbox engineers on Twitter (in rather involved tweet chains) have outright stated this. PRT is just the foundation for SFS; even in the slides for the Hot Chips presentation they highlight all of the deficiencies with PRT and illustrate how SFS fixes them. Those level of fixes would not be possible if it was simply PRT with BCPack support, I don't think you give them enough credit on this. As just another example the GPU has the custom mip-blending hardware to facilitate SFS capabilities, again this isn't something you're just going to find anywhere to the point you can slap in PRT with BCPack support and call it a day.

IIRC Jason Ronald has been quoted on Twitter illustrating in discussions with other game devs that the minimum page size for SFS is actually 4KB, not 64KB. I'll go look back over those sometime but I specifically remember 4KB figures being referenced. So with that in mind if 64KB is the L0/L1 (probably L0) for the CUs, if the minimum page size is 4KB like I'm recalling it's been said it is by Jason, then the cache would be able to hold multiple pages concurrently.

I don't quite see where you assume the CUs would be involved with this? Flexible compression ratio (if it is a thing) doesn't mean the CUs are doing it. That could easily be something done through the decompression block itself. We still don't have a full detailing on the decompression block, FWIW. Additionally if the CUs were being used for this task (presumed taking away cycles for otherwise intended game code to execute), then that would make MS's marketing dishonest and disingenuous. Something that plays a bit on a prevailing narrative that seems to tick up regularly among some folks, for rather outlandish reasons, at least WRT their hardware. I wouldn't hang that air of suspicion around their head and knowing the CUs would be regularly used for this type of purpose kind of "locks" them away from direct developer access, so no need to include them in a GPU performance specification that is generally aimed at letting devs know what they can can regularly aim for in terms of theoretical maximum saturation of the hardware.

Tempest is not subtractive to PS5's TFs because it is its own processor component, although with you saying this perhaps it would be more keen to direct this towards people who feel Tempest will act as some GPU booster through flexibility. Some do seem to think it will be a "significant" sort of "secret sauce" booster by the way they speak of it, but you seem to be more in the camp that won't be the case. I would more in the middle of that, personally: Tempest can't provide any substantial/significant boost to GPU performance, but it could be used for some very selective, neat offloading of non-audio taskwork sparingly. I brought up Shining Force III on the SEGA Saturn as an example of this being done in the past, which also would suggest that a similar type of offloading could be done with Series X's audio (the programmable portion) if developers wanted to be similarly creative to squeeze out performance from the hardware down the line of the generation.

Speaking of SFS's latency is odd because that run contrary to everything devs have said when discussing SFS in piecemeal within various interviews, what lead Xbox engineers have said, and what research papers, patents etc. on XvA and analogous systems that have been R&D'd (such as FlashMap) speak on. All of them focus very keenly on cutting down latency, that has been one of the chief design goals of XvA. So focusing on an implementation that suggests it's creating more latency frankly doesn't mesh with what any of the other sources I listed speak about. For example, one of the stated goals with SFS is to cut down the prefetch window size, that way there's less frames ahead of time a texture needs to be resident in memory for feeding the GPU. How exactly it goes about doing that isn't clarified officially at this time, but I doubt it would involve any additional latency, at least nothing that is significant.

There's a lot more I could go into on this but I'd need to cite a few posts from other places and some of the MS research/project published papers. But essentially, whenever I'm able to post again WRT XvA and SFS I'll try indicating more definitively why the latency factor you're bringing up either doesn't exist or is not much of a process cost you're suggesting it is, if I haven't done so already. But really there's just a lot of other aspects to XvA touched on from disparate sources that I don't think are being considered here, and it's actually quite a lot to go into in this post, let alone organize the way I'd like.



Generally decompression takes longer than the compression itself. After all it's easier to tear things down than to build something back up ;)

It feels like you took that the wrong way. What I was trying to convey was that I think the Bcpack is the secret sauce enhancement that makes SFS a hardware accelerated improvement over PRT, and that a GPU without BCpack's customisations wouldn't have the same ASIC level performance of SFS, that the XsX will - even if they emulate it in GPU compute.

My point about latency is about data latency, not latency lowering frame rates. Surely for SFS to work(AFAIK), it needs to render a lower fidelity frame (with lower quality data it had to transfer) when new non-obscured assets enter the frustum to be able to - provide sampler feedback and - determine the higher order data it needs to transfer/ So that each subsequent frame for those assets is accessing either the portions of the 8K textures, or the highest order portion of texture suited to the asset - based on its depth and coverage in the frame- no?

As for CUs in the XsX being used for BCpack, it they are - as I still believe they are, purely because (IMHO) zlib seems inherently difficult and risky to commit to an ASIC - I wouldn't consider that false advertising, because on balance it is a bandwidth/vram/GPU cache/compute win using SFS - in the best case scenario - compared to running without,

As for Tempest Engine, I believe it is effectively like a specialised GPU core for double precision FLOPS work, and there is no comparative technology with that capability in the XsX, however, Sony's research into HRTF means the resource is audio only; unless HRTF research finds a cheap workaround to allow Sony to get their HRTF solution from the PS5 CPU and re-purpose the resource, so in that respect it is a moot point, because it will either go unused in games or used as intended.
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
Looking at oodle textures, we can see they aren't likely going to be unpacked by the PS5 GPU, and instead that workload will be passed to the CPU's versatile co-processors - the IO complex.

Oodle Textures do not need to be unpacked by CPU/GPU/etc..., what they do is make the texture a lot easier to compress (proprietary RDO mechanism) with zlib or Kraken. It is Oodle’s BC7Prep that requires an additional decompression/decoding step (usually run on the GPU via async compute shaders).
 

PaintTinJr

Member
Oodle Textures do not need to be unpacked by CPU/GPU/etc..., what they do is make the texture a lot easier to compress (proprietary RDO mechanism) with zlib or Kraken. It is Oodle’s BC7Prep that requires an additional decompression/decoding step (usually run on the GPU via async compute shaders).
Are you sure?

AFAIK, we have zlib-esq compression/decompression (zlib, oodle, kraken,etc) for any type of data - that doesn't exclude textures or even block compressed textures.

We then have specialised texture methods like Bcpack and Ooodle texture for optimally packing lossy compressed textures (BCn/DXTn)- where the oodle and pack compression is lossless - but uses Rate distortion on the inner lossy format(BCn/DXT) to improve the rate of the outer lossless compression - the proprietary RDO you mention is (presumably) just the algorithm to quickly "distort" the inner compressed texture at the compressing stage,, and then specifically with packing the inner BC7 textures - that both BCpack and oodle texture use - they've added a "prep" stage that is a rotation translation of the block - to further enhance the ability to "pack" - and that step needs reversed on the async compute - prior to the BC7 texture being used by the GPU.

/I've verbosely mentioned how I think it all hangs together, so if my info/comprehension is wrong, it should be easy for you to point me to info that corrects the specific misunderstandings.
 

Panajev2001a

GAF's Pleasant Genius
Are you sure?

AFAIK, we have zlib-esq compression/decompression (zlib, oodle, kraken,etc) for any type of data - that doesn't exclude textures or even block compressed textures.

We then have specialised texture methods like Bcpack and Ooodle texture for optimally packing lossy compressed textures (BCn/DXTn)- where the oodle and pack compression is lossless - but uses Rate distortion on the inner lossy format(BCn/DXT) to improve the rate of the outer lossless compression - the proprietary RDO you mention is (presumably) just the algorithm to quickly "distort" the inner compressed texture at the compressing stage,, and then specifically with packing the inner BC7 textures - that both BCpack and oodle texture use - they've added a "prep" stage that is a rotation translation of the block - to further enhance the ability to "pack" - and that step needs reversed on the async compute - prior to the BC7 texture being used by the GPU.

/I've verbosely mentioned how I think it all hangs together, so if my info/comprehension is wrong, it should be easy for you to point me to info that corrects the specific misunderstandings.

I think we are saying the same thing, you did give a great detailed summary thanks for that :). Please, feel free to pick the following apart, quite enjoyable conversation.

Oodle Texture includes Rate Distortion Optimisation as well as block layout optimisation to improve the Kraken lossless compression step (enhancing compression rate) while BC7Prep (and I was thinking BCPack too) adds a further layout reordering optimisation that needs to be undone (in software on PS5, but the BCPack decoder is able to undo its version of the same optimisation in the HW decompression block, requiring no further decoding work).

BC7Prep: http://cbloomrants.blogspot.com/2020/06/oodle-texture-bc7prep-data-flow.html -> require extra decoding step in SW

Oodle Texture:http://cbloomrants.blogspot.com/2020/06/oodle-texture-slashes-game-sizes.html (this include the encoding and decoding pipeline steps too, quite handy) -> includes RDO, improves Kraken compression rate, and requires no additional SW decoding step


BsVjMUd.jpg


RNAk1j5.jpg



My only point is that, based on the advertised equivalent compressed I/O bandwidth by both MS and Sony (2.4 GB/s to ~4.8 GB/s for XSX vs 5.5 GB/s to 8-9 GB/s on PS5... the latter may be a bit more conservative based on historical data, but let’s assume it is not), BCPack averages higher compression rates for textures and there is no mention of a required GPU based or CPU based decoding step. I do think Sony was already factoring RDO pre-processing before Kraken compression, but not BC7Prep as they would need a giant * next to it (*requires GPU decoding step, might lower performance ;)).
 
Last edited:

Elog

Member
Speaking of SFS's latency is odd because that run contrary to everything devs have said when discussing SFS in piecemeal within various interviews, what lead Xbox engineers have said, and what research papers, patents etc. on XvA and analogous systems that have been R&D'd (such as FlashMap) speak on. All of them focus very keenly on cutting down latency, that has been one of the chief design goals of XvA. So focusing on an implementation that suggests it's creating more latency frankly doesn't mesh with what any of the other sources I listed speak about. For example, one of the stated goals with SFS is to cut down the prefetch window size, that way there's less frames ahead of time a texture needs to be resident in memory for feeding the GPU. How exactly it goes about doing that isn't clarified officially at this time, but I doubt it would involve any additional latency, at least nothing that is significant.

As always we should wait for real data. However, I reacted some to this paragraph. The reason they have focused so much on latency is because that is SFS's achilles heal and they are working very hard to address it.

In reality, a software driven approach (i.e. it is run over the CPU) such as DirectStorage API set with the SFS suffers from latency problems.

While the raw throughout comparison between PS5 and XSX gives advantage to PS5, I am willing to bet that the difference is significantly larger (to the benefit of PS5) on the latency front.

My point is not to downplay the benefits of DirectStorage API and SFS for the XSX, however based on what we know the fairest assumption is that latency is the largest difference between the PS5 and XSX in terms of I/O.
 
Last edited:

Raekwon26

Member
*More Bullshit from this guy*

Like I said dude, get real. You're full of shit.

There is a reason there was so much hype and buzz leading up to that Halo reveal. It was supposed to be the first showing of Series X gameplay and show people the power of the Series X and show a true leap in next-gen gaming...... and then we found out it was running on PC.

You want to sit up here and have the nerve to be so dumbfucked to try and act like Minecraft, Gears 5 and State of Decay 2 are Series X gameplay is fucking embarassing.

No standards. No accountabilty. No quality.

Imagine.... SOny coming out here and showing us Uncharted 2 running on a PS5 and claiming this is PS5 gameplay. Must be crazy.
 

TBiddy

Member
Raekwon26 Raekwon26

Jesus, you're an aggressive one.

He literally wrote; we haven't seen any Series X gameplay yet.

Considering we've seen multiple games running on the XSX and fast-resume demoed on an actual XSX, this is obviously wrong. Not sure why you get all riled up about that. It wasn't even your argument to begin with.
 
D

Deleted member 775630

Unconfirmed Member
There is a reason there was so much hype and buzz leading up to that Halo reveal. It was supposed to be the first showing of Series X gameplay and show people the power of the Series X and show a true leap in next-gen gaming...... and then we found out it was running on PC.
No, that's because it's Halo.
 
As always we should wait for real data. However, I reacted some to this paragraph. The reason they have focused so much on latency is because that is SFS's achilles heal and they are working very hard to address it.

In reality, a software driven approach (i.e. it is run over the CPU) such as DirectStorage API set with the SFS suffers from latency problems.

While the raw throughout comparison between PS5 and XSX gives advantage to PS5, I am willing to bet that the difference is significantly larger (to the benefit of PS5) on the latency front.

My point is not to downplay the benefits of DirectStorage API and SFS for the XSX, however based on what we know the fairest assumption is that latency is the largest difference between the PS5 and XSX in terms of I/O.

You cannot decouple the software from the hardware! Not even just that, but you are misinformed if you think XvA is only a software solution. You are isolating parts of it that are more software-driven but ignoring the fact all of these things work in concert with each other, they are not in isolation, and the software only runs as well as the hardware it sits upon will let it. This is true of all APIs. MS has built the Series X as hardware to perfectly suit the hardware/software-driven solutions of XvA, XvA is something being specifically tailored for the Series X, certain aspects of it (such as the mip-blending hardware in the GPU) are exclusive to it (and Series S).

You are "willing to bet" on a belief and not much more, because you are not actually factoring how XvA works on the software and hardware front. The days where software variants of hardware solutions were always magnitudes slow died off a long time ago, this isn't the '90s anymore. What do you think algorithms are? They are software, but they run at their best when the hardware is present is support them at their peak. This is true in fact of virtually any electronics system ever made, and yes that also includes the PS5. I.e if the SSD I/O APIs are not sufficient (as an example), then it doesn't matter what the hardware is capable of, you will never have developers utilize that hardware efficiently enough to tap into it.

I think you and several others are looking at this stuff completely wrong; if anything MS's optimizations with SFS should show you they understand this very philosophy I'm speaking of: the hardware means nothing without the software. MS wants to make aspects of XvA scalable and future-proofed for ever-advancing hardware configurations to come, and the biggest benefit of putting certain functions on software rather than fixed-function explicitly is the fact the former can more readily and easily be updated and improved as time goes on. The latter? It requires new hardware design, which costs more in terms of end-cost to users who want to take advantage of that, and is also harder to distribute in the field due to the fact it is not a software solution and thus does not benefit from the methods of distribution that would entail.

You say we should base this on things that we know, but I have actually read through the various MS patents relating to their flash storage technologies (such as FlashMap) and looked into conversations engineers from the team have had on Twitter, and they contradict your assumptions quite absolutely. I base my conclusions on reasoned speculation and logic, and that tells me when aspects of XvA are looked into as the congruent mechanization they should be seen as (rather than parts in isolation), it produces functionality that in actuality should be rather in the ballpark of what Sony is set to offer with PS5's SSD I/O. This has never been a discussion of if XvA suddenly "closes the gap" or surpasses Sony's solution for me; the two approaches are not too similar in reality and are relying on different methods that, on a fundamental conceptual level, even one another out. Whether hardware or software, they are hands that compliment one another. XvA, like Sony's approach, is something worth more than the sum of its parts, but you need to actually understand how those parts work with one another to understand this.

EDIT: An aside, but I'd also like to point out an odd idiosyncrasy that tends to pop up here. The normally accepted narrative is that Sony is a "hardware company" and MS is a "software company", yet it seems when convenient, suddenly MS being a "software company" is of no benefit to them despite the resources available to them that can be leveraged in such a way.

The truth is this narrative is completely idiotic; you can't get hardware to work without software (APIs, algorithms, OSes, kernels etc.). Almost any company in the tech field dabbles with both hardware and software, out of necessity. It's a pretty generalized idea and cuts short the work both players display.

It feels like you took that the wrong way. What I was trying to convey was that I think the Bcpack is the secret sauce enhancement that makes SFS a hardware accelerated improvement over PRT, and that a GPU without BCpack's customisations wouldn't have the same ASIC level performance of SFS, that the XsX will - even if they emulate it in GPU compute.

My point about latency is about data latency, not latency lowering frame rates. Surely for SFS to work(AFAIK), it needs to render a lower fidelity frame (with lower quality data it had to transfer) when new non-obscured assets enter the frustum to be able to - provide sampler feedback and - determine the higher order data it needs to transfer/ So that each subsequent frame for those assets is accessing either the portions of the 8K textures, or the highest order portion of texture suited to the asset - based on its depth and coverage in the frame- no?

As for CUs in the XsX being used for BCpack, it they are - as I still believe they are, purely because (IMHO) zlib seems inherently difficult and risky to commit to an ASIC - I wouldn't consider that false advertising, because on balance it is a bandwidth/vram/GPU cache/compute win using SFS - in the best case scenario - compared to running without,

As for Tempest Engine, I believe it is effectively like a specialised GPU core for double precision FLOPS work, and there is no comparative technology with that capability in the XsX, however, Sony's research into HRTF means the resource is audio only; unless HRTF research finds a cheap workaround to allow Sony to get their HRTF solution from the PS5 CPU and re-purpose the resource, so in that respect it is a moot point, because it will either go unused in games or used as intended.

Okay I think I see where you are coming from. About the latency thing though, what specific type of data are you referring to? If it's texture data, then apparently if we go by words like the DiRT 5 developer, it can't be anything severe. According to them they can take texture data, fetch it, use it, discard it and replace it mid-frame. Granted that is a cross-gen game but it's one of the few examples we've seen of any games with gameplay for next-gen and it's one of the more impressive ones IMHO. So I'm just curious what specific type of data you are referring to here.

WRT to zlib being difficult/risky for ASIC, I'm curious on that as well. PS5 is also able to use zlib, though it has dedicated decompression for Kraken. In this scenario if PS5 games were to also use zlib would they not also have to dedicate some CU resources for what you describe? And what amount of CU resources would have to be utilized for this as you are describing it?

EDIT: I did a little looking and the Eurogamer Series X article states that the decompression block is what is running the zlib decompression algorithm.

"Our second component is a high-speed hardware decompression block that can deliver over 6GB/s," reveals Andrew Goossen. "This is a dedicated silicon block that offloads decompression work from the CPU and is matched to the SSD so that decompression is never a bottleneck. The decompression hardware supports Zlib for general data and a new compression [system] called BCPack that is tailored to the GPU textures that typically comprise the vast majority of a game's package size."

I think this here more or less supports the conclusion that the decompression block handles zlib. Granted it doesn't say anything about offloading it from the GPU but then again, neither did Sony with describing their decompression hardware either. And realistically, it doesn't make too much sense that zlib would be too risky for a dedicated ASIC but Kraken is; they are both compression/decompression algorithms at the end of the day. They function differently in ways, sure, but still...

Yes you're right about Tempest, it's a repurposed CU unit (singular; for some reason AMD calls them Dual Compute so they are in as pairs; Sony took one core of those pairs and repurposed it for Tempest Engine to simulate an SPU, basically). You're also right that Series X's audio only specifies SPFP, not DPFP, but at the same time it isn't taking a GPU compute core and repurposing it, so that figures to be the expected outcome here. FWIW, though, HRTF is perfectly capable on Series X, in fact the One X also supported HRTF. HRTF is actually new for Sony systems via PS5, but it's been a feature with MS systems since at least the One X.

Hopefully what you are describing in the end there isn't a prelude to completely shutting down the idea Tempest could possibly be used for some non-audio tasks; from the sounds of it that could be the case, but I guess if devs wanted to get really specific with the chip they could push it in that kind of direction. But the cost may not be worth it. I don't think using the audio chip in such a way (and FWIW, something roughly similar could theoretically be done with Series X's audio processor, generally speaking) would be any kind of "secret sauce" whatsoever but it would make for cool examples of tech in systems being used in creative ways. I still can't think of any game that used the sound processor for graphics/logic purposes outside of Shining Force III on the SEGA Saturn, and that was decades ago.
 
Last edited:

Jokerevo

Banned
I mean all this shit on paper looks great but the reality is 2 things need to happen if we are ever to see the XsX reach it's ceiling:

1) MS aqcuires studios talented enough to make games for it

2) AND THOSE GAMES MUST BE EXCLUSIVES to XsX

Because there's an elephant in the room here called lockhart. And if games (thanks to gamepass) have to scale down to lockhart and even non-ssd PC solutions then we can't optimise and if we can't optimise we end up with compromised garbage like HI.

I've been saying it over and over, HI should be an XsX exclusive that shows all of that 12 flops of power and I would buy an XsX the day HI releases.

MS insistence on accommodating gamepass dilutes/neutralises all this power. We have frequently been shown on ps4 side how engines built for a fixed config can be optimised. That's 1.84 flops of base being harnessed on heavily outdated hardware producing stuff like GoW etc etc

I do think we are going to see (thanks to SSD) a genuine leap forward in fidelity this gen but it will cost. If MS refuses to get on the exclusives bandwagon now the XsX will never build up the install base to justify investment from studios and instead the AAA studios will gravitate towards Sony because nobody wants their titles to be rented out.
 
Top Bottom