• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Microsoft Xbox Series X's AMD Architecture Deep Dive at Hot Chips 2020

Nikana

Go Go Neo Rangers!
No one can be sure unless you are a programmer working on the consoles. But hole in my head blog probably has the best next gen tech articles I think. Including the one how he thinks the Xbox RAM will work with wider and narrow bus setup. Here is his part11:
http://hole-in-my-head.blogspot.com/2020/08/analyse-this-next-gen-consoles-part-11.html

Totally agree with the opinion that the supposed digital only consoles (XSS and Ps5 DE) are backwards. They should be the high end models and not the entry points.
 
You have to be very careful. People with a committed point of view tend to cherry-pick from the info that they think supports their case, leaving out what doesn't. The fact that they spend time doing so doesn't say anything about how true they information they present is.
I appreciate the warning. Much like you, I've been on these forums since well before the split, so I am well aware of the biases that GAF holds. The thing some forum users don't hold onto is the biases that reside with developers as well. If a Sucker Punch developer says that a Playstation product is the best, you have to weigh the obvious implied PR, same if it came from Ninja Theory and the Xbox. You either look at them all with skepticism or take them all for their word.
 
I appreciate the warning. Much like you, I've been on these forums since well before the split, so I am well aware of the biases that GAF holds. The thing some forum users don't hold onto is the biases that reside with developers as well. If a Sucker Punch developer says that a Playstation product is the best, you have to weigh the obvious implied PR, same if it came from Ninja Theory and the Xbox. You either look at them all with skepticism or take them all for their word.

So, it's not a blind read. Crytek Engineer for example said SeriesX could push more pixels in screen because more CUs. He said other SeriesX advantagens like hyperthreading in CPU. Of course he could say wrong things, but it's clearly not a Sony's lawyer.

I read this "researcher users" posts, and there are many interesting points. But many of these words are not from pratical experiences.. we could theorize many things, but only software developers could really talk about the real point. I read Matt Hargett, Jason Ronald, Mark Cerny, James Stanard and many other tweets and interviews to find some point.

It's about discover the truths, not to trust in someone that talks what you really want to hear.
 

Journey

Banned
Well, this is what he wrote......

Seems pretty clear to me what he's saying.......

So, yeah.... you misspoke.


No... just no.


This is what he REPLIED to, get it right:

Wait for DF analysis, holiday line-up and actually see games running on XsX..that shit will make you humble, facts!!

Tired of this bs, but at the end of the day, we're at SonyGAF, afterall .
We are waiting. Since a whole year. Nothing so far. Not a good look. LMAO.

Read the whole book, not just the cover. He replied directly to the comment about DF multiplatform comparisons coming this holdiay when Both XSX and PS5 are on store shelves and both have retail games running that can be compared.


The time machine comment is funny as hell and totally appropriate.


Not sure if serious lmao

12278582.jpg
 
Last edited:
The main reason I would assume AVX2 is the decompressor on the XsX, is that (AFAIK) you get one unit per core, and if they are using a 10th of a CPU core, it is the most capable feature to do the job (IMO).

Your GPUdirectstorage point is interesting, but how does that work in regards to the very big indication that the XsX is really using lots of decompression tricks with BCpack - that uses zlib with RDO? The CPU would either have to decompress on the CPU core as we were told (IIRC) and write back the decompressed BCpack data to a temporary disk store, prior to the GPU requesting it uncompressed, directly - which will cost bandwidth/s on each read/write, or they'd need to have CUs in the GPU do the zlib decompression to make the BC files ready to use.

I know it was suggested that BCpack may include a random access ability to partially decompress BCpack to BC blocks - which after researching worked out that current random access for zlib has an overhead of 64KB per access, which is expensive for a potentially 6:1 compressed 4x4 pixel BC1 block that is 8bytes or 48bytes on access.

AFAIK, zlib and RDO are used alongside BCPack (or can be), not that BCPack is necessarily those things? It's similar to PS5 utilizing Oodle Texture alongside RDO. BCPack is being designed specifically for GPU-bound textures, so I think it would be logical to assume DirectStorage/GPUDirectStorage take this into consideration with the points you raise. FWIW MS have only said that 1/10th of a CPU core is used for XvA, this is probably 1/10th of the OS core, and they never said decompression work is done on this core. After all, it has decompression hardware in the APU itself where the decompression actually takes place, similar to PS5.

If the GPU is pulling in the texture files in a GPU-friendly format (BCPack) directly from storage, then the GPU is likely also handling the decompression for those files. That is if it even needs to decompress the files for use. We don't necessarily know where the decompression block is in the Series X, but we can assume it's within the I/O block, and if you look here:

202008180209151_575px.jpg


You can see that the SDF has a link to the I/O Hub, and the SDF of course has a direct feed to the GPU. Also you can look here:

202008180207551_575px.jpg


In the physical die view the I/O hub blocks are located closest to the GPU. We can assume the decompression hardware is at least one of these blocks, and with the understanding of the first image, the probability the GPU is doing what you're assuming the CPU would be doing, is very high.

The CPU is likely only giving an initial instruction and the GPU is handling use of the decompression block from there while accessing the textures off storage. Don't forget about the hardware built in for executeIndirect features; MS have expanded on this a lot for the Series systems and I personally think some custom ARM cores are implemented for this (an old Linkedin listing from an Indian AMD engineer on the Series X team specifically mentioned ARM cores in the APU design).
 
Y
So, it's not a blind read. Crytek Engineer for example said SeriesX could push more pixels in screen because more CUs. He said other SeriesX advantagens like hyperthreading in CPU. Of course he could say wrong things, but it's clearly not a Sony's lawyer.

I read this "researcher users" posts, and there are many interesting points. But many of these words are not from pratical experiences.. we could theorize many things, but only software developers could really talk about the real point. I read Matt Hargett, Jason Ronald, Mark Cerny, James Stanard and many other tweets and interviews to find some point.

It's about discover the truths, not to trust in someone that talks what you really want to hear.
Right now you have either theory or PR. The people who can speak with authority, meaning they have dev kits, are all under NDAs. So believe what you will. Anyways... I'm moving on...
 

ZywyPL

Banned
Well thats just stupid.

Games that weren't designed with the Series X as the primary focus.

If I play PS1 Ridge Racer on PS4, I guess thats PS4 gameplay........ you got to be kidding me.

Yes, desperately trying to drive the narrative to fit you personal agenda is stupid. The Series X-only titles are still in the making, they will eventually show up, same as most 3rd party multiplatform and Sony's 1st party titles, unless you were living under a rock in the past decade let me remind you games nowadays can take up to 5-7 years to develop, they won't appear all of a sudden because some random people on the internet demand to see them, that's nowhere near how it works. So far MS showed fully path traced Minecraft and Gears5 at 4K at over 100FPS at even higher than Ultra settings, so to anyone with half a brain it's way more than enough evidence how potent to console is, it just doesn't have any new advanced AAA games as of now, but like I said, they will eventually show up, you'll get to see them, you just need a bit of more patience.
 

pyrocro

Member
Yeah, I just watched a good portion of the video, very interesting stuff. However the method he uses to achieve the culling is entirely different from how the PS5 is handling it. In the demo, he achieves the culling through use of "meshlets" or "mesh shaders" and this is done at a software level not hardware and it's impressive. However there are costs to this method, which he details in the video. I'm not going to begin typing them out since it's stuff I don't know too much about and there's way too much technical jargon lol

The PS5 method of handling the different types of culling is done through allowing full control of the "traditional" graphics pipeline and this is achieved on a hardware level through the GE. The engineer in the technical video states that it's "tough" beating hardware accelerated methods (which PS5 has btw) and that you should expect the hardware (Series X hardware he is referring to, I think) to handle back-face, zero area and frustum culling.

So what's the difference? Judging from the information from the video, achieving culling through use of meshlets can be tough depending on what type of culling you want to implement. The hardware based method is easier with less costs. So the question is how effective is the Series X hardware at things like culling and does it allow developers full control of rendering pipeline? I don't know as Microsoft hasn't revealed anything about it yet, Thic mentioned that the Series X GPU does have a multi-core command processor unit so maybe that will allow it. As of the now PS5's method seems more straight forward and easier, but I could be wrong. Maybe someone with more knowledge like @geordiemp could elaborate?

Btw you can watch the video here.
This is incorrect, Mesh shaders run on the Video Card(software is a reference to the programmability of the mesh shaders).
furthermore, this culling happens at the beginning of the pipeline make it more efficient, since the rest of the graphics pipeline never has to spend time processing culled geometry.

Sony has not indicated that the PS5 is using Mesh shaders,
if culling is done on the primitive shaders it's less efficient then than mesh shaders as it is further down in the graphics pipeline and would have taken up more processing time alone the pipeline.
 

Raekwon26

Member
Yes, desperately trying to drive the narrative to fit you personal agenda is stupid. The Series X-only titles are still in the making.

That's all that needed to be said, the rest was fluff.

So you guys are arguing about tech you haven't seen in action and trying to tell people they will look amazing?

Then when people ask you to provide proof of this amazing tech outclassing everything, you say they should wait for DF?

Yeah sure..... Didn't know Phil Spencer had so many clones.
 

Raekwon26

Member
No... just no.


This is what he REPLIED to, get it right:




Read the whole book, not just the cover. He replied directly to the comment about DF multiplatform comparisons coming this holdiay when Both XSX and PS5 are on store shelves and both have retail games running that can be compared.


The time machine comment is funny as hell and totally appropriate.


Not sure if serious lmao

12278582.jpg

Tells me to read the whole book........ proceeds to post only the first page.


Uh huh.....
 

ZywyPL

Banned
That's all that needed to be said, the rest was fluff.

So you guys are arguing about tech you haven't seen in action and trying to tell people they will look amazing?

Then when people ask you to provide proof of this amazing tech outclassing everything, you say they should wait for DF?

Yeah sure..... Didn't know Phil Spencer had so many clones.

Newsflash - consoles are now closed box PCs, build from PC parts, times of exotic unknown alien tech are long time gone, and it's not the first time that's happening, we had no less than 4 consoles in the past 7 years that were designed like that, where we could easily compare all 4 as well as with various PC configs. Not only that, we already saw quite a bunch of upcoming next-gen titles, which are basically PC High-Ultra settings plus RT effects here and there, and those titles are even starting to post their requirements, which again clearly indicate what to expect.
 
T

Three Jackdaws

Unconfirmed Member
This is incorrect, Mesh shaders run on the Video Card(software is a reference to the programmability of the mesh shaders).
furthermore, this culling happens at the beginning of the pipeline make it more efficient, since the rest of the graphics pipeline never has to spend time processing culled geometry.

Sony has not indicated that the PS5 is using Mesh shaders,
if culling is done on the primitive shaders it's less efficient then than mesh shaders as it is further down in the graphics pipeline and would have taken up more processing time alone the pipeline.
Yes the culling happens early in the pipeline but the mesh shaders do have trade offs, he literally talks about it in the video, one example being high attribute shading cost. He also goes on to state it's hard to beat fixed function hardware blocks which handle the vertices/triangles (which are present in the PS5 and Series X).

As for the Primitive Shaders on the PS5, this is still up in the air as far as I am aware. The PS5's Geometry Engine was created for the sole purpose of taking full control over the graphics rendering pipeline so it would be silly of Sony to allow Mesh Shading support which would not allow this kind of control. I know there was some patents recently about AMD's Primitive Shaders showing how they've developed. I think it's likely we'll see a new iteration of them towards the end of the year when we get the closer to the release of the PS5 and RDNA 2 in general.
 

Raekwon26

Member
Newsflash - consoles are now closed box PCs, build from PC parts, times of exotic unknown alien tech are long time gone, and it's not the first time that's happening, we had no less than 4 consoles in the past 7 years that were designed like that, where we could easily compare all 4 as well as with various PC configs. Not only that, we already saw quite a bunch of upcoming next-gen titles, which are basically PC High-Ultra settings plus RT effects here and there, and those titles are even starting to post their requirements, which again clearly indicate what to expect.

No no no no no.

It's very simple.

You lot are speaking as if you worked on the consoles and are there designing each and every game.

Enough of that. You guys don't really know jack about how these games will look and perform until someone tells you what you are actually looking at and what to look for. Instead of running around this forum (and this is to BOTH sides) telling everybody how unreal and how amazing this is going to look and what this console can do and what this console can't, how about you just sit and wait?

It wasn't even over a month ago where this forum was told every hour about how amazing Halo would look............ and then we got that.

You don't know. You keep telling everybody, but you don't know and have no proof.

Just wait.
 

ZywyPL

Banned
No no no no no.

It's very simple.

You lot are speaking as if you worked on the consoles and are there designing each and every game.

Enough of that. You guys don't really know jack about how these games will look and perform until someone tells you what you are actually looking at and what to look for. Instead of running around this forum (and this is to BOTH sides) telling everybody how unreal and how amazing this is going to look and what this console can do and what this console can't, how about you just sit and wait?

It wasn't even over a month ago where this forum was told every hour about how amazing Halo would look............ and then we got that.

You don't know. You keep telling everybody, but you don't know and have no proof.

Just wait.

Wait for what? We already saw many many games during all the shows in the past few months, what more do you need? An eye check?
 
They're closer to the new 4700/4800 CPU portion of the APU's for Laptops, than the desktop 3700/3800 CPU's.

Thanks for the reply. So what percent of performance of a 3700x? Like 80%?

I’m bemused at the likes of AC Valhalla and WD Legion being 30fps with those CPU’s... Like wtf are they doing with all that 4x CPU compute performance gain. I can understand 30fps next gen only games but I really expected the cross gen games to be 60fps.
 
Last edited:
Imo NXGamer is heavily invested in Playstation, so I would take his statements with a grain of salt.

NXGamer definitely knows his stuff (I think his stuff is a bit too in-depth which is why his channel has struggled. DF is much more mass market in its terminology and presentation etc). I definitely appreciate him though.

He definitely has a preference for Sony esp their first party games though and he fucking hates Nintendo lol (some of the lengths he goes to in a video to downplay their role in helping to save the industry in the 80’s was laughable). He was definitely a SEGA kid 😂

It does him well to praise Sony as they are the market leader, provide him with free code and now and again their developers shout him out on Twitter. Their games are definitely impressive too so he’s not telling any lies he does go a bit overboard sometimes while not giving the same credit or plaudits to the big Xbox and even Nintendo games. Why would you upset that kind of relationship.
 
No no no no no.

It's very simple.

You lot are speaking as if you worked on the consoles and are there designing each and every game.

Enough of that. You guys don't really know jack about how these games will look and perform until someone tells you what you are actually looking at and what to look for. Instead of running around this forum (and this is to BOTH sides) telling everybody how unreal and how amazing this is going to look and what this console can do and what this console can't, how about you just sit and wait?

It wasn't even over a month ago where this forum was told every hour about how amazing Halo would look............ and then we got that.

You don't know. You keep telling everybody, but you don't know and have no proof.

Just wait.
I don't know.. seems pretty much like common sense that a faster more capable machine is going to facilitate making better looking, better running games. Companies like Naughty Dog are going to have some amazing looking games. I know this because they already have amazing looking games on significantly lesser hardware.
 

Journey

Banned
NXGamer definitely knows his stuff (I think his stuff is a bit too in-depth which is why his channel has struggled. DF is much more mass market in its terminology and presentation etc). I definitely appreciate him though.

He definitely has a preference for Sony esp their first party games though and he fucking hates Nintendo lol (some of the lengths he goes to in a video to downplay their role in helping to save the industry in the 80’s was laughable). He was definitely a SEGA kid 😂

It does him well to praise Sony as they are the market leader, provide him with free code and now and again their developers shout him out on Twitter. Their games are definitely impressive too so he’s not telling any lies he does go a bit overboard sometimes while not giving the same credit or plaudits to the big Xbox and even Nintendo games. Why would you upset that kind of relationship.



1ab.jpg
 
some insight courtesy of DF on the audio components found in both systems.


If you break down the transfer speed of the Quick Resume feature for Forza Horizon 3, 18 GB over the course of 6.3 seconds, that actually breaks the per second raw bandwidth for the drive down to 2.857 GB/s.

So it does look like MS's numbers were conservative on the raw and compressed speeds of the drive after all. But like also said in the video, developers have to be developed with XvA in mind (so probably a slight learning curve there, probably moreso than with Sony's solution but I'd probably say not by a huge amount).

NXGamer definitely knows his stuff (I think his stuff is a bit too in-depth which is why his channel has struggled. DF is much more mass market in its terminology and presentation etc). I definitely appreciate him though.

He definitely has a preference for Sony esp their first party games though and he fucking hates Nintendo lol (some of the lengths he goes to in a video to downplay their role in helping to save the industry in the 80’s was laughable). He was definitely a SEGA kid 😂

It does him well to praise Sony as they are the market leader, provide him with free code and now and again their developers shout him out on Twitter. Their games are definitely impressive too so he’s not telling any lies he does go a bit overboard sometimes while not giving the same credit or plaudits to the big Xbox and even Nintendo games. Why would you upset that kind of relationship.


TBF some people actually do overstate Nintendo's role in "saving" gaming during the '80s (just like how they exaggerate the crash in the early '80s which was primarily an American thing. Japan was hardly affected whatsoever and consoles weren't even getting traction in Europe until the Master System came out, as microcomputers were king there), but that's a discussion for another day...

I think it's worth touching on the point you bring up with these companies in their own way, kind of encouraging this type of slanted praise we see some types do on various channels (I don't think NXGamer is necessarily guilty of any overbearing bias tbh; there are other tech analysis types around who are though. Same with various gaming-centric podcasts but again, delving into another topic...), with the perks in free codes etc. you mention. It behooves these individuals to still try their best in preventing their preferences from overtaking what should be fairly neutral and fair technical analysis and insights, however.
 
Last edited:

Bergoglio

Member
The same goes for 3D audio, which has made me think a lot about marketing and presentation. Sony made a remarkable pitch for a revolution in 3D audio with PlayStation 5 with its Tempest Engine, talking about hundreds of audio sources accurately positioned in 3D space - yet Microsoft has essentially made the same pitch with its own hardware, which also has the HRTF support that the Tempest Engine has. Microsoft hasn't made any specific promises about mapping 3D audio to the individual's specific HRTF, but then again, Sony hasn't really told us how it plans to get that data for each player.

 
Last edited:
The same goes for 3D audio, which has made me think a lot about marketing and presentation. Sony made a remarkable pitch for a revolution in 3D audio with PlayStation 5 with its Tempest Engine, talking about hundreds of audio sources accurately positioned in 3D space - yet Microsoft has essentially made the same pitch with its own hardware, which also has the HRTF support that the Tempest Engine has. Microsoft hasn't made any specific promises about mapping 3D audio to the individual's specific HRTF, but then again, Sony hasn't really told us how it plans to get that data for each player.


Seem Series X audio system is comparable in power to PlayStation 5's
 
If you break down the transfer speed of the Quick Resume feature for Forza Horizon 3, 18 GB over the course of 6.3 seconds, that actually breaks the per second raw bandwidth for the drive down to 2.857 GB/s.

So it does look like MS's numbers were conservative on the raw and compressed speeds of the drive after all. But like also said in the video, developers have to be developed with XvA in mind (so probably a slight learning curve there, probably moreso than with Sony's solution but I'd probably say not by a huge amount).
keep in mind that that includes equal amount of read and write, and I am willing to bet that the ssds on both consoles will be of the type that are significantly faster reading, than writing.
 
Seem Series X audio system is comparable in power to PlayStation 5's

Comparing the Series X audio to Tempest itself it's actually the more "powerful" of the two in terms of raw performance and it supports virtually all of the same features Tempest does. However, Sony does have a (traditional) DSP audio core separate from Tempest in PS5 which I'd figure is a lot less powerful but is present for normal audio tasks, and less bandwidth (also power) demanding, too.

So Tempest + regular DSP might still give Sony's setup a slight edge in audio WRT raw capability, but that setup would also involve two audio cores sharing access to the memory bus, which throws in another variable for bus contention. That's assuming the regular DSP even accesses the memory bus, mind you; I'm assuming it does but there's a small chance it might not.

Overall audio capabilities between these systems is going to be remarkably similar and that's overall a net benefit for gamers and developers.

keep in mind that that includes equal amount of read and write, and I am willing to bet that the ssds on both consoles will be of the type that are significantly faster reading, than writing.

That's a good point; SSDs almost always have magnitudes faster read speed than writes. NOR flash is the opposite. I wish there was a flash technology (maybe some kind of MCP) that combined NOR and NAND into a single package, especially if it came to market a few years prior.

Now, I just think stuff like Xpoint, Optane etc. are going to take that spot and bring much bigger benefits, let alone stuff like ReRAM, FRAM, NRAM, MRAM etc.
 
Last edited:

PaintTinJr

Member
AFAIK, zlib and RDO are used alongside BCPack (or can be), not that BCPack is necessarily those things? It's similar to PS5 utilizing Oodle Texture alongside RDO. BCPack is being designed specifically for GPU-bound textures, so I think it would be logical to assume DirectStorage/GPUDirectStorage take this into consideration with the points you raise. FWIW MS have only said that 1/10th of a CPU core is used for XvA, this is probably 1/10th of the OS core, and they never said decompression work is done on this core. After all, it has decompression hardware in the APU itself where the decompression actually takes place, similar to PS5.

If the GPU is pulling in the texture files in a GPU-friendly format (BCPack) directly from storage, then the GPU is likely also handling the decompression for those files. That is if it even needs to decompress the files for use. We don't necessarily know where the decompression block is in the Series X, but we can assume it's within the I/O block, and if you look here:

.....
BCPack isn't a GPU friendly format, because as the jokey Xbox engineer tweet said(paraphrasing, of course) it is block compression(zlib (ant falvour) guided by RDO stats) of block compression textures BCn, but we didn't want to call it that.
 

Panajev2001a

GAF's Pleasant Genius
Comparing the Series X audio to Tempest itself it's actually the more "powerful" of the two in terms of raw performance and it supports virtually all of the same features Tempest does.

I think you are comparing apples and oranges here and by that same token PS3 was 2 TFLOPS with nVIDIA counting fixed function HW.

The numbers are there, 20-24 FP ops/cycle on the programmable DSP XSX has and 64 FP ops/cycle on Tempest running at GPU clocks (fully programmable + very likely PS4 like less flexible Sound DSP’s).

I am not sure we would be keen to people twisting PS5 numbers to somehow make it appear to have higher FLOPS rating for its GPU than it does, so not sure why like with the SSD’s and the XVA+SFS somehow the numbers kind of magically tell a suddenly different and inverted story.

Also, bus contention issues would be similar across designs... not sure why it is such a hot topic, as if we did not have essentially multi channel ring busses or a more mesh like interconnect fabric allowing multiple peripherals to share bandwidth without having to own and release the bus and local storage/buffers cover a lot of that latency too (and even then you had various ways to trade off speed with impact on the shared single bus).
 

Marlenus

Member
Thanks for the reply. So what percent of performance of a 3700x? Like 80%?

I’m bemused at the likes of AC Valhalla and WD Legion being 30fps with those CPU’s... Like wtf are they doing with all that 4x CPU compute performance gain. I can understand 30fps next gen only games but I really expected the cross gen games to be 60fps.

On PC the application performance between renoir (4000 APUs) and Matisse (3000 CPUs) is about the same unless it is a cache heavy app.

In gaming renoir sits between the 2000 series and the 3000 series when all are running the se memory config.

Low latency memory really helps both do better so the performance in the consoles will depend what speed the Infinity Fabric is running at and the memory timings.

If Sony were to tune the memory timings and MS don't ot would more than offset the 100Mhz difference in CPU clockspeed.
On the
 
I think you are comparing apples and oranges here and by that same token PS3 was 2 TFLOPS with nVIDIA counting fixed function HW.

The numbers are there, 20-24 FP ops/cycle on the programmable DSP XSX has and 64 FP ops/cycle on Tempest running at GPU clocks (fully programmable + very likely PS4 like less flexible Sound DSP’s).

I am not sure we would be keen to people twisting PS5 numbers to somehow make it appear to have higher FLOPS rating for its GPU than it does, so not sure why like with the SSD’s and the XVA+SFS somehow the numbers kind of magically tell a suddenly different and inverted story.

Also, bus contention issues would be similar across designs... not sure why it is such a hot topic, as if we did not have essentially multi channel ring busses or a more mesh like interconnect fabric allowing multiple peripherals to share bandwidth without having to own and release the bus and local storage/buffers cover a lot of that latency too (and even then you had various ways to trade off speed with impact on the shared single bus).

I'm going by the raw performance comparisons both Sony and MS have provided: Sony compared Tempest raw performance to a PS4 CPU, MS compared theirs to One X CPU. PS4 CPU is around 102.4 GFLOPs, One X's CPU is around 137.2 GFLOPs. These are chips Sony and MS decided to compare their audio solutions to, so that is with them.

Also why are making a verbal designation between "programmable" in one case and "fully programmable" in another case? It's either programmable or it isn't. Neither company's been transparent enough to state the absolute degree of programmability with their audio solutions, so this sounds more like you making an inference to your own tastes.

There's no numbers being invented with the SSD performace; a use-case scenario was described with Quick Resume and numbers provided that you can easily do some quick math on and get the number I posted. They clearly specify the conditions and it all makes logical sense. How is saying the actual performance of the drive in that Quick Resume case being 2.857 GB/s "inverting" the narrative of the SSD I/O solutions? That sounds more like certain people want certain perceived gaps to stay as they are even if the reality doesn't shape out that way. It happens on both sides, you're possibly showing your own flavor of it here.

Bus contention is an issue with both systems, agreed. However, any system with more processor components needing to access a shared memory pool, that is another variable into the general bus setup of the system among those components. Everything else you're describing in this regard is present in both systems so I don't see how it really nullifies my point which wasn't even a serious point in the first place, just a cursory observation and acknowledgement that "more hands on the pie" means, figuratively, less of the pie for each hand. It's just a general statement, nothing more than that.

BCPack isn't a GPU friendly format, because as the jokey Xbox engineer tweet said(paraphrasing, of course) it is block compression(zlib (ant falvour) guided by RDO stats) of block compression textures BCn, but we didn't want to call it that.

Is there a link to this?
 
Last edited:
On PC the application performance between renoir (4000 APUs) and Matisse (3000 CPUs) is about the same unless it is a cache heavy app.

In gaming renoir sits between the 2000 series and the 3000 series when all are running the se memory config.

Low latency memory really helps both do better so the performance in the consoles will depend what speed the Infinity Fabric is running at and the memory timings.

If Sony were to tune the memory timings and MS don't ot would more than offset the 100Mhz difference in CPU clockspeed.
On the

So what percentage of a 3700x do you think?
 

Panajev2001a

GAF's Pleasant Genius
I'm going by the raw performance comparisons both Sony and MS have provided: Sony compared Tempest raw performance to a PS4 CPU, MS compared theirs to One X CPU. PS4 CPU is around 102.4 GFLOPs, One X's CPU is around 137.2 GFLOPs. These are chips Sony and MS decided to compare their audio solutions to, so that is with

The DSP they quoted as programmable is the one I quoted the FP ops/cycle for and the one that compares to Tempest (in the sense that PS2’s GS was flexible but not freely/fully programmable in any modern sense of the term, or think about Flipper’s TEV, and NV2A shaders and their predecessors the register combiners before were programmable).

You can take that slide that way you are doing and I can state PS3 was a 2 TFLOPS machine because that is what Sony and nVIDIA stated in a slide too... no circling around that point.

You are also under quoting Tempest’s number with the lower simplified estimated bound Cerny have: 64 FP ops/cycle * 2.23 GHz = 142.72 GFLOPS and again you are comparing fixed function HW in the mix vs the Tempest compute unit which MS did deliberately to get a figure that compared more favourably to Tempest. That is not unlike how they were selling or trying to sell ESRAM bandwidth + DDR bandwidth as something that added up together to dwarf PS4’s bandwidth. I am sure there are possibly even slides with that written on them.
 
Last edited:
The DSP they quoted as programmable is the one I quoted the FP ops/cycle for and the one that compares to Tempest (in the sense that PS2’s GS was flexible but not freely/fully programmable in any modern sense of the term, or think about Flipper’s TEV, and NV2A shaders and their predecessors the register combiners before were programmable).

You can take that slide that way you are doing and I can state PS3 was a 2 TFLOPS machine because that is what Sony and nVIDIA stated in a slide too... no circling around that point.

You are also under quoting Tempest’s number with the lower simplified estimated bound Cerny have: 64 FP ops/cycle * 2.23 GHz = 142.72 GFLOPS and again you are comparing fixed function HW in the mix vs the Tempest compute unit which MS did deliberately to get a figure that compared more favourably to Tempest. That is not unlike how they were selling or trying to sell ESRAM bandwidth + DDR bandwidth as something that added up together to dwarf PS4’s bandwidth. I am sure there are possibly even slides with that written on them.

Your Tempest number assumes the engine is running at the same clock as the GPU, but Sony have already said it's its own separate chip and not part of the GPU. Therefore there's no guarantee it runs at the GPU's clock speed and they actually have never clarified that.

Therefore I'm only comparing their audio capabilities to what both companies have actually stated officially, and in both cases they compared them to prev-gen CPUs. If Tempest's numbers were as you said then Cerny would've used the PS4 Pro as a reference point, not the PS4.
 

Panajev2001a

GAF's Pleasant Genius
Your Tempest number assumes the engine is running at the same clock as the GPU, but Sony have already said it's its own separate chip and not part of the GPU. Therefore there's no guarantee it runs at the GPU's clock speed and they actually have never clarified that.

From DF’s satellite interview with Cerny:
The Tempest engine itself is, as Cerny explained in his presentation, a revamped AMD compute unit, which runs at the GPU's frequency and delivers 64 flops per cycle.

This was a few minutes search, not like I scoured the web for this by the way. I think and always stated that Cerny never boasts even when he could twist the numbers to make them seem even more impressive than they are, but that is beside the point.
 
Last edited:

PaintTinJr

Member
....

Is there a link to this?

There will be the tweets about the info they disclosed that isn't under NDA- it seemed almost in reaction to Rate distortion optimization (RDO) info RAD game tools were releasing about how Oodle texture compression worked, which is in effect the same AFAIK from the tweets.

Anyway, the fact that you've asked for a link suggest you haven't had time to gen-up on this, so I'll try and give some simple background (for you and anyone else interested) and hopefully someone that uses twitter can trackdown and link the Xbox engineer tweets that are pertinent to BCpack released info.

So, lossy block compression for textures (HW accelerated) has been around since about the launch of Quake 3 IIRC, and went by its original name S3_TC, which also has newer names DXT1 and BC1 - if we are strictly talking about the bog standard format.

A RGB_888 format - in this case - texture (and mipmaps) is divided into blocks of 4x4 pixels (16 pixels x 3 bytes per pixel) occupying 48bytes per block. Two uncompressed RGB_565 pixel values are chosen - that's 2bytes in total - and are then used with two known blending equations to generate two more RGB_565 values. giving a block 4 unique colours with just 2bytes storage needed.

Whether the 4 colours are the best chosen is a matter of exhaustive computational search and signal to noise analysis, and so these values will be different between GPU vendors, where speed of encoding has more of a priority, and even standalone software encoders will differ because a "best algorithm" is likely still highly contested .

However, with the 4 pixel values for the block now selected, each original pixel in the block can be substituted with one of those 4 colours that best matches its value. Which is done by storing 2bits (00, 10, 01, 11) for each original pixel that selects a colour. 16pixels at 2bits each, plus the 2x RGB_565 colours stored, gives just 8bytes, and 6:1 compression.

Now, you can take a zlib algorithm and compress the S3_TC texture like any other file. But the point of using RDO, is that at the point of the S3_TC compression, different selections - that would be deemed less correct by the algorithm used - will produce different zlib compression ratios. So (AFAIK) RDO is the balance of introducing noise into the S3_TC compression stage, measuring the resulting zlib compression, and choosing an acceptable increase in noise in exchange for improved zlib compression ratios.

Here's the original tweet about BCPack that I was paraphrasing.

 
Last edited:
From DF’s satellite interview with Cerny:


This was a few minutes search, not like I scoured the web for this by the way. I think and always stated that Cerny never boasts even when he could twist the numbers to make them seem even more impressive than they are, but that is beside the point.

Appreciate it; I had to get ready for a workout so I didn't have time to do a Google search 🤷‍♂️. In any case, we're talking a pittance overall in terms of raw power between Series X's audio solution and Tempest, barely 10 GFLOPs if not even less.

If MS provided a rough comparison of their audio performance to greater than One X's CPU, given this was at a technical presentation and not an E3-style presser, then it should be assumed that is the level of performance even if it is not "fully programmable" in the way Tempest might be.

At the end of the day it just cements the reality the audio solutions are remarkably close to each other in raw performance and overall capability, which has been my point from the very moment I started speaking about it in this thread.

PaintTinJr PaintTinJr Appreciate the briefing. Mind parlaying this roundabout into your general point regarding GPU/CPU decompression and BCPack you were talking about earlier?
 
Last edited:

PaintTinJr

Member
....

PaintTinJr PaintTinJr Appreciate the briefing. Mind parlaying this roundabout into your general point regarding GPU/CPU decompression and BCPack you were talking about earlier?
Yeah, the point is that for BCpack to be anything more than just BCn (in comparison to a competitor solution), the decompression needs to happen in a place that the GPU can use the uncompressed BCn data instantly.

I did read/skim hrough some of the guy's other tweets, and the 64KB block size is mentioned in regards of mipmap pages (for SFS), so it does look like they are intending to use random access zlib decompression - which removes the need to unpack BCn files fully, and that sounded like a hardware solution in one of the tweets, however he said he's continuing to work on the DXTC in another, and there was tweets I remember from the next-gen thread where he claimed they could get more compression out of BCpack than the stated figures as they improve the algorithm, so that instantly says it is software based, but probably on the GPU side. He also mentioned multiple code paths - with AVX2 getting a mention but I'm not sure if that's more DX on PC.

To be honest, I can't really tell anymore if their strategy is zlib decompression to the 6GB through the CPU, or random access zlib decompression from the 10GB using SFS on the GPU. There was a tweet about they can't publicly disclose more, but are sharing the info with Xbox developers, but my hunch now is that they are using CUs for the BCpack decompression now, and using the AVX2 from standard zlib decompression into the 6GB for non-texture data.
 
Last edited:

Allandor

Member
Seem Series X audio system is comparable in power to PlayStation 5's
We have audio hardware acceleration in every console since years and it is not really used that way that was promised. E.g. Shape had truly remarkable power for that time. Something similar was in PS4 and now this is just used for some stereo headphones virtual 3d effects.
My prediction: It will be the same in the next gen, just because the level of sound quality is already so high, most users wouldn't even notice if the quality get's better. And than there is the problem that each sound setup is different.
 

Revenge

Member
Low latency memory really helps both do better so the performance in the consoles will depend what speed the Infinity Fabric is running at and the memory timings.

PS5 and XSX, both use GDDR6, which have a lot of bandwidth, but more latency when compared to DDR4 on PC.
That's why L3 cache would be very important in consoles. But they also take a lot of space in the SoC, so i understand Microsoft choosing to only have 8MB of it, instead of 32MB like the 3700X.

The 3700X benefits of the low latency of DDR4 and from having a lot more L3 Cache. The new Cpu's in the consoles are still a lot stronger than the weaker Jaguar.
 
Last edited:
Yeah, the point is that for BCpack to be anything more than just BCn (in comparison to a competitor solution), the decompression needs to happen in a place that the GPU can use the uncompressed BCn data instantly.

I did read/skim hrough some of the guy's other tweets, and the 64KB block size is mentioned in regards of mipmap pages (for SFS), so it does look like they are intending to use random access zlib decompression - which removes the need to unpack BCn files fully, and that sounded like a hardware solution in one of the tweets, however he said he's continuing to work on the DXTC in another, and there was tweets I remember from the next-gen thread where he claimed they could get more compression out of BCpack than the stated figures as they improve the algorithm, so that instantly says it is software based, but probably on the GPU side. He also mentioned multiple code paths - with AVX2 getting a mention but I'm not sure if that's more DX on PC.

To be honest, I can't really tell anymore if their strategy is zlib decompression to the 6GB through the CPU, or random access zlib decompression from the 10GB using SFS on the GPU. There was a tweet about they can't publicly disclose more, but are sharing the info with Xbox developers, but my hunch now is that they are using CUs for the BCpack decompression now, and using the AVX2 from standard zlib decompression into the 6GB for non-texture data.

Sounds about to what might be the case, however I'm not exactly sure if the GPU's using CUs for that work. I keep going back to the explicit mention of ARM cores in the APU design from the Indian AMD engineer a long few months back. It was a pretty deliberate mention and rather interesting since it would beg the question where it's being used.

My guess is it's being utilized in some way for extended executeIndirect functionality in the GPU; it was actually one of the things I hoped MS would specify info on but no dice. Back to the BCn stuff, I do recall the tweet about them working on optimizing the algorithm, so you're right it's at least partially software-based, just like SFS. However, just like SFS, there's also some dedicated hardware involved; SFS in particular with the mip-blending hardware in the GPU which has been mentioned several times, but not readily present (or even mentioned) at the Hot Chips presentation or in any of the slides.

It's possible their strategy is both things you mention: zlib decompression through CPU, random access zilb decompression through SFS on GPU. Just keeping in mind they still do have a dedicated decompressor block, which is what I would say is actually doing the bulk of the decompression. The CPU and GPU probably just take turns in controlling it (the GPU has a DMA block in the diagram, maybe that is for the decompressor?).
 

Redlight

Member
What? :messenger_tears_of_joy:

They're doing the exact same job and getting paid exactly the same. One can only do it with a little less strength.
So we're here now? We can't bring ourselves to say that Series X is more powerful so we must redefine our definitions so that PS5 is both 'exactly the same' while simultaneously being 'a little less' powerful?

I don't think I want any of that, but thanks. :)
 

DForce

NaughtyDog Defense Force
So we're here now? We can't bring ourselves to say that Series X is more powerful so we must redefine our definitions so that PS5 is both 'exactly the same' while simultaneously being 'a little less' powerful?

I don't think I want any of that, but thanks. :)

I literally said one can only do it with a little less strength.

Try keeping up.
 

Marlenus

Member
PS5 and XSX, both use GDDR6, which have a lot of bandwidth, but more latency when compared to DDR4 on PC.

People kept saying that about GDDR5 vs DDR3 in the PS4 vs Xbox One and the memory specification sheets showed the latency was the same on the memory chip itself. We also have geekbench tests of PS4 vs jaguar based laptops and memory latency was about the same in both (120ns). It would be interesting if we can find any tests of that subor(sp?) Zen based console that had GDDR5 memory. DF had some hands on time with it but didnt do any memory latency testing so boo on them.

We have no empirical evidence via memory latency tests that GDDR6 has higher inherent latency than DDR4 when paired with the same CPU so it cannot be categorically stated that GDDR6 has higher latency because it is GDDR6. I expect it may be higher than 3600 C16 because the consoles will probably stick to JEDEC standards but I would not be surprised if the console memory latency was the same or lower than JEDEC 3200 C20.
 
From DF’s satellite interview with Cerny:


This was a few minutes search, not like I scoured the web for this by the way. I think and always stated that Cerny never boasts even when he could twist the numbers to make them seem even more impressive than they are, but that is beside the point.

Unf that Cerny is one sexy mofo 😂😍
 
Top Bottom