• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

UL releases 3DMark Mesh Shaders Feature test, first results of NVIDIA Ampere and AMD RDNA2 GPUs

"Intersection engine" is basically "Ray Accelerator" or "RA". AMD/Sony engineers (maybe even MS engineers) were probably using the term "intersection engine" internally during the design process. Hence why Mark Cerny referred to it as "intersection engine" in his talk back in March 2020.

amd-ray-accelerator.png


In late Oct 2020, it became "Ray Accelerator". Which is quite simply AMD's marketing term for "intersection engine" inside RDNA 2 CU. Similar to their marketing term "GameCache" (remember that?) to refer to what was basically just a large L3 cache during the introduction of Zen 2 architecture.

And then Cerny said this.

Another major new feature of our custom RDNA2 based GPU is Ray-Tracing.

f:id:keepitreal:20200329150003j:plain


Using the same strategy as AMD's upcoming PC GPUs.

Why are we still arguing if the PS5 has RT hardware?
 

FireFly

Member
I don’t remember so... Infinte Cache is a external cache and it is not inside the CU.
IPC is related to all GPU parts... AMD and MS were talking about CU perf. per clock cycle.

But if you have a source about that, please share with us.

BTW if that is true then RDNA 2 CUs are just RDNA 1 CU at arch level without any improvement... that makes sense too.
I remember there being a general slide about IPC, but I can't find it now. However here is one that attributes improved performance per clock to the Infinity Cache:


Looking at the PC benchmarks, I don't see any evidence of a big boost in IPC, like the one from GCN to RDNA.
 

SlimySnake

Flashless at the Golden Globes
Please provide a Link where an AMD Rep states infinity Cache is the most important feature in RDNA2.

Secondly show me an RDNA2 card with a 320 bit bus like the Xbox Series X has.
Infinity cache is there to mitigate the lack of bandwidth on a 256 bit bus...
“Introduced with the RDNA 2 architecture, Infinity Cache is a new cache system that operates alongside the GDDR6 memory interface within both the RX 6800 XT and RX 6800. It's a pretty big deal for AMD, too, with its engineers telling us it is the key to unlocking gaming performance from 1080p to 4K where it would otherwise have been saddled with a massive and power-hungry alternative.”


Ask yourself a simple question. Why else would amd spend 6 billion of the 26 billion transitions on the chip on the infinity cache instead of on more CUs? That's literally 25% of the gpu silicon.

You can find more quotes by simply googling.
 

Genx3

Member
“Introduced with the RDNA 2 architecture, Infinity Cache is a new cache system that operates alongside the GDDR6 memory interface within both the RX 6800 XT and RX 6800. It's a pretty big deal for AMD, too, with its engineers telling us it is the key to unlocking gaming performance from 1080p to 4K where it would otherwise have been saddled with a massive and power-hungry alternative.”


Ask yourself a simple question. Why else would amd spend 6 billion of the 26 billion transitions on the chip on the infinity cache instead of on more CUs? That's literally 25% of the gpu silicon.

You can find more quotes by simply googling.
That's the secret on a 256 bit bus not on a 320 bit bus.

Anandtech

"On-chip caches for GPU usage are not a new idea, especially for AMD. The company included a 32MB eSRAM cache for the Xbox One (and Xbox One S) SoC, and even before that the Xbox 360 had an on-package eDRAM as well. But this is the first time we’ve seen a large cache on a PC GPU.

Navi 21 will have a 128MB Infinity Cache. Meanwhile AMD isn’t speaking about other GPUs, but those will presumably include smaller caches as fast caches eat up a lot of die space.
s."

XSX has a 320 bit bus.
Infinity cache is basically doing what the 32 megs of ES Ram are doing in the original Xbox One.

It's been done before twice on Xbox no less but now it's some New secret performance upgrade.
 
Last edited:

SlimySnake

Flashless at the Golden Globes
That's the secret on a 256 bit bus not on a 320 bit bus.

Anandtech

"On-chip caches for GPU usage are not a new idea, especially for AMD. The company included a 32MB eSRAM cache for the Xbox One (and Xbox One S) SoC, and even before that the Xbox 360 had an on-package eDRAM as well. But this is the first time we’ve seen a large cache on a PC GPU.

Navi 21 will have a 128MB Infinity Cache. Meanwhile AMD isn’t speaking about other GPUs, but those will presumably include smaller caches as fast caches eat up a lot of die space.
s."

XSX has a 320 bit bus.
Infinity cache is basically doing what the 32 megs of ES Ram are doing in the original Xbox One.

It's been done before twice on Xbox no less but now it's some New secret performance upgrade.
No.

TtTJQbH.jpg


Infinity cache is the main reason why they are able to hit such high performance gains. They have tried adding more CUs before with Vega and it blew up in their face. Even Nvidia got severely bottlenecked despite having 3x the shader cores of 2080 and offering only 80% more performance. they couldnt even get 2x more performance. they literally have 135 CUs worth of shader cores in the 3080. rtx 2080 had only 46 CUs worth of shader processors. a 30 tflops card like 3080 is performing like a 20 tflops card from AMD because of their lack on infinity cache.

Infinity cache is the key going forward thats why AMD guys call it the key. And the fact of the matter is that this fancy new feature that got its own section during the reveal is not in the xsx. There is a reason why the xsx is struggling to keep up with the PS5 in some games despite its 20% extra tflops count. That should never happen on the same arch of cards. A 5700 could never outperform a 5700xt. a 6800 could never outperform a 6800xt. a 3080 can never be outperformed by a 3070. And yet here we are. A so called rdna 1.5 card outperforming a rdna 2.0 card. maybe just maybe the reason for the xsx underperforming is that they needed this infinity cache to get the most out of their GPU and skimping on it screwed them.
 

SlimySnake

Flashless at the Golden Globes
there is no tensor core but support for int 4 int8 e FP8
Precisely. its the most basic levels of support. like you said yourself, sony had something similar in the pro with their rapid packed math implementation.

the fact remains that in the second wired article, they mentioned the GPU supporting machine learning. We can see from the xsx implementation how basic GPU features can be used to do ML. I can promise you the PS5 will be using something similar. Maybe the xsx will have better performance because of their additions but that remains to be seen.

P.S the PS5 is fully BC with the PS4 Pro. It must have support for rapid packed math just in case games used it and we know from Horizon and GoW, it was definitely utilized.
 

Genx3

Member
No.

TtTJQbH.jpg


Infinity cache is the main reason why they are able to hit such high performance gains. They have tried adding more CUs before with Vega and it blew up in their face. Even Nvidia got severely bottlenecked despite having 3x the shader cores of 2080 and offering only 80% more performance. they couldnt even get 2x more performance. they literally have 135 CUs worth of shader cores in the 3080. rtx 2080 had only 46 CUs worth of shader processors. a 30 tflops card like 3080 is performing like a 20 tflops card from AMD because of their lack on infinity cache.

Infinity cache is the key going forward thats why AMD guys call it the key. And the fact of the matter is that this fancy new feature that got its own section during the reveal is not in the xsx. There is a reason why the xsx is struggling to keep up with the PS5 in some games despite its 20% extra tflops count. That should never happen on the same arch of cards. A 5700 could never outperform a 5700xt. a 6800 could never outperform a 6800xt. a 3080 can never be outperformed by a 3070. And yet here we are. A so called rdna 1.5 card outperforming a rdna 2.0 card. maybe just maybe the reason for the xsx underperforming is that they needed this infinity cache to get the most out of their GPU and skimping on it screwed them.
Its not a new feature its literally ES/ED Ram 2.0.
 

SlimySnake

Flashless at the Golden Globes
Its not a new feature its literally ES/ED Ram 2.0.
This is sad. You keep moving goalposts instead of admitting you were wrong about AMD saying that the infinity cache is the most important feature of their RDNA 2 cards.

There is no shame in saying thanks, i missed that and moving on. no one is gonna respect you less for admitting you didnt know all the details.
 
No.

TtTJQbH.jpg


Infinity cache is the main reason why they are able to hit such high performance gains. They have tried adding more CUs before with Vega and it blew up in their face. Even Nvidia got severely bottlenecked despite having 3x the shader cores of 2080 and offering only 80% more performance. they couldnt even get 2x more performance. they literally have 135 CUs worth of shader cores in the 3080. rtx 2080 had only 46 CUs worth of shader processors. a 30 tflops card like 3080 is performing like a 20 tflops card from AMD because of their lack on infinity cache.

Infinity cache is the key going forward thats why AMD guys call it the key. And the fact of the matter is that this fancy new feature that got its own section during the reveal is not in the xsx. There is a reason why the xsx is struggling to keep up with the PS5 in some games despite its 20% extra tflops count. That should never happen on the same arch of cards. A 5700 could never outperform a 5700xt. a 6800 could never outperform a 6800xt. a 3080 can never be outperformed by a 3070. And yet here we are. A so called rdna 1.5 card outperforming a rdna 2.0 card. maybe just maybe the reason for the xsx underperforming is that they needed this infinity cache to get the most out of their GPU and skimping on it screwed them.

Not in the slightest. It's a big reason for extra sustained performance because having such a powerful backstop of cache for when data isn't in VRAM is a powerful thing, but by no means is it the biggest reason for the performance gains in those chips. The biggest reason is the architectural enhancements of RDNA 2 with many more Compute Units, plus greatly improved clock speeds over what they use to get with GCN. With GCN, clock speed and architectural inferiority were their two biggest reasons for lackluster performance compared to Nvidia.

You can remove that infinity cache, toss in some faster RAM, or go with a larger memory bus and it would still perform every bit as good. What AMD did with Infinity Cache was find a cheaper way than going with more expensive and faster RAM like Nvidia did, or by adding a larger, more expensive memory bus, which would also increase the amount of RAM on the GPU. Infinity Cache was about smart cost savings with the additional uplift of getting some nice extra performance from the bandwidth savings. It isn't the reason for RDNA 2's big performance gains.

If you believe that about Infinity Cache, then you might as well say give GCN infinity cache and just like that you have a perf match for Ampere. Infinity Cache was a clever way to avoid increasing costs, but the architectural enhancements, which directly led to being able to get even more work done faster, on top of increased clock speeds and a large number of CUs under the new arch is the prime reason for RDNA 2's gains.

The idea actually comes from their Ryzen CPUs, and before that there is some influence from xbox consoles like 360.
 
Last edited:
Its not a new feature its literally ES/ED Ram 2.0.
Yes it is. The only difference is ESRAM in Xbox One didn't work the way EDRAM did in Xbox 360. The 360 model is more in line with what AMD did with their Ryzen CPUs, and now in line with what they're doing with RDNA 2.
 

FireFly

Member
No.

TtTJQbH.jpg


Infinity cache is the main reason why they are able to hit such high performance gains. They have tried adding more CUs before with Vega and it blew up in their face. Even Nvidia got severely bottlenecked despite having 3x the shader cores of 2080 and offering only 80% more performance. they couldnt even get 2x more performance. they literally have 135 CUs worth of shader cores in the 3080. rtx 2080 had only 46 CUs worth of shader processors. a 30 tflops card like 3080 is performing like a 20 tflops card from AMD because of their lack on infinity cache.

Infinity cache is the key going forward thats why AMD guys call it the key. And the fact of the matter is that this fancy new feature that got its own section during the reveal is not in the xsx. There is a reason why the xsx is struggling to keep up with the PS5 in some games despite its 20% extra tflops count. That should never happen on the same arch of cards. A 5700 could never outperform a 5700xt. a 6800 could never outperform a 6800xt. a 3080 can never be outperformed by a 3070. And yet here we are. A so called rdna 1.5 card outperforming a rdna 2.0 card. maybe just maybe the reason for the xsx underperforming is that they needed this infinity cache to get the most out of their GPU and skimping on it screwed them.
The 3080 is "bottlenecked" because Nvidia only doubled FP32 performance per SM, not the texture rate, fillrate or integer rate. That's not the same as scaling by increasing the amount of SMs/CUs. The XSX has roughly 25% more compute than the 5700 XT, but also 25% more bandwidth, so it shouldn't be bandwidth constrained.

On the PC the Infinity Cache makes sense because AMD is bound by power consumption before they are bound by die area, and they can afford to increase the die size, because they are not as heavily cost constrained as on the console. So by giving them better performance per watt, it allows them to make a faster 300W card. That is a big deal, but it doesn't mean either console is somehow terribly unbalanced without it.
 
25% of boost in perfomance per CUs on series X compared the past generation reading their panels, around the 50% on ps5 looking to the Road to the ps5 video.

I think people are confusing IPC gains and performance per watt gains. But that's not the only thing being confused. Performance gain is being automatically assumed to be the same as an perf per clock or IPC gain. They can be the same thing, but based on context can be entirely different depending on the reason for the performance gain.

There are two metrics relating to performance when considering RDNA in any capacity. IPC (instructions per clock) and performance per watt.

RDNA represented a 25% IPC gain over GCN or a 1.25x gain. Meaning on pure architectural compute capability alone, the RDNA architecture is that much better at getting more work done per clock before any other factor is introduced.


RDNA also represented a 50% perf per watt gain over GCN. Meaning that for more or less the exact same power draw, you get 50% better performance out of the GPU compared to the last gen equivalent. This improvement can be spent towards boosting clock speed if desired. Power doesn't scale linearly with clock speed increases, so your mileage will vary. Or you can get away with achieving equal or better performance compared to a previous gen using much lower power consumption levels.

Based on the numbers we've seen for power usage on Series X during gameplay with a title like Gears 5, it's safe to say that the perf per watt advantages over both GCN and RDNA 1 to RDNA 2 is somewhat in full effect. Otherwise I don't see power draw being that good at 1825MHz.

The only performance (NOT IPC) gains from RDNA 1 to RDNA 2 (which was another 25%) came 100% from boosting clock speeds an additional 30%. In other words, the core IPC performance of a compute unit did not change in any meaningful way going from RDNA 1 to RDNA 2.

The only true perf per clock or IPC improvement, according to AMD, came from the inclusion of the infinity cache, but it isn't a bigger contributor to absolute performance over RDNA 1 compared to the 30% clock speed boost.

Infinity cache played a key role in how AMD beat their 50% perf per watt target for RDNA 2, getting to 54% perf per watt. The 54% perf per watt was achieved through a combination of power efficiency optimizations, infinity cache, and due to new clock speed improvements, the previously mentioned 30% boost in clock speed over RDNA 1.

Here is the slide showing the big reason for additional 1.25x gain over RDNA1 to RDNA2. All from the 30% frequency increase, not any change to IPC.

ZQZF0Iu.jpg




Anandtech believes the majority of AMD's 54% perf per watt improvement over RDNA 1 came from that big 30% clock speed improvement.


The anandtech article below and the youtube video from AMD more or less cover these details.




Microsoft was referring to the IPC gain from Vega (GCN) to RDNA 1 because there were no traditional IPC gains from RDNA 2 to RDNA 1 outside of the small bump from Infinity Cache, and neither console has infinity cache.

So on a pure architectural standpoint, Microsoft is correct that compared to GCN the Series X GPU's CUs have 25% better performance per clock on average graphics workloads relative to the GCN generation. Video below.




The only perf per clock improvement in RDNA 2 came entirely from Infinity Cache. What makes RDNA 2 better than RDNA 1 are 30% faster clock speed, which isn't the same as better IPC per CU, improved perf per watt, and the brand new big features like VRS, Hardware Accelerated Ray Tracing, Sampler Feedback and Mesh Shaders.


TLDR VERSION
: In absolute performance terms, the Series X GPU is getting much better pure performance per Compute Unit when compared to any GCN GPU, and a crap ton more pure performance per compute unit when compared to any Xbox One era console, including X. BUT leaving out all other factors, such as notably higher clock speeds and what new features can do to help, a Series X GPU CU is on a pure instructions per clock basis 25% better on average graphics workloads compared to GCN (Vega). It isn't the whole picture, but it's a fair measurement of architectural improvement in the CU. There was no such improvement in raw Compute Unit IPC performance from RDNA 1 to 2. Only IPC improvement to speak of comes directly from Infinity Cache, but infinity cache isn't responsible for the largest performance uplift on RDNA 2, that's the 30% clock speed improvement.
 
Last edited:

mitchman

Gold Member
Absolutely yes. the gpu is very likely based on rdna1 to which they have added a whole series of customizations to make it similar and very close to rdna2. As I have already written there is too much, too much evidence to prove it. It will be interesting once the devs exploit all the capabilities of the hw (2022?) to see the performance differences of both machines.
Yes, just like the XSX APU, also more like RDNA1 with 2 features bolted on. See
 
Oh you are right, that one with the dragon in the room or what it was....

Another cool fact about Xbox Series X's mesh shader support is that it is confirmed to be beyond the RDNA 2 spec, beyond even the RX 6000 Series GPUs on PC. And higher Series X spec produces better results.

DOYbYMT.jpg


And this shows the RX 6000 max from AMD's own YouTube video on Mesh Shaders showcases the max on their PC cards is 128. Higher means better Vertex Sharing.

C3nPh5s.jpg
 

Shmunter

Member
Perhaps explains why Sony rolled out the programable geometry engine thingy, was not happy with stock rdna2
 
Last edited:
Top Bottom