• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Programmer guy: There is significant headroom to improve ray tracing performance on RDNA 2

LordOfChaos

Member
"The TL;DR first: There is significant headroom in RDNA2 Raytracing with efficient coding. I was able to increase the performance of my 6800XT in https://github.com/GPSnoopy/RayTracingInVulkan… by 19% with some small code changes (PR47). These can be summed up as switching to wave32 and reducing VGPRs."
Ev-k9i-WYAARWkc


"With these improvements my (overclocked) RX 6800 is now nearly as fast as the 6900XT, without any changes to the amount of traced rays or the scene."

Ev-ltPhWEAMo2iX

Ev-l_V0WYAsas6i


"It now outperforms the 2080TI and even the 3080 in scenes 1 and 2, it performs on par with the 2080Ti in scene 3 and is slower in scenes 4 and 5. Optimization matters."

Github:





Even more tl;dr, with a bit of hand tuning for better use of RDNA 2 hardware, an improvement akin to stepping up a GPU tier can be hand and even trade blows with Nvidia hardware in specific scenes (Nvidia is still faster taking every difference and averaging though, but it's a nice boost for the same GPU you already had), and it's really a rather small amount of coding.
 
Last edited:

VFXVeteran

Banned
"The TL;DR first: There is significant headroom in RDNA2 Raytracing with efficient coding. I was able to increase the performance of my 6800XT in https://github.com/GPSnoopy/RayTracingInVulkan… by 19% with some small code changes (PR47). These can be summed up as switching to wave32 and reducing VGPRs."
Ev-k9i-WYAARWkc


"With these improvements my (overclocked) RX 6800 is now nearly as fast as the 6900XT, without any changes to the amount of traced rays or the scene."

Ev-ltPhWEAMo2iX

Ev-l_V0WYAsas6i


"It now outperforms the 2080TI and even the 3080 in scenes 1 and 2, it performs on par with the 2080Ti in scene 3 and is slower in scenes 4 and 5. Optimization matters."

Github:





Even more tl;dr, with a bit of hand tuning for better use of RDNA 2 hardware, an improvement akin to stepping up a GPU tier can be hand and even match Nvidia hardware in specific scenes, and it's really a rather small amount of coding.

These scenes.. are they entire games? The difference between the 6800XT and the 6900XT wasn't something to write home about. If you can change that kind of performance with just software changes, then my claim is true - that the RT is done in software where the texture units are basically "general purpose" TMUs that could be used for other things other than JUST RT like in Nvidia's RT cores.
 
Last edited:
Even more tl;dr, with a bit of hand tuning for better use of RDNA 2 hardware, an improvement akin to stepping up a GPU tier can be hand and even trade blows with Nvidia hardware in specific scenes, and it's really a rather small amount of coding.

New hardware performs better after developers have some time to experiment.
This is just normal and expected, specially in this case where the hardware is more flexible and not a black box (compared with the competition).
 

ethomaz

Banned
Shocking news.

AMD hardware works better with a different code path than nVidia hardware.

That is why core to specific hardware matter.
 
Eh. Person who made the original said it's not optimized at all and stated the scenes that are closest to an actual video game are the one's the 6900xt are the shittiest at.
Obviously gains are good but did this person try optimizing the code for Nvidia as well? Or just running this optimized code on Nvidia and seeing what's up?

Taking not optimized code and optimizing it a bit then testing it only on one card only proves the code was not optimized to begin with.
 
Last edited:

JimboJones

Member
Is it really ever a good thing to make it difficult to eek out performance from the hardware like this, feels like amd should have had this figured out 2 years ago.
 

LordOfChaos

Member
Eh. Person who made the original said it's not optimized at all and stated the scenes that are closest to an actual video game are the one's the 6900xt are the shittiest at.
Obviously gains are good but did this person try optimizing the code for Nvidia as well? Or just running this optimized code on Nvidia and seeing what's up?

Taking not optimized code and optimizing it a bit then testing it only on one card only proves the code was not optimized to begin with.

I think the other post above is correct, RTX is a readymade black box with the good and bad that comes with, it's already built optimized for Nvidia hardware. Whereas AMD's solution is more general and software reliant, so games just using the market leaders assumptions on it aren't doing as well. It's still not going to be faster than RTX even with optimization, but it is able to be faster.
 
I think the other post above is correct, RTX is a readymade black box with the good and bad that comes with, it's already built optimized for Nvidia hardware. Whereas AMD's solution is more general and software reliant, so games just using the market leaders assumptions on it aren't doing as well. It's still not going to be faster than RTX even with optimization, but it is able to be faster.
From original GitHub

When looking at the benchmark results of an RTX 2070 and an RTX 2080 Ti, the performance differences mostly in line with the number of CUDA cores and RT cores rather than being influences by other metrics. Although I do not know at this point whether the CUDA cores or the RT cores are the main bottleneck.

UPDATE 2020-01-07: the RTX 30xx results seem to imply that performance is mostly dictated by the number of RT cores. Compared to Turing, Ampere achieves 2x RT performance only when using ray-triangle intersection (as expected as per NVIDIA Ampere whitepaper), otherwise performance per RT core is the same. This leads to situations such as an RTX 2080 Ti being faster than an RTX 3080 when using procedural geometry.

UPDATE 2020-01-31: the 6900 XT results show the RDNA 2 architecture performing surprisingly well in procedural geometry scenes. Is it because the RDNA2 BVH-ray intersections are done using the generic computing units (and there are plenty of those), whereas Ampere is bottlenecked by its small number of RT cores in these simple scenes? Or is RDNA2 Infinity Cache really shining here? The triangle-based geometry scenes highlight how efficient Ampere RT cores are in handling triangle-ray intersections; unsurprisingly as these scenes are more representative of what video games would do in practice.

Edit
This isn't like a real benchmark. It's a guy following along an RT tutorial. The code is not optimized to begin with.
I do not think op is correct in the tldr
 
Last edited:

M1chl

Currently Gif and Meme Champion
Man graphic progamming is hard, when you fall from that. I am studying the code for past hour and I struggle to know, what they changed (like what does it do)
 
Man graphic progamming is hard, when you fall from that. I am studying the code for past hour and I struggle to know, what they changed (like what does it do)
All I know is wave 32 and 64 can be audio formats because I've a long time interest in audio formats and and vgprs is Vector General-Purpose Registers from Google.

Did that help haha

Edit
VFXVeteran VFXVeteran knows their shit.
What they said.
 
Last edited:

M1chl

Currently Gif and Meme Champion
All I know is wave 32 and 64 can be audio formats because I've a long time interest in audio formats and and vgprs is Vector General-Purpose Registers from Google.

Did that help haha

Edit
VFXVeteran VFXVeteran knows their shit.
What they said.
Well I am somewhat music producer, so I came across wave32 and 64, however I never really looked up at the structure of that "variable"... I helped with KC: D however I was not directly dedicated coder there. But I saw some shit, write some shit and so on. But this, it's too much for this time, especially when I suddenly became year older over midnight :messenger_pensive:
 

thelastword

Banned
Well, it was clear as day, seeing as how RDNA 2 performs with raytraced shadows......Nvidia has been at it for two gens of cards and they've been gameworksing the living propietary daylights out of Control especially, but as more RT games emerge, they will begin to take advantage of AMD hardware instead of just porting Nvidia Gameworks RTX over to RDNA 2.....
 

VFXVeteran

Banned
Well I am somewhat music producer, so I came across wave32 and 64, however I never really looked up at the structure of that "variable"... I helped with KC: D however I was not directly dedicated coder there. But I saw some shit, write some shit and so on. But this, it's too much for this time, especially when I suddenly became year older over midnight :messenger_pensive:

Dude, don't feel bad. I had YEARS to learn this stuff. Looking at that code is boring because I was doing that like when RT wasn't even in the horizon for GPUs to have hardware accelerated cores. Film has always been the leader in making this stuff. Peter Shirley is one of the wizards that made cool shit back in the day. Ken Perlin, Darwin Peachey, etc.. all of them was doing this stuff in offline for years and years. A typical C++ coder is not going to understand this stuff unless they've worked on a lot of math and algorithms before. Cheer up! And it's a software change. I'm sure game company software engineers have done more optimizations than this to their code already.
 
Top Bottom