• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

DLSS and Path Tracing could force console makers to go with NVIDIA in the future

Could sufficient advances in DLSS and Path Tracing support bring Sony to NVIDIA?

  • No, the competition eventually catches up and offers a similar current value.

  • No, the console market just doesn't care enough to afford the price.

  • Yes, they corner the market by subsidising the chip and outvalue the competition.

  • Yes, the difference will become even larger and consumers will pay for it in the end.


Results are only viewable after voting.

Three

Member
Nvidia tries to gouge manufacturers as much as it tries to gouge its customers on GPU prices. Console manufacturers have been burnt one too many times to go there again.
 

SmokedMeat

Gamer™
Guess you can.
The discussion was about "RT" and "RT performance not mattering" according to you with reference to Lumen...which ironically relies on hardware RT.
images

Looks like it doesn’t Schmendrick. Look at the benchmarks we have so far Schmendrick. Neck and neck, with a 7900XTX actually pulling slightly ahead of the 4080 that’s a good $300 more.

I’m not going to continue when you’ve got no Lumen receipts Schmendrick. I’m about real world results, not speculation and hot air.
 
Nintendo already is Nvidia, Sony and Microsoft care more about getting the cheapest price on hardware components and AMD can consistently undercut Nvidia there
 
Last edited:

Schmendrick

Member
Looks like it doesn’t Schmendrick. Look at the benchmarks we have so far Schmendrick. Neck and neck, with a 7900XTX actually pulling slightly ahead of the 4080 that’s a good $300 more.

I’m not going to continue when you’ve got no Lumen receipts Schmendrick. I’m about real world results, not speculation and hot air.
"Real world results" translating to "only those benchmarks you like" because obviously the last few hundred benchmarks with the heaviest possible RT load available today on consumer`s private rigs don't count, and ofc the 10% difference in light RT loads even according to your personally selected benchmarks count as "neck and neck".

You'd stand under a blue sky and still claim it's red.

It doesn’t rely on RT hardware you hardhead ! Lol 😂 you can use lumen perfectly fine on a gpu with no hardware RT.
If you`re fine with abstracted geometry fields and inaccurate lighting while still slamming your performance, yes.
Dynamic and halfway accurate => hardware RT, no way around it.
 
Last edited:

nemiroff

Gold Member
Console makers be like: Hey AMD, what's going on with RT, this isn't looking too good for us right now. We have to use inline RT in cutscenes and pretend we can do RT.
AMD: Hey don't worry, we'll catch up next generation. And we're cheap!
Console makers: Ah, ok. You kinda told us that last time.. But ok, makes sense..
Next generation: No catchupped.
AMD: We're so awesome!
Console makers: Well, we already have a few reflections.., maybe we can add a bit of shadows this time and call it Path Tracing..?
 
Last edited:

JCK75

Member
I prefer FSR simply because it works across so many more platforms.
No DLSS on my Nvidia GTX 1080ti but FSR works great on it..
 

lyan

Member
Console makers be like: Hey AMD, what's going on with RT, this isn't looking too good for us right now. We have to use inline RT in cutscenes and pretend we can do RT.
AMD: Hey don't worry, we'll catch up next generation. And we're cheap!
Console makers: Ah, ok. You kinda told us that last time.. But ok, makes sense..
Next generation: No catchupped.
AMD: We're so awesome!
Console makers: Well, we already have a few reflections.., maybe we can add a bit of shadows this time and call it Path Tracing..?
To be fair a console with nvidia would use a xx60 card equivalent which I wouldn't call it RT ready.
 

Buggy Loop

Member
Console makers be like: Hey AMD, what's going on with RT, this isn't looking too good for us right now. We have to use inline RT in cutscenes and pretend we can do RT.
AMD: Hey don't worry, we'll catch up next generation. And we're cheap!
Console makers: Ah, ok. You kinda told us that last time.. But ok, makes sense..
Next generation: No catchupped.
AMD: We're so awesome!
Console makers: Well, we already have a few reflections.., maybe we can add a bit of shadows this time and call it Path Tracing..?

We shall name it… r/AyyMD
 

mrcroket

Member
"Real world results" translating to "only those benchmarks you like" because obviously the last few hundred benchmarks with the heaviest possible RT load available today on consumer`s private rigs don't count, and ofc the 10% difference in light RT loads even according to your personally selected benchmarks count as "neck and neck".

You'd stand under a blue sky and still claim it's red.


If you`re fine with abstracted geometry fields and inaccurate lighting while still slamming your performance, yes.
Dynamic and halfway accurate => hardware RT, no way around it.
Cyberpunk overdrive and portal are two nvidia-sponsored games that have made an effort to look and perform well on nvidia by leveraging their strengths. But the reality is that in games that use Unreal Engine 5 with lumen hardware (which is about 90% of the games to be released in the coming years) the difference in performance is much lower.

And it is not the first time it happens, the same thing happened with hardware physx, it only performed well in nvidia graphics, in games sponsored by nvidia.
 
In 5 years time when the next console are due out, the tech that AMD has might be very different.
I think that DLSS type ML will become a thing on future AMD products might well be a thing.
There was the rumour of a now cancelled Xbox project with AMD where they had a big ML block on the APU, so maybe it's something down the piple line.
 

Schmendrick

Member
Cyberpunk overdrive and portal are two nvidia-sponsored games that have made an effort to look and perform well on nvidia by leveraging their strengths. But the reality is that in games that use Unreal Engine 5 with lumen hardware (which is about 90% of the games to be released in the coming years) the difference in performance is much lower.

And it is not the first time it happens, the same thing happened with hardware physx, it only performed well in nvidia graphics, in games sponsored by nvidia.
we have ONE game that uses lumen with the RT sliders nearly at zero... and we have a few dozen available games where the gap between AMD and NVIDIA basically linearly widens with every RT feature you activate.....documented in hundreds of review benchmarks right at launch if you so wish to ignore the Nvidia sponsored CP2077 and Portal........ The 7900xtx`s RT competence is about RTX 3090 level, or in other terms, a good gen behind. And that is before factoring in tech like ReStir, the quality difference between DLSS and FSR etc.......
AMD`s saving grace is the big equalizer namely the weak budget hardware in the consoles which is the baseline for developments and basically damns RT to stay in its little niche until at least next gen except for sponsored outliers.

And comparing a general tech like RT to a highly proprietary solution like PhysX is absolutely ridiculous.
 
Last edited:

Buggy Loop

Member
we have ONE game that uses lumen with the RT sliders nearly at zero... and we have a few dozen available games where the gap between AMD and NVIDIA basically linearly widens with every RT feature you activate.....documented in hundreds of review benchmarks right at launch if you so wish to ignore the Nvidia sponsored CP2077 and Portal........ The 7900xtx`s RT competence is about RTX 3090 level, or in other terms, a good gen behind. And that is before factoring in tech like ReStir, the quality difference between DLSS and FSR etc.......
AMD`s saving grace is the big equalizer namely the weak budget hardware in the consoles which is the baseline for developments and basically damns RT to stay in its little niche until at least next gen except for sponsored outliers.

And comparing a general tech like RT to a highly proprietary solution like PhysX is absolutely ridiculous.

It's not even fair to call it a gen behind because it's worse than that. It brute forces its way by having more cores and the higher baseline rasterization is really what's keeping it even "a gen behind". On a RT performance/core basis, I don't think they've even surpassed Turing, not when you enable RT beyond just shadows.

The hybrid RT pipeline won't survive path tracing advances in the coming years. It was made specifically for inline ray tracing and that lasted a whooping 1 game (Not even AMD sponsored). Soon as you have too many dynamic shaders you choke it and it becomes a worse performer than DXR 1.0.

Ray tracing hardware sharing resources with texture unit and compute is a dead end. It's probably why Microsoft's intensive ML research cannot even be implemented on Xbox so far because you're probably slowing down the texture unit too much for what the ML will bring.

For AMD, ideally they continue the MCM route and maybe have dedicated RT/ML modules on the side of the GCD with a 3D memory stacked on top for quick transfers.

Their hybrid RT patent was to simplify the silicon and save area for more rasterization... but Nvidia basically managed to keep up in rasterization AND has ~25% of silicon dedicated to RT & ML. Their solution seemed sound initially, but it fizzled out. The advantage is clearly not there.
 

poppabk

Cheeks Spread for Digital Only Future
The performance on cards that require several hundred watts and are about the size of a console just for the GPU aren't really that indicative of where Nvidia is on a console compatible system. The DLSS/FSR difference exists but at the expense of extra hardware and I just don't see the console manufacturers switching the entire architecture for slightly less flickering and artifacts that 90% of people won't even notice.
 
People have obviously no clue what they are talking about.

If they chose to go with nvidia they also have to find someone for the CPU.

Would mean seperated GPU and CPU. This will never happen because It would raise the price for the console drastically.
 
Last edited:

Buggy Loop

Member
People have obviously no clue what they are talking about.

If they chose to go with nvidia they also have to find someone for the CPU.

Would mean seperated GPU and CPU. This will never happen because It would raise the price for the console drastically.

Nvidia can handle CPU with ARM. Check anything tegra? Grace superchip?

I don't think it's happening for Nvidia to enter console market in the next gen (outside nintendo), but Nvidia making an APU isn't a huge hurdle for them if the R&D budget for one would be approved for a next gen console.
 

poppabk

Cheeks Spread for Digital Only Future
Nvidia can handle CPU with ARM. Check anything tegra? Grace superchip?

I don't think it's happening for Nvidia to enter console market in the next gen (outside nintendo), but Nvidia making an APU isn't a huge hurdle for them if the R&D budget for one would be approved for a next gen console.
But then they are the ones generations behind - and is it even worth it for anyone, console manufacturers or Nvidia, at that point?
 

Loxus

Member
It's not even fair to call it a gen behind because it's worse than that. It brute forces its way by having more cores and the higher baseline rasterization is really what's keeping it even "a gen behind". On a RT performance/core basis, I don't think they've even surpassed Turing, not when you enable RT beyond just shadows.

The hybrid RT pipeline won't survive path tracing advances in the coming years. It was made specifically for inline ray tracing and that lasted a whooping 1 game (Not even AMD sponsored). Soon as you have too many dynamic shaders you choke it and it becomes a worse performer than DXR 1.0.

Ray tracing hardware sharing resources with texture unit and compute is a dead end. It's probably why Microsoft's intensive ML research cannot even be implemented on Xbox so far because you're probably slowing down the texture unit too much for what the ML will bring.

For AMD, ideally they continue the MCM route and maybe have dedicated RT/ML modules on the side of the GCD with a 3D memory stacked on top for quick transfers.

Their hybrid RT patent was to simplify the silicon and save area for more rasterization... but Nvidia basically managed to keep up in rasterization AND has ~25% of silicon dedicated to RT & ML. Their solution seemed sound initially, but it fizzled out. The advantage is clearly not there.
AMD RDNA3 ML & RT is just as dedicated as Nvidia's and laid out the same way.
3954hWD.jpg
KH8BIUQ.jpg
KsvgIQx.png
hCdkGjO.jpg

OFZm1Mf.jpg
huf4fZS.jpg


With a closer look, you can see the layout of an Nvidia 3000 series SM vs AMD RDNA 2 CU.
dpDzaUj.jpg
LfEdEwb.jpg


And yes, it's one gen behind as AMD current best matches Nvidia's last gen best.
sigS5hK.jpg


Quit spreading faults information.
 

winjer

Gold Member
AMD RDNA3 ML & RT is just as dedicated as Nvidia's and laid out the same way.

They are not the same.
AMD's ML acceleration is just an instruction called WMMA, that performs GEMMS on a normal Vector unit on the GPU. It's not a dedicated unit. Unlike NVidia's tensor cores.
vZfhbc9.jpg


In the case of Ray-tracing, AMD is still doing RT in the TMUs. And the BVH traversal is still done in shaders.
There have been improvements in RDNA3, like LDS instructions to improve BVH traversal, bigger caches with lower latency, and increased vector register file capacity allows to keep more ray's in execution.
 

Otre

Banned
When FSR 2 exists? When Nvidia sell everything at an inflated price? When console gamers dont notice nor care for most quality drops like framerate or resolution? Nah. They will go for the cheapest option.
 

Loxus

Member
They are not the same.
AMD's ML acceleration is just an instruction called WMMA, that performs GEMMS on a normal Vector unit on the GPU. It's not a dedicated unit. Unlike NVidia's tensor cores.
vZfhbc9.jpg


In the case of Ray-tracing, AMD is still doing RT in the TMUs. And the BVH traversal is still done in shaders.
There have been improvements in RDNA3, like LDS instructions to improve BVH traversal, bigger caches with lower latency, and increased vector register file capacity allows to keep more ray's in execution.
5x0FxBM.png

These are separate from what you are talking about.
Here you can see the AI Matrix Accelerator are different than the one you are referring too (Matrix SIMD32).

NkCwXjM.jpg

Not to mention, from my understanding. The Tensor Core shares the same schedulers, registers, and other resources similar to AMD.
LglaWPE.jpg


It's obvious AMD RT implementation is different from Nvidia RT implementation. And that's one of the reasons Nvidia performs better, devs optimize for Nvidia.
It's not that AMD RT implementation is bad.

My point is that people claim Nvidia has separate block for Nvidia ML and RT compared to AMD, but as you can see, that's not the case.
 
Last edited:

winjer

Gold Member

You posted a diagram where the AI Matrix Accelerator is inside the Vector unit. Which in turn is a part of the Compute Unit.
It's just a set of instructions to accelerate GEMMS. It's not a full Tensor Core, like NVidia or Intel have.
 

Loxus

Member
You posted a diagram where the AI Matrix Accelerator is inside the Vector unit. Which in turn is a part of the Compute Unit.
It's just a set of instructions to accelerate GEMMS. It's not a full Tensor Core, like NVidia or Intel have.
Did you read this?
WH2UTvx.jpg
 

Crayon

Member
My guess is that nvidia has too much leverage for ms and sony to want to deal with them. Not to mention these feature differences are not especially pronounced on screen. DLSS looks reliably, but nominally better than other upscalers. With most games, I'm hardly able to pick out rtx on/off in side by sides and I know what I'm supposed looking for. Normies wouldn't know in a million years with reflections being the possible exception.

They are staying amd. Those mcm chips are going to get real cheap when they figure out how to stitch two graphics dies together.
 
Last edited:

MikeM

Member
AMD’s APUs are getting better and better. No reason to back out from them now.

By the time RT becomes standard, everything will be able to run it well.
 

flying_sq

Member
TBH, I'd like to see the Nintendo and Nvidia partnership keep going. It'd be cool to see what Nvidia could do if they really focused on Tegra like chips. I think there is a market for them in TV processors. A mediatek competitor, and as long as no legal issues, put the same chip as the next Switch in TVs, play third party games that are on switch level, natively on your TV, just get a controller/mouse and keyboard. Or if TV manufacturers would be worried on price, another shield pro, but with major publisher support.
 

Puscifer

Gold Member
It doesn’t rely on RT hardware you hardhead ! Lol 😂 you can use lumen perfectly fine on a gpu with no hardware RT.
I hate when people laugh react anything that isn't positive for Nvidia, the 7900 XT and XTX date within such a margin the 4080 and 4090 in everything but RT that Nvidia should be damn embarrassed. Without RT any of the people who care about it like me id feel like an idiot buying this card. Seriously, if you aren't running RT to get increased performance out of your 4080/4090 than you played yourself and should've gotten a 7900 XT/xtx
 
Last edited:

OCASM

Banned
I hate when people laugh react anything that isn't positive for Nvidia, the 7900 XT and XTX date within such a margin the 4080 and 4090 in everything but RT that Nvidia should be damn embarrassed. Without RT any of the people who care about it like me id feel like an idiot buying this card. Seriously, if you aren't running RT to get increased performance out of your 4080/4090 than you played yourself and should've gotten a 7900 XT/xtx
Nah, image reconstruction is also inferior on AMD cards.
 

Del_X

Member
If AMD can’t close the gap, then I could see it. I think these SOCs will get close enough. I can also easily see next gen consoles starting at $699 or $799 and actually dropping a couple hundred in the first few years. Unlike this gen, the APU will probably be much larger and benefit die shrinks or other advancements.


It’s not just going to be ray tracing and image reconstruction, there will be a baseline level of ML cores to assist with physics, AI, and animation.

Another thing - wages are up about 20% in many regions since 2017. They’ll likely be about double 2013 levels in real terms by 2028 (even if inflation comes down a bit). $699 will be $399 from 2013 in real terms.
 

Anchovie123

Member
No. Whatever advantages Nvidea have over AMD wont be worth trading Zen CPU cores for. Just have to pray that AMD can get their GPU architecture as close to Nvidea as possible. & by the time next gen consoles launch (2028?) they should have competent solutions for Pathtracing & AI. FSR should also see big improvements.

There is the question of ARM + Nvidea but i dont think it would be worth the insane hassle to switch to a new architecture also it would probably break backwards compatibility.
 

winjer

Gold Member
Dissecting Tensor Cores via Microbenchmarks

Dual Compute Unit (DCU) = Streaming Multiprocessors (SM).

As you can see, the CUDA cores (FP/INT) and the Tensor Cores share the same scheduler, register file and shared memory in the same manner as AMD SIMD and AI Matrix Accelerator.

It can't get any more clear than this man, like come on.

Yet, tensor cores are still a separate unit inside the SM.
Unlike AMD that is using the Vector Units and WMMA instructions.
Not only are NVidia's Tensor Cores more complete with capabilities and features, they can be used when the shader units are being used.
On RDNA3, the vector unit is either calculating graphics operations or is calculating matrices.
 
Last edited:

Loxus

Member
Yet, tensor cores are still a separate unit inside the SM.
Unlike AMD that is using the Vector Units and WMMA instructions.
Not only are NVidia's Tensor Cores more complete with capabilities and features, they can be used when the shader units are being used.
On RDNA3, the vector unit is either calculating graphics operations or is calculating matrices.
I have no idea why you ignore even AMD.
nAnyLZF.jpg

There are two different Matrix within RDNA3.
One is dual purpose with the SIMD.
4x 32 FP32/INT32/Matrix Stream Processor.

The other is dedicated.
4x AI Matrix Accelerators.
Even here, you can see there are two Matrix blocks.
2N8t98B.jpg

Matrix SIMD32, which can either be Float, INT or Matrix.
AI Matrix Accelerator, which is a dedicated unit and separate from the Dual Issue Stream Processors.
6G3VStW.png



AMD plans to use the AI Matrix Accelerators for more that just image processing while playing a game.
How do you expect the game to work if all the units are utilized for Matrix operations?

AMD won't have users 'paying for features they never use' when it comes to AI in GPUs
However, the new approach with RDNA 3 has been to implement AI, with the new AI Matrix Accelerator block inside the Navi 31 GPU, but only where it's really needed.

"We are focused on including the specs that users want and need to give them enjoyment in consumer GPUs. Otherwise, users are paying for features they never use."

"Even if AI is used for image processing, AI should be in charge of more advanced processing," says Wang. The plan is to ensure any AI tech AMD brings to the table isn't limited to image processing."

The use of AI to empower game NPCs is something we hear a lot right now, and admittedly does sound like a good use for AI acceleration beyond just enhancing game visuals.



AMD themselves said the AI Matrix Accelerators are dedicated, if not even AMD you believe, it doesn't make sense explaining anything to you anymore.
 

Mahavastu

Member
You should take into account that a 10TF machine should not be as huge as the PS5 is, but AMD is not really efficient when considered their better node.

I guess that after the noise complaints of the PS4 Sony went a overboard with the cooling and such the size of the PS5. There were variants with way smaller cooling, which already would have allowed to reduce the size of the PS5 with the original chips.

I guess the 4090 might actually be close to what we can expect for the PS6.
I assume the PS6 to have a somewhat faster GPU, like 20-50% better than a 4090. I also expect AMD to have a better raytracing until then compared to nVidia today, after all we speak about 2-3 graphics core gens until then, one of which is probably major. I also expect AMD to have a fast hardware upscaling solution comparable to DLSS3, which is more then "good enough".

Anyone thinking that Chinese engineered console parts could come with next gen?
No, I guess it is too early.
Things in this space do not happen this fast and this is absolut high end. TSMC is currently the only company on that level, with Samsung a distant number 2. Even huge companies like Intel and Global Foundry have problems to compete with TSMC.
Even if China would be able to do all the chip manufacturing, it will probably not as advanced, good and mature as TSMCs one then and such a few shrinks behind, lower yield and what not. Not good if you want to make a huge die and make over 100mio consoles.

Anyway, for the PS7 this might have changed. With the new sanctions against the Chinese chip industry the Chinese are pretty much forced to invest and develop their own technology. And once the Chinese are there, they will improve fast and will beat the competition in innovation, price and maybe even quality.
A few weeks ago I read that Huawei had some major results for (still outdated) 14nm technology, about 2 years after being unable to use TSMC as their manufacturer. They are fast!!
 

winjer

Gold Member
I have no idea why you ignore even AMD.

There are two different Matrix within RDNA3.
One is dual purpose with the SIMD.
4x 32 FP32/INT32/Matrix Stream Processor.

The other is dedicated.
4x AI Matrix Accelerators.
Even here, you can see there are two Matrix blocks.
2N8t98B.jpg

Matrix SIMD32, which can either be Float, INT or Matrix.
AI Matrix Accelerator, which is a dedicated unit and separate from the Dual Issue Stream Processors.



AMD plans to use the AI Matrix Accelerators for more that just image processing while playing a game.
How do you expect the game to work if all the units are utilized for Matrix operations?

AMD won't have users 'paying for features they never use' when it comes to AI in GPUs
However, the new approach with RDNA 3 has been to implement AI, with the new AI Matrix Accelerator block inside the Navi 31 GPU, but only where it's really needed.

"We are focused on including the specs that users want and need to give them enjoyment in consumer GPUs. Otherwise, users are paying for features they never use."

"Even if AI is used for image processing, AI should be in charge of more advanced processing," says Wang. The plan is to ensure any AI tech AMD brings to the table isn't limited to image processing."

The use of AI to empower game NPCs is something we hear a lot right now, and admittedly does sound like a good use for AI acceleration beyond just enhancing game visuals.



AMD themselves said the AI Matrix Accelerators are dedicated, if not even AMD you believe, it doesn't make sense explaining anything to you anymore.

Yu are literally posting an image from AMD, that shows the AI Matrix Accelerator inside the Vector units.
What part don't you understand that it is not a dedicated unit, but just a set of instructions to accelerate Matrices done in the Shader Units.
It's called WMMA and it's well documented by AMD.

 

StereoVsn

Member
This thread is crazy. Chance of MS and Sony of dropping a straightforward x86 SoC solution with great CPU performance and good enough GPU performance for ARM/Nvidia option is basically 0.

Between price, performance, backward compatibility options, and results being "good enough", they would be crazy to jump to Nvidia, especially after Nvidia's previous shenanigans.

🤦‍♀️🤦‍♂️🤡
 

Buggy Loop

Member
Holy crap, STOP spamming the same goddamn images over and over..

The asynchronous barrier works at a subset of CUDA threads within the block while previous architectures were working at a whole-warp/block level. RDNA's 2 synchronization barrier prevents GPU from using async work to keep execution units busy while the BVH is in the compute queue. This has very poor occupancy when it happens and is latency bound. L2 latency is likely to be a limiting factor when building the BVH, because when occupancy is poor, L0/L1 cache hitrates are also poor. This has been micro benched.

You can basically overlap with other execution in the SM. Nvidia has a different pathway to do any sort of INT8 computing concurrently with the FP32+FP32 or INT32.

There's a single memory path dedicated to textures and another for memory loads. EACH of warp schedulers per SM is capable of dispatching two warp instructions per clock cycle and the warp scheduler issues the tensor core instruction in one cycle.

In the meantime, while RDNA 3 introduced Volta era matrix cores, consoles are left with barely anything worthwhile.

And it’s one thing to have a feature in a block diagram, but without pathways or the fundamental impacts of those decisions such as RT chocking the pipeline when things aren’t all lined up in a nice little queue in the right order, you’ll be left wondering why consoles have to have quarter resolution reflections and no GI?

An exemple

Shaders alone

FHYufA8DcSJS5ByEgq48ye-1024-80.png


ML alone

iURJZGwQMZnVBqnocbkqPa-1024-80.png


So far so good! What a beast!

Season 8 Episode 3 GIF by THE NEXT STEP



Combine everything…

5xRLCknXRdn9EFSKYETiye-1024-80.png


shocked mr bean GIF


RT’s solution on AMD already is super heavy on cache and latency and queue ordering. Sprinkling ML on top of that + shader pipeline is a big no no as of now.
 
I would love to see the next-gen consoles switch to NVIDIA GPUs but I feel that this would be make backward compatibility problematic or force Microsoft and Sony to abandon it altogether. Yes, Microsoft went the extra mile to make a large number of original Xbox and Xbox 360 games run on newer hardware but many titles are missing (e.g. RalliSport Challenge 2, PGR 1-4, etc). Not sure if that is due to issues with emulation or licencing but the point is that there are a notable number of classic Xbox and 360 games that do not run on the new systems.

AMD have had a good run with consoles but, in my experience, they aren't really innovators in the same way as NVIDIA are. NVIDIA lead where AMD follow in my honest opinion. AMD tech and software is usually not as good as NVIDIA's, e.g. they are behind with ray-tracing by a generation and their FSR2 upscaling tech is inferior to DLSS. DLSS would be a great addition to the next-gen systems, especially with DLSS3 frame generation which would be very refined by that point.

I would be happier though if AMD actually matched NVIDIA with hardware AI-assisted upscaling and RT on a par with their rivals. That way the new consoles could continue to use AMD GPUs which in turn would ensure backwards compatibility.
 

Loxus

Member
Holy crap, STOP spamming the same goddamn images over and over..

The asynchronous barrier works at a subset of CUDA threads within the block while previous architectures were working at a whole-warp/block level. RDNA's 2 synchronization barrier prevents GPU from using async work to keep execution units busy while the BVH is in the compute queue. This has very poor occupancy when it happens and is latency bound. L2 latency is likely to be a limiting factor when building the BVH, because when occupancy is poor, L0/L1 cache hitrates are also poor. This has been micro benched.

You can basically overlap with other execution in the SM. Nvidia has a different pathway to do any sort of INT8 computing concurrently with the FP32+FP32 or INT32.

There's a single memory path dedicated to textures and another for memory loads. EACH of warp schedulers per SM is capable of dispatching two warp instructions per clock cycle and the warp scheduler issues the tensor core instruction in one cycle.

In the meantime, while RDNA 3 introduced Volta era matrix cores, consoles are left with barely anything worthwhile.

And it’s one thing to have a feature in a block diagram, but without pathways or the fundamental impacts of those decisions such as RT chocking the pipeline when things aren’t all lined up in a nice little queue in the right order, you’ll be left wondering why consoles have to have quarter resolution reflections and no GI?

An exemple

Shaders alone

FHYufA8DcSJS5ByEgq48ye-1024-80.png


ML alone

iURJZGwQMZnVBqnocbkqPa-1024-80.png


So far so good! What a beast!

Season 8 Episode 3 GIF by THE NEXT STEP



Combine everything…

5xRLCknXRdn9EFSKYETiye-1024-80.png


shocked mr bean GIF


RT’s solution on AMD already is super heavy on cache and latency and queue ordering. Sprinkling ML on top of that + shader pipeline is a big no no as of now.
It was never about performance.
As for as I know, the AI Matrix Accelerator is not utilized as yet.

It was about you claiming Nvidia has dedicated block, while AMD does not.
I told you AMD ML and RT is just as dedicated as Nvidia's.

I have proven you wrong.
You can literally see labeled below, the processing block contains the FP/INT/Tensor that shared the same things.
JBUeGO5.jpg

Gs7cHgA.jpg


You are a Nvidia fan, of course you are set in your ways.

Edit:
Latency is not an issue for RDNA 3.
Microbenchmarking AMD’s RDNA 3 Graphics Architecture
rdna3_lds.png

RDNA 3 makes a massive improvement in LDS latency, thanks to a combination of architectural improvements and higher clock speeds. Nvidia enjoyed a slight local memory latency lead over AMD’s architectures, but RDNA 3 changes that. Low LDS latency could be very helpful when RDNA 3 is dealing with raytracing, because the LDS is used to store the BVH traversal stack.
 
Last edited:

Loxus

Member
Yu are literally posting an image from AMD, that shows the AI Matrix Accelerator inside the Vector units.
What part don't you understand that it is not a dedicated unit, but just a set of instructions to accelerate Matrices done in the Shader Units.
It's called WMMA and it's well documented by AMD.

[/URL]
I think you are mixing up some terms.

In this document a SIMD refers to the Vector ALU unit that processes instructions for a single wave.
eQqlWVB.jpg

The vector ALU maintains Vector GPRs that are unique for each work item and execute arithmetic operations uniquely on each work-item.

Look again, the AI Matrix Accelerator is not a SIMD.
XHd3trU.jpg


I get that you think it's supposed to work and operate exactly like the Tensor cores, but you have to remember these implementations are patented.

Different way of doing things to achieve the same goal.
 

winjer

Gold Member
Holy crap, STOP spamming the same goddamn images over and over..

The asynchronous barrier works at a subset of CUDA threads within the block while previous architectures were working at a whole-warp/block level. RDNA's 2 synchronization barrier prevents GPU from using async work to keep execution units busy while the BVH is in the compute queue. This has very poor occupancy when it happens and is latency bound. L2 latency is likely to be a limiting factor when building the BVH, because when occupancy is poor, L0/L1 cache hitrates are also poor. This has been micro benched.

Just quoting you to say you are very correct about the BVH being limited by the caches on RDNA2, causing low occupancy on the ray-accelerators.

rdna2_jigjig_longest_rt_call_stats.png


Our testing also shows that the RX 6900 XT’s L2 cache has a load-to-use latency of just above 80 ns. While that’s good for a multi-megabyte GPU cache, it’s close to memory latency for CPUs. In the end, the 6900 XT was averaging 28.7 billion box tests and 3.21 billion triangle tests per second during that call, despite being underclocked to 1800 MHz. AMD says each CU can perform four box tests or one triangle test per cycle, so RT core utilization could be anywhere from 7.2% to 29% depending on whether the counters increment for every intersection test, or every node.
 
Top Bottom