Guess you can.
The discussion was about "RT" and "RT performance not mattering" according to you with reference to Lumen...which ironically relies on hardware RT.
It doesn’t rely on RT hardware you hardhead ! Lol you can use lumen perfectly fine on a gpu with no hardware RT."Can you act any dumber?"
Guess you can.
The discussion was about "RT" and "RT performance not mattering" according to you with reference to Lumen...which ironically relies on hardware RT.
"Real world results" translating to "only those benchmarks you like" because obviously the last few hundred benchmarks with the heaviest possible RT load available today on consumer`s private rigs don't count, and ofc the 10% difference in light RT loads even according to your personally selected benchmarks count as "neck and neck".Looks like it doesn’t Schmendrick. Look at the benchmarks we have so far Schmendrick. Neck and neck, with a 7900XTX actually pulling slightly ahead of the 4080 that’s a good $300 more.
I’m not going to continue when you’ve got no Lumen receipts Schmendrick. I’m about real world results, not speculation and hot air.
If you`re fine with abstracted geometry fields and inaccurate lighting while still slamming your performance, yes.It doesn’t rely on RT hardware you hardhead ! Lol you can use lumen perfectly fine on a gpu with no hardware RT.
To be fair a console with nvidia would use a xx60 card equivalent which I wouldn't call it RT ready.Console makers be like: Hey AMD, what's going on with RT, this isn't looking too good for us right now. We have to use inline RT in cutscenes and pretend we can do RT.
AMD: Hey don't worry, we'll catch up next generation. And we're cheap!
Console makers: Ah, ok. You kinda told us that last time.. But ok, makes sense..
Next generation: No catchupped.
AMD: We're so awesome!
Console makers: Well, we already have a few reflections.., maybe we can add a bit of shadows this time and call it Path Tracing..?
Console makers be like: Hey AMD, what's going on with RT, this isn't looking too good for us right now. We have to use inline RT in cutscenes and pretend we can do RT.
AMD: Hey don't worry, we'll catch up next generation. And we're cheap!
Console makers: Ah, ok. You kinda told us that last time.. But ok, makes sense..
Next generation: No catchupped.
AMD: We're so awesome!
Console makers: Well, we already have a few reflections.., maybe we can add a bit of shadows this time and call it Path Tracing..?
Cyberpunk overdrive and portal are two nvidia-sponsored games that have made an effort to look and perform well on nvidia by leveraging their strengths. But the reality is that in games that use Unreal Engine 5 with lumen hardware (which is about 90% of the games to be released in the coming years) the difference in performance is much lower."Real world results" translating to "only those benchmarks you like" because obviously the last few hundred benchmarks with the heaviest possible RT load available today on consumer`s private rigs don't count, and ofc the 10% difference in light RT loads even according to your personally selected benchmarks count as "neck and neck".
You'd stand under a blue sky and still claim it's red.
If you`re fine with abstracted geometry fields and inaccurate lighting while still slamming your performance, yes.
Dynamic and halfway accurate => hardware RT, no way around it.
we have ONE game that uses lumen with the RT sliders nearly at zero... and we have a few dozen available games where the gap between AMD and NVIDIA basically linearly widens with every RT feature you activate.....documented in hundreds of review benchmarks right at launch if you so wish to ignore the Nvidia sponsored CP2077 and Portal........ The 7900xtx`s RT competence is about RTX 3090 level, or in other terms, a good gen behind. And that is before factoring in tech like ReStir, the quality difference between DLSS and FSR etc.......Cyberpunk overdrive and portal are two nvidia-sponsored games that have made an effort to look and perform well on nvidia by leveraging their strengths. But the reality is that in games that use Unreal Engine 5 with lumen hardware (which is about 90% of the games to be released in the coming years) the difference in performance is much lower.
And it is not the first time it happens, the same thing happened with hardware physx, it only performed well in nvidia graphics, in games sponsored by nvidia.
we have ONE game that uses lumen with the RT sliders nearly at zero... and we have a few dozen available games where the gap between AMD and NVIDIA basically linearly widens with every RT feature you activate.....documented in hundreds of review benchmarks right at launch if you so wish to ignore the Nvidia sponsored CP2077 and Portal........ The 7900xtx`s RT competence is about RTX 3090 level, or in other terms, a good gen behind. And that is before factoring in tech like ReStir, the quality difference between DLSS and FSR etc.......
AMD`s saving grace is the big equalizer namely the weak budget hardware in the consoles which is the baseline for developments and basically damns RT to stay in its little niche until at least next gen except for sponsored outliers.
And comparing a general tech like RT to a highly proprietary solution like PhysX is absolutely ridiculous.
People have obviously no clue what they are talking about.
If they chose to go with nvidia they also have to find someone for the CPU.
Would mean seperated GPU and CPU. This will never happen because It would raise the price for the console drastically.
But then they are the ones generations behind - and is it even worth it for anyone, console manufacturers or Nvidia, at that point?Nvidia can handle CPU with ARM. Check anything tegra? Grace superchip?
I don't think it's happening for Nvidia to enter console market in the next gen (outside nintendo), but Nvidia making an APU isn't a huge hurdle for them if the R&D budget for one would be approved for a next gen console.
AMD RDNA3 ML & RT is just as dedicated as Nvidia's and laid out the same way.It's not even fair to call it a gen behind because it's worse than that. It brute forces its way by having more cores and the higher baseline rasterization is really what's keeping it even "a gen behind". On a RT performance/core basis, I don't think they've even surpassed Turing, not when you enable RT beyond just shadows.
The hybrid RT pipeline won't survive path tracing advances in the coming years. It was made specifically for inline ray tracing and that lasted a whooping 1 game (Not even AMD sponsored). Soon as you have too many dynamic shaders you choke it and it becomes a worse performer than DXR 1.0.
Ray tracing hardware sharing resources with texture unit and compute is a dead end. It's probably why Microsoft's intensive ML research cannot even be implemented on Xbox so far because you're probably slowing down the texture unit too much for what the ML will bring.
For AMD, ideally they continue the MCM route and maybe have dedicated RT/ML modules on the side of the GCD with a 3D memory stacked on top for quick transfers.
Their hybrid RT patent was to simplify the silicon and save area for more rasterization... but Nvidia basically managed to keep up in rasterization AND has ~25% of silicon dedicated to RT & ML. Their solution seemed sound initially, but it fizzled out. The advantage is clearly not there.
AMD RDNA3 ML & RT is just as dedicated as Nvidia's and laid out the same way.
They are not the same.
AMD's ML acceleration is just an instruction called WMMA, that performs GEMMS on a normal Vector unit on the GPU. It's not a dedicated unit. Unlike NVidia's tensor cores.
In the case of Ray-tracing, AMD is still doing RT in the TMUs. And the BVH traversal is still done in shaders.
There have been improvements in RDNA3, like LDS instructions to improve BVH traversal, bigger caches with lower latency, and increased vector register file capacity allows to keep more ray's in execution.
Did you read this?You posted a diagram where the AI Matrix Accelerator is inside the Vector unit. Which in turn is a part of the Compute Unit.
It's just a set of instructions to accelerate GEMMS. It's not a full Tensor Core, like NVidia or Intel have.
Did you read this?
Dissecting Tensor Cores via MicrobenchmarksLink to that paper.
I hate when people laugh react anything that isn't positive for Nvidia, the 7900 XT and XTX date within such a margin the 4080 and 4090 in everything but RT that Nvidia should be damn embarrassed. Without RT any of the people who care about it like me id feel like an idiot buying this card. Seriously, if you aren't running RT to get increased performance out of your 4080/4090 than you played yourself and should've gotten a 7900 XT/xtxIt doesn’t rely on RT hardware you hardhead ! Lol you can use lumen perfectly fine on a gpu with no hardware RT.
Nah, image reconstruction is also inferior on AMD cards.I hate when people laugh react anything that isn't positive for Nvidia, the 7900 XT and XTX date within such a margin the 4080 and 4090 in everything but RT that Nvidia should be damn embarrassed. Without RT any of the people who care about it like me id feel like an idiot buying this card. Seriously, if you aren't running RT to get increased performance out of your 4080/4090 than you played yourself and should've gotten a 7900 XT/xtx
The question you should ask yourself is why anyone spending 1200/1600 dollars on a gpu has to use them at allNah, image reconstruction is also inferior on AMD cards.
Because it gives the games a boost they would otherwise not get? Is that supposed to be a bad thing?The question you should ask yourself is why anyone spending 1200/1600 dollars on a gpu has to use them at all
Dissecting Tensor Cores via Microbenchmarks
Dual Compute Unit (DCU) = Streaming Multiprocessors (SM).
As you can see, the CUDA cores (FP/INT) and the Tensor Cores share the same scheduler, register file and shared memory in the same manner as AMD SIMD and AI Matrix Accelerator.
It can't get any more clear than this man, like come on.
I have no idea why you ignore even AMD.Yet, tensor cores are still a separate unit inside the SM.
Unlike AMD that is using the Vector Units and WMMA instructions.
Not only are NVidia's Tensor Cores more complete with capabilities and features, they can be used when the shader units are being used.
On RDNA3, the vector unit is either calculating graphics operations or is calculating matrices.
You should take into account that a 10TF machine should not be as huge as the PS5 is, but AMD is not really efficient when considered their better node.
I assume the PS6 to have a somewhat faster GPU, like 20-50% better than a 4090. I also expect AMD to have a better raytracing until then compared to nVidia today, after all we speak about 2-3 graphics core gens until then, one of which is probably major. I also expect AMD to have a fast hardware upscaling solution comparable to DLSS3, which is more then "good enough".I guess the 4090 might actually be close to what we can expect for the PS6.
No, I guess it is too early.Anyone thinking that Chinese engineered console parts could come with next gen?
I have no idea why you ignore even AMD.
There are two different Matrix within RDNA3.
One is dual purpose with the SIMD.
4x 32 FP32/INT32/Matrix Stream Processor.
The other is dedicated.
4x AI Matrix Accelerators.
Even here, you can see there are two Matrix blocks.
Matrix SIMD32, which can either be Float, INT or Matrix.
AI Matrix Accelerator, which is a dedicated unit and separate from the Dual Issue Stream Processors.
AMD plans to use the AI Matrix Accelerators for more that just image processing while playing a game.
How do you expect the game to work if all the units are utilized for Matrix operations?
AMD won't have users 'paying for features they never use' when it comes to AI in GPUs
However, the new approach with RDNA 3 has been to implement AI, with the new AI Matrix Accelerator block inside the Navi 31 GPU, but only where it's really needed.
"We are focused on including the specs that users want and need to give them enjoyment in consumer GPUs. Otherwise, users are paying for features they never use."
"Even if AI is used for image processing, AI should be in charge of more advanced processing," says Wang. The plan is to ensure any AI tech AMD brings to the table isn't limited to image processing."
The use of AI to empower game NPCs is something we hear a lot right now, and admittedly does sound like a good use for AI acceleration beyond just enhancing game visuals.
AMD themselves said the AI Matrix Accelerators are dedicated, if not even AMD you believe, it doesn't make sense explaining anything to you anymore.
It was never about performance.Holy crap, STOP spamming the same goddamn images over and over..
The asynchronous barrier works at a subset of CUDA threads within the block while previous architectures were working at a whole-warp/block level. RDNA's 2 synchronization barrier prevents GPU from using async work to keep execution units busy while the BVH is in the compute queue. This has very poor occupancy when it happens and is latency bound. L2 latency is likely to be a limiting factor when building the BVH, because when occupancy is poor, L0/L1 cache hitrates are also poor. This has been micro benched.
You can basically overlap with other execution in the SM. Nvidia has a different pathway to do any sort of INT8 computing concurrently with the FP32+FP32 or INT32.
There's a single memory path dedicated to textures and another for memory loads. EACH of warp schedulers per SM is capable of dispatching two warp instructions per clock cycle and the warp scheduler issues the tensor core instruction in one cycle.
In the meantime, while RDNA 3 introduced Volta era matrix cores, consoles are left with barely anything worthwhile.
And it’s one thing to have a feature in a block diagram, but without pathways or the fundamental impacts of those decisions such as RT chocking the pipeline when things aren’t all lined up in a nice little queue in the right order, you’ll be left wondering why consoles have to have quarter resolution reflections and no GI?
An exemple
Shaders alone
ML alone
So far so good! What a beast!
Combine everything…
RT’s solution on AMD already is super heavy on cache and latency and queue ordering. Sprinkling ML on top of that + shader pipeline is a big no no as of now.
I think you are mixing up some terms.Yu are literally posting an image from AMD, that shows the AI Matrix Accelerator inside the Vector units.
What part don't you understand that it is not a dedicated unit, but just a set of instructions to accelerate Matrices done in the Shader Units.
It's called WMMA and it's well documented by AMD.
[/URL]
Holy crap, STOP spamming the same goddamn images over and over..
The asynchronous barrier works at a subset of CUDA threads within the block while previous architectures were working at a whole-warp/block level. RDNA's 2 synchronization barrier prevents GPU from using async work to keep execution units busy while the BVH is in the compute queue. This has very poor occupancy when it happens and is latency bound. L2 latency is likely to be a limiting factor when building the BVH, because when occupancy is poor, L0/L1 cache hitrates are also poor. This has been micro benched.
Our testing also shows that the RX 6900 XT’s L2 cache has a load-to-use latency of just above 80 ns. While that’s good for a multi-megabyte GPU cache, it’s close to memory latency for CPUs. In the end, the 6900 XT was averaging 28.7 billion box tests and 3.21 billion triangle tests per second during that call, despite being underclocked to 1800 MHz. AMD says each CU can perform four box tests or one triangle test per cycle, so RT core utilization could be anywhere from 7.2% to 29% depending on whether the counters increment for every intersection test, or every node.