• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Sony files new patent for ‘Accelerated Ray Tracing’ - speculation that it could be more performant

assurdum

Banned
Me revisiting this thread hoping to get some more info on the RT patent:

Surprised Fire GIF
It's a .... patent. People argued it's about a new hardware but initially personally I though it was more about save performance with smart optimization. But it's not that clear at all in that specific part.
 
Please don't take away my 3D audio...
The thread dedicated to 3D audio is already reserved by the system, they won't take away that. But there is another Tempest thread available for devs to do whatever they want. They could in theory use it to assist some specific task, even graphical tasks. But I doubt it's what the Cerny patent is all about which seems some RT customizations for PS5 Pro.
 

winjer

Gold Member
Navi tech is practically designed by Cerny, smartass. Go to watch Road to the ps5.
There was an agreement between AMD and Cerny. He designed the Navi tech in exchange he given the patent to AMD for free and AMD provided to him all the resources necessary to design it without any costs. But obviously you don't knew lol.

RDNA2 was made by AMD, to match nVidia's Turing. And even some features from Pascal, like support for DP4A.
DX12U is designed around what Turing brought forward. Like Ray-tracing, mesh shaders, variable rate shading tier 2, sampler feedback.
Cerny and Sony definitely gave some input for RDNA2. But most of it was just AMD trying to catch up to the market leader in graphics: nVidia.
 
Remember this is the same guy that was so adamant the PS5 didn't have hardware Ray Tracing.

avuqpPw.jpg

Also, there are multiple thing within a Compute Unit that are named with unit that doesn't have a separate dedicated block.
- Scalar Units
- Texture Filler Units
- Texture Mapping Units

In a patent you have to word your document in a way that you cover all basis from someone coping and wording it different.

If there is going to be a PS5 Pro, it would use the same Zen 2 CPU and RDNA 2 GPU (probably with an additional Shader Engine) similarly to how the PS4 Pro uses the same CPU and GPU with additional Compute Units.

Which means RT will be the same.
byrwW7c.jpg


Also, the PS6 isn't releasing until 2027 at least with these chip shortages.
The hardware from AMD that will be ready around that time probably isn't near being finalized and that's if development even started.

So how can this patent be not related to PS5?
Battaglia knows Jack shit
For anything about PS5 information, particularly about RT, do no listen to this guy. He is constantly spreading disinformation or FUD about PS5 since before the release of those consoles.
He's a well known PS5 Hater. Don't ever listen to this clown.
 

Elios83

Member
I really hope that Mark can do some magic at the API level so that more developers can implement ray traced reflections at good frame rates like Insomniac did.
 

Fafalada

Fafracer forever
RDNA 2 does BVH calculations on the general compute units.
Minor quibble regarding terminology her.
Once you take out box->ray, and triangle->ray intersections (which RDNA2 accelerates), 99% of remaining work in BVH traversal is not 'calculations' but conditional flow and data-movement. Which is something that GPU shaders are poor at doing, and why alternatives to running it on general-compute units are being explored.
Ie. it's taking something that's not calculation heavy away from units designed for calculation heavy workloads.

But it's not that clear at all in that specific part.
The embodiments described are all in hardware - so it seems pretty clear unless I've misread some of it.
 
Solid speculation, and all good point.

I’m a curious how much use tempest is actually getting right now outside of 1st party. Possible going cold most of the time.

No way to really know without developers breaking down the production of their titles. But few of them do that AFAIK. You would probably need to secure a developer's kit and run some build of game code on there and check the debugger and profiling software, to see what each component is actually doing (and its load of activity) in a given frame.

The problem is tempest has no direct access to CU caches (L0, L1 or even L2), as it can only access main ram (GDDR6).

Right, that's another issue. Means there's no way for it to snoop the CPU caches either. All of this would contribute to added latency and we don't even know if the cache sizes for Tempest have been increased or decreased compared to a typical CU in PS5's GPU. It may not even have a L2$ since that is shared among the Shader Arrays in the GPU.
 
Last edited:
No way to really know without developers breaking down the production of their titles. But few of them do that AFAIK. You would probably need to secure a developer's kit and run some build of game code on there and check the debugger and profiling software, to see what each component is actually doing (and its load of activity) in a given frame.



Right, that's another issue. Means there's no way for it to snoop the CPU caches either. All of this would contribute to added latency and we don't even know if the cache sizes for Tempest have been increased or decreased compared to a typical CU in PS5's GPU. It may not even have a L2$ since that is shared among the Shader Arrays in the GPU.
Tempest has no cache at all. It's DMA, Direct Memory Access, memory being GDDR6 ram. It has advantages though, like being able to use ~100% of SIMD and getting direct access to all main ram (albeit with high GDDR6 latency). It's really a co-processor probably sitting outside the GPU part and is actually labelled as such internally in the PC database.
13eb Ariel HD Audio Coprocessor
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
No way to really know without developers breaking down the production of their titles. But few of them do that AFAIK. You would probably need to secure a developer's kit and run some build of game code on there and check the debugger and profiling software, to see what each component is actually doing (and its load of activity) in a given frame.



Right, that's another issue. Means there's no way for it to snoop the CPU caches either. All of this would contribute to added latency and we don't even know if the cache sizes for Tempest have been increased or decreased compared to a typical CU in PS5's GPU. It may not even have a L2$ since that is shared among the Shader Arrays in the GPU.

I think it would have local memory (SPU’s like does not mean doing loads straight from external RAM, but handling data fetching into local working memory manually).
 
Last edited:

Loxus

Member
This is for PS5pro/PS6.

Mark Cerny said PS5 wasn’t designed with RT as a focus, next bit of hardware will.
After reading this from the patent,
12. A graphic processing unit (GPU) comprising:
at least one processor core adapted to execute a software-implemented shader; and
at least one hardware-implemented ray tracing unit (RTU) separate from the processor core...


One would assume the RTU is some kind of new unit not present in the PS5.

But first, you have to ask yourself what is a processor core.
The only thing a processor core would be on AMD GPU is the ALU.

So understanding this, technically the RT on the PS5 is separate from the ALU.

On PS5 within a CU, you have 4 ALUs (SP) and 2 Interaction Engines (RA).
DLQ34pH.png


In the patent, you can see 4 Processor Cores (ALU) and 2 Interaction Engines (RA).
CRsC50k.png


From Road to PS5
"The CUs contain a new specialized unit called the Intersection Engine which can calculate the intersection of rays with boxes and triangles."

That new specialized unit is obviously the Ray Tracing Unit (RTU), which is also called the Intersection Engine.



I think we misuse dedicated when it comes to AMD and Nvidia.

Ampere GA102
XiSl0Ug.png

Each Texture Processor Cluster (TPC), contains 2 Streaming Microprocessors (SM). Each SM contains (ALU, TMU, RT, Tensor)

RDNA 2 GFX1030
9Cp5Wu6.png

Each Shader Array (SA), contains 5 Work Groups (WGP). Each WGP contains (ALU, TMU, RT). Only thing missing is the Tensor Cores.

AMD RT is just as dedicated as Nvidia's RT, Just Nvidia's is more performant.
 
Last edited:

ChiefDada

Gold Member
If only amd had better raytracing tech instead of just doing shadows like on cyberdrunk 2077.

I don't expect PS5 ray tracing performance to compete with the likes of Nvidia, but I do believe we will see the PS5 achieve ray tracing performance that we otherwise wouldn't expect from similarly spec'd PC GPUs (6700xt). Mainly because of the way Cerny has discussed PS5 ray tracing.

1. In "Road to PS5", Cerny warns developers to keep memory costs in mind when implementing RT. But what he sad afterwards is interesting:

I'm thinking it'll take less than a million rays a second to have a big impact on audio that should be enough for audio Occlusion and some Reverb calculations.

With a bit more of the GPU invested in Ray-Tracing it should be possible to do some very nice Global Illumination.

Having said that adding Ray-Traced shadows and reflections to a traditional graphics engine could easily take hundreds of millions of rays a second and full Ray-Tracing could take billions.

How far can we go? I'm starting to get quite bullish I've already seen a PlayStation 5 title that's successfully using Ray-Tracing based Reflections in complex animated scenes with only modest costs.

I don't think Sony 1st party will be using a "traditional" engine in comparison to 3rd party developers. Instead, their engines will be highly optimized to leverage the I/O, thus eliminating those memory constraints more than any other console or PC could ever hope to.

His comment on being bullish on PS5 RT capabilities was also insightful. I'm assuming he's referring to Ratchet when discussing the RT reflections in complex animated scene; he says the costs were modest (i.e. minimal). Keep in mind, Cerny is notorious for underselling PS5 abilities. If he is expecting even better RT applications, then we should either expect full ray tracing (doubtful) or a combination of different RT such as global illumination, shadows, and reflections, which would be both amazing and WAY above my expectations.

oHDPsu1.jpg
 

GAF machine

Member
The patent application is new to us, but it isn't "new". It has a 20th of August 2020 priority date (seen at "(30) Priority: 20.08.2020"). It was published this month because the 18 month confidentiality period has run its course (i.e. from 20.08.2020 to 23.02.2022).

The patent application was filed to protect PS5-related methods of using RTUs (i.e. 'Ray Accelerators', which Cerny called "Intersection Engines") to calculate ray intersections, while compute shaders asynchronously calculate the transparency or opacity of objects those rays intersect.

If we line up what Cerny explained in (57) of the patent application with what he explained during his 'Road to PS5' keynote, it's essentially a match:

(57) A graphics processing unit (GPU) includes one or more processor cores adapted to execute a software-implemented shader program, and one or more hardware-implemented ray tracing units (RTU) adapted to traverse an acceleration structure to calculate intersections of rays with bounding volumes and graphics primitives asynchronously with shader operation. The RTU implements traversal logic to traverse the acceleration structure including transformation of rays as needed to account for variations in coordinate space between levels, stack management, and other tasks to relieve burden on the shader, communicating intersections to the shader which then calculates whether the intersection hit a transparent or opaque portion of the object intersected. Thus, one or more processing cores within the GPU perform accelerated ray tracing by offloading aspects of processing to the RTU, which traverses the acceleration structure within which the 3D environment is represented. -- Mark Cerny

"Another major new feature of our custom RDNA2 based GPU is Ray-Tracing. Using the same strategy as AMD's upcoming PC GPUs. The CUs contain a new specialized unit called the Intersection Engine which can calculate the intersection of rays with boxes and triangles. To use the Intersection Engine first you build what is called an acceleration structure. It's data in RAM that contains all of your geometry. There's a specific set of formats you can use their variations on the same BVH concept. Then in your shader program you use a new instruction that asks the intersection engine to check a ray against the BVH. While the Intersection Engine is processing the requested ray triangle or ray box intersections the shaders are free to do other work." -- Mark cerny
 
Last edited:

Ezekiel_

Banned
The patent application is new to us, but it isn't new. It has a 20th of August 2020 priority date (seen at "(30) Priority: 20.08.2020", this is the only date that matters). It was published this month because the 18 month confidentiality period has run its course (i.e. from 20.08.2020 to 23.02.2022).

The patent application was filed to protect PS5-specific methods of using RTUs (aka "Ray Accelerators", which Cerny called "Intersection Engines") to calculate ray intersections, while asynchronous shaders (timestamped) calculate the transparency or opaqueness of surfaces those rays intersect.

If we line up what Cerny explained in (57) of the patent with what he explained during his keynote, it's the same thing more or less:
So it is for PS5 then.
 

M1chl

Currently Gif and Meme Champion
If you feel like the results are advantageous then surely you'd want to keep for yourself
Nah fuck that, make it open source obviously, like what is Guerilla doing now. Open source is way to go, helps you in a long run.

however patent due to patent abuse makes perfect sense.
 

Hobbygaming

has been asked to post in 'Grounded' mode.
Nah fuck that, make it open source obviously, like what is Guerilla doing now. Open source is way to go, helps you in a long run.

however patent due to patent abuse makes perfect sense.
There would be less need to patent a physics system like Jolt. It was used because of their issues Guerrilla had with streaming assets and CPU utilization

Jolt is the reason they were able to do flying mounts this time I'm glad they started it
 
Last edited:

Lysandros

Member
The PS5 matching the competition while it has a 20% smaller APU, less memory bandwidth, and has more outdated hardware features like its RDNA1 RB units vs the XSX's RDNA2 RB+ is a tiny bit impressive, imo.
Doing more with less isn't always easy.
I think the less memory bandwidth part is more an assumption than an established fact. As to RBEs PS5 is actually doing more with 'more' hardware since it has 256 depth ROPs compared XSX which only has 128. This is one of the main advantages of the machine over its counter part in matter of pixel/fillrate throughput. In this context 'outdated' is a strong word i think, i see it more like a design choice.
 

Tripolygon

Banned
After reading this from the patent,
12. A graphic processing unit (GPU) comprising:
at least one processor core adapted to execute a software-implemented shader; and
at least one hardware-implemented ray tracing unit (RTU) separate from the processor core...


One would assume the RTU is some kind of new unit not present in the PS5.

But first, you have to ask yourself what is a processor core.
The only thing a processor core would be on AMD GPU is the ALU.

So understanding this, technically the RT on the PS5 is separate from the ALU.

On PS5 within a CU, you have 4 ALI's (SP) and 2 Interaction Engines (RA).


In the patent, you can see 4 Processor Cores (ALU) and 2 Interaction Engines (RA).


From Road to PS5
"The CUs contain a new specialized unit called the Intersection Engine which can calculate the intersection of rays with boxes and triangles."

That new specialized unit is obviously the Ray Tracing Unit (RTU), which is also called the Intersection Engine.



I think we misuse dedicated when it comes to AMD and Nvidia.

Ampere GA102

Each Texture Processor Cluster (TPC), contains 2 Streaming Microprocessors (SM). Each SM contains (ALU, TMU, RT, Tensor)

RDNA 2 GFX1030

Each Shader Array (SA), contains 5 Work Groups (WGP). Each WGP contains (ALU, TMU, RT). Only thing missing is the Tensor Cores.

AMD RT is just as dedicated as Nvidia's RT, Just Nvidia's is more performant.
You knocked this out the fucking park mate. Well done. I had a whole write-up yesterday but just couldn't find those pics to show as a visual representation so I didn't post it.

In a broad sense, an AMD CU is like an Nvidia SM, they contain fixed-function "processors" or "hardware units" that do specific tasks which include specific hardware for ray intersection transversal acceleration. AMD calls it Intersection Engine and Nvidia calls it RT Core.
 

Boglin

Member
I think the less memory bandwidth part is more an assumption than an established fact. As to RBEs PS5 is actually doing more with 'more' hardware since it has 256 depth ROPs compared XSX which only has 128. This is one of the main advantages of the machine over its counter part in matter of pixel/fillrate throughput. In this context 'outdated' is a strong word i think, i see it more like a design choice.
I agree with you and have made the same arguments in the past. People tend to like to view their "rivals" in worst light possible though so I was just leaning into that mentality.
When a person argues that the PS5 hardware is weak then it only seems more impressive when it outperforms its competition.
 

ChiefDada

Gold Member
Smart tech guys…
So is it more performant or nah?

Lol, I'm still trying to figure out if this is hardware or software for PS5 or Ps5 Pro/PS6 People seem to be split right down the middle. Are there really no other experts in GAF other than that one guy from the military?

Nvm
 
Last edited:

Shmunter

Member
Tempest has no cache at all. It's DMA, Direct Memory Access, memory being GDDR6 ram. It has advantages though, like being able to use ~100% of SIMD and getting direct access to all main ram (albeit with high GDDR6 latency). It's really a co-processor probably sitting outside the GPU part and is actually labelled as such internally in the PC database.

If we look at examples like Metro Exodus, they build up the rays over time. There's is a vid, maybe a gif I don't have handy; where entering a room shows the global illumination lag behind and literally generate in front of you. Offloading this sort of process seems feasible,
 
After reading this from the patent,
12. A graphic processing unit (GPU) comprising:
at least one processor core adapted to execute a software-implemented shader; and
at least one hardware-implemented ray tracing unit (RTU) separate from the processor core...


One would assume the RTU is some kind of new unit not present in the PS5.

But first, you have to ask yourself what is a processor core.
The only thing a processor core would be on AMD GPU is the ALU.

So understanding this, technically the RT on the PS5 is separate from the ALU.

On PS5 within a CU, you have 4 ALI's (SP) and 2 Interaction Engines (RA).
DLQ34pH.png


In the patent, you can see 4 Processor Cores (ALU) and 2 Interaction Engines (RA).
CRsC50k.png


From Road to PS5
"The CUs contain a new specialized unit called the Intersection Engine which can calculate the intersection of rays with boxes and triangles."

That new specialized unit is obviously the Ray Tracing Unit (RTU), which is also called the Intersection Engine.



I think we misuse dedicated when it comes to AMD and Nvidia.

Ampere GA102
XiSl0Ug.png

Each Texture Processor Cluster (TPC), contains 2 Streaming Microprocessors (SM). Each SM contains (ALU, TMU, RT, Tensor)

RDNA 2 GFX1030
9Cp5Wu6.png

Each Shader Array (SA), contains 5 Work Groups (WGP). Each WGP contains (ALU, TMU, RT). Only thing missing is the Tensor Cores.

AMD RT is just as dedicated as Nvidia's RT, Just Nvidia's is more performant.

yeah even on a geometric / floor plan level - meaning where the RT hardware is located relatively to the rest of the CUs/SMs components - the integrations are kinda similar.
the image that people got from nvidias marketing material that the RT core is a kind of seperated island doing it's own thing, is totally screwed.

crazy to think, that we would not have this whole conversation if AMDs marketing just called those things RT-Core also (not sure if the term was trademarked though)
 

Panajev2001a

GAF's Pleasant Genius
You knocked this out the fucking park mate. Well done. I had a whole write-up yesterday but just couldn't find those pics to show as a visual representation so I didn't post it.

In a broad sense, an AMD CU is like an Nvidia SM, they contain fixed-function "processors" or "hardware units" that do specific tasks which include specific hardware for ray intersection transversal acceleration. AMD calls it Intersection Engine and Nvidia calls it RT Core.

As F Fafalada and others were pointing out, the BVH traversal in standard RDNA2 is happening in compute shaders (unlike for modern nVIDIA cards) and not in the RTU / IE or whatever we currently have inside a DCU on XSX|S and PS5… so either our understanding there is incomplete, the parent is partially describing a future console / theoretical unit, or the patent language is being a bit funny in what we think and how we think the traversal is HW accelerated.
 

Petopia

Banned
I don't expect PS5 ray tracing performance to compete with the likes of Nvidia, but I do believe we will see the PS5 achieve ray tracing performance that we otherwise wouldn't expect from similarly spec'd PC GPUs (6700xt). Mainly because of the way Cerny has discussed PS5 ray tracing.

1. In "Road to PS5", Cerny warns developers to keep memory costs in mind when implementing RT. But what he sad afterwards is interesting:



I don't think Sony 1st party will be using a "traditional" engine in comparison to 3rd party developers. Instead, their engines will be highly optimized to leverage the I/O, thus eliminating those memory constraints more than any other console or PC could ever hope to.

His comment on being bullish on PS5 RT capabilities was also insightful. I'm assuming he's referring to Ratchet when discussing the RT reflections in complex animated scene; he says the costs were modest (i.e. minimal). Keep in mind, Cerny is notorious for underselling PS5 abilities. If he is expecting even better RT applications, then we should either expect full ray tracing (doubtful) or a combination of different RT such as global illumination, shadows, and reflections, which would be both amazing and WAY above my expectations.

oHDPsu1.jpg
Then i dont know why have raytracing at all did u know how bad it was during the 20 series of nvidia cards.
 

Loxus

Member
As F Fafalada and others were pointing out, the BVH traversal in standard RDNA2 is happening in compute shaders (unlike for modern nVIDIA cards) and not in the RTU / IE or whatever we currently have inside a DCU on XSX|S and PS5… so either our understanding there is incomplete, the parent is partially describing a future console / theoretical unit, or the patent language is being a bit funny in what we think and how we think the traversal is HW accelerated.
Nope, it's like I said.
JNVRQrv.png

RT is within the SM on Nivdia similarly to how RT is within the WGP on AMD.

This die shot from the 3080 shows where the RT is somewhat located. It's supposed to be within the SM next to the TMU, which is why Nemez labeled it with a question mark.
W29SKlZ.jpg


Here, you can see where it's located on the 6900XT.
YZlxl9W.jpg
 
Last edited:

winjer

Gold Member
From AMD:
As you can see, RDNA2 only accelerates Ray/box and ray/tri-testers in hardware.
nVidia's Turing and Ampere also do Bounding Volume Hierarchy (BVH) Processing in Hardware
And it does BVH Processing with Coherency Sorting in Software, supposedly with CUDA.

 
Last edited:

Panajev2001a

GAF's Pleasant Genius
From AMD:
As you can see, RDNA2 only accelerates Ray/box and ray/tri-testers in hardware.
nVidia's Turing and Ampere also do Bounding Volume Hierarchy (BVH) Processing in Hardware
And it does BVH Processing with Coherency Sorting in Software, supposedly with CUDA.



This is what I was talking about Loxus Loxus , not where the units are located.
 

Loxus

Member
This is what I was talking about Loxus Loxus , not where the units are located.
Many people believe this patent isn't related to the PS5 because of the word RTU.
They think it's some kind of new unit outside of the CU similarly to Nvidia.

But it turns out Nvidia RT is also within the SM, which is similar to a CU.

What your trying to explain has nothing to do with the topic at hand.
 

winjer

Gold Member
Many people believe this patent isn't related to the PS5 because of the word RTU.
They think it's some kind of new unit outside of the CU similarly to Nvidia.

But it turns out Nvidia RT is also within the SM, which is similar to a CU.

What your trying to explain has nothing to do with the topic at hand.

It doesn't matter where the unit IS?
What matters is what it does. And in that sense, RDNA2 does less RT than Turing and Ampere.
 
For what I understood seeing comments on other places, that is being investigated is a way to improve efficiently in two ways.
1 is that I think when the GPU is doing the RT calculations it isn't doing much more. The way the architecture works it can be forced to do other calculations in parallel with the RT calculations, making the rendering faster. But coordinating all this work is hard, that's why the necessity of investigations like this, not to know that it can be done, but how it can be done in the best manner.
2 is trying to decrease the number of RT calculations, trying to find when the RT don't need to be calculated further around some objects because the resulting information would be useless.


This is more about how to better utilize the RDNA2 architecture, so I expect Microsoft is doing the same investigations and whoever finds I way first will show the way to the other.


EDIT: oh, the SeX has a bit more L2 on the GPU right? I wonder if this would help it perform much faster RT than the PS5. I ask this because the new RDNA2 iGPU on Rembrandt APU have a bit more L2 on the GPU and this made the 12CU iGPU be faster than the 16CU 6500XT dGPU. It's one reason why Nvidia GPUs are faster at RT, a lot more L2.

0oOft0k.jpg
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
Many people believe this patent isn't related to the PS5 because of the word RTU.
They think it's some kind of new unit outside of the CU similarly to Nvidia.

But it turns out Nvidia RT is also within the SM, which is similar to a CU.

What your trying to explain has nothing to do with the topic at hand.

It does, I was saying that it might apply to something beyond PS5 (too) not because of the name of the unit but the capabilities associated to the unit. Namely accelerating BVH traversal or requiring compute shaders to do so.
 
Why do people keep saying this? He took basically of the shelf amd components, a fast ssd, and tweaked it a bit to keep the (ps5) machine cheap. We have seen almost no difference in performance between both next gen machines, aside from 1 game with fast loading levels that didn't really matter.
Whopdee doo.

That said, if he's come up with something that significantly reduces the overhead of rt, that might be something that affects performance in a larger way.
The man made Marble Madness at age 18. What the fuck were you doing at that age?
 

ChiefDada

Gold Member
Then i dont know why have raytracing at all did u know how bad it was during the 20 series of nvidia cards.

Because you can still get great rt results with the hardware. 1st party optimization is so underrated. We will see amazing things we didn't think possible. And I was referring to 30 series.
 
Top Bottom