• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.
  • The Politics forum has been nuked. Please do not bring political discussion to the rest of the site, or you will be removed. Thanks.

Analysis Hardware XSX Machine Learning. The real deal.

Xplainin

Banned
Apr 30, 2020
700
1,528
545
After looking in to the XSX Machine Learning abilities, this is really going to be a big deal for the series X this next gen, on par with Nvidias DLSS.

Nvidia included Tensor Cores into their GPUs. Tensor Cores added new INT8 and INT4 precision modes for inferencing workloads that can tolerate quantization and don’t require FP16 precision.
Microsoft also added specific hardware to their GPU to allow it to do INT8 and INT4 calculations. The additions that MS made are the equal of the Tensor cores of Nvidia.
The result is that Series X offers 12tflops of 32-bit precision, 24tflops of 16-bit precision, 49 TOPS for 8-bit integer operations and 97 TOPS for 4-bit integer operations.
The Nvidia 2080ti has 14.2tflops of 32-bit precision, 28.4tflops of 16-bit precision, 56.8 TOPS for 8-bit and 113.8 TOPS in 4-bit (113.8 Tensor flops in Nvidia speak).

The XSX has the same ML abilities as Nvidia DLSS cards do.

Now there is more to DLSS than the ability to do calculations.
"The DLSS team first extracts many aliased frames from the target game, and then for each one we generate a matching “perfect frame” using either super-sampling or accumulation rendering. These paired frames are fed to NVIDIA’s supercomputer. The supercomputer trains the DLSS model to recognize aliased inputs and generate high quality anti-aliased images that match the “perfect frame” as closely as possible. We then repeat the process, but this time we train the model to generate additional pixels rather than applying AA. This has the effect of increasing the resolution of the input. Combining both techniques enables the GPU to render the full monitor resolution at higher frame rates."

Microsoft also has access to even more powerful super computers and AI than Nvidia do, and as such will have no problem to match or exceed what Nvidia can with DLSS.
Microsoft has also developed it's own API for the console end of it, namely Direct ML.

The addition of ML in the XSX is being overlooked as the game changer for Microsoft that it is.

We have seen the results that Nvidia have got with DLSS, and the same can be expected on the XSX.
On top of That MS have already said that ML is going to allow them to add HDR to back compat games that never shipped with it.
I cannot wait to see what else MS can do with this tech.
 
Last edited:

T-Cake

Member
May 28, 2019
1,621
1,995
480
I don't get why they've been so quiet about this though. If it will help with 4K rendering then I don't know why they are not shouting about it. They could in theory use it to generate new textures for older games on the fly, as Sony mentioned previously.
 
Last edited:

Xplainin

Banned
Apr 30, 2020
700
1,528
545
I don't get why they've been so quiet about this though. If it will help with 4K rendering then I don't know why they are not shouting about it. They could in theory use it to generate new textures for older games on the fly, as Sony mentioned previously.
I guess they have their messaging primed at this point on a certain number of things, and dont want to water it down over a heap of different things. At the moment it seems to be Velocity Architecture orientated.
 

Thirty7ven

Sony make cringe trainers.
Apr 13, 2020
4,322
16,645
660
Could you start naming your thread titles correctly, along with the classifiers, so that I can avoid them instead of coming in thinking this is anything other than personal opinion?

No XSX doesn't have tensor core like hardware. It's the same as when Sony said FP16 gave the PS4 Pro 8 Tflop. Direct ML isn't a Xbox thing, it's a Microsoft thing.
 
Last edited:

billyxci

13 year old console warrior. Put me on ignore.
Aug 3, 2014
13,819
8,748
1,040
Machine learning. :messenger_neutral:

Machine Learning! :messenger_smirking:

Machine LEARNING!! :messenger_smiling_with_eyes:

MACHINE LEARNING!!!:messenger_grinning_smiling:

MaChInE LeArNiNg!!11!!!! :messenger_beaming::messenger_tears_of_joy::messenger_sunglasses::messenger_poop::messenger_ok::messenger_clapping:
 
Last edited:

Dampf

Member
Jun 28, 2020
519
1,097
480
Your INT-figures for the RTX series are dead wrong, because you just account the raw performance without tensor cores, and DLSS uses these tensor cores to great extent. In reality, Series X has half of the INT performance of a 2060.

2080 Ti has over 500 TOPS at INT4 and 250 at INT8. A RTX 2060 has 200 INT TOPS at INT4 and 100 INT TOPS at INT8.

RDNA2 also does these integr operation in shader hardware, while Nvidia has the dedicated tensor cores for that task.
 
Last edited:

Xplainin

Banned
Apr 30, 2020
700
1,528
545
Your INT-figures for the RTX series are dead wrong, because you just account the raw performance without tensor cores, and DLSS uses these tensor cores to great extent. In reality, Series X has half of the INT performance of a 2060.

2080 Ti has over 500 TOPS at INT4 and 250 at INT8.

RDNA2 also does these integr operation in shader hardware, while Nvidia has the dedicated tensor cores for that task.

The GeForce RTX 2080 Ti Founders Edition GPU delivers the following exceptional computational performance:

▶ 14.2 TFLOPS1 of peak single precision (FP32) performance

▶ 28.5 TFLOPS1 of peak half precision (FP16) performance

▶ 14.2 TIPS1 concurrent with FP, through independent integer execution units

▶ 113.8 Tensor TFLOPS
 
  • Like
Reactions: Jaguar Victory

Memorabilia

Member
Oct 25, 2013
4,906
770
595
I don't get why they've been so quiet about this though. If it will help with 4K rendering then I don't know why they are not shouting about it. They could in theory use it to generate new textures for older games on the fly, as Sony mentioned previously.

For the same reason it's difficult to market 1440p + DLSS looking as good or better than native 4k. Even trying to muddies the waters and risks people writing it off as incapable of true 4k or equating it to checkerboarding, etc. Better to stick with "4k period" on the marketing material to keep it simple and trickle in alternative rendering techniques over time under the radar.
 
  • Thoughtful
Reactions: T-Cake

Dampf

Member
Jun 28, 2020
519
1,097
480

The GeForce RTX 2080 Ti Founders Edition GPU delivers the following exceptional computational performance:

▶ 14.2 TFLOPS1 of peak single precision (FP32) performance

▶ 28.5 TFLOPS1 of peak half precision (FP16) performance

▶ 14.2 TIPS1 concurrent with FP, through independent integer execution units

▶ 113.8 Tensor TFLOPS
TFLOPS, not TOPS. Microsoft used Integer to calculate machine learning performance of the Series X, so you should use the same metric.

When we are talking about Integer-Performance, it's calculated in TOPS. Floating-Point is FLOPs.


Nvidia claims that TU102’s Tensor cores deliver up to 114 TFLOPS for FP16 operations, 228 TOPS of INT8, and 455 TOPS INT4. The FP16 multiply with FP32 accumulation operations used for deep learning training are supported as well, but at half-speed compared to FP16 accumulate.
 
Last edited:

Dampf

Member
Jun 28, 2020
519
1,097
480
It doesn't even come close...
And this is why we won't see a DLSS type reconstruction methode with the next gen consoles.

Sure, they are capable of running the neural network, but it would just cost too much performance from the shader cores for it to be worthwile. The performance boost would be much lower or worse, it would even demand more performance than just running at native resolution.

Still, it's great MS added INT capabilities. We will see machine learning in games a lot more in the future, can be used for all kinds of stuff not just reconstruction. Examples: physics, AI, behaviour of NPCs including language processing, texture upscaling, simulations and more.
 
Last edited:
  • Like
Reactions: Alder and Remij

Hendrick's

Member
Jan 7, 2014
9,451
16,576
995
I can't wait to see games start to take advantage of this. It will be the only way we will see any sort of RT global illumination.
 
Last edited:

Bernkastel

Member
Mar 8, 2018
7,900
16,925
960
Another big game changer is DirectML.
DirectML – Xbox Series X supports Machine Learning for games with DirectML, a component of DirectX. DirectML leverages unprecedented hardware performance in a console, benefiting from over 24 TFLOPS of 16-bit float performance and over 97 TOPS (trillion operations per second) of 4-bit integer performance on Xbox Series X. Machine Learning can improve a wide range of areas, such as making NPCs much smarter, providing vastly more lifelike animation, and greatly improving visual quality.
DirectML was part of Nvidia's SIGGRAPH 2018 tech talk
Forza Horizon 3 demo at 16:06
At 19:06


At 19:28
Performance comparison at 22:38


We couldn’t write a graphics blog without calling out how DNNs(Deeo Neural Networks) can help improve the visual quality and performance of games. Take a close look at what happens when NVIDIA uses ML to up-sample this photo of a car by 4x. At first the images will look quite similar, but when you zoom in close, you’ll notice that the car on the right has some jagged edges, or aliasing, and the one using ML on the left is crisper. Models can learn to determine the best color for each pixel to benefit small images that are upscaled, or images that are zoomed in on. You may have had the experience when playing a game where objects look great from afar, but when you move close to a wall or hide behind a crate, things start to look a bit blocky or fuzzy – with ML we may see the end of those types of experiences.

PS : DirectML is the part of WindowsML meant for gaming.


Machine learning is a feature we've discussed in the past, most notably with Nvidia's Turing architecture and the firm's DLSS AI upscaling. The RDNA 2 architecture used in Series X does not have tensor core equivalents, but Microsoft and AMD have come up with a novel, efficient solution based on the standard shader cores. With over 12 teraflops of FP32 compute, RDNA 2 also allows for double that with FP16 (yes, rapid-packed math is back). However, machine learning workloads often use much lower precision than that, so the RDNA 2 shaders were adapted still further.
"We knew that many inference algorithms need only 8-bit and 4-bit integer positions for weights and the math operations involving those weights comprise the bulk of the performance overhead for those algorithms," says Andrew Goossen. "So we added special hardware support for this specific scenario. The result is that Series X offers 49 TOPS for 8-bit integer operations and 97 TOPS for 4-bit integer operations. Note that the weights are integers, so those are TOPS and not TFLOPs. The net result is that Series X offers unparalleled intelligence for machine learning."
...
However, the big innovation is clearly the addition of hardware accelerated ray tracing. This is hugely exciting and at Digital Foundry, we've been tracking the evolution of this new technology via the DXR and Vulkan-powered games we've seen running on Nvidia's RTX cards and the console implementation of RT is more ambitious than we believed possible.
RDNA 2 fully supports the latest DXR Tier 1.1 standard, and similar to the Turing RT core, it accelerates the creation of the so-called BVH structures required to accurately map ray traversal and intersections, tested against geometry. In short, in the same way that light 'bounces' in the real world, the hardware acceleration for ray tracing maps traversal and intersection of light at a rate of up to 380 billion intersections per second.
"Without hardware acceleration, this work could have been done in the shaders, but would have consumed over 13 TFLOPs alone," says Andrew Goossen. "For the Series X, this work is offloaded onto dedicated hardware and the shader can continue to run in parallel with full performance. In other words, Series X can effectively tap the equivalent of well over 25 TFLOPs of performance while ray tracing."
It is important to put this into context, however. While workloads can operate at the same time, calculating the BVH structure is only one component of the ray tracing procedure. The standard shaders in the GPU also need to pull their weight, so elements like the lighting calculations are still run on the standard shaders, with the DXR API adding new stages to the GPU pipeline to carry out this task efficiently. So yes, RT is typically associated with a drop in performance and that carries across to the console implementation, but with the benefits of a fixed console design, we should expect to see developers optimise more aggressively and also to innovate. The good news is that Microsoft allows low-level access to the RT acceleration hardware.
"[Series X] goes even further than the PC standard in offering more power and flexibility to developers," reveals Goossen. "In grand console tradition, we also support direct to the metal programming including support for offline BVH construction and optimisation. With these building blocks, we expect ray tracing to be an area of incredible visuals and great innovation by developers over the course of the console's lifetime."
The proof of the pudding is in the tasting, of course. During our time at the Redmond campus, Microsoft demonstrated how fully featured the console's RT features are by rolling out a very early Xbox Series X Minecraft DXR tech demo, which is based on the Minecraft RTX code we saw back at Gamescom last year and looks very similar, despite running on a very different GPU. This suggests an irony of sorts: base Nvidia code adapted and running on AMD-sourced ray tracing hardware within Series X. What's impressive about this is that it's fully path-traced. Aside from the skybox and the moon in the demo we saw, there are no rasterised elements whatsoever. The entire presentation is ray traced, demonstrating that despite the constraints of having to deliver RT in a console with a limited power and silicon budget, Xbox Series X is capable of delivering the most ambitious, most striking implementation of ray tracing - and it does so in real time.
Minecraft DXR is an ambitious statement - total ray tracing, if you like - but we should expect to see the technology used in very different ways. "We're super excited for DXR and the hardware ray tracing support," says Mike Rayner, technical director of the Coalition and Gears 5. "We have some compute-based ray tracing in Gears 5, we have ray traced shadows and the [new] screen-space global illumination is a form of ray traced screen-based GI and so, we're interested in how the ray tracing hardware can be used to take techniques like this and then move them out to utilising the DXR cores.
"I think, for us, the way that we've been thinking about it is as we look forward, we think hybrid rendering between traditional rendering techniques and then using DXR - whether for shadows or GI or adding reflections - are things that can really augment the scene and [we can] use all of that chip to get the best final visual quality."
...
Microsoft ATG principal software engineer Claude Marais showed us how a machine learning algorithm using Gears 5's state-of-the-art HDR implementation is able to infer a full HDR implementation from SDR content on any back-compat title. It's not fake HDR either, Marais rolled out a heatmap mode showing peak brightness for every on-screen element, clearly demonstrating that highlights were well beyond the SDR range.
Journalist: How hard is game development going to get for the next generation? For PlayStation 5 and Xbox Series X? The big problem in the past was when you had to switch to a new chip, like the Cell. It was a disaster. PlayStation 3 development was painful and slow. It took years and drove up costs. But since you’re on x86, it shouldn’t happen, right? A lot of those painful things go away because it’s just another faster PC. But what’s going to be hard? What’s the next bar that everybody is going to shoot for that’s going to give them a lot of pain, because they’re trying to shoot too high?
Gwertzman:
You were talking about machine learning and content generation. I think that’s going to be interesting. One of the studios inside Microsoft has been experimenting with using ML models for asset generation. It’s working scarily well. To the point where we’re looking at shipping really low-res textures and having ML models uprez the textures in real time. You can’t tell the difference between the hand-authored high-res texture and the machine-scaled-up low-res texture, to the point that you may as well ship the low-res texture and let the machine do it.
Journalist: Can you do that on the hardware without install time?
Gwertzman:
Not even install time. Run time.
Journalist: To clarify, you’re talking about real time, moving around the 3D space, level of detail style?
Gwertzman:
Like literally not having to ship massive 2K by 2K textures. You can ship tiny textures.
Journalist: Are you saying they’re generated on the fly as you move around the scene, or they’re generated ahead of time?
Gwertzman:
The textures are being uprezzed in real time.
Journalist: So you can fit on one blu-ray.
Gwertzman:
The download is way smaller, but there’s no appreciable difference in game quality. Think of it more like a magical compression technology. That’s really magical. It takes a huge R&D budget. I look at things like that and say — either this is the next hard thing to compete on, hiring data scientists for a game studio, or it’s a product opportunity. We could be providing technologies like this to everyone to level the playing field again.
Journalist: Where does the source data set for that come from? Do you take every texture from every game that ships under Microsoft Game Studios?
Gwertzman:
In this case, it only works by training the models on very specific sets. One genre of game. There’s no universal texture map. That would be kind of magical. It’s more like, if you train it on specific textures and then you — it works with those, but it wouldn’t work with a whole different set.
Journalist: So you still need an artist to create the original set.
Journalist: Are there any legal considerations around what you feed into the model?
Gwertzman:
It’s especially good for photorealism, because that adds tons of data. It may not work so well for a fantasy art style. But my point is that I think the fact that that’s a technology now — game development has always been hard in terms of the sheer number of disciplines you have to master. Art, physics, geography, UI, psychology, operant conditioning. All these things we have to master. Then we add backend services and latency and multiplayer, and that’s hard enough. Then we added microtransactions and economy management and running your own retail store inside your game. Now we’re adding data science and machine learning. The barrier seems to be getting higher and higher.
That’s where I come in. At heart, Microsoft is a productivity company. Our employee badge says on the back, the company mission is to help people achieve more. How do we help developers achieve more? That’s what we’re trying to figure out.
Use of Machine Learning in texture compression
Patent US20200105030A1 and US20190304138A1 describes the use of machine learning in texture compression or upscaling and reducing search space for real time texture compression.
Video games are experiencing problems with textures taking too much storage. Having a relatively large storage footprint effects the speed with which games can load textures. Block compression used by games at runtime to save memory, bandwidth and cache pressure have a fixed compression ratio. Other schemes present far better compression ratio but are not in a format directly usable by GPU. One method is, using a machine learning model the graphics hardware incompatible compressed textures(e.g., Machine Learning Image Compression, JPEG compression, wavelet compression etc.,) will be converted into hardware compatible compressed textures usable by GPU at runtime of the application. Another method relates to a computer readable medium storing instructions executable by a computing device, causing the computing device to access at runtime of an application, graphics hardware incompatible compressed textures in a format incompatible with GPU and using the instructions to convert them into hardware compatible compressed in run time. This will help in reducing input/output bandwidth and the actual size of game data.
 
Last edited:

M1chl

Currently Gif and Meme Champion
Dec 25, 2019
11,304
21,398
1,025
Prague, Czech Republic
I don't get why they've been so quiet about this though. If it will help with 4K rendering then I don't know why they are not shouting about it. They could in theory use it to generate new textures for older games on the fly, as Sony mentioned previously.
Because to push back on that Secret Sauce Drive nobody would believe what ML can do, including me...until I run Control with DLSS 2.0. That shit ain't suppose to work like that, it's too good.
 

Panajev2001a

GAF's Pleasant Genius
Jun 7, 2004
19,398
13,693
2,110
After looking in to the XSX Machine Learning abilities, this is really going to be a big deal for the series X this next gen, on par with Nvidias DLSS.

Nvidia included Tensor Cores into their GPUs. Tensor Cores added new INT8 and INT4 precision modes for inferencing workloads that can tolerate quantization and don’t require FP16 precision.
Microsoft also added specific hardware to their GPU to allow it to do INT8 and INT4 calculations. The additions that MS made are the equal of the Tensor cores of Nvidia.
The result is that Series X offers 12tflops of 32-bit precision, 24tflops of 16-bit precision, 49 TOPS for 8-bit integer operations and 97 TOPS for 4-bit integer operations.
The Nvidia 2080ti has 14.2tflops of 32-bit precision, 28.4tflops of 16-bit precision, 56.8 TOPS for 8-bit and 113.8 TOPS in 4-bit (113.8 Tensor flops in Nvidia speak).

The XSX has the same ML abilities as Nvidia DLSS cards do.

Now there is more to DLSS than the ability to do calculations.
"The DLSS team first extracts many aliased frames from the target game, and then for each one we generate a matching “perfect frame” using either super-sampling or accumulation rendering. These paired frames are fed to NVIDIA’s supercomputer. The supercomputer trains the DLSS model to recognize aliased inputs and generate high quality anti-aliased images that match the “perfect frame” as closely as possible. We then repeat the process, but this time we train the model to generate additional pixels rather than applying AA. This has the effect of increasing the resolution of the input. Combining both techniques enables the GPU to render the full monitor resolution at higher frame rates."

Microsoft also has access to even more powerful super computers and AI than Nvidia do, and as such will have no problem to match or exceed what Nvidia can with DLSS.
Microsoft has also developed it's own API for the console end of it, namely Direct ML.

The addition of ML in the XSX is being overlooked as the game changer for Microsoft that it is.

We have seen the results that Nvidia have got with DLSS, and the same can be expected on the XSX.
On top of That MS have already said that ML is going to allow them to add HDR to back compat games that never shipped with it.
I cannot wait to see what else MS can do with this tech.

It would be great if they added additional tensor cores, but the likely answer (and this is based on Xbox engineers’ interview to DF a while back) is that they extended the CU’s vector ALU’s to process 16x8 bits elements or 32x4 bits elements on top of 8x16 bits elements in parallel with FP16 RPM or 4x32 bits elements (usual 128 bits vector with four lanes).

This allows some computations with four and eight bits elements to run a lot faster inside the existing CU’s without requiring much extra HW (definitely a lot less HW than adding tensor core equivalents). This also means that if you dedicate that CU to run INT8/FP16 operations for ML that CU will not be used for rendering at the same time: I do not think you can add the ML TOPS numbers there to the 12 TFLOPS peak, but you should subtract them.

So, like with RPM for FP16 AMD added in Vega and PS4 Pro, XSX, and PS5 have picked up... MS announced that XSX further extended the RPM feature to process a higher number of smaller elements in parallel.

Edit: to provide context Xplainin Xplainin here is the DF interview:

The RDNA 2 architecture used in Series X does not have tensor core equivalents, but Microsoft and AMD have come up with a novel, efficient solution based on the standard shader cores. With over 12 teraflops of FP32 compute, RDNA 2 also allows for double that with FP16 (yes, rapid-packed math is back). However, machine learning workloads often use much lower precision than that, so the RDNA 2 shaders were adapted still further.

[...]

we added special hardware support for this specific scenario. The result is that Series X offers 49 TOPS for 8-bit integer operations and 97 TOPS for 4-bit integer operations.
 
Last edited:
  • Like
Reactions: GHG

bohrdom

Banned
Mar 19, 2020
246
258
250
ML wasn't a key aspect of RDNA 2. Microsoft added their own custom changes to get INT8 and INT4 ability.

If Sony made those same changes then they have yet to announce it.

It's a future looking feature but INT8 and INT4 ML models are still an active area of research. Most models deployed use FP16/32. The tensor cores on RTX cards are FP16 accelerators.
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
Jun 7, 2004
19,398
13,693
2,110
It's a future looking feature but INT8 and INT4 ML models are still an active area of research. Most models deployed use FP16/32. The tensor cores on RTX cards are FP16 accelerators.

Yup, to the point of them inventing their own FP formats too:

* Google’s BFLOAT16 format for their new TPU: https://cloud.google.com/tpu/docs/bfloat16
* nVIDIA Ampere Tensor Cores and their new TensorFloat32 format: https://blogs.nvidia.com/blog/2020/05/14/tensorfloat-32-precision-format/
 
Last edited:

geordiemp

Member
Sep 5, 2013
11,920
25,017
1,010
UK
It's a future looking feature but INT8 and INT4 ML models are still an active area of research. Most models deployed use FP16/32. The tensor cores on RTX cards are FP16 accelerators.

I read somehwere the XSX board is split so they can repurpose the half board for server and the int4/8 for that purpose.

Cant remember where though - but seems logical as spreads the development costs.
 
Last edited:

Mistershine.

Banned
Jan 20, 2018
2,022
3,342
495
Excellent topic.

Machine learning is the future, it can be applied in many ways.

Transparent wall fluid simulation is my jam. I won't pretend to know anything about it on a technical level but aesthetically it's like a little nerdgasm everything I see it.
 
  • Like
Reactions: Alder
Sep 13, 2009
1,581
1,033
1,235
Outer Heaven
At the moment it's only a good bullet point on the spec sheet - it won't be the "real deal" until MS's DLSS-like feature actually shows up in games.

And also keep in mind, it wasn't until DLSS 2.0 (the second iteration) before people became convinced of how useful the feature could be.
 
Last edited:
  • Like
Reactions: GHG

Alder

Neo Member
Jul 5, 2020
26
55
145
Transparent wall fluid simulation is my jam. I won't pretend to know anything about it on a technical level but aesthetically it's like a little nerdgasm everything I see it.

The idea is for AI to learn how to present the same simulation results in real time.

Your diversion will become more common :messenger_horns:
 

Mistershine.

Banned
Jan 20, 2018
2,022
3,342
495
The idea is for AI to learn how to present the same simulation results in real time.

Your diversion will become more common :messenger_horns:
I understand the video posted, with regards to the ML getting better and better at it, it's the actual CFD itself that baffles me.
 

ZywyPL

Member
Nov 27, 2018
5,995
10,639
725
unfortunately it doesn't work that way, 2080Ti has a dedicated RT cores, which means even at full load the 15-17TF from CUDA cores are still at full disposal, whereas XBX has to sacrifice CUs for AI computations, which even at full 52CU utilization offers 97TOPS, compared 2080Ti's 430TOPS. And that full 52CU utilization will obviously never happen because there wouldn't have been any computing power left to draw polygons, let alone more advanced effects. So I'm curious to see is MS will be actually able to make any use of the ML implementation in Series X, or will it end up completely unused just like Rapid Packed Math was in PS4 Pro before.
 

mtcn77

Neo Member
Jan 30, 2017
6
1
150
And that full 52CU utilization will obviously never happen because there wouldn't have been any computing power left to draw polygons, let alone more advanced effects.
Hi, I think you are missing out on mesh shaders in that instance. Previously, it happened to be that way that you had to have a fixed number of pixels to a single triangle to make efficient use of the hardware. Since a CU was 4x16 SIMDs, you had to bundle each triangle to 16 pixels, otherwise you wasted a whole wavefront for a single triangle. It's a significant latency cost whichever way we put it, it takes 4 cycles on GCN.
Rdna, however, utilises mesh shaders and that frees such fixed association costs and thus, ought to enable better utility when performed alongside asynchronous shaders.
 
  • Like
Reactions: rnlval

rnlval

Member
Jun 26, 2017
1,271
1,019
460
Sector 001
gpucuriosity.wordpress.com
Could you start naming your thread titles correctly, along with the classifiers, so that I can avoid them instead of coming in thinking this is anything other than personal opinion?

No XSX doesn't have tensor core like hardware. It's the same as when Sony said FP16 gave the PS4 Pro 8 Tflop. Direct ML isn't a Xbox thing, it's a Microsoft thing.
On RTX GPUs, both Tensor and CUDA cores are limited by the external video memory and texture cache bandwidth.

RX 6800's DirectML and normal Shader workloads have access to very fast 128 MB Infinity Cache (Level L3 cache) in addition to the texture cache.
 

rnlval

Member
Jun 26, 2017
1,271
1,019
460
Sector 001
gpucuriosity.wordpress.com
PS5 can also do machine learning btw, its a key aspect of RDNA2. I expect some form of MLAA from Sony in a few years
RDNA's 8X rate INT4 and 4X rate INT8 are optional features. XSX has been confirmed to have the 8X rate INT4 and 4X rate INT8 optional features.

Machine learning workloads can be run on double rate FP16/INT16 RDNA hardware.
 

rnlval

Member
Jun 26, 2017
1,271
1,019
460
Sector 001
gpucuriosity.wordpress.com
unfortunately it doesn't work that way, 2080Ti has a dedicated RT cores, which means even at full load the 15-17TF from CUDA cores are still at full disposal, whereas XBX has to sacrifice CUs for AI computations, which even at full 52CU utilization offers 97TOPS, compared 2080Ti's 430TOPS. And that full 52CU utilization will obviously never happen because there wouldn't have been any computing power left to draw polygons, let alone more advanced effects. So I'm curious to see is MS will be actually able to make any use of the ML implementation in Series X, or will it end up completely unused just like Rapid Packed Math was in PS4 Pro before.
Unfortunately, RDNA 2 also has a dedicated RT core i.e. note why MS can claim ~25 TFLOPS for XSX.

While RTX GPU has dedicated tensor and RT cores, the bottlenecks are with external video memory and texture cache bandwidth.

XSX GPU's dedicated RT cores are placed next to TMUs which also consumes texture cache and external video memory bandwidth.