• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

XSX Machine Learning. The real deal.

Andodalf

Banned
On RTX GPUs, both Tensor and CUDA cores are limited by the external video memory and texture cache bandwidth.

RX 6800's DirectML and normal Shader workloads have access to very fast 128 MB Infinity Cache (Level L3 cache) in addition to the texture cache.

Its worth noting that RTX 3000 is using higher bandwidth GDRR6X memory, which AMD doesn’t have access to. In many ways infinity cache seems to be a response to this.
 

rnlval

Member
Its worth noting that RTX 3000 is using higher bandwidth GDRR6X memory, which AMD doesn’t have access to. In many ways infinity cache seems to be a response to this.
Atm, GA102 SKUs have GDDR6X. RTX 3070 has mainstream 448GB/s from 256 bit GDDR6-14000 PCB.

AMD has spec'ed RX 6800 with a 256-bit GDDR6-16000 (512 GB/s) with very fast 128 MB Infinity Cache (Level 3 cache).

NVIDIA needs to revise RTX 3070 into RTX 3070 Super.
 
Last edited:
After looking in to the XSX Machine Learning abilities, this is really going to be a big deal for the series X this next gen, on par with Nvidias DLSS.

Nvidia included Tensor Cores into their GPUs. Tensor Cores added new INT8 and INT4 precision modes for inferencing workloads that can tolerate quantization and don’t require FP16 precision.
Microsoft also added specific hardware to their GPU to allow it to do INT8 and INT4 calculations. The additions that MS made are the equal of the Tensor cores of Nvidia.

The XSX has the same ML abilities as Nvidia DLSS cards do.

The bolded is pure unsweetened, creamy, smelly bullshit.

Sorry OP. Try again.
 

The GeForce RTX 2080 Ti Founders Edition GPU delivers the following exceptional computational performance:

▶ 14.2 TFLOPS1 of peak single precision (FP32) performance

▶ 28.5 TFLOPS1 of peak half precision (FP16) performance

▶ 14.2 TIPS1 concurrent with FP, through independent integer execution units

▶ 113.8 Tensor TFLOPS

These figures apply to the the OPs done on CUDA cores, not the Tensor cores. Tensor cores are more than an order of magnitude faster.

You have no idea what you're talking about.
 
Top Bottom