Support NeoGAF

Andodalf · Nov 5, 2020

rnlval said:
On RTX GPUs, both Tensor and CUDA cores are limited by the external video memory and texture cache bandwidth.

RX 6800's DirectML and normal Shader workloads have access to very fast 128 MB Infinity Cache (Level L3 cache) in addition to the texture cache.

Its worth noting that RTX 3000 is using higher bandwidth GDRR6X memory, which AMD doesn’t have access to. In many ways infinity cache seems to be a response to this.

rnlval · Nov 5, 2020

Andodalf said:
Its worth noting that RTX 3000 is using higher bandwidth GDRR6X memory, which AMD doesn’t have access to. In many ways infinity cache seems to be a response to this.

Atm, GA102 SKUs have GDDR6X. RTX 3070 has mainstream 448GB/s from 256 bit GDDR6-14000 PCB.

AMD has spec'ed RX 6800 with a 256-bit GDDR6-16000 (512 GB/s) with very fast 128 MB Infinity Cache (Level 3 cache).

NVIDIA needs to revise RTX 3070 into RTX 3070 Super.

TheThreadsThatBindUs · Nov 5, 2020

Xplainin said:
After looking in to the XSX Machine Learning abilities, this is really going to be a big deal for the series X this next gen, on par with Nvidias DLSS.

Nvidia included Tensor Cores into their GPUs. Tensor Cores added new INT8 and INT4 precision modes for inferencing workloads that can tolerate quantization and don’t require FP16 precision.
Microsoft also added specific hardware to their GPU to allow it to do INT8 and INT4 calculations. The additions that MS made are the equal of the Tensor cores of Nvidia.

The XSX has the same ML abilities as Nvidia DLSS cards do.

The bolded is pure unsweetened, creamy, smelly bullshit.

Sorry OP. Try again.

TheThreadsThatBindUs · Nov 5, 2020

Xplainin said:
NVIDIA Turing Architecture In-Depth | NVIDIA Technical Blog

Fueled by the ongoing growth of the gaming market and its insatiable demand for better 3D graphics, NVIDIA® has evolved the GPU into the world’s leading parallel processing engine for many…

developer.nvidia.com

The GeForce RTX 2080 Ti Founders Edition GPU delivers the following exceptional computational performance:

▶ 14.2 TFLOPS1 of peak single precision (FP32) performance

▶ 28.5 TFLOPS1 of peak half precision (FP16) performance

▶ 14.2 TIPS1 concurrent with FP, through independent integer execution units

▶ 113.8 Tensor TFLOPS

These figures apply to the the OPs done on CUDA cores, not the Tensor cores. Tensor cores are more than an order of magnitude faster.

You have no idea what you're talking about.

Support NeoGAF

XSX Machine Learning. The real deal.

Andodalf

Banned

rnlval

Member

TheThreadsThatBindUs

Member

TheThreadsThatBindUs

Member

NVIDIA Turing Architecture In-Depth | NVIDIA Technical Blog

Similar threads