Support NeoGAF

psorcerer · Sep 3, 2020

TL;DR 1 Ampere TF = 0.72 Turing TF, or 30TF (Ampere) = 21.6TF (Turing)

Reddit Q&A

To accomplish this goal, the Ampere SM includes new datapath designs for FP32 and INT32 operations. One datapath in each partition consists of 16 FP32 CUDA Cores capable of executing 16 FP32 operations per clock. Another datapath consists of both 16 FP32 CUDA Cores and 16 INT32 Cores. As a result of this new design, each Ampere SM partition is capable of executing either 32 FP32 operations per clock, or 16 FP32 and 16 INT32 operations per clock. All four SM partitions combined can execute 128 FP32 operations per clock, which is double the FP32 rate of the Turing SM, or 64 FP32 and 64 INT32 operations per clock.

A reminder from the Turing whitepaper:

First, the Turing SM adds a new independent integer datapath that can execute instructions concurrently with the floating-point math datapath. In previous generations, executing these instructions would have blocked floating-point instructions from issuing.

So, Turing GPU can execute 64INT32 + 64FP32 ops per clock per SM.
Ampere GPU can either execute 64INT32 + 64FP32 or 128FP32 ops per clock per SM.

Which means if a game executes 0 (zero) INT32 instructions then Ampere = 2xTuring
And if game executes 50/50 INT32 and FP32 then Ampere = Turing exactly.

So how many INT32 are there on average?
According to Nvidia:

we typically see about 36 additional integer pipe instructions for every 100 floating point instructions

Some math: 36 / (100+36) = 26%, i.e. in an average game instruction stream 26% are INT32

So we can now calculate what will happen to both Ampere and Turing when 26% INT32 + 74% FP32 instruction streams are used.
I have written a simple software to do that. But you can calculate an analytical upper bound easily: 74%/50% = 1.48 or +48%
My software shows a slightly smaller number +44% (and that's because of the edge cases where you cannot distribute the last INT32 ops in a batch equally, as only one pipeline can issue INT32 per each block of 16 cores)
So the theoretical absolute max is +48%, in practice the absolute achievable max is +44%

Thus each 2TF of Ampere have only 1.44TF of Turing performance.

Let's check the actual data Nvidia gave us:
3080 = 30TF (ampere) = 21.6TF (turing) = 2.14x 2080 (10.07TF turing)
Nvidia is even more conservative than that and gives us: 3080 = 2x2080
3070 = 20.4TF (ampere) = 14.7TF (turing) = 1.86x 2070 (7.88TF turing)
Nvidia is massively more conservative here giving us: 3070 = 1.6x2070
Actually if we average the two max numbers that Nvidia gives us (they explicitly say "up to") we get to even lower theoretical max of 1 Ampere TF = 0.65 Turing TF
Which suggests that maybe these new FP32/INT32 mixed pipelines cannot execute FP32 at full speed (or cannot execute all the instructions).
We do know that Turing had reduced register file access in INT32 (64 vs 256 for FP32) if it's the same (and everything suggests that Ampere is just a Turing facelift) then obviously not all FP32 instruction sequences can run on these pipelines.

Anyway a TF table:

	Ampere TF	Turing TF (me)	Turing TF (NV)
3080 (Ampere)	30	21.6	19.5
3070 (Ampere)	20.4	14.7	13.3
2080Ti (Turing)	18.75 (me) or 20.7 (NV)	13.5	13.5
2080 (Turing)	14 (me) or 15.5 (NV)	10.1	10.1
2070 (Turing)	10.4 (me) or 11.5 (NV)	7.5	7.5

Bonus round: RDNA1 TF
RDNA1 has no INT32 pipeline, all the INT32 instructions are handled in the main stream. Thus it's essentially almost exactly the same as Ampere, but it has no skew in the last instruction thus +48% theoretical max applies here (Ampere +2.3%)

	Ampere TF	Turing TF (me)	Turing TF (NV)
5700XT (RDNA1)	10.01	7.2	?

Amusingly enough 5700XT actual performance is pretty similar to 2070 and these adjusted TF numbers show exactly that (10TF vs 10-11TF)

Update: why Ampere is just a Turing facelift.

MISTER INVEIGLER · Sep 3, 2020

Coulomb_Barrier · Sep 3, 2020

Yes I was gonna post about this but the tugging off merry-go-round (circle jerk) was so overwhelming the other day I let it slide.

This '30 tflops' is Nvidia BS marketing. Also marketing was the 1.9X perf per watt improvement. Yeh only to reach 60fps at certain settings in a certain title comparing certain cards!

Silver Wattle · Sep 3, 2020

Yeah, I figured that was the case after watching the digital foundry 3080 preview and seeing the card averaging about 175% the 2080 performance despite having triple the TF.

diffusionx · Sep 3, 2020

it's like two PS5s duct-taped together, not 3.

M1chl · Sep 3, 2020

png-transparent-okay-emoji-illustration-emoji-know-your-meme-thought-normie-thinking-mammal-carnivoran-hand.png

NullZ3r0 · Sep 3, 2020

Now translate that to AMD FLOPS and you see how all the PCMR flexing is meaningless.

SF Kosmo · Sep 4, 2020

This is all kind of academic. I think what Digital Foundry showed, with 3080 giving a roughly 75% uplift over 2080Ti and a 90%+ uplift on RT intensive stuff is a good indicator of what we're getting. That's real world stuff that is still at least somewhat CPU bound.

Bonfires Down · Sep 4, 2020

Where do the consoles fit in all this?

psorcerer · Sep 4, 2020

diffusionx said:
it's like two PS5s duct-taped together, not 3.

Actually it's still 3xPS5 by flops (if PS5 has the same IPC as 5700XT)

psorcerer · Sep 4, 2020

SF Kosmo said:
giving a roughly 75% uplift over 2080Ti

Are you sure it's 2080Ti and not the vanilla 2080?

hard_boiled · Sep 4, 2020

1 PS5 TFLOP = 100 XSX TFLOPS

wordslaughter · Sep 4, 2020

NullZ3r0 said:
Now translate that to AMD FLOPS and you see how all the PCMR flexing is meaningless.

Don't let us stop you.

Please translate this to AMD flops and report your findings.

MightySquirrel · Sep 4, 2020

psorcerer said:
Actually it's still 3x PS5 by flops (if PS5 has the same IPC as 5700XT)

3x PS5 ?

Woot Woot !

diffusionx · Sep 4, 2020

NullZ3r0 said:
Now translate that to AMD FLOPS and you see how all the PCMR flexing is meaningless.

I honestly don't give a single shit about a teraflop, that seems like console posturing, and it became that because both console platforms were similar architecturally and can be directly compared.... honestly the only thing I care about are benchmarks. I think most who game on PC feel the same way.

Guilty_AI · Sep 4, 2020

so people really waste their time researching these R A N D O M N U M B E R S

Its 'cheap' and its powerful enough for next gen. Thats all i need to know right now

THE DUCK · Sep 4, 2020

So basically, in a nutshell, the xbox series X is still more powerfull than a 3070, considering the X's closed architechture. Perhaps the consoles aren't so "weak" after all.

wordslaughter · Sep 4, 2020

dvdvideo said:
So basically, in a nutshell, the xbox series X is still more powerfull than a 3070, considering the X's closed architechture. Perhaps the consoles aren't so "weak" after all.

If you like being disappointed, then you should definitely think that.

MightySquirrel · Sep 4, 2020

dvdvideo said:
So basically, in a nutshell, the xbox series X is still more powerfull than a 3070, considering the X's closed architechture. Perhaps the consoles aren't so "weak" after all.

LoL nah well maybe by the end of the gen when 3070 is long forgotten and not supported anymore.

GHG · Sep 4, 2020

Bonfires Down said:
Where do the consoles fit in all this?

They don't.

They offer a completely different experience and if you are buying one in addition to a 3080 or something similar it's for a completely different reason.

SF Kosmo · Sep 4, 2020

psorcerer said:
Are you sure it's 2080Ti and not the vanilla 2080?

Ah, correct.

But I find it interesting that even nVidia seems to correct for this. The chart at 30:50 in their presentation shows the 2070 (20 TFlops) as only very narrowly ahead of 2080Ti (13 TFlops?). And they claim the 3080 to be about 2x the vanilla 2080.

Entroyp · Sep 4, 2020

Interesting to see efficiency per teraflop going backwards... this might mean big navi might not suck that much (at least on shader performance)

psorcerer · Sep 4, 2020

SF Kosmo said:
But I find it interesting that even nVidia seems to correct for this. The chart at 30:50 in their presentation shows the 2070 (20 TFlops) as only very narrowly ahead of 2080Ti (13 TFlops?). And they claim the 3080 to be about 2x the vanilla 2080.

Yup, that's why it's pretty fishy.
Anyway these 2xCUDA cores and 2xTF for Ampere are sure inflated as hell compared to Turing.

psorcerer · Sep 4, 2020

Entroyp said:
Interesting to see efficiency per teraflop going backwards... this might mean big navi might not suck that much (at least on shader performance)

Everybody was saying: "NV TF are better than AMD TF!" so, now NV wanted to fix that.

wordslaughter · Sep 4, 2020

Comparing TFLOPS across architectures has always been tricky business.

But you're trying to argue that Ampere is less powerful than it seems while I would argue that it's actually MORE powerful.

It's really getting to the point where tryin to use TFLOPS as comparison is nearly useless, especially vs Nvidia, because noone else has any kind of answer to DLSS.

How do you compare the relative performance of ANY other GPU to a GPU that can take a 1440p or even 1080p input and pump out a 4K output that is nearly indistinguishable ( and in some cases superior ) to native 4K? How do you do that? You could argue that it's cheating, but if your eyes can't tell the difference and you're getting nearly double the framerate, then what does it matter?

Taking DLSS 2.0 ( and 2.1 ) into account I could argue that Ampere can perform more like another GPU that has 50 TFLOPS, but no DLSS.

SF Kosmo · Sep 4, 2020

dvdvideo said:
So basically, in a nutshell, the xbox series X is still more powerfull than a 3070, considering the X's closed architechture. Perhaps the consoles aren't so "weak" after all.

No, probably not. They're in the ballpark as far as shader compute, but better RT performance and the tensor for ai stuff.

Consoles do generally have advantage of being able to optimize to a specific target rather than make compromises to scale, but MS seems to be fucking that part up.

Coulomb_Barrier · Sep 4, 2020

SF Kosmo said:
This is all kind of academic. I think what Digital Foundry showed, with 3080 giving a roughly 75% uplift over 2080Ti and a 90%+ uplift on RT intensive stuff is a good indicator of what we're getting. That's real world stuff that is still at least somewhat CPU bound.

See, this 75% figure is pulled out of a green arse. It's wrong, please stop repeating massively inflated numbers.

EDIT: So you got the cards mixed up, 75% over 2080 sounds more realistic.

hyperbertha · Sep 4, 2020

So is the 3080 twice as fast as the 2080 or not? I mean this is bordering on false marketing at this point.

Iced Arcade · Sep 4, 2020

One of many reasons I'm a petty console gamer and love it.

Chiggs · Sep 4, 2020

hyperbertha said:
So is the 3080 twice as fast as the 2080 or not? I mean this is bordering on false marketing at this point.

I don't believe it is. I think it's more like 40-75% in certain situations. Still quite good, though.

Edit: I was referring to 2080ti.

psorcerer · Sep 4, 2020

wordslaughter said:
But you're trying to argue that Ampere is less powerful than it seems while I would argue that it's actually MORE powerful.

Not really I'm actually saying two things:
1. 3080 is 2080Ti in disguise. Same arch (sans small improvements) on a smaller node and thus faster clocks.
2. Current NV "marketing TF" are not comparable to Turing directly.

wordslaughter said:
How do you compare the relative performance of ANY other GPU to a GPU that can take a 1440p or even 1080p input and pump out a 4K output that is nearly indistinguishable ( and in some cases superior ) to native 4K.

I think everybody (including Intel) will have DL upscale a year from now.

nochance · Sep 4, 2020

Not quite. It is quite easy to calculate the performance by looking at the number of processing units. The way it handles instructions is a benefit on top of sheer processing power.

psorcerer · Sep 4, 2020

nochance said:
It is quite easy to calculate the performance by looking at the number of processing units.

Which are inflated 2x for no reason as well. Yup.

wordslaughter · Sep 4, 2020

psorcerer said:
Not really I'm actually saying two things:
1. 3080 is 2080Ti in disguise. Same arch (sans small improvements) on a smaller node and thus faster clocks.
2. Current NV "marketing TF" are not comparable to Turing directly.

I think everybody (including Intel) will have DL upscale a year from now.

You're really trying to argue that a 3080 is a 2080ti in disguise?

Good luck with that considering ....

You can see right here that a 3080 is between 70 and 100% more performant than a 2080ti.

You think consoles are going to have an equivalent answer to DLSS 2.0 in a year? I doubt they ever will. They can come up with all the checkerboarding methods they want, but they will never be equivalent to DLSS 2.0+

nochance · Sep 4, 2020

psorcerer said:
Not really I'm actually saying two things:
1. 3080 is 2080Ti in disguise. Same arch (sans small improvements) on a smaller node and thus faster clocks.
2. Current NV "marketing TF" are not comparable to Turing directly.

I think everybody (including Intel) will have DL upscale a year from now.

This is factually false. 3080 has 8,704 cores vs 4352 on 2080 ti.

psorcerer · Sep 4, 2020

wordslaughter said:
You can see right here that a 3080 is between 70 and 100% more performant than a 2080ti.

Numbers please.
According to my numbers in the OP: 3080 = 2080Ti + 60% (theoretical max).

nochance · Sep 4, 2020

psorcerer said:
Which are inflated 2x for no reason as well. Yup.

The reason is the smaller manufacturing process allowing for more units to be fitted on the die.

psorcerer · Sep 4, 2020

nochance said:
This is factually false. 3080 has 8,704 cores vs 4352 on 2080 ti.

That's bullshit. Explained in the OP.
NV marketing is just counting all the FP32 and INT/FP32 cores as FP32 for marketing purposes.

CuNi · Sep 4, 2020

People trying to recalculate FLOPS between different architectures and things like "Ampere FLOPS != Turing FLOPS" just show repeatedly that they inherently do not understand what FLOPS mean.

diffusionx · Sep 4, 2020

psorcerer said:
Not really I'm actually saying two things:
1. 3080 is 2080Ti in disguise. Same arch (sans small improvements) on a smaller node and thus faster clocks.

You can read all about the architecture here:

NVIDIA Ampere Architecture In-Depth | NVIDIA Technical Blog

Today, during the 2020 NVIDIA GTC keynote address, NVIDIA founder and CEO Jensen Huang introduced the new NVIDIA A100 GPU based on the new NVIDIA Ampere GPU architecture. This post gives you a look…

developer.nvidia.com

You are accusing Nvidia of lying and claiming that the 3080 is something it is not. That is a pretty heavy accusation.

Coulomb_Barrier · Sep 4, 2020

hyperbertha said:
So is the 3080 twice as fast as the 2080 or not? I mean this is bordering on false marketing at this point.

Not, not even close likely once we see averages of more than a few games.

wordslaughter said:
You're really trying to argue that a 3080 is a 2080ti in disguise?

Good luck with that considering ....

You can see right here that a 3080 is between 70 and 100% more performant than a 2080ti.

What in the wo....you seriously think that in games, where a 2080 Ti gets around 100fps, the 3080 will clock around 200fps? Or are you talking about RT performance? That's astounding.

SlimySnake · Sep 4, 2020

hyperbertha said:
So is the 3080 twice as fast as the 2080 or not? I mean this is bordering on false marketing at this point.

DF benchmarks found 82% on average. not 100% the 2x suggests but close, very close.

psorcerer · Sep 4, 2020

diffusionx said:
You are accusing Nvidia of lying and claiming that the 3080 is something it is not. That is a pretty heavy accusation.

It's not lying, it's exaggerating.
Purely theoretically Ampere has 2x FP32 cores, but in reality these are shared with INT32 (which is what NV says in their Reddit Q&A)

nochance · Sep 4, 2020

psorcerer said:
That's bullshit. Explained in the OP.
NV marketing is just counting all the FP32 and INT/FP32 cores as FP32 for marketing purposes.

Cuda cores are a known and defined element.
Further context for tech heads: https://forums.developer.nvidia.com...nt-unit-in-a-same-core-work-in-parallel/71086

THE DUCK · Sep 4, 2020

wordslaughter said:
If you like being disappointed, then you should definitely think that.

Well based on the numbers this poster is throwing around, its really 12tf in a closed environment vs 13.3tf in open pc architecture.

Coulomb_Barrier · Sep 4, 2020

SlimySnake said:
DF benchmarks found 82% on average. not 100% the 2x suggests but close, very close.

Yes but this was in cherry picked titles and settings under the guidance of Nvidia. The average is likely far below that.

wordslaughter · Sep 4, 2020

psorcerer said:
Numbers please.
According to my numbers in the OP: 3080 = 2080Ti + 60% (theoretical max).

You can just watch the video. It's actual gameplay with a FPS counter.

If you were right ( you're not ) then the 3080 framerate while playing DOOM would be the same as the 2080ti's framerate. But it isn't.

So now are you going to admit you were wrong or move goalposts?

MightySquirrel · Sep 4, 2020

Chiggs said:
I don't believe it is. I think it's more like 40-75% in certain situations. Still quite good, though.

We already have benchmarks from DF showing 70-90% uplift in 6 different games all with different engines. Where did you find 40% from?

psorcerer · Sep 4, 2020

nochance said:
Cuda cores are a known and defined element.

Yup.
In Turing cuda core has 1xFP32 + 1xINT32 ALUs = 2ALUs but only one is FP capable
In Ampere a cuda core has 1xFP32 + 1xINT/FP32 ALUs = 2ALUs and both are FP capable
NV counts the Turing ALU as 1 and Ampere ALU as 2 although even the die size between these will be roughly the same.
Purely symantically inflating the number of cores that were present in Turing!

wordslaughter · Sep 4, 2020

Console fanboys trying to attack Ampere. It's cute

Support NeoGAF

Nvidia Ampere teraflops and how you cannot compare them to Turing

Banned

Member

Member

Gold Member

Gold Member

Currently Gif and Meme Champion

Banned

Al Jazeera Special Reporter

Member

Banned

Banned

Neophyte

Banned

Banned

Gold Member

Member

voted poster of the decade by bots

Banned

Banned

Member

Al Jazeera Special Reporter

Member

Banned

Banned

Banned

Al Jazeera Special Reporter

Member

Member

Member

Member

Banned

Banned

Banned

Banned

Banned

Banned

Banned

Banned

Member

Gold Member

Member

Flashless at the Golden Globes

Banned

Banned

voted poster of the decade by bots

Member

Banned

Banned

Banned

Banned

Similar threads