• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

GeForce GTX 970s seem to have an issue using all 4GB of VRAM, Nvidia looking into it

Status
Not open for further replies.

Rizzi

Member
Make sure it's the latest version and look here.

cs9k0t.jpg

Thanks.
Looks like I have Hynix memory. :/
 

Darklord

Banned
It'd be of more help if people could do this:



...and report back with a screenshot of their stats window, along with the make/model of their GPU and the brand of memory it uses (use GPU-Z for the latter).

Edit: It's telling me that the last ~400MB of my 670 has a bandwidth of 4GBps, which I find rather odd...

This is mine. Samsung memory, MSI 970 gaming g4 4gb. Does that drop near the end mean I'm affected too?
EZXSWIG.jpg
 

NJDEN

Member
Checked Nvidia Inspector immediately after reading OP:


My Gigabyte 970 has Samsung memory and it reports 4096 MBs of available memory.
 
I posted this elsewhere, but I think it actually seems more likely that the issue is hardware-related and cannot be fixed. Here's an illustration of GM204 (the chip inside the 970 and the 980)

gtx980-17b.jpg


Three of those sixteen SMMs are cut/disabled to make a 970 whereas the 980 gets all sixteen fully enabled. It seems that each of the four 64-bit memory controllers corresponds with each of the four raster engines and in the same way that the 970's effective pixel fillrate has been demonstrated to be lower than the 980's even though SMM cutting leaves the ROPs fully intact (http://techreport.com/blog/27143/here-another-reason-the-geforce-gtx-970-is-slower-than-the-gtx-980), the same situation may apply to bandwidth with Maxwell causing the 970s to have this VRAM issue while the 980s don't. However, the issue may be completely independent of which SMMs are cut and may simply relate to how many.

GM206's block diagram demonstrates the same raster engine to memory controller ratio/physical proximity:

GM206-Block-Diagram.jpg


I expect a cut-down GM206 part and even a GM200 part will exhibit the same issue as a result, it might be intrinsically tied to how Maxwell as an architecture operates. Cut down SMMs -> effectively mess up ROP and memory controller behavior as well as shaders and TMUs. I also don't think there's a chance in hell Nvidia were unaware of this, but I could be wrong.

This seems to be the most likely reason to me. Just hardware related =/. I doubt it's a matter of memory manufacturer, and aside from that one memory benchmark there isn't a great way to test.

Fake edit: And those results coming in off Samsung memory now... yeeeeep.
 

JaseC

gave away the keys to the kingdom.
So, assuming the benchmark is accurate, it would seem the issue manifests as a gradual drop in memory bandwidth across the last handful of chunks and is not isolated to a particular brand of VRAM. It'd be great if some 980 folks could find their way to this thread and post their findings.
 

Ac30

Member
I posted this elsewhere, but I think it actually seems more likely that the issue is hardware-related and cannot be fixed. Here's an illustration of GM204 (the chip inside the 970 and the 980)


Three of those sixteen SMMs are cut/disabled to make a 970 whereas the 980 gets all sixteen fully enabled. It seems that each of the four 64-bit memory controllers corresponds with each of the four raster engines and in the same way that the 970's effective pixel fillrate has been demonstrated to be lower than the 980's even though SMM cutting leaves the ROPs fully intact (http://techreport.com/blog/27143/here-another-reason-the-geforce-gtx-970-is-slower-than-the-gtx-980), the same situation may apply to bandwidth with Maxwell causing the 970s to have this VRAM issue while the 980s don't. However, the issue may be completely independent of which SMMs are cut and may simply relate to how many.

GM206's block diagram demonstrates the same raster engine to memory controller ratio/physical proximity:

I expect a cut-down GM206 part and even a GM200 part will exhibit the same issue as a result, it might be intrinsically tied to how Maxwell as an architecture operates. Cut down SMMs -> effectively mess up ROP and memory controller behavior as well as shaders and TMUs. I also don't think there's a chance in hell Nvidia were unaware of this, but I could be wrong.

Didn't the 660 or 660Ti exhibit similar behavior? Like it couldn't fill the last 500MB at a sufficient rate.

Seems it was the 660Ti that could only really use 1.5GB of its 2GB, so it wouldn't be the first time Nvidia pulled this, if it is deliberate HW gimping in this case:

http://www.anandtech.com/show/6159/the-geforce-gtx-660-ti-review/2
 
So, assuming the benchmark is accurate, it would seem the issue manifests as a gradual drop in memory bandwidth across the last handful of chunks and is not isolated to a particular brand of VRAM. It'd be great if some 980 folks could find their way to this thread and post their findings.

980 folks don't hangout with us plebs.
 
Too bad my cpu is packed and in the storage as my apartment is under renovation so I can't test, but i am not having high hope of my cards operating any better than this.

/*I knew the price to performance was too good to be true.*/
 
My MSI GTX 970 package came in the mail yesterday and is sitting in my desk unopened. The 30 day clock has started. Should I hold on and hope I get a free upgrade to 980 or return the 970? I sure as hell don't want to spend that kind of money for a 980.

When can we expect a response from nvidia?
 

GHG

Member
I posted this elsewhere, but I think it actually seems more likely that the issue is hardware-related and cannot be fixed. Here's an illustration of GM204 (the chip inside the 970 and the 980)

gtx980-17b.jpg


Three of those sixteen SMMs are cut/disabled to make a 970 whereas the 980 gets all sixteen fully enabled. It seems that each of the four 64-bit memory controllers corresponds with each of the four raster engines and in the same way that the 970's effective pixel fillrate has been demonstrated to be lower than the 980's even though SMM cutting leaves the ROPs fully intact (http://techreport.com/blog/27143/here-another-reason-the-geforce-gtx-970-is-slower-than-the-gtx-980), the same situation may apply to bandwidth with Maxwell causing the 970s to have this VRAM issue while the 980s don't. However, the issue may be completely independent of which SMMs are cut and may simply relate to how many.

GM206's block diagram demonstrates the same raster engine to memory controller ratio/physical proximity:

GM206-Block-Diagram.jpg


I expect a cut-down GM206 part and even a GM200 part will exhibit the same issue as a result, it might be intrinsically tied to how Maxwell as an architecture operates. Cut down SMMs -> effectively mess up ROP and memory controller behavior as well as shaders and TMUs. I also don't think there's a chance in hell Nvidia were unaware of this, but I could be wrong.

Didn't the 660 or 660Ti exhibit similar behavior? Like it couldn't fill the last 500MB at a sufficient rate.

Seems it was the 660Ti that could only really use 1.5GB of its 2GB, so it wouldn't be the first time Nvidia pulled this, if it is deliberate HW gimping in this case:

http://www.anandtech.com/show/6159/the-geforce-gtx-660-ti-review/2

After just doing a bit of research on the issue now I was going to post exactly the same thing. Looks like nvidia have done something similar to what they did with the 2gb variants of the 660 and 660ti.

My previous posts documenting the issues with the 2gb versions of the 660:

http://www.neogaf.com/forum/showpost.php?p=47788241&postcount=3798

Nvidia got away with it then because the 660 cards were not flagship nor were they marketed as high end so most buyers of those cards would not be sensitive to those issues nor savvy to what is actually going on.

I have a feeling they will not be so lucky this time.

They have disabled 3 out of the 16 SMX units. That's 81% (rounded) of the SMX units still available.

81% of 4096 MB is 3317.76 MB and if you look at the results from the test being posted above (let's take darklord's as an example since he is on Samsung memory) this is roughly where the significant bandwidth drop off starts:

EZXSWIG.jpg


Yep, this is a hardware issue.

The results being posted actually point towards a problem whereby the whole of the 4th GPC engine's bandwidth is compromised by the removal of these 3 SMX units.

3 out of the 4 GPC engines are fully intact. To do the math again... That means 75% of the hardware works as intended at full capacity. 75% of 4096 MB is 3072 MB. As shown above 3072MB is exactly the point where the bandwidth issues begin to creep in.

Nvidia saying they are "looking into" this are just trying to save face. If the above is true there is nothing they can do to solve this other than to give all 970 owners a 980.
 

def sim

Member
So, assuming the benchmark is accurate, it would seem the issue manifests as a gradual drop in memory bandwidth across the last handful of chunks and is not isolated to a particular brand of VRAM. It'd be great if some 980 folks could find their way to this thread and post their findings.

980, Samsung memory

wapitatakgj.png
 

GHG

Member
There you go. Hope this helps. :)

7rN2Lrc.jpg

This is... Interesting. Why on earth is the 980 bandwidth starting to choke at roughly 3.5GB? The 970 issues we've seen actually make sense, but this doesn't make any sense unless nvidia is flat out lying about their hardware.
 
The drop off at the end is supposedly affected by Windows using some vram (according to oc.net posts), so 980s will likely see it too, like the ones posted above (unless running from integrated graphics/disabling dwm?), just not as drastically as 970s, which are probably actually gimped with the cut SMXs.

While there does seem to be an issue with 970s, this benchmark is probably not the most reliable way to collect data about it =/
 

GHG

Member
The drop off at the end is supposedly affected by Windows using some vram (according to oc.net posts), so 980s will likely see it too, like the ones posted above (unless running from integrated graphics/disabling dwm?), just not as drastically as 970s, which are probably actually gimped with the cut SMXs.

While there does seem to be an issue with 970s, this benchmark is probably not the most reliable way to collect data about it =/

Makes sense.

So we need someone to able to do some similar tests in a scenario where there is no OS overhead on the GPU. That just isn't going to happen really.
 

Rafterman

Banned
That "test" is garbage. Not only does it crash my driver every time I run it, but if you look at the tests people are posting it shows massive slowdown well below the 3.5g mark that is the supposed cutoff. If that test were accurate people would be having problems with games at the 3g mark or so and not at the 3.5g mark. I do know, though, that MSI Kombuster uses up nearly all of my ram (3912) during it's 3g stress test, without stuttering or frame drops. The only game I have that I can max out vram with is Far Cry 4, and the only reason it runs like crap when using all my ram is because a single 970 isn't pushing 5120x2160 at decent frames anyway, but it does use all of it.

P.S. That 980 test shows the same thing at 3.5g and I've never heard any 980 user complain about memory issues.
 
The drop off at the end is supposedly affected by Windows using some vram (according to oc.net posts), so 980s will likely see it too, like the ones posted above (unless running from integrated graphics/disabling dwm?), just not as drastically as 970s, which are probably actually gimped with the cut SMXs.

While there does seem to be an issue with 970s, this benchmark is probably not the most reliable way to collect data about it =/

I wouldn't say there's a guaranteed issue with 970s. AC: Unity does just fine using almost 4 GB of VRAM. Shadows of Mordor does too with the caveat that Ambient Occlusion be turned off.

If it was the OS overhead killing 980s in that test, we'd be hearing reports from the people who play with borderless widescreen in Shadows of Mordor about bad stuttering.
 
I wouldn't say there's a guaranteed issue with 970s. AC: Unity does just fine using almost 4 GB of VRAM. Shadows of Mordor does too with the caveat that Ambient Occlusion be turned off.

Yeah, I've only been able to test AC reasonably (as in, not pushing 4k+AA to eat the vram where performance won't be great regardless) and at 1080p with 8xMSAA my 970 was using ~3950. It just had to be "forced" into it and doesn't seem to want to cross that 3.5 gig threshold normally. It also didn't seem to tank performance for using the extra memory, but the resistance to going over 3.5 is still curious.
 

JaseC

gave away the keys to the kingdom.
That benchmark is not useful at all. It's using code ripped from nvidia's cuda concurrent bandwidth test.

The latter doesn't necessarily confirm the former. Part of the bandwidth test is testing device-to-device bandwidth (i.e. moving data within the GPU itself as the test only supports one GPU) -- that's exactly what people should be testing for and presumably what this custom benchmark is doing. I'd say the program is fine and 980 folks (and those with other cards, like myself and my two 670s) are seeing stark drop-offs at the very end because of OS overhead, whereas the 970s seem to be invariably affected a few chunks earlier.
 
MSI GeForce GTX 970 Gaming 4G with Hynix memory
Just got it a few weeks ago :(

ramtest1_by_realghostvids-d8f16e0.gif


I haven't played anything that has sent the VRAM up that high yet, so I haven't noticed anything peculiar.
 

Bendoruu

Neo Member
Gainward Phantom with Hynix memory too. Bought it after Christmas.

From what I can see in your benchmarks, all the GTX 9xx have limitations between 3400 and 3840 Mbytes ?

Or am I missing something ?

EDIT: 980 not affected confirmed.Ok.
 
Are the 960s out yet? If so, I'd be interested to see someone with one of them post a benchmark...just to see if it cuts out at 1.5?
 
The latter doesn't necessarily confirm the former. Part of the bandwidth test is testing device-to-device bandwidth (i.e. moving data within the GPU itself as the test only supports one GPU) -- that's exactly what people should be testing for and presumably what this custom benchmark is doing. I'd say the program is fine and 980 folks (and those with other cards, like myself and my two 670s) are seeing stark drop-offs at the very end because of OS overhead, whereas the 970s seem to be invariably affected a few chunks earlier.

From the official bandwidthTest from the CUDA SDK.

Code:
bandwidthTest.exe --dtod --start=1 --end=4294967296 --increment=3000000 --mode=shmoo --csv

[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: GeForce GTX 970
 Shmoo Mode

---snip---
bandwidthTest-D2D, Bandwidth = 142328.4 MB/s, Time = 0.00021 s, Size = 31535104
bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 142681.9 MB/s, Time = 0.00022 s, Size = 33632256
bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 135286.8 MB/s, Time = 0.00027 s, Size = 37826560
bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 142764.5 MB/s, Time = 0.00028 s, Size = 42020864
bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 142635.0 MB/s, Time = 0.00031 s, Size = 46215168
bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 143035.5 MB/s, Time = 0.00034 s, Size = 50409472
bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 142108.2 MB/s, Time = 0.00037 s, Size = 54603776
bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 142393.2 MB/s, Time = 0.00039 s, Size = 58798080
bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 141567.1 MB/s, Time = 0.00042 s, Size = 62992384
bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 142641.7 MB/s, Time = 0.00045 s, Size = 67186688
bytes, NumDevsUsed = 1
Result = PASS
 
Status
Not open for further replies.
Top Bottom