NVIDIA GTX 970 Specifications Corrected, Memory Pools Explained

← Back to Stories (view on slashdot.org)

NVIDIA GTX 970 Specifications Corrected, Memory Pools Explained

Posted by samzenpus on Monday January 26, 2015 @08:57AM from the under-the-hood dept.

Vigile writes Over the weekend NVIDIA sent out its first official response to the claims of hampered performance on the GTX 970 and a potential lack of access to 1/8th of the on-board memory. Today NVIDIA has clarified the situation again, this time with some important changes to the specifications of the GPU. First, the ROP count and L2 cache capacity of the GTX 970 were incorrectly reported at launch (last September). The GTX 970 has 52 ROPs and 1792 KB of L2 cache compared to the GTX 980 that has 64 ROPs and 2048 KB of L2 cache; previously both GPUs claimed to have identical specs. Because of this change, one of the 32-bit memory channels is accessed differently, forcing NVIDIA to create 3.5GB and 0.5GB pools of memory to improve overall performance for the majority of use cases. The smaller, 500MB pool operates at 1/7th the speed of the 3.5GB pool and thus will lower total graphics system performance by 4-6% when added into the memory system. That occurs when games request MORE than 3.5GB of memory allocation though, which happens only in extreme cases and combinations of resolution and anti-aliasing. Still, the jury is out on whether NVIDIA has answered enough questions to temper the fire from consumers.

12 of 113 comments (clear)

Min score:

Reason:

Sort:

They are partnering with SOE on a fix by mandark1967 · 2015-01-26 09:03 · Score: 2, Insightful

You pay for an airdrop containing the extra ROPS and Cache. It's contested, though, so you may or may not get it.

--
Sig Follows: "Suppose you were an idiot. And suppose you were a member of Congress. But I repeat myself." -- Mark Twain
Consumers? No just whiny fanboys by Sycraft-fu · 2015-01-26 09:10 · Score: 3, Insightful

Consumers are fine. The only benchmark that matters to a normal consumer is "How fast does it run my games?" and the answer for the 970 is "Extremely damn fast." It offers performance quite near the 980, for most games so fast that your monitor's refresh rate is the limit, and does so at half the cost. It is an extremely good buy, and I say this as someone who bought a 980 (because I always want the highest end toy).
Some people on forums are trying to make hay about this because they like to whine, but if you STFU and load up a game the thing is just great. While I agree companies need to keep their specs correct, the idea that this is some massive consumer issue is silly. The spec heads on forums are being outraged because they like to do that, regular consumers are playing their games happily, amazed at how much power $340 gets you these days.
1. Re:Consumers? No just whiny fanboys by alvinrod · 2015-01-26 09:18 · Score: 3, Insightful
  
  While that's a reasonable argument (and true) there are some people who do have cause to complain if they would have changed their purchasing decision based on having the correct information at the time of their purchase.
  
  Honestly, even something like that 970 is overkill for me. I've still got an 8800 in my old machine that runs plenty of games just fine, especially many of the older ones that I'm finally getting around to playing.
Better article by gman003 · 2015-01-26 09:15 · Score: 5, Informative

As usual, AnandTech's article is generally the best technical reporting on the matter
Key takeaways (aka tl;dr version):
* Nvidia's initial announcement of the specs was wrong, but only because the technical marketing team wasn't notified that you could partially disable a ROP unit with the new architecture. They overstated the number of ROPs by 8 (was 64, actually 56) and the amount of L2 cache by 256KB (was 2MB, actually 1.75MB). This was quite unlikely to be a deliberate deception, and was most likely an honest mistake.
* The card effectively has two performance cliffs for exceeding memory usage. Go over 3.5GB, and it drops from 196GB/s to 28GB/s; go over 4GB and it drops from 28GB/s to 16GB/s as it goes out to main memory. This makes it act more like a 3.5GB card in many ways, but the performance penalty isn't quite as steep, and it intelligently prioritizes which data to put in the slower segment.
* The segmented memory is not new; Nvidia previously used it with the 660 and 660 Ti, although for a different reason.
* Because, even with the reduced bandwidth, the card is bottlenecked elsewhere, this is unlikely to cause actual performance issues in real-world cases. The only things that currently show it are artificial benchmarks that specifically test memory bandwidth, and most of those were written specifically to test this card.
* As always, the only numbers that matter for buying a video card are benchmarks and prices. I'm a bigger specs nerd than most, but even I recognize that the thing that matters is application performance, not theoretical. And the application performance is good enough for the price that I'd still buy one, if I were in the market for a high-end but not top-end card.
Not a shill or fanboy for Nvidia - I use and recommend both companies' cards, depending on the situation.
1. Re:Better article by gman003 · 2015-01-26 10:42 · Score: 2
  
  This wasn't "marketing material", it was "technical marketing material", the stuff given to review sites, not the general public. And it was a relatively obscure portion that was incorrect, not something that most consumers would even understand, let alone care about. The technical marketing staff (a distinct group from the consumer marketing department) made the assumption that every enabled ROP/MC functional unit has two 8px/clock ROPs, two L2 cache units of 256KB, two links into the memory crossbar, and two 32-bit memory controllers.
  This assumption was true for previous architectures (Tesla, Fermi, Kepler). It was true for earlier releases in this architecture (the 750 Ti and 980 were full-die releases, no disabled units; the 750 only disabled full units). This is the first architecture where disabling parts of the ROP/MC functional unit, while keeping other parts active, was possible. The marketing department was informed that there were still 8 ROP/MC units, and that there was still a 256-bit memory buss. They were not informed that one ROP/MC unit was partially disabled, with only one ROP and one L2 cache unit, and only one port into the memory crossbar, but still two MCs.
  The point AT made is this: this information would have been figured out eventually. If Nvidia had been up-front with it, it would have been a minor footnote on the universally-positive launch reviews, not dedicated articles just for this issue. It only hurts them to have it not be known information from the get-go.
  As much as it's hip to hate on big corporations for being evil, they are not evil for no purpose. They do evil only when it is more profitable. In this case, the supposed lie was less profitable than the truth. Therefore it was incompetence, either "they honestly didn't know this was how it worked when they sent the info to reviewers", or "they thought they could get away with something that absolutely would have gotten out, and would not help them sell cards anyway". The former incompetence seems far, far more likely than the latter.
Re:Option? by Anonymous Coward · 2015-01-26 09:16 · Score: 2, Informative

There's really no point to doing that. If you disable the memory and run at high resolutions with ultra textures and AA that would cause you to break that 3.5 GB barrier, your performance would just tank because you are exchanging with main memory. In other words, the performance of the card is exactly at least what you would get from a 3.5 GB card. That extra 500 MB isn't hurting anything.
Re:has not answered the important question by JustNiz · 2015-01-26 09:36 · Score: 2, Informative

>> causing them to crash out
This is a blatantly misieading thing to say. The cards don't crash at all. The only thing that happens as a result of this is a properly handled decrease in real world performance compared to the 980.
Are you seriously trying to claim that the 970 _should_ have the same performance as the 980?
Re:1/7th the speed? by Orestesx · 2015-01-26 09:38 · Score: 4, Insightful

This coward has just clearly demonstrated that there is in fact such a thing as a dumb question.
Re:has not answered the important question by JustNiz · 2015-01-26 09:52 · Score: 2

>> where it can and does cause repeated and repeatable crashes
I call bullshit. Please post credible references to people actually experiencing crashes while gaming as a result of this.
Re:Just bought two of these cards by JustNiz · 2015-01-26 10:23 · Score: 5, Informative

>> I run three 30" 2650x1600s
Thats pretty much irrelevant. GPU ram isn't used that way at all. Its used to hold the 3D geometry, bitmaps, bump maps etc of assets and other processing data which is largely if not completely independent of screen resolution/no.of screens.
Do the math:
2560 x 1600 x 4 (4 bytes per pixel for 32 bit color) = 15.625 Mb * 3 monitors = screen buffer for 3 screens total size = 46.875 Mb.
Even triple buffering your total screen buffer requirement for all 3 monitors is less than 150Mb.
Re:Just bought two of these cards by Yunzil · 2015-01-26 11:10 · Score: 2

Yeah, you return those cards and give nvidia more money for the more expensive cards. That'll show 'em!
Re:Car Analogy by gman003 · 2015-01-26 17:39 · Score: 2

Both of you suck at car analogies.
Let's say Nissan makes an engine. V6, 3.8L. They advertize it as being 250HP, promote it mainly by putting it in racecars and winning races, and a whole lot of other technical specs get handed out to reviewers to gush over, but nobody really reads them except nerds.
They then make a variant engine. Same V6, but they cut the stroke down so it's only 3.0L. They advertize it as being 200HP, promote it with some more racecars that don't win the overall race but are best in their class, and again they hand out a small book worth of technical specs, this one with a minor error in the air flow rates on page 394. Somebody forgot to edit the numbers from the 3.8L engine, so even though the actual airflow is more than enough for the smaller engine, the numbers originally given look bigger. Nobody from marketing was told about the airflow change, because it was a weird side-effect of something they got rid of related to turbocharger compatibility, and nobody thought to ask the engineers to double-check all of their numbers since only like 200 people would see it worldwide anyways.
Once actual customers get their hands on the new engine, most of them are pretty happy. The 3.8L is better, but it costs like twice as much as the 3.0L, so whatever. One customer is driving on this godawful, decrepit highway that hasn't been repaved since the Eisenhower administration built it, and obviously has some issues. Rather than blame the shitty conditions, he takes a look at the engine, and finds that if you take an air compressor and blow air through the intakes, not as much gets to the engine as in the 3.8L. He then bitches about it online, and other people find the same thing. Motorheads being just as collectively retarded as any group, they build a standardized test set that completely ignores realistic driving conditions and pretty much only identifies this particular oddity in this particular engine, and take to the streets waving torches and pitchforks when they find the air flow value on page 394 isn't the airflow they're getting.
Someone at Nissan hears the noise outside, checks with their internal books and finds the typo. They start explaining as quickly and loudly as they can, but the mob's angry and nobody's going to stop it with logic at this point.
Meanwhile, the smart motorheads are sitting back, waiting for Nissan to drop the price on the "tainted" engine so they can pick one up for cheap themselves, since it's actually a perfectly fine engine, already a pretty good one for the price, and way more fuel-efficient than Audi's equivalent.