Slashdot Mirror


NVidia Accused of Inflating Benchmarks

Junky191 writes "With the NVidia GeForce FX 5900 recently released, this new high-end card seems to beat out ATI's 9800 pro, yet things are not as they appear. NVidia seems to be cheating on their drivers, inflating benchmark scores by cutting corners and causing scenes to be rendered improperly. Check out the ExtremeTech test results (especially their screenshots of garbled frames)."

23 of 404 comments (clear)

  1. What's the big news? by binaryDigit · · Score: 5, Insightful

    Isn't this SOP for the entire video card industry? Every few years someone gets caught targeting some aspect of performance to the prevailing benchmarks. I guess that's what happens when people wax on about "my video card does 45300 fps in quake and yours only does 45292, your card sucks, my experience is soooo much better". For a while now it's been the ultimate hype driven market wrt hardware.

    1. Re:What's the big news? by Anonymous Coward · · Score: 5, Interesting

      Posting anonymously because I used to work for a graphics card company.

      I've seen a video card driver where about half the performance-related source code was put in specifically for benchmarks (WinBench, Quake3, and some CAD-related benchmarks), and the code was ONLY used when the user is running said benchmark. This is one of the MAJOR consumer cards, people.

      So many programming hours put into marketing's request to optimize the drivers for a particular benchmark. It makes me sick to think that we could have been improving the driver's OVERALL performance and add more features! One of the reasons I left......

    2. Re:What's the big news? by newsdee · · Score: 4, Insightful

      Now, cards are tweaked towards improved performance within a particular benchmark

      This is always the case with any chosen performance measurement. Look at managers asked to bring quarterly profits. They tend to be extremely shortsighted...

      Moral of the story: be very wary on how you measure and always add a qualitative side to your review (e.g. in this case, "driver readiness/completedness").

    3. Re:What's the big news? by medscaper · · Score: 4, Funny
      my video card does 45300 fps in quake and yours only does 45292, your card sucks

      Uhhh, can I have the sucky card?

      Please?

      --
      Any sufficiently well-organized Government is indistinguishable from bullshit.
  2. The reason by S.I.O. · · Score: 5, Funny

    They just hired some ATI engineers.

  3. As the mighty start to fall... by mahdi13 · · Score: 4, Interesting

    nVidia has been one of the more customer friendly video card makers...ever. They have full support for all platforms from Windows to Macs to Linux, this makes them, to me, one of the best companies around.
    So now they are falling into the power trap of "we need to be better and faster then the others" which is only going to have them end up like 3DFX in the end. Cutting corners is NOT the way to gain consumer support.

    As I look at it, it doesn't matter if your the fastest or not...it's the wide variety of platform support that has made them the best. ATi does make better hardware but their software (drivers) are terrible and not very well supported. If ATi would get the support that nVidia has been giving for the last few years, I would start using ATi hands down...It's the platform support that I require, not speed.

    --
    "Some things have to be believed to be seen." - Ralph Hodgson
  4. Re:Hmmmm by drzhivago · · Score: 5, Informative

    Do you remember how a year or so ago ATI released a driver set that reduced image quality in Quake 3 to increase frame rate?

    Here is a link about it in case you forgot or didn't know.

    It just goes to show that both companies play that game, and neither to good results.

  5. Re:Does this even improve your experience? by Hellkitty · · Score: 5, Funny

    You make an excellent point. I am tired of spending way too much money trying to reach that holy grail of gaming. The slight improvement in hardware isn't going to change the fact that I'm only a mediocre gamer. The best gamers are going to kick my ass regardless of what hardware they use. I don't need to spend $400 every six months to be reminded of that.

  6. Very old practice. by shippo · · Score: 4, Interesting

    I recall about 10 years ago that one of the video adaptor manufacturers optimised their Windows 3.1 acclerated video drivers to give the best performance possible with the benchmark program Ziff-Davis used for their reviews.

    One test involved writing a text string in a particular font continuously to the screen in. This text string was encoded directly in the driver for speed. Similarly one of the polygon drawing routines was optimised for the particular polygons used in this benchmark.

  7. Another reason to open-source drivers by BenjyD · · Score: 4, Insightful

    The problem is that people are buying cards based on these silly synthetic benchmarks. When performance in one arbitrary set of tests is so important to sales, naturally you're going to see drivers tailored to improving performance in those tests.

    Of course, if Nvidia's drivers were released under the GPL, none of the mud from this would stick as they could just point to the source code and say "look, no tricks". As it is, we just get a nasty combination of the murky world of benchmarks and the murky world of modern 3D graphics.

    1. Re:Another reason to open-source drivers by Obiwan+Kenobi · · Score: 4, Interesting

      5) Liability. Though it doesn't Make Sense (tm), if someone downloaded an "optimized driver" from superoptimizedrivers.com that in turn melted their chip or corrupted their vid card RAM in some way there would be repurcussions.

      Realize, in a society in which people sue others over dogs barking too loud, NVidia would definitely hear from a very small but very vocal group about it.

      6) Nivida's Programmers Don't Want This. Why? Let's say they GPL'd just the Linux reference driver. And in less than two weeks, a new optimized version came out that was TWICE as fast as the one before. This makes the programmers looks foolish. I know this is pure ego, but it is a concern I'm sure, for a programmer w/ a wife and kids.

      I know this all sounds goofy, and trivial. But politics and Common Sense do not mesh. Again, I think your intentions are great and in a perfect world there would be thousands working on making the best, most optimized driver out there.

      But if such a community were to exist (and you know it would), why bother paying a league of great programmers and not just send out a few test boards to those most active in that new community, more than willing to do work for Free (as in beer?)

      Just something to think about.

  8. Re:whatever by Pulzar · · Score: 5, Informative

    Instead of only looking at the pictures, read the whole article before making decisions on whether it's a driver "fuckup" or an intentional optimization.

    The short of it is that nVidia added hard-coded clipping of the scenes for everything that the banchmark doesn't show in its normal run, and which gets exposed as soon as you move the camera away from its regular path.

    It's a step in the direction of recording an mpeg on what the benchmark is supposed to show and then playing it back at 200 fps.

    --
    Never underestimate the bandwidth of a 747 filled with CD-ROMs.
  9. Not a big deal. by grub · · Score: 4, Informative


    One has to take all benchmarks with a grain of salt if they come from a party with financial interestes in the product. Win 2K server outperforms Linux, a Mac is 2x the speed of the fastest Wintel box, my daddy can beat up your daddy..

    It's not suprising but it is somewhat disappointing.

    --
    Trolling is a art,
  10. Re:whatever by GarfBond · · Score: 5, Interesting

    Because these rendering errors only occur when you go off the timedemo camera track. If you were on the normal track (like you would be if you were just running the standard demo) you would not notice it. Go off the track and the card ceases to render properly. It's an optimization that is too specific and too coincidental for the excuse "driver bug" to work. It's not the first time nvidia has been seen to 'optimize' for 3dmark either (there was a driver set, a 42.xx or 43.xx, can't remember, where it didn't even render things like explosions and smoke in game test 1 for 3DM03)

  11. Problem is the benchmarks themselves by Ed+Avis · · Score: 4, Interesting

    Why is it that people are assessing the performance of cards based on running the same narrow set of benchmarks each time? Of _course_ if you do that then performance optimization will be narrowly focused towards those benchmarks. Not just on the level of blatant cheating (recording a particular hardcoded text string or clipping plane) but more subtle things like only optimizing one particular code path because that's the only one the benchmark exercises.

    More importantly why is any benchmark rendering the exact same scene each time? Nobody would test an FPU based on how many times per second it could take the square root of seven. You need to generate thousands, millions of different scenes and render them all. Optionally, the benchmark could generate the scenes at random, saving the random seed so the results are reproducible and results can be compared.

    --
    -- Ed Avis ed@membled.com
    1. Re:Problem is the benchmarks themselves by satch89450 · · Score: 4, Insightful
      Nobody would test an FPU based on how many times per second it could take the square root of seven.

      Really? Do you write benchmarks?

      I used to write benchmarks. It was very common to include worst-case patterns in benchmark tests to try to find corner cases -- the same sort of things that QA people do to try to find errors. For example, given your example of a floating-point unit: I would include basic operations that would have 1-bits sprinkled throughout the computation. If Intel's QA people would have done this with the Pentium, they would have discovered the un-programmed quadrant of the divide look-up table long before the chip was committed to production.

      Why do we benchmark people do this? Because we are amazed (and amused) at what we catch. Hard disk benchmarks that catch disk drives that can't handle certain data patterns well at all, even to the point of completely being unable to read back what we just wrote. My personal favorite: how about modems from big-name companies that drop data when stressed to their fullest?

      The SPECmark group recognizes that the wrong answer is always bad, so they insist that in their benchmarks the unit under test get the right answer before they even talk of timing. This is from canned data, of course, not "generating random scenes." The problem with using random data is that you don't know if the results are right with random data -- or at least that you get the results you've gotten on other testbeds.

      Besides, how is the software supposed to know how the scene was rendered? Read back the graphics planes and try to interpret the image for "correctness"? First, is this possible with today's graphics cards, and, second, is it feasible to try? Picture analysis is an art unto itself, and I suspect that being able to check rendering adds a whole 'nuther dimension to the problem. I won't say it can't be done, but I will say that it would be expensive.

      For FPUs, it's easy: have a test vector with lots of test cases. Make sure you include as many corner cases as you can conceive. When you make a test run, mix up the test cases so that you don't execute them in the same order every pass. (This will catch problems in vector FPU implementations.) Check those results!

      Now, if you will tell me how to extend that philosophy to graphic cards, we will have something.

  12. Re:NVIDIA == Thieves and Liars if et is correct by Surak · · Score: 4, Insightful

    Yeah, but they all do it, and it isn't strictly video board manufacturers either. That '80 GB' hard drive you just bought isn't 80 GB, it's (depending on the manufacturer) either a 80,000,000,000 byte hard drive or a 80,000 MB hard drive...either way it isn't by any stretch of imagination 80 GB. That Ultra DMA 133 hard drive, BTW, can't really do a sustained 133 MB/s transfer rate either, that's the burst speed and you'll probably NEVER actually achieve that transfer rate in actual use. That 20" CRT you just bought isn't 20", it's 19.2" inches of viewable area. A 333 MHZ FSB isn't 333 MHZ, it's 332-point-something mhz, and even then it isn't really 333 MHZ because it's really like 166 mhz and doubled because DDR memory allows you to read and write on the high and low side of the clock. That 2400 DPI scanner you just bought is only 2400 DPI with software interpolation. Your 56K modem can really only do 53K due the FCC regulations requiring them to disable the 56K transfer rate. The list goes on.

  13. Re:Giveing them self a bad name by satch89450 · · Score: 4, Insightful
    [Nvidia] used to be great.. but now i have my doubts

    Oh, c'mon. Benckmark fudging has been an on-going tradition in the computer field. When I was doing computer testing for InfoWorld, I found some people in a vendor's organization would try to overclock computers so they would do better in the automated benchmarks. ZD Labs found some people who "played" the BAPco graphics benchmarks to earn better scores by detecting a benchmark was running and cutting corners.

    <Obligatory-Microsoft-bash>

    One of the early players was Microsoft, with its C compiler. I have it from a source in Microsoft that when the Byte C-compiler benchmarks figures were published in the early 1980s Microsoft didn't like being back of the pack. "It would take six months to fix the optimizer right." It would take two weeks, though, to put in recognizers for the common benchmarks of the time and insert hand-optimized "canned code" to better their score.

    </Obligatory-Microsoft-bash>

    Microsoft wasn't the only one. How about a certain three-letter company who fudged their software? You have multiple right answers to this one. :)

    When the SPECmark people first formed their benchmark committee, they knew of these practices and so they made the decision that SPECmarks were to be based on real programs, with known input and output, and the output was checked for correct answers before the execution times would be used.

    And now you know why reputable testing organizations who use artifical workloads check their work with real applications: to catch the cheaters.

    Let me reiterate an earlier comment by Alan Partridge: it's idiots who think that a less-than-one-percent difference in performance is significant. (Whether you the shoe fits you is something you have to decide for yourself.) What benchmark articles don't tell you is the spread of results they obtain through multiple testing cycles. When I was doing benchmark testing at InfoWorld, it was common for me to see trial-to-trial spreads of three percent in CPU benchmarks, and broader spreads than that with hard-disk benchmarks. Editors were unwilling to admit to readers that results were collected that formed a "cloud" -- they wanted a SINGLE number to put in print. ("Don't confuse the reader with facts, I want to make the point and move on.") I see that in the years since I was doing this full-time that editors are still insisting on "keep it simple" even when it's wrong.

    Another observation: when I would trace back hardware and software that was played with, the response from upper management was universally astonishment. They would fall over backwards to ensure we got a production piece of equipment. To some extent, I believed their protestations, especially when bearded during their visits to our Labs. One computer company (name withheld to protect the long-dead guilty) was amazed when we took them into the lab and opened up their box. We pointed out that someone had poured White-Out over the crystal can, and that when we carefully removed the layer of gunk the crystal was 20% faster than usual. Talk about over-clocking!

    So when someone says "Nvidia is guilty of lying" I say "prove it", further saying that you have to show with positive proof that the benchmark fudging was authorized by top management. I can't tell from the article, but I suspect someone pulled a fast one, and soon will be joining the very long high-technology bread line.

    Pray the benchmarkers will always check their work.

    And remember, the best benchmark is YOUR application.

  14. NVidia not cheating by linux_warp · · Score: 4, Informative

    hardocp.com on the front page has a great writeup on this.

    But basically, extremetek is just a little bit mad because they were excluded from the doom3 benchmarks. Since nvidia refused to pay the 10s of thousands of dollars to be a member of the 3dmark03 board, they have absolutely no access to the software used to create this bug.

    Here is the full exept from hardocp.com:

    3DMark Invalid?
    Two days after Extremetech was not given the opportunity to benchmark DOOM3, they come out swinging heavy charges of NVIDIA intentionally inflating benchmark scores in 3DMark03. What is interesting here is that Extremetech uses tools not at NVIDIA's disposal to uncover the reason behind the score inflations. These tools are not "given" to NVIDIA anymore as the will not pay the tens of thousands of dollars required to be on the "beta program" for 3DMark "membership".

    nVidia believes that the GeForceFX 5900 Ultra is trying to do intelligent culling and clipping to reduce its rendering workload, but that the code may be performing some incorrect operations. Because nVidia is not currently a member of FutureMark's beta program, it does not have access to the developer version of 3DMark2003 that we used to uncover these issues.

    I am pretty sure you will see many uninformed sites jumping on the news reporting bandwagon today with "NVIDIA Cheating" headlines. Give me a moment to hit this from a different angle.

    First off it is heavily rumored that Extremetech is very upset with NVIDIA at the moment as they were excluded from the DOOM3 benchmarks on Monday and that a bit of angst might have precipitated the article at ET, as I was told about their research a while ago. They have made this statement:

    We believe nVidia may be unfairly reducing the benchmark workload to increase its score on 3DMark2003. nVidia, as we've stated above, is attributing what we found to a bug in their driver.

    Finding a driver bug is one thing, but concluding motive is another.

    Conversely, our own Brent Justice found a NVIDIA driver bug last week using our UT2K3 benchmark that slanted the scores heavily towards ATI. Are we to conclude that NVIDIA was unfairly increasing the workload to decrease its UT2K3 score? I have a feeling that Et has some motives of their own that might make a good story.

    Please don't misunderstand me. Et has done some good work here. I am not in a position to conclude motive in their actions, but one thing is for sure.

    3DMark03 scores generated by the game demos are far from valid in our opinion. Our reviewers have now been instructed to not use any of the 3DMark03 game demos in card evaluations, as those are the section of the test that would be focused on for optimizations. I think this just goes a bit further showing how worthless the 3DMark bulk score really is.

    The first thing that came to mind when I heard about this, was to wonder if NVIDIA was not doing it on purpose to invalidate the 3DMark03 scores by showing how the it could be easily manipulated.

    Thanks for reading our thoughts; I wanted to share with you a bit different angle than all those guys that will be sharing with you their in-depth "NVIDIA CHEATING" posts. While our thoughts on this will surely upset some of you, especially the fanATIics, I hope that it will at least let you possibly look at a clouded issue through from a different perspective.

    Further on the topics of benchmarks, we addressed them earlier this year, which you might find to be an interesting read.

    We have also shared the following documentation with ATI and NVIDIA while working with both of them to hopefully start getting better and more in-game benchmarking tools. Please feel free to take the documentation below and use it as you see fit. If you need a Word document, please drop me a mail and let me know what you are trying to do please.

    Benchmarking Benefiting Gamers

    Objective: To gain reliable benchmarking and image quality tools

  15. Everyone seems to mess with benchmarks. by Maul · · Score: 4, Interesting

    Companies always tweak their code, insist on tests optimized for their hardware, etc. in order to get an edge up on benchmarks. This is probably especially true in cases where the competition is so neck-and-neck, as it seems to be with the video card industry. It seems that these companies will do anything to show they can get even two or three more FPS than the competition. It is hard to treat any benchmark seriously because of this.

    At the same time, I'm debating what my next video card should be. Even though ATI's hardware might be slightly better this round, the differences will probably be negligable to all but the most extreme gamers. At the same time NVidia has proven to me that they have a history of writing good drivers, and they still provide significantly better support to the Linux community than ATI does.

    For this reason I'm still siding with the GeForce family of video cards.

    --

    "You spoony bard!" -Tellah

  16. Re:Giveing them self a bad name by mmol_6453 · · Score: 4, Insightful

    One of the first courses in all college business curriculums I've seen is "Business Statistics" (BA154 here at GRCC.).

    The course focuses on making decisions based on statistics. In the second week of class, we learned what a standard deviation was, and we never stopped using it throughout the semester.

    But perhaps ignorance would explain business tactics of the 90's.

    --
    What's this Submit thingy do?
  17. Re:NVIDIA == Thieves and Liars if et is correct by Polo · · Score: 4, Funny

    I believe my 19.2" viewable-area monitor is a twenty-ONE inch monitor, thank-you-very-much!

  18. Re:STFU - who cares? by Oswald · · Score: 4, Insightful
    One of us doesn't understand the article. The way I read it, the "optimization" the card is performing would only work on the benchmark game--the performance increase it yields will never be manifested in any real game, so is useless.

    I gather you read it differently?