Examining Benchmarking
VL writes "Benchmarks exist to determine how a particular piece of hardware performs in relation to itself, and to others. Question is, are readers getting the information they really need?"
← Back to Stories (view on slashdot.org)
Benchmarks are inherently flawed for the reasons stated in the posts. Comparing hardware to itself and similar hardware means there's no external reference point. Comparing one thing to another is okay, but you can't get absolute numbers in a closed Platonic system.
Goedel's Incompleteness Theorem states that you can't define a system entirely in its own terms, and that any system needs to be defined by terms outside of it.
So, how can you accurately rate hardware based on similar hardware? To meet the GIT (Goedel's Incompleteness Theorem), you would need to compare the hardware with something outside of the system, so you have an external reference point. For example, if you're benchmarking graphics cards, you need to also compare them to something outside of that area of hardware.. so.. say, a graphics tablet, or an iPod.
So, say that the first graphics card is 0.7% compared to the iPod, we now have an external reference to use with the other graphics cards.. so a better card might be 10% compared with the iPod, or a few percent compared to the graphics tablet, which proves that the second card is better than the first, due to the respective ratings compared to the external objects.
This is just regular math. I have to say, it's pretty amazing what you can apply regular math to.. yes, even benchmarks!
It all depends on the range of excercise-able aspects of some hardware a particular benchmarking suite excercices. That's why you prefer a suite rather than a stand-alone benchmark. For instance, Top500.org ranks HPC machine according to LINPACK, for which the ES (earth simulator) of course does well due its vectorization capabilities.
So, if you want to know about your hardware, you better run more than one benchmark, and more importantly, your 'problem code'. Yes, you want hardware that performs well for you problem. Something that can be good in general, is ratrher rare.
My favorite computers haven't been the fastest. In fact, I've been the most productive on systems that were objectively less impressive.
My favorite Operating Systems haven't been the ones with the best selection of software.
My favorite games haven't been the ones with the best graphics.
The reviews I find most valuable don't have the most complete set of numbers of why something's the best or worst.
It's interesting that the goal of benchmarks is to be objective as possible, when it's the subjective that makes me want to buy or not buy something. But meanwhile the more the objectivity of the benchmark tests are in doubt, the less important the tests become. So I guess that means benchmarks don't mean anything to me one way or the other, huh?
Alex.
The problem is as a benchmark becomes widespread and respected, the incentive to cheat the mark increases at a much greater rate.
For less widely used benchmarks, its possible to do one offs in the lab and include the false results in the marketing material. The primary examples of this are spec, drhystone, and whetstone. For awhile Intels compilers had recognition routines just for these benchmarks. Apple has always done tuned versions of the benchmarks.
Once a benchmark gets into the wild and is in a form that anyone with a website can just load without too much trouble on a machine, you get manufacturers actively moving to cheat the benchmark. Best examples are Nvidia and ATI's optimizations that are specific to 3dmark and quake III.
I don't know of anyone who would buy a piece of hardware solely on a benchmark, However salesmen when they can't sell are without peer in inventing excuses and shifting blame. So as long as you have sales goals that are unrealistic and salespeople that are good at inventing excuses, you will have engineering departments forced to cheat the benchmarks.
Its true money changes everything.
The bottom line is that you really can't put much trust in benchmarks. Well... Thats not exactly true, but think - of those games and apps that you always see the same people run over and over again, how many of those do you use on a daily basis? Personally, i've read so many reviews that I don't even have to think about what a pixel shader is anymore, so it probably will come as no suprize that I skip through the mumbo jumbo they tell you about the card and go straight to the benchies. And its always the same ones.
Thats all well and good, and I guess it gives you a VERY generic view at how those particular things work, but how about real life performance? How about a screenshot in the HL mod Natural Selection when there are 15 turrets firing at bile bombing aliens with the show_fps set to 1? Can we get something like that? I guess that would consitute in there with fill rate, and before you tell me thats an arcane game. Let me direct you to the little X on the top right of your browser. I don't care.
You can get a very good idea about the speed of a card, but you have no idea what the card will have trouble with until you load up your copy of Star Wars : Pod Racer just to be greeted by a big white screen when the race starts. Thats one thing I really miss about 3dfx. Thier cards worked. Always. Well, at least they did at the time.