Making a Fair Gfx Benchmarking Utility?
Moggie68 asks: "Always when the big two release new GPU's and graphics cards that reach astounding heights with their benchmark scores, the same heated debate about unfair benchmarking utilities rises again. But what about the flipside of the coin? Would it really be that easy to construct a fair benchmarking utility for GPU's and graphics cards? What facts need to be considered? What problems solved?"
Just stick to using popular games. Seriously.
Here's the problem: ATI and NVidia have diverged a bit. They get performance upgrades from different optimizations/workflows. For this reason, performance is more a question of which card the game developer favors than it is about which card is better. Granted, what I'm saying isn't quite as black and white as that, but it's worth considering that if the benchmark uses an optimization that the game doesn't, then the benchmark is misleading.
I don't find video card benchmarks interesting, but I do enjoy CPU benchmarks. I'm a 3D artist, so render speed is very important to me. I recently had to go through the "Do I want a P4 or Athlon?" debate. Lightwave comes with benchmark scenes. You're supposed to load the scene, hit the render button, and write the number down. Some decent sites actually do the benchmark that way. That is a selling point for me, not the rest of those idiotic benchmarks that they throw in there. Yeah, like I care about how fast Office is.
I hope my point got across. Real world numbers are gold, theoretical numbers are pyrite.
"Derp de derp."
This is the probelem: the graphics drivers check the process/executable to see what program is making the graphics calls. If it matches a known target profile (benchmarking, quake3, etc), the graphics are tuned.
The problem here is that the Windows driver model allows the driver to check what program is making calls into it. This is not a bad thing by itself, so I wouldn't advocate getting rid of it.
So.. lets say you make a new benchmarking program and you don't leak any copies out to the graphics people. What happen when you release it? It might work and be fair on the current batch of drivers.. but as soon as the graphics people get their hands on it, there's nothing you can do to prevent them from "optimizing" (tuning down rendering) for your benchmark.
So maybe you can make a fair benchmark today. But as soon as you give it to anyone, don't bet on it being fair on the next driver revision.
-molo
Using your sig line to advertise for friends is lame.
There is no such thing as a fair benchmark. Each persons needs differ and therefore a different product suits those needs best. Best thing to do, is grab demo's of the things you like to do with your video cards and then head down to your local computer store and see how it works.
...repeat it
Does anyone still care about MIPS, MFLOPS, Dhrystone, Whetstone, or SPEC? Why do we want to rehash history with GPU's?
If you want a synthetic benchmark, the companies will make their product work well with the benchmark, a little else. When the inevitable happens (As it has with both major players) you should neither get upset nor demand a better benchmark, instead laugh when someone fronts a synthetic benchmark score.
So you want to know if a card you are going to buy will work well for a game that is going to come out in 6 months to a year. We'd all like to know the future as well, I'd prefer a crystal ball.
"I don't know that atheists should be considered citizens, nor should they be considered patriots." George HW Bush
One possibility is to have each vendor create two test suites -- a suite that the vendor thinks highlights the best performance features of their own system and a suite that highlights the worst performance features of the competitor's system. For two vendors, this results in a total for 4 test suites (vendor 1's favorites, vendor 1's killer for vendor 2, vendor 2's favorites, vendor 2's killer for vendor 1).
Then run all four suites on both systems and take normalized averages. The best system can win only by being robust and of overall high performance. With four tests in all, the vendor's own "best foot forward" suite can't overweight the result. And with the other vendor looking for any weaknesses, the downsides of each vendor's system becomes quite evident.
Such testing may not produce over-optimized one-application super-stars, but it should lead to well-rounded graphics boards for high performance on a range of graphical display tasks.
I bet that ATI and NVidia will never go for this approach becuase it would lead to real head-to-head fair competition as opposed to carefully staged, optimized, marketing-controlled demos.
Two wrongs don't make a right, but three lefts do.