Making a Fair Gfx Benchmarking Utility?

Stick to the games by NanoGator · 2003-09-19 12:19 · Score: 3, Insightful

Just stick to using popular games. Seriously.

Here's the problem: ATI and NVidia have diverged a bit. They get performance upgrades from different optimizations/workflows. For this reason, performance is more a question of which card the game developer favors than it is about which card is better. Granted, what I'm saying isn't quite as black and white as that, but it's worth considering that if the benchmark uses an optimization that the game doesn't, then the benchmark is misleading.

I don't find video card benchmarks interesting, but I do enjoy CPU benchmarks. I'm a 3D artist, so render speed is very important to me. I recently had to go through the "Do I want a P4 or Athlon?" debate. Lightwave comes with benchmark scenes. You're supposed to load the scene, hit the render button, and write the number down. Some decent sites actually do the benchmark that way. That is a selling point for me, not the rest of those idiotic benchmarks that they throw in there. Yeah, like I care about how fast Office is.

I hope my point got across. Real world numbers are gold, theoretical numbers are pyrite.

--
"Derp de derp."

Can't be done if driver authors want to skew it.. by molo · 2003-09-19 12:25 · Score: 3, Informative

This is the probelem: the graphics drivers check the process/executable to see what program is making the graphics calls. If it matches a known target profile (benchmarking, quake3, etc), the graphics are tuned.

The problem here is that the Windows driver model allows the driver to check what program is making calls into it. This is not a bad thing by itself, so I wouldn't advocate getting rid of it.

So.. lets say you make a new benchmarking program and you don't leak any copies out to the graphics people. What happen when you release it? It might work and be fair on the current batch of drivers.. but as soon as the graphics people get their hands on it, there's nothing you can do to prevent them from "optimizing" (tuning down rendering) for your benchmark.

So maybe you can make a fair benchmark today. But as soon as you give it to anyone, don't bet on it being fair on the next driver revision.

-molo

--
Using your sig line to advertise for friends is lame.

Fair Benchmarks by m0rph3us0 · 2003-09-19 12:33 · Score: 3, Insightful

There is no such thing as a fair benchmark. Each persons needs differ and therefore a different product suits those needs best. Best thing to do, is grab demo's of the things you like to do with your video cards and then head down to your local computer store and see how it works.

Re:Fair Benchmarks by grolschie · 2003-09-22 21:09 · Score: 1

I agree, there is always some way to screw with benchmarks. Especially when there are so many settings in the display adapter properties that can screw your performance or increase it.

However, benchmarks are a good ballpark guide to whether I should buy an ATI or Nvidia. I am so glad I read the benchmarks when I made the ATI vs Nvidia decision recently. This time ATI won me over for price and performance. Previously it was NVidia. Who knows who will win next round. It seems that just as soon as you have shelled out a fortune, they release the next chipset on you and your current card is obsoleted.

Re:Can't be done if driver authors want to skew it by BusterB · 2003-09-19 12:35 · Score: 2, Interesting

Benchmarkers can just always rename their benchmark programs to something else when testing. Isn't this how a lot of recent driver optimizations were discovered in the first place? How about a benchmark installer that installs a differently-named executable every time.

Re:Can't be done if driver authors want to skew it by Lshmael · 2003-09-19 12:49 · Score: 1

But what if the graphics drivers then use the memory space of the process to see if it is a benchmarking program running? You get into an "arms race" of sorts, like that between malicious code writers and antivirus companies, or crackers (as in people who crack programs) and shareware programmers.

Those who do not learn from history are doomed to by Mad+Quacker · 2003-09-19 12:50 · Score: 3, Interesting

...repeat it

Does anyone still care about MIPS, MFLOPS, Dhrystone, Whetstone, or SPEC? Why do we want to rehash history with GPU's?

If you want a synthetic benchmark, the companies will make their product work well with the benchmark, a little else. When the inevitable happens (As it has with both major players) you should neither get upset nor demand a better benchmark, instead laugh when someone fronts a synthetic benchmark score.

So you want to know if a card you are going to buy will work well for a game that is going to come out in 6 months to a year. We'd all like to know the future as well, I'd prefer a crystal ball.

--
"I don't know that atheists should be considered citizens, nor should they be considered patriots." George HW Bush

Mutual generation of fair tests by G4from128k · 2003-09-19 12:54 · Score: 3, Interesting

One possibility is to have each vendor create two test suites -- a suite that the vendor thinks highlights the best performance features of their own system and a suite that highlights the worst performance features of the competitor's system. For two vendors, this results in a total for 4 test suites (vendor 1's favorites, vendor 1's killer for vendor 2, vendor 2's favorites, vendor 2's killer for vendor 1).

Then run all four suites on both systems and take normalized averages. The best system can win only by being robust and of overall high performance. With four tests in all, the vendor's own "best foot forward" suite can't overweight the result. And with the other vendor looking for any weaknesses, the downsides of each vendor's system becomes quite evident.

Such testing may not produce over-optimized one-application super-stars, but it should lead to well-rounded graphics boards for high performance on a range of graphical display tasks.

I bet that ATI and NVidia will never go for this approach becuase it would lead to real head-to-head fair competition as opposed to carefully staged, optimized, marketing-controlled demos.

--
Two wrongs don't make a right, but three lefts do.

Re:Mutual generation of fair tests by Snowmit · 2003-09-19 13:46 · Score: 1

Or they'd spend all of their time writing suites that crash the other guy's system.

--
I have a lot of opinions about Cyborgs and Architects
Re:Mutual generation of fair tests by i_am_nitrogen · 2003-09-19 14:21 · Score: 1

This is not acceptable. The benchmarks cannot be developed by anyone influenced by the hardware manufacturers. Otherwise you'll have manufacturer A putting sleep calls in their anti-manufacturer-B benchmark and vice versa. Then you'll just have a test of how quickly the computer finishes 2000 calls to sleep for 100ms, rather than 2000 calls to draw the screen and swap buffers.

Then you'll have driver manufacturers figuring out a way to disable the sleep system call....

--
A solution to the problem with music today

Benchmarks and Subjectivity by Henry+V+.009 · 2003-09-19 12:56 · Score: 2, Insightful

The problems with benchmarking graphics cards have traditionally been:

How do you benchmark image quality?
How do you compare different performance advantages in different areas?
How do you stop the card manufacturers from cheating on the tests?

The only way to test the first is with the human eye. You need to look at two images and make a subjective decision on which is better. And the programs that generally have the right amount of graphical frills are popular games.

The performance question is harder. But again, popular games level the playing field. When you benchmark using a game you know that programmers are actually using the features you are testing.

And finally, there is the matter of cheating. If a manufacturer is noticeably decreasing image quality for frame rate, he is usually "cheating." When image quality is maintained, it is an optimization. So again, it becomes a matter of subjective judgments of the human eye.

Subjective judgments are not so bad of course. A five star restaurant is only subjectively better than a two star restaurant. But usually that will mean a lot to the customer. So we can tolerate the errors that come from benchmarking cards from games pretty well. When manufacturers pull their tricks, you can bet that the review sites will be there to catch them.

Re:Can't be done if driver authors want to skew it by BusterB · 2003-09-19 13:14 · Score: 1

OK, then how about benchmarking in Linux or FreeBSD. They both support Direct Rendering Manager. I'm sure that a vendor arms race would be a welcome sight in the free operating system arena.

Re:Can't be done if driver authors want to skew it by molo · 2003-09-19 13:43 · Score: 2, Interesting

Then the drivers will check a md5sum of the executable.. or they'll search for certain signatures within the file.. plenty of options.. it would be an arms race of sorts. There's no way to gurantee it.

-molo

--
Using your sig line to advertise for friends is lame.

Re:Can't be done if driver authors want to skew it by borgboy · 2003-09-19 14:18 · Score: 1

The problem here is that the Windows driver model allows the driver to check what program is making calls into it. This is not a bad thing by itself, so I wouldn't advocate getting rid of it.

Hey, this aint MSDN. Get your priorities straight!

--
meh.

Re:Can't be done if driver authors want to skew it by billcopc · 2003-09-19 16:07 · Score: 1

My proposition: randomize the program name (as reported to the OS/scheduler).

--
-Billco, Fnarg.com

Cheating 101 by billcopc · 2003-09-19 16:21 · Score: 1

In graphics, everything is redundant because you really can't see that lone pixel among the other 1920x1440. So the solution is to render one out of every four polygons... tada, 4x performance.

--
-Billco, Fnarg.com

Re:Cheating 101 by Anonymous Coward · 2003-09-20 13:54 · Score: 1, Interesting

Yes that'll work... right up until the drivers decide to drop that huge polygon that was supposed to be part of a mountainside.

One thing most benchmark folk miss by TheLink · 2003-09-19 18:05 · Score: 2, Interesting

Those typical office/desktop benchmarks aren't real world.

Why? Coz they don't have antivirus software running in the background. AV software running in the background could change results significantly.

In most offices, the desktop PCs have AV software installed. If they don't have AV software installed, they usually have worms and viruses and those tend to take up more CPU.

That's real world.

Which AV software to use in the benchmark is one question that they may not want to deal with ;).

But, hey, doesn't anyone want to know whether AV+apps works better with or without Hyperthreading enabled etc? Whether it works better with Athlons or P4s?

Oh well..

--

Too many replies beneath your current threshold

OK, So here's what we do: by Rick+the+Red · 2003-09-19 18:40 · Score: 2, Interesting

OK, So here's what we do:

We take a bunch of gamers and group them by what video card they own. We give each of them the test board. After one month we take away the test board and give them their old one back. The benchmark is: How many out of 10 owners of board X would buy the test board? Because that's what you really want to know, right? And who better to tell you this than people who own the same board you do?

--
If all this should have a reason, we would be the last to know.

I fail to see the problem by 0x0d0a · 2003-09-19 20:52 · Score: 2, Informative

So...what exactly is wrong with this?

I can't see why you'd care whether a vendor is "cheating" or not. Lets say that you're a Tribes 2 fan. You run out and look at Tribes 2 benchmarks in reviews. The reviewer says something about image quality, and includes bits of screenshots (I vaguely remember this happening with the Riva128 and G200 the last time I purchased a 3d card for gaming). End of story.

Now, there are a couple of possibilities. First, both you and the reviewer can't see the image quality degradation that's taking place, and you do notice the speed increase. That's not cheating! The card vendor has just figured out a way to provide you with more resources that you care about at the cost of something that you don't even notice. We do this all the time with lossy compression in JPEG and MP3 -- you don't care about 90% of the data, but you do care about the size savings. People didn't care when lossy texture compression became the standard on video cards because the only thing that lossless compression gives them is a psychological "this is a flawless image".

Another possibility is that the reviewer or you notice image quality degradation. If this is the case, the card gets a lower image quality score. Big deal!

Finally, you may be worried about game-specific tweaking in that the game won't provide a representative sample of how the card will do on other games. This is *always* the case! Cards could perform quite differently on any set of games just due to the fact that designs differ, and different things form a bottleneck on different cards in different games.

Just let some reviewer sit down and try the stupid card out, and if they're enjoying the card...hey, who cares what hacks are included in the driver?

--
May we never see th

Re:I fail to see the problem by gl4ss · 2003-09-19 23:00 · Score: 1

well.. the problems escalate when the drivers is tweaked only for those default benchmarking runs, having precalculated data for them.

that is, the game itself will NOT run as the benchmark portrays, the tweaks being useless or normal gaming.

if any sanity in how the drivers act from a programmer point of view it should be that the program is tweaked for the drivers, not the other way around(as the driver should just do what the spec says, and do it exactly. i fail to see where the point is whoring the drivers for synthetic benches and getting caught on it).

--
world was created 5 seconds before this post as it is.
Re:I fail to see the problem by GoofyBoy · 2003-09-20 08:12 · Score: 1

Also, its going to get to the point where the cheating could be that it detects when a screenshot is taken and then boosts up the quality for the current frame.

I haven't heard it happening but thats what its going to get to.

--
The surprise isn't how often we make bad choices; the surprise is how seldom they defeat us.
Re:I fail to see the problem by WhiteWolf666 · 2003-09-21 09:53 · Score: 1

In fact, this HAS already happened.

Someone figured out that the two or three releases ago, the Nvidia Detonator's did exactly that, detecting screenshots, and boosting up the quality for that frame.

Unfortunately, it is difficult to determine if the drivers are still exhibiting that behavior, because Nvidia now supplies drivers where the code is encrypted, and decrypted in a 'just-in-time' fashion.

Sketchy. Very very sketchy. ATI for me.

--
WhiteWolf666 an exBush supporter. All you new-school,compassionate,save the children Republicans can rot in hell
Re:I fail to see the problem by Anonymous Coward · 2003-09-23 07:07 · Score: 0

I don't know if it's more cheating or false advertising. I'm one who prefers better quality to faster frame rates, and when ATI (I believe) would let you select better AA but then actually use a lower quality AA just so they could get 160fps instead of 120fps or whatever on Quake3, I was pissed.

Trying to crash the other vendor's system is OK by G4from128k · 2003-09-19 22:16 · Score: 1

This would drive both vendors to improve the robustness of their chips and drivers. Knowing that the competitor is goign to try to crash your system would put pressure on the development team to avoid or fix bugs.

These would be true test suites as opposed to nice speed demo suites. As a graphic board customer, I do want speed. But I would probably say that robustness has a higher implicit priority. A graphics chip that crashes is the last thing I want, regardless of how fast it is on some more limited set of code.

--
Two wrongs don't make a right, but three lefts do.

Re:Trying to crash the other vendor's system is OK by Snowmit · 2003-09-20 00:21 · Score: 1

This would drive both vendors to improve the robustness of their chips and drivers. Knowing that the competitor is goign to try to crash your system would put pressure on the development team to avoid or fix bugs.

Here's the thing - when you run a game that crashes the graphics chips, you don't patch the drivers, you patch the game. Writing drivers that will survive running malicious code takes time away from addressing other programming issues and the thing is that no one except for your compititor is writing that kind of code into their App.

The more fundamental problem is that all any kind of test can ever measure is your ability to do well at that test. We'd like to hope that there is a correlation between the test and real world performance but as we've seen the driver coders are quite happy to tweak test results at the expense of real performance improvements.

Creating a new, even more artificial, set of tests does not solve this problem. It makes it worse.

--
I have a lot of opinions about Cyborgs and Architects

Sleep calls are OK by G4from128k · 2003-09-19 22:42 · Score: 1

Perhaps I did not explain the idea well enough. Since manufacturer A has to also run the anti-manufacturer B test suite, any sleep calls will effect both of them. Because every card as to run ALL of the tests (both the "best-case" tests and "worst-case" tests of all cards), each manufacturer must make sure that their own card can handle whatever they are trying to throw at the competitor's card.

Sleep calls cannot bias the results unless the two cards have different definitions of "sleep." Bypassing sleep would not improve performance. I would assume that if one card ignored a sleep call, that would be scored as a failure by the card to execute a valid command.

--
Two wrongs don't make a right, but three lefts do.

Re:Sleep calls are OK by i_am_nitrogen · 2003-09-20 06:56 · Score: 1

Sleep calls don't go to the card. They tell the scheduler "don't run this program for the next X milliseconds." The scheduler will not schedule the test program at all. All the other manufacturer has to do is put more sleep calls in than the first manufacturer.

--
A solution to the problem with music today

DRM? by yerricde · 2003-09-20 00:19 · Score: 1

then how about benchmarking in Linux or FreeBSD. They both support Direct Rendering Manager

I thought Microsoft was using Linux's and FreeBSD's non-support of DRM as a selling point for Windows.

Oh, that DRM.

--
Will I retire or break 10K?

Worms (no, not the game) by yerricde · 2003-09-20 00:31 · Score: 2, Interesting

Writing drivers that will survive running malicious code takes time away from addressing other programming issues and the thing is that no one except for your compititor is writing that kind of code into their App.

What if somebody finds a way to break Windows through a video driver bug? What if somebody puts that exploit into the next Windows worm?

The more fundamental problem is that all any kind of test can ever measure is your ability to do well at that test.

And if that test measures a video card's ability to process OpenGL instructions without bringing down the computer, I'm all for it.

--
Will I retire or break 10K?

Re:One thing most benchmark folk miss by GoofyBoy · 2003-09-20 08:07 · Score: 1

Benchmarks are, or should be taken as, just guidelines.

In the real world there are huge number of varibles, old dll files from previous drivers, IM clients running in the background, stuff in boot config files which are old yet effects performance, stuff hanging around since the last clean reboot, physical environment etc.

--
The surprise isn't how often we make bad choices; the surprise is how seldom they defeat us.

Bleh people have it all wrong. by Zenki · 2003-09-21 11:28 · Score: 1

You need to find the video card that results in the most number of winners. Scrounge up cash, run a lan party, and get down info on the video cards that people are using.

The card that correlates to the most wins is obviously the superior video card.

Re:Can't be done if driver authors want to skew it by Anonymous Coward · 2003-09-21 12:27 · Score: 0

id10t.

Re:Can't be done if driver authors want to skew it by borgboy · 2003-09-21 14:13 · Score: 1

gutless coward

--
meh.

Objective benchmarking by Animats · 2003-09-21 16:09 · Score: 1

For OpenGL (but not for Direct-X) there are benchmarks that check that the scene was rendered correctly, by reading back the rendered image. So there's an objective definition of correctness. Run them and check.

Once that's out of the way, the next step is to crank up scene complexity until the rendering rate drops. Crank up the polygon count, the texture count, the shader count, etc. until the card misses a frame refresh time. That's what matters when you're running 3D applications. It's also what matters to game developers - this tells you your resource budget.

Being able to re-render the same scene at higher than the refresh rate is meaningless. The main reason people have so much trouble with this is that the people who write game reviews don't program.

It's easy to prevent bad drivers. by Wolfier · 2003-09-24 02:12 · Score: 1

Randomize your benchmark. It'll take a few more runs to get an average performance figure, but then the benchmark is immune to cheating drivers.

Slashdot Mirror

Making a Fair Gfx Benchmarking Utility?

40 comments