Benchmarking the Benchmarks
apoppin writes "HardOCP put video card benchmarking on trial and comes back with some pretty incredible verdicts. They show one video returning benchmark scores much better than another compared to what you get when you actually play the game. Lies, damn lies, and benchmarks."
We used to benchmark a computer by *gasp* actually running things on it. If you wanted to find out how well it would perform running a game, you played the damn game and found out. Course, thats not good enough for these ubernoobs who think they are cool with their benchmark scores on their forum signatures...
If sharing a song makes you a pirate, what do I have to share to be a ninja?
aren't you being just a little bit... oh, I dunno... offtopic?
Either I misunderstood you, or I don't see how the license can be a metric of performance or accuracy.
Onda Technology Institute
I have no idea what this means, but it certainly sounds like Crysis has left its mark somewhere or other.
I always mod up spelling trolls.
Is your benchmark of the benchmarks accurate? We might have to benchmark it.
I used to do this benchmark:
10 PRINT TIME$
20 FOR I=1 TO 9999
30 NEXT I
40 PRINT TIME$
I then improved it to be:
10 A$=TIME$
20 IF A$=TIME$ THEN GOTO 20 !breaks out when the seconds change
30 I=1:A$=TIME$
40 I=I+1:IF A$=TIME$ THEN GOTO 40
50 PRINT I
Ahhh...the good old days... (1970s, early 1980s)
Benchmarking using actual games is, of course, important. But part of the reason a lot of us buy video cards and such isn't JUST about the performance on today's games, but for how they'll play the games coming out in the next few months. Synthetic benchmarks often implement advanced features not currently seen in today's games, but which will be implemented in just-over-the-horizon games. So while clearly one ought not judge a card purely on 3DMark or similar benchmarking suites, they do have their uses.
Apparently you were using the wrong benchmark. You just thought you were fast.
Layne
...And an international benchmarking committee.
To avoid concentrating all the data management in a single entity, we need a national benchmarking committee for each country and then international elections to get a chief of benchmarking interrelationships or CBI.
To avoid the possible corruption of the CBI, we would need an independent international supervision committee for the review of benchmarking standards.
The IISCRBS would review the actions of the CBI yearly and produce a thorough report.
That report (which would be called the IISCRBS-CBI report) would be the main reference to start any kind of productive debate about who has the leetest rack and who's a lame n00b.
Duh, a benchmark is a controlled test performed "on a bench" - meaning, in a controlled environment with specific, well-described procedures.
You must perform the same exact test on all video cards, disclose any variables, and you must not "pick a subset of completed tests to publish". You must not compare tests performed using different procedures, no matter how slight the deviation of the procedures are.
One cannot draw conclusions about "real world" performance from a benchmark. The benchmark is merely an indicator. A "real world" test that uses the strong, formalized procedures of a benchmark IS a benchmark - and suddenly, the benchmark is not "real world" - because the "real world" doesn't have formal procedures for gameplay.
Haphazard "non-blind" gameplay on a random machine is NOT a benchmark, and it can not provide useful, comparable numbers.
A good benchmark is one where (1) most experts agree that it has validity, and (2) one where the tester cannot change the rules of the game.
The numbers of a benchmark are meaningless, except in terms of being compared to one another using the same exact procedure.
without using the screen-recording functionality, the overhead should be statistically irrelevant.
upon the advice of my lawyer, i have no sig at this time
Are you one of those software pirates?
Modern copyright is theft of culture from everyone and it retards the progress of the useful arts and sciences.
Here are a few that I had :
- is triple-buffering on or vsync off? This will make a huge difference to real time versus sped up timedemos
- is sound on when playing back both types of timedemos?
- how does FRAPS affect your benchmark scores?
Finally, in relation to the Crysis real world gameplay versus the AT benchmark score, I thought it was common knowledge that the game would be slower when actually playing it because you likely have physics,AI,logic,sound calculations to do that you don't in timedemo mode. What is the big deal here?
Um, they come up with what is probably the most useful data of all:
The highest playable settings for given hardware.
They then change the video card and find the highest playable settings for that hardware.
I'd much rather compare the highest playable settings for two different cards than the timedemo benchmark numbers for two different cards.
"As long as you don't run two 30 inch monitors, any name brand video card for about 200 bucks will give you great playable rates at 1680 x 1050."
Not in Crysis, Call of Duty 4, UT3, etc.
When I go to plunk down $200 - $300 on a video card, and one of them performs comfortably at my LCD's native resolution and the other one doesn't, that matters. Saying all cards in a given price range are roughly equivalent is saying that you are completely, 100% blind to the reality of video cards today.
One thing that's bothering me is that HardOCP said "Anandtech benchmarked this card vs. an 8800GTS and said it came out faster, then we benchmarked it against an 8800GTX and it game out faster, then people complained that our results didn't match". Isn't that expected? The GTX is a faster card than the GTS last time I looked. Why is it such a shock that the ATI card came in between them in performance?
It is a bit of a shock that ATI's latest and greatest can't seem to consistently beat nVidia's over a year old GTX cards I guess.
I read the internet for the articles.
You know that's totally intractable, right?
For example: 1620x1050 with no AA may be considered unplayable (jaggies) for some, but others it's perfectly fine...
Or, maybe you can turn on the AA, but deactivate shadows, changing your whole "playable" demographic again.
It's like asking someone to benchmark coffee at different resturants to grade whether it is palletable or not.
~D
This sig has been enciphered with a one-time pad. It could say almost anything.
Translation: if you mod me down, I will become more insightful than you can possibly imagine.
Thank God for evolution.
Either I misunderstood you, or I don't see how the license can be a metric of performance or accuracy.
Clearly you haven't been drinking enough of your Kool Aid. Please contact the FSF and request more immediately.