Benchmarking the Benchmarks
apoppin writes "HardOCP put video card benchmarking on trial and comes back with some pretty incredible verdicts. They show one video returning benchmark scores much better than another compared to what you get when you actually play the game. Lies, damn lies, and benchmarks."
damn i hate benchmarks
Its no wonder that most modern benchmarks are innacurrate, given that they tend to benchmark propietary, closed source software, running on propietary, closed source operating systems. Where they to run benchmarking software on Open Source operating systems, such as Ubuntu, then their results would not only be more accurate, but fairer. The fact that Open Source software would also have much higher scores then propietary, closed source software goes without saying.
We used to benchmark a computer by *gasp* actually running things on it. If you wanted to find out how well it would perform running a game, you played the damn game and found out. Course, thats not good enough for these ubernoobs who think they are cool with their benchmark scores on their forum signatures...
If sharing a song makes you a pirate, what do I have to share to be a ninja?
Correct me if I'm wrong, but doesn't FRAPS have some sort of overhead while running? I certainly don't disagree with their findings, but it seems to be a factor they didn't account for between the traditional timedemo benchmarks and their FRAPS-ified benchmarks.
I have no idea what this means, but it certainly sounds like Crysis has left its mark somewhere or other.
I always mod up spelling trolls.
Is your benchmark of the benchmarks accurate? We might have to benchmark it.
I used to do this benchmark:
10 PRINT TIME$
20 FOR I=1 TO 9999
30 NEXT I
40 PRINT TIME$
I then improved it to be:
10 A$=TIME$
20 IF A$=TIME$ THEN GOTO 20 !breaks out when the seconds change
30 I=1:A$=TIME$
40 I=I+1:IF A$=TIME$ THEN GOTO 40
50 PRINT I
Ahhh...the good old days... (1970s, early 1980s)
Benchmarking using actual games is, of course, important. But part of the reason a lot of us buy video cards and such isn't JUST about the performance on today's games, but for how they'll play the games coming out in the next few months. Synthetic benchmarks often implement advanced features not currently seen in today's games, but which will be implemented in just-over-the-horizon games. So while clearly one ought not judge a card purely on 3DMark or similar benchmarking suites, they do have their uses.
Apparently you were using the wrong benchmark. You just thought you were fast.
Layne
Well benchmarks are like reviewing hardware... where have i seen something about a score of game that got the reviewer fired for being honest and not complying to the agreement?..hum
...And an international benchmarking committee.
To avoid concentrating all the data management in a single entity, we need a national benchmarking committee for each country and then international elections to get a chief of benchmarking interrelationships or CBI.
To avoid the possible corruption of the CBI, we would need an independent international supervision committee for the review of benchmarking standards.
The IISCRBS would review the actions of the CBI yearly and produce a thorough report.
That report (which would be called the IISCRBS-CBI report) would be the main reference to start any kind of productive debate about who has the leetest rack and who's a lame n00b.
I have what was a "hot" card only eighteen months ago (7800) ago and now it is stuttering on some of the newer content when I'm raiding. The rest of the game is glass smooth. Suppose it could be the PC but it is a pretty good PC too.
Would love a site that showed "here is the game on the highest settings on these CPU/GFX combos".
She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
Duh, a benchmark is a controlled test performed "on a bench" - meaning, in a controlled environment with specific, well-described procedures.
You must perform the same exact test on all video cards, disclose any variables, and you must not "pick a subset of completed tests to publish". You must not compare tests performed using different procedures, no matter how slight the deviation of the procedures are.
One cannot draw conclusions about "real world" performance from a benchmark. The benchmark is merely an indicator. A "real world" test that uses the strong, formalized procedures of a benchmark IS a benchmark - and suddenly, the benchmark is not "real world" - because the "real world" doesn't have formal procedures for gameplay.
Haphazard "non-blind" gameplay on a random machine is NOT a benchmark, and it can not provide useful, comparable numbers.
A good benchmark is one where (1) most experts agree that it has validity, and (2) one where the tester cannot change the rules of the game.
The numbers of a benchmark are meaningless, except in terms of being compared to one another using the same exact procedure.
Okay, so benchmarks don't adequately reflect real applications. Not much of a surprise there...
But does this impact their usefullness in comparing hardware at all?
=Smidge=
They never use the same game configuration, so trying to figure out how much faster one thing is than another is impossible. Rather than have 1 variable (the hardware being benchmarked), they use 2 variables (the hardware, and the settings of the benchmarked software).
Are you one of those software pirates?
Modern copyright is theft of culture from everyone and it retards the progress of the useful arts and sciences.
Here are a few that I had :
- is triple-buffering on or vsync off? This will make a huge difference to real time versus sped up timedemos
- is sound on when playing back both types of timedemos?
- how does FRAPS affect your benchmark scores?
Finally, in relation to the Crysis real world gameplay versus the AT benchmark score, I thought it was common knowledge that the game would be slower when actually playing it because you likely have physics,AI,logic,sound calculations to do that you don't in timedemo mode. What is the big deal here?
Give you an idea relative to other cards tested using the same benchmark. However, I have always found them misleading and somewhat gratuitous. Declaring a card superior over another just because it gives five more frames a second than another card is dumb. Especially when it is the difference between 110 and 115 frames per second.
As long as you don't run two 30 inch monitors, any name brand video card for about 200 bucks will give you great playable rates at 1680 x 1050.
A lot of benchmarks imply you need to sell you child to get great frame rates. In the end, playing games etc is the only way to determine real performance. Benchmarks are mainly a marketing tool. Kind of an equivalent of spam's how big you need to be to have a satisfying sex life.
Funny? No, not really.
We don't really need artificial benchmarks as they tend to mislead, even delude most people. We need real world applications, in this case that would be any modern game. Or lots of games.
"We even discussed not putting in any framerate data. Funny eh? The framerates are not used in determining the card's value or gaming ability, so why supply them?"
The simple inclusion of this line in their methodology should throw up red-flags to anyone who knows anything. Yes, FPS matter when determining how video cards stack up against each other.
Also, most of their complaints about other sites review methods come down to "time-demos and real-world play don't give exactly the same FPS readings"--if you actually bother to look at their numbers, yeah, ok, the real-world numbers were always lower than the time-demos. Jee, I wonder why this is? Maybe because they specifically noted that they went and tried to find THE most stressful part of the game for their real-world tests, while time-demos generally are not developed in order to crush your system. What they didn't bother to mention was there was no giant flip in comparative performance between time-demos and real-world tests for the cards. The ATi card trailed in time-demos and trailed in real-world performance, and the relative difference wasn't too large moving from time-demos to real-world.
So slower time-demo translates to slower real-world performance. Who would have thought?
So who's going to benchmark the benchmarks of the benchmarks?
That would be nice, especially retouching on older ones and also cheaper combos you'd find in generic desktops.
I'd also like to see a benchmark app you canr un from usb or dvd/cdrom booting. Something that gives you a clean slate to compare against running it in your existing install so you can see how much all the various apps and drivers are bogging your performance down.
Pain lasts, kid. Its how you know you're alive. Sometimes I think this growing up thing is just pain management-TheMaxx
Since when do we mod people insightful for not getting a joke (even a bad one)?
I noticed many reviewers use only 2GB of RAM, which is very unlikely in the real life, since if you can afford high end video card, why not spend a bit more to get at least 4GB of RAM. However, 4GB kicks in PAE on win32/linux32 that slows things by what, 10% ? That should bias 64/32 comparisons as well.
One thing that's bothering me is that HardOCP said "Anandtech benchmarked this card vs. an 8800GTS and said it came out faster, then we benchmarked it against an 8800GTX and it game out faster, then people complained that our results didn't match". Isn't that expected? The GTX is a faster card than the GTS last time I looked. Why is it such a shock that the ATI card came in between them in performance?
It is a bit of a shock that ATI's latest and greatest can't seem to consistently beat nVidia's over a year old GTX cards I guess.
I read the internet for the articles.
At least that is what I think he was trying to say. If ATI/NVIDIA knows that everyone will be benchmarking their respective cards using X benchmark why not write drivers that excel in that benchmark. Even further you can create hardware to much the same effect, though given the lead times for hardware design this will be harder.
What the best method for eliminating the discrepancies from those best able to code for a given benchmark is I am not sure but it seems he tries.
FLASH NEWS: [H]ardOCP throws such outdated concepts such as "controlled testing environment" and "repeatability" out the window and calls it revolutionary! Yay!
hehe.
Well you probably know what I meant and were making a funny but in case you didn't.
In EQ, on a raid, you get 54 people close to you (so they can't be clipped based on distance), and 40-70 server side creatures (player pets, monsters, the big "bad") and your machine is trying to keep up and report on and render all that in real time. My frame rate is >60 (>100?) in some content but in the new content on a raid, it can go to 10 to 20 fps unless I turn off a lot of features. Kinda sucks.
She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
MacTech Labs (part of MacTech Magazine) has done a number of benchmarks that were very mainstream in the past year -- including most recently Parallels vs. Boot Camp vs. VMware Fusion, and Office 2008. In designing each of these, we went out of our way to figure out how to make them "real world". In other words, not only to only test the things that most users would do ... but also to measure them in a way that users perceive. One way that we do that is to do the testing with stopwatches. Because, if it's not long enough to see with a stopwatch, it's certainly not long enough for a user to perceive. This has worked well ... and avoids the issue of getting erroneous timings as mentioned in other posts here.
How about benchmarking frame rates on the real platform. Friends don't let friends play games on Vista. All of the serious gamers I know avoid it like the plague because of crappy frame rates and poor performance.
Athiesm is a religion like not collecting stamps is a hobby.
EQ is in many ways a very very bad example, or in some ways I guess a good example.
Problem with EQ is that performance can vary greatly depending on the card, the drivers, and of course the settings.
There are non-graphical settings within EQ that can slow down your computer in a raid environment that won't mess with it much in a non-raiding environment. Basically anything that logs information to your hard drive will really mess you up in a raid.
But EQ has so many damn bugs in it that benchmarking would be useless. The West Bug being one that has been with the game for years now.
Don't know something? Look it up. Still don't know? Then ask.
I think your problem is SoE. I've also done raids on both EQ2 and SWG (back in the day). EQ's servers handle the load better than SWG's did back then. In SWG the lag got so bad around half of the people lost connection. So, in short, your end is not the problem.
"No fair! You changed the outcome by watching it!"
A fix for the west bug has been found.
It is posted somewhere on "therunes.net" boards. I linked it to my guild boards a couple months ago.
She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
No, he attached several harddisks to his graphics card to created a RAID
Video drivers from both ATI and nVidia would look for specific binaries known to be games used for benchmarking. Example: Quake3. You could rename your quake3 binary to quack3 and it'd perform somewhat worse.
Apparently, it had something to do with trading correctness for speed.
Don't thank God, thank a doctor!
That is true if you start recording in FRAPS, and actually probably even less than half your framerate if your proc/mem/disk speeds suck. FRAPS will give you a decent FPS display with out too much overhead. Usually though, most games have the ability to display their frame rates in game with even less overhead. And with most game publishers giving out demo's...download the demo and try it out....see what your fps is. If it sucks, decide if you really want to see the game in all its FX glory and spend the $$$ to get your rig there. Obviously if every demo you download sucks for you...low fps...its probably time to upgrade your rig, if you want to play the newer games, or just stick to Wolf3D or Doom.
Insert funny smart-ass comment here.
They've examined ONE SINGLE game and used this to (try to) invalidate the testing method for EVERY game. Sorry, doesn't work like that.
All they've proven is that there is something wrong with the timedemo system in Crysis.
Slashdot is pwned by Sourceforge Inc.
Vista at times brought my gaming rig to a crawl even when doing nothing (Vista has since been removed from my gaming rig having proven to have no benifits over XP or Linux), Ubuntu 7.10 runs with Compiz set to full on my 2yr old laptop which wasn't that good when I bought it (Cel 1.6 MHZ, 1 GB RAM, Intel 915 Graphics module) and hardly ever shows signs of stress. Point in short, the freak-out over Aero was justified but the freak-out over Crysis was blown out of proportion especially seeing as Crytek themselves said that Crysis would require a fairly chunky rig from the word go.
Calling someone a "hater" only means you can not rationally rebut their argument.
The true reason Blizzard switched from 40-man to 25-man raids in the Burning Crusade.
The thing is that games are different each time you play them, so that isn't really a benchmark. The summary says that real games are slower than benchmarks.. I mean DUHHHH! Benchmarks are (or should be) on rails, with no user interaction to ensure that they're the same on each system. Over and above what the benchmarks do, games need to monitor user input and do AI for the enemies at the very least (probably some other obvious things that I'm missing out but those seem to be the main differences to me at the moment). Benchmarks can also get away with faking physics, whereas games usually have to calculate their physics in realtime. A benchmark isn't really meant to be an objective thing - just because your computer performs well in a benchmark doesn't mean it can do well in real terms, it's there for comparing aspects of your computer subjectively.
/. method of car analogies, take the example of a drag racer that has been specifically setup to have a fast quarter mile time. When it comes to racing on a track or even everyday commuting, a drag racer is next to useless. Just because your vehicle/computer performs well in preset tests, does not mean that it is a good general purpose machine. These benchmarks test graphical prowess in a few specific areas - not full gameplaying ability.
Using the classical
which is totally what she said