Slashdot Mirror


Benchmarking the Benchmarks

apoppin writes "HardOCP put video card benchmarking on trial and comes back with some pretty incredible verdicts. They show one video returning benchmark scores much better than another compared to what you get when you actually play the game. Lies, damn lies, and benchmarks."

126 comments

  1. Erase Futuremark = instant win by majorme · · Score: 1, Insightful

    damn i hate benchmarks

  2. OSS by Anonymous Coward · · Score: 1, Funny

    Its no wonder that most modern benchmarks are innacurrate, given that they tend to benchmark propietary, closed source software, running on propietary, closed source operating systems. Where they to run benchmarking software on Open Source operating systems, such as Ubuntu, then their results would not only be more accurate, but fairer. The fact that Open Source software would also have much higher scores then propietary, closed source software goes without saying.

    1. Re:OSS by joaommp · · Score: 2, Insightful

      aren't you being just a little bit... oh, I dunno... offtopic?

      Either I misunderstood you, or I don't see how the license can be a metric of performance or accuracy.

    2. Re:OSS by Anonymous Coward · · Score: 0

      The idea behind the parent is NOT if the license affects the hardware, but you could actually see what's happening behind the curtains, if there are any specific optimization biasing the results toward this or that manufacturer/chip/model, thus making the process transparent for everybody. But this kind of 'synthetic benchmark' is almost useless in real world situations, even if our pointy hair bosses treats this like a gospel. My english is lousy, but I think you get the idea.

    3. Re:OSS by snoyberg · · Score: 5, Funny

      PS yes...release your rage and mod me down.... just makes my post more Insightful.

      Translation: if you mod me down, I will become more insightful than you can possibly imagine.

      --
      Thank God for evolution.
    4. Re:OSS by edwdig · · Score: 2, Funny

      Either I misunderstood you, or I don't see how the license can be a metric of performance or accuracy.

      Clearly you haven't been drinking enough of your Kool Aid. Please contact the FSF and request more immediately.

    5. Re:OSS by joaommp · · Score: 1

      The fact that Open Source software would also have much higher scores then propietary, closed source software goes without saying.

      Did you read this?

      And even if the operating system or platform is opensource, that doesn't mean that the benchmark will be. He didn't mention any benchmark but actually referred to a platform. So how could you have any idea of how biased the benchmark software is/isn't?

  3. back in my day... by Aranykai · · Score: 4, Funny

    We used to benchmark a computer by *gasp* actually running things on it. If you wanted to find out how well it would perform running a game, you played the damn game and found out. Course, thats not good enough for these ubernoobs who think they are cool with their benchmark scores on their forum signatures...

    --
    If sharing a song makes you a pirate, what do I have to share to be a ninja?
    1. Re:back in my day... by Anonymous Coward · · Score: 3, Interesting

      It's not the benchmark-scores that count. Sure, you need a specific minimum to enjoy the game, but it's the actual gameplay that makes the game fun, no matter the hardware.

      I'm pretty sure these benchmarks are invented by men.

    2. Re:back in my day... by SQLGuru · · Score: 5, Funny

      And, on top of that, they are on your lawn....

      Layne

    3. Re:back in my day... by Sancho · · Score: 4, Informative

      The problem is that it's hard to objectively score performance by "running things on it." Benchmarks are nice because they run the exact same tests every time. You can't just turn on FPS display and walk around in the game to measure performance--your actions may not be the same each time, and slight variations could cause drastically different results.

      Benchmarking provides potential customers with a metric to compare potential purchases.

    4. Re:back in my day... by MWoody · · Score: 1

      Admit it, you "benchmarked" with Windows Solitaire.

    5. Re:back in my day... by MobileTatsu-NJG · · Score: 2, Insightful

      It's not the benchmark-scores that count. Sure, you need a specific minimum to enjoy the game, but it's the actual gameplay that makes the game fun, no matter the hardware.

      I'm pretty sure these benchmarks are invented by men. These benchmark scores are important when trying to determine a balance of cost vs. performance. So yes, these benchmarks were invented by men. This is because the old standard of picking the one whose color matches their shoes also resulted with the invention of the credit card.
      --

      "I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)

    6. Re:back in my day... by PReDiToR · · Score: 3, Informative

      Wolfenstein3D actually.
      That DX chip kicked the arse out of the SX models.

      Solitaire on "You just won. Watch the cards leap" was good for checking out the Windows performance, but Wolf told you how fast the PC was.

      --

      Do not meddle in the affairs of geeks for they are subtle and quick to anger
    7. Re:back in my day... by KPexEA · · Score: 1

      Since every game / program uses the hardware differently the ONLY way to compare hardware is to run the game/program or a subset of the game on the actual hardware. What would be really nice would be to have a slimmed down version of the game you want ( supplied by the game company, and preferable as small as possible so it can easily be put on a small USB drive ) that you can run on the machine in question and have it display the "score". That way, when my kid is looking for a new machine to run WoW on, I can lookup the WowTest "score" for the particular machines he is thinking of, or download the "WowTest" onto a USB drive and take it to the store and run it on some machines.

    8. Re:back in my day... by donscarletti · · Score: 4, Insightful

      There is indeed a bare minimum hardware performance required to play but sadly many new games, especially Crysis, that bare minimum is scarily close to the market's maximum. Benchmarks are supposed to be a way to isolate this and objectively measure it so that a good purchasing decision can be made by the consumer and when the game is played hopefully the subjective experience of enjoyment will follow. A framerate above human perception is needed for fun (as jerky frames lead to nausia and frustration), high detail is needed for the beauty of a game which is probably just as important (it's been the basis for visual art, music and poetry for millennia).

      The reason we've got so far and now can have computers, electricity, aeroplanes, cars, etc. is because of the willingness of scientifically inclined individuals to isolate, experiment and measure. Technology is one of the things in life that can be measured and I think it is a good idea to continue to do it, provided we can do it right. Experimentation and science is what got us out of caves no?

      As for Hardocp, what have they proven? Apparently traditional time demos run a fairly linear amount faster than realtime demos, even though it has been acknowledged that realtime demos render more including weapons, characters and effects that the canned demo does not. This would be interesting if the question was "how fast can Crysis run on different cards" but that's not what people want to know. What I'd want to know is which card should I buy to allow me to continue to play cutting edge games for as long as possible while enjoying their whole beauty but not getting a framerate low enough to make me uncomfortable. It just so happens that the card with the best timedemo benchmark has the best actual playthrough benchmark and by roughly the same factor. The only difference is that the traditional timedemo depends on only the graphics hardware whereas the playthrough benchmark depends on efficiency elsewhere in the engine (AI physics), where the player spent most time and if reviewing subjectively, the reviewers current mindset and biases.

      Somebody please think of the science!

      --
      When Argumentum ad Hominem falls short, try Argumentum ad Matrem
    9. Re:back in my day... by cHiphead · · Score: 3, Insightful

      Some of us make purchasing decisions based on the piece of shit game we are thinking of buying. Crysis is a joke with such high requirements for a playable experience. I base my game purchases on what will run on my old pos single core p4 2.8ghz box. Any game that can't impress with such insanely fast hardware as we have these days even on the 'budget' boxes is not a game worth investing in.

      I must be getting old, I haven't upgraded my box in almost 2 years.

      Cheers.

      --

      This is my sig. There are many like it, but this one is mine.
    10. Re:back in my day... by billcopc · · Score: 4, Interesting

      It's funny that you mention Crysis... people are freaking out over Crysis the same way they freaked out over Aero Glass a year ago. The reality is, Crysis runs fine on midrange gaming systems. It won't run in 1920x1200 with DX10 eyecandy on that crusty old Geforce 6200, but it certainly does not require a $2500 powerhouse to be enjoyable.

      In the end, benchmarks can be useful as long as you don't accept their results as the gospel truth. Some benchmarks favor ATI, some favor NVidia, and I'm sure there's gotta be one benchmark that favors Intel Extreme Graphics :P... the important thing is to find parallels that relate to your own needs and wants so you can put those numbers into perspective.

      --
      -Billco, Fnarg.com
    11. Re:back in my day... by billcopc · · Score: 1

      Actually, I think the FPS display is a great measure of actual performance. The benchmarks will give you abstract numbers, but the FPS display is what you're actually getting out of the game.

      It doesn't matter if you don't follow the same path each time, what counts is the actual feel... some games can get away with lower framerates in the flashy areas (e.g. Crysis), while others would be totally unacceptable.

      I believe it's HardOCP that plots graphs of the minimum, maximum and average FPS. That's a step in the right direction, IMHO.

      --
      -Billco, Fnarg.com
    12. Re:back in my day... by i.of.the.storm · · Score: 1

      Yeah, I have to agree the whole thing with Crysis is overblown, the minimum requirements are actually really low and any (intended for gaming) card made in the last two years could probably run it.

      --
      All your base are belong to Wii.
    13. Re:back in my day... by IndustrialComplex · · Score: 2, Funny

      I do remember marveling at my friend's 486 and how fast those cards bounced off the screen.

      --
      Out of modpoints but really liked a post? 1BDkF6TtmmeZ3yqXbz9yhdYVqRYnwFoXDj
    14. Re:back in my day... by Sancho · · Score: 2, Insightful

      You're conflating benchmarking games vs. benchmarking graphics cards. If you're looking for raw power for an arbitrary amount of money, you'd want to get the graphics card which has the maximum frame rate at that price. If you're looking to play a specific game, you'd look for a graphics card which most people (quite subjectively, obviously) say plays the game well.

      The point is that you can't use a standard game (plus FPS meter) played by a human player to judge a graphics card's raw capabilities. To reduce subjectivity and error, you need a consistency in what is being rendered.

    15. Re:back in my day... by immcintosh · · Score: 1

      The point is that you can't use a standard game (plus FPS meter) played by a human player to judge a graphics card's raw capabilities. To reduce subjectivity and error, you need a consistency in what is being rendered.
      What you're saying makes sense when you write it down, but after having read the article the OP is talking about, as well as some of the related articles, I think it's fair to say that they are reliably doing just that. Decide on a specific run to do through a specific section, practice it until you can do it mechanically, then report on how well it played. They make a strong point of the fact that the FPS average and charts aren't even part of what informs their analysis, and I think that's fair.

      To put it another way, they play what is effectively the exact same run through of a real game level on both cards. Will the frame rate be reported accurately? No, but the point is, that doesn't matter. What they report on was how the game played--how it felt--and that is fairly easy to qualify in a reasonably objective manner in my experience. If it's unpleasantly choppy with certain settings/hardware/whatever, that's what I really care about. Framerate tells me nothing useful, and I don't care how many triangles per second ATI/nVidia's cards can pump out on paper if the games are still unplayable.
    16. Re:back in my day... by Loopy · · Score: 1

      Which, it has been repeatedly shown, can be and are "faked" by competent video card manufacturers. Having a preset benchmark means you can tweak the solution to perform in the static environment. You can't fake what you see when actually playing the game.

      I'll give you a great example. In Crysis, my 8800GT system at home can be set up to 1280x1024 HIGH settings and still get 25FPS+ through the whole timedemo. Take those same graphics settings and try to play the last series on the aircraft carrier and it is completely unplayable. This is what gets lost in the "graphs and charts" presentations by the "canned" benchmark folks; primarily because it takes so much time to do it the hard way. ;)

    17. Re:back in my day... by Anonymous Coward · · Score: 0
      It just so happens that the card with the best timedemo benchmark has the best actual playthrough benchmark and by roughly the same factor.

      Actually, if you bother to read the article, you will find that in some cases a card with a better "timedemo" score has worse "play through" scores. Why not try for accurate numbers?

    18. Re:back in my day... by NorQue · · Score: 1

      It's funny that you mention Crysis... people are freaking out over Crysis the same way they freaked out over Aero Glass a year ago.
      And it hasn't not started with Aero Glass either. The first game I remember people complaining about hardware requirements was Wing Commander, and there must be earlier examples. People are stupid.
    19. Re:back in my day... by WhoBeDaPlaya · · Score: 1

      I beg to differ. It really seems to need shader power as using hacked Very High settings results in slowdowns in the ice cave and final carrier levels on an overclocked 8800GTS 320MB (backed by a 3.6GHz C2D E6600 and 4GB RAM).

    20. Re:back in my day... by mwvdlee · · Score: 1

      Benchmarks would be nice if the hardware manufacturers didn't optimize specifically for those benchmarks. The problem isn't automated benchmarks, it's drivers cutting corners in the benchmarks at the expense of (or atleast not benefitial to) performance in normal use.

      --
      Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
    21. Re:back in my day... by YeeHaW_Jelte · · Score: 1

      Pfff, newfangled stuff.

      I use the dir command in dos to benchmark my new computers. and have been doing so since the 8088.

      --

      ---
      "The chances of a demonic possession spreading are remote -- relax."
    22. Re:back in my day... by Glonoinha · · Score: 1

      Wing Commander? I had friends complaining about hardware requirements as far back as Choplifter and JumpMan.

      What do you mean I need to buy a 1541 single sided floppy drive for my C=64 - I just bought this tape drive six months ago, paid $100 for it and at the time it was the fastest secondary storage known to man - I could type LOAD "*",1,1 and by the time I was done eating lunch my game was ready to run, and now you tell me I have to buy a new piece of hardware just to play a game?

      --
      Glonoinha the MebiByte Slayer
    23. Re:back in my day... by billcopc · · Score: 1

      You don't beg to differ. You shouldn't be running it on "hacked Very High settings" in the first place. That's like whining that your Corolla loses maneuverability when doing 160kph on the oval track... 8800GTS 320mb, you should stick to Medium with maybe 3-4 items on High, to taste. Even SLI'd 8800 Ultras struggle with Very High settings unless you run it in 1024x768.

      Crysis is a game we'll all be able to revisit in a couple years and be wowed a second time. I'm looking forward to the next generation or two of GPUs, maybe they'll finally release something that can drive my big LCDs half-decently.

      --
      -Billco, Fnarg.com
  4. FRAPS Overhead? by roadkill_cr · · Score: 1

    Correct me if I'm wrong, but doesn't FRAPS have some sort of overhead while running? I certainly don't disagree with their findings, but it seems to be a factor they didn't account for between the traditional timedemo benchmarks and their FRAPS-ified benchmarks.

    1. Re:FRAPS Overhead? by compro01 · · Score: 2, Informative

      without using the screen-recording functionality, the overhead should be statistically irrelevant.

      --
      upon the advice of my lawyer, i have no sig at this time
  5. whatevermark by Yath · · Score: 2, Funny

    Crysis, UT3, and COD4 are the three primary games we are using currently, with Crysis performance certainly being the new watermark in the industry.


    I have no idea what this means, but it certainly sounds like Crysis has left its mark somewhere or other.
    --
    I always mod up spelling trolls.
    1. Re:whatevermark by peragrin · · Score: 1

      read it again Crysis left a watermark.

      don't ask why the water smells funny and is yellow in color.

      --
      i thought once I was found, but it was only a dream.
    2. Re:whatevermark by immcintosh · · Score: 1

      My best guess is he meant "high water mark."

  6. hmm by nomadic · · Score: 2, Funny

    Is your benchmark of the benchmarks accurate? We might have to benchmark it.

  7. My old benchmark by Anonymous Coward · · Score: 3, Funny

    I used to do this benchmark:
    10 PRINT TIME$
    20 FOR I=1 TO 9999
    30 NEXT I
    40 PRINT TIME$

    I then improved it to be:
    10 A$=TIME$
    20 IF A$=TIME$ THEN GOTO 20 !breaks out when the seconds change
    30 I=1:A$=TIME$
    40 I=I+1:IF A$=TIME$ THEN GOTO 40
    50 PRINT I

    Ahhh...the good old days... (1970s, early 1980s)

    1. Re:My old benchmark by CarpetShark · · Score: 1

      I used to do this benchmark:
      10 PRINT TIME$
      20 FOR I=1 TO 9999
      30 NEXT I


      I think I've spotted a bug. You'll need a much bigger upper limit on that loop, if you're busy-waiting for basic to be capable of something useful ;)
    2. Re:My old benchmark by sempernoctis · · Score: 4, Funny

      My favorite benchmark for finding the size of the memory heap:

      void doit(int i) { printf("%i\n", i); doit(i + 1); }

      worked really well until I tried it in an environment where the call stack could get paged...then it turned into a hard drive benchmark

    3. Re:My old benchmark by bored · · Score: 1
      finding the size of the memory heap:

      void doit(int i) { printf("%i\n", i); doit(i + 1); }

      Oh god! What has become of this site? Poor spelling and grammar I can understand. Confusing the stack and the heap is a sign of the times!

    4. Re:My old benchmark by Anonymous Coward · · Score: 0

      Had to fire up the ol' trusty...

      Hardware: CPU: MOS Technologies 6510 @ 0.9852484 Hz (PAL). Mem: 64 kB RAM. OS: Basic V2 (38911 Basic Bytes Free) (Basic rev. 3)*

      With the improved benchmark, I get the following

      Result: 106.33 with std. deviation of 1.89 (from 6 consecutive runs)

      So, what do you guys have? Anyone tried how much overclocking it can take? :)

      * POKE 1024,1 produces light-blue "A", denoting revision 3.

    5. Re:My old benchmark by sempernoctis · · Score: 1

      The stack and the heap usually occupy different ends of the same block of memory (virtual or otherwise), so when one overflows, it runs into the other. I've seen it happen, and it can cause quite a spectacular crash. Unless the stack is limited to a single segment like in the olden days..... mmmmmm....... segmented memory........

  8. Synthetics not entirely useless by Anonymous Coward · · Score: 4, Informative

    Benchmarking using actual games is, of course, important. But part of the reason a lot of us buy video cards and such isn't JUST about the performance on today's games, but for how they'll play the games coming out in the next few months. Synthetic benchmarks often implement advanced features not currently seen in today's games, but which will be implemented in just-over-the-horizon games. So while clearly one ought not judge a card purely on 3DMark or similar benchmarking suites, they do have their uses.

    1. Re:Synthetics not entirely useless by dmsuperman · · Score: 1

      Trust me, as long as there are games like Crysis to do more than 200% of what my system can handle we'll be alright.

      --
      :(){ :|:& };: Go!
    2. Re:Synthetics not entirely useless by snarfies · · Score: 1

      I prefer the term "Artificial Hardware Test" myself.

      More cornbread?

    3. Re:Synthetics not entirely useless by immcintosh · · Score: 1

      Crysis sorta breaks your argument though. One thing everybody will agree on is that there currently is NO consumer hardware that will play smoothly it at its highest settings. I've heard that a two (three?) card SLI setup of nVidia's top of the line overclocked monsters can get it to pump out 30fps or so with its settings maxed out, but that's about it. The game of tomorrow--today!

  9. Re:1st Post by SQLGuru · · Score: 2, Funny

    Apparently you were using the wrong benchmark. You just thought you were fast.

    Layne

  10. Where have i seen this by Anonymous Coward · · Score: 0

    Well benchmarks are like reviewing hardware... where have i seen something about a score of game that got the reviewer fired for being honest and not complying to the agreement?..hum

  11. We need international benchmarking standards! by Thanshin · · Score: 3, Funny

    ...And an international benchmarking committee.

    To avoid concentrating all the data management in a single entity, we need a national benchmarking committee for each country and then international elections to get a chief of benchmarking interrelationships or CBI.

    To avoid the possible corruption of the CBI, we would need an independent international supervision committee for the review of benchmarking standards.

    The IISCRBS would review the actions of the CBI yearly and produce a thorough report.

    That report (which would be called the IISCRBS-CBI report) would be the main reference to start any kind of productive debate about who has the leetest rack and who's a lame n00b.

    1. Re:We need international benchmarking standards! by Anonymous Coward · · Score: 0

      Do you work for the European Parliament?

  12. Would like to see a real world comparison for EQ by Maxo-Texas · · Score: 1

    I have what was a "hot" card only eighteen months ago (7800) ago and now it is stuttering on some of the newer content when I'm raiding. The rest of the game is glass smooth. Suppose it could be the PC but it is a pretty good PC too.

    Would love a site that showed "here is the game on the highest settings on these CPU/GFX combos".

    --
    She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
  13. Benchmarks by Anonymous Coward · · Score: 5, Insightful

    Duh, a benchmark is a controlled test performed "on a bench" - meaning, in a controlled environment with specific, well-described procedures.

    You must perform the same exact test on all video cards, disclose any variables, and you must not "pick a subset of completed tests to publish". You must not compare tests performed using different procedures, no matter how slight the deviation of the procedures are.

    One cannot draw conclusions about "real world" performance from a benchmark. The benchmark is merely an indicator. A "real world" test that uses the strong, formalized procedures of a benchmark IS a benchmark - and suddenly, the benchmark is not "real world" - because the "real world" doesn't have formal procedures for gameplay.

    Haphazard "non-blind" gameplay on a random machine is NOT a benchmark, and it can not provide useful, comparable numbers.

    A good benchmark is one where (1) most experts agree that it has validity, and (2) one where the tester cannot change the rules of the game.

    The numbers of a benchmark are meaningless, except in terms of being compared to one another using the same exact procedure.

    1. Re:Benchmarks by Xzzy · · Score: 1

      The accusation that HardOCP is making is that it is not possible to perform the exact same tests for all video cards, because software vendors sneak in shortcuts and cheats (sorry, optimizations) that screw with the numbers.

      So they threw benchmarking out, for the most part, and instead tried to make a system for measuring how well a given video card delivers a positive experience. It's not ideal.. but at least it's immune to interference from the video card makers. Now you just have to worry about bias from the reviewers. ;)

    2. Re:Benchmarks by immcintosh · · Score: 1

      Actually, I think it would be more accurate to say that their accusation is that, while you can perform those tests, your results will be totally useless and not even remotely indicative of real-world performance, as seems to be demonstrated by their Crysis benchmark. And by useless, they mean the benchmarks will lead you to believe one card is faster (ATI here), when in actuality the opposite is the case while actually playing the game.

  14. Benchmarks != Reality by Smidge204 · · Score: 1

    Okay, so benchmarks don't adequately reflect real applications. Not much of a surprise there...

    But does this impact their usefullness in comparing hardware at all?
    =Smidge=

    1. Re:Benchmarks != Reality by jonnythan · · Score: 1

      Yes.

      RTFA. It clearly shows how the canned timedemo benchmarks most sites use can be horribly misleading and give totally wrong impressions.

    2. Re:Benchmarks != Reality by Firehed · · Score: 1

      We've known this for years, which is why a lot of the better review sites moved away from timedemos a long while ago.

      However, they can still (sort of) be used to compare cards against each other. They don't do much to reflect playability of a game at given settings accurately, but in theory all of the numbers you get from a timedemo should be inflated by about the same percent.

      --
      How are sites slashdotted when nobody reads TFAs?
    3. Re:Benchmarks != Reality by jonnythan · · Score: 1

      The article attempts to show that the numbers you get from a timedemo *don't* correlate well to what you get in the real world. Some cards or drivers do better in the "timedemo -> real life" conversion than others.

      This difference is the entire point of the article.

  15. Obligatory Portal Reference by psychicsword · · Score: 0

    Lies, damn lies Just like the cake.
  16. HardOCP benchmarks suck ass by Clockwurk · · Score: 1

    They never use the same game configuration, so trying to figure out how much faster one thing is than another is impossible. Rather than have 1 variable (the hardware being benchmarked), they use 2 variables (the hardware, and the settings of the benchmarked software).

    1. Re:HardOCP benchmarks suck ass by jonnythan · · Score: 3, Insightful

      Um, they come up with what is probably the most useful data of all:

      The highest playable settings for given hardware.

      They then change the video card and find the highest playable settings for that hardware.

      I'd much rather compare the highest playable settings for two different cards than the timedemo benchmark numbers for two different cards.

    2. Re:HardOCP benchmarks suck ass by Dracolytch · · Score: 2, Insightful

      You know that's totally intractable, right?

      For example: 1620x1050 with no AA may be considered unplayable (jaggies) for some, but others it's perfectly fine...

      Or, maybe you can turn on the AA, but deactivate shadows, changing your whole "playable" demographic again.

      It's like asking someone to benchmark coffee at different resturants to grade whether it is palletable or not.

      ~D

      --
      This sig has been enciphered with a one-time pad. It could say almost anything.
    3. Re:HardOCP benchmarks suck ass by sholden · · Score: 1

      You mean precisely like people do?

      I've heard rumors that similar things are done for movies, books, games, tv shows, and even food.

      I believe the idea is to work out how closely you agree with the reviewer in question in order to determine if what they say is useful (and of course when you completely disagree they can be useful - if they love it you'll hate it sort of thing)...

      But, yes, if the point was meant to be that there is no one comparison function and hence each persons ordering will may be different, then that's clear enough. Doesn't stop people reporting that X's is better than Y's.

  17. Re:Would like to see a real world comparison for E by Digital+Vomit · · Score: 4, Funny

    I have what was a "hot" card only eighteen months ago (7800) ago and now it is stuttering on some of the newer content when I'm raiding.

    Are you one of those software pirates?

    --
    Modern copyright is theft of culture from everyone and it retards the progress of the useful arts and sciences.
  18. [H] raises more questions than it answers by tayhimself · · Score: 2, Informative

    Here are a few that I had :
    - is triple-buffering on or vsync off? This will make a huge difference to real time versus sped up timedemos
    - is sound on when playing back both types of timedemos?
    - how does FRAPS affect your benchmark scores?

    Finally, in relation to the Crysis real world gameplay versus the AT benchmark score, I thought it was common knowledge that the game would be slower when actually playing it because you likely have physics,AI,logic,sound calculations to do that you don't in timedemo mode. What is the big deal here?

    1. Re:[H] raises more questions than it answers by DeadChobi · · Score: 3, Informative

      It's misleading because video card manufacturers tweak their drivers to perform better in timedemos versus real world gameplay so that hardware review sites will do reviews touting the game as playable on such-and-such a card at maximum settings even though real world gameplay never comes close to what the time demo is doing to the game. Wow, that was one sentence. Oh, and how can you say that card A outperforms card B without ever comparing them in gameplay? That would be like me going into a hardware store and swinging two different hammers to compare them, then buying one based on that test only to find out that its total crap at actually hammering.

      The root of the issue is that timedemos give the video card manufacturers something to tweak their drivers around besides gameplay. And there are also some arguments over how representative of your actual experience a timedemo will be. At least HardOCP gives a crap about their methodology, as opposed to other hardware sites which don't use any sort of statistical analysis.

      --
      SRSLY.
    2. Re:[H] raises more questions than it answers by Hoi+Polloi · · Score: 1

      Reminds me of how the EPA is changing how fuel efficiancy is determined for cars. The old standard was not realistic compared to how most people actually drive. Now they are putting a lot more stop & go driving in their testing and getting lower, but more realistic, numbers.

      --
      It is by the juice of the coffee bean that thoughts acquire speed, the teeth acquire stains. The stains become a warning
    3. Re:[H] raises more questions than it answers by jonnythan · · Score: 1

      It's misleading because sometimes one card will come out way in front of another during a canned benchmark due to tweaking, shortcuts, whatever.... but that same card will come out way behind the other card during actual, real-life gameplay.

      See the difference?

      HardOCP's testing is only concerned with real-life gameplay. Most of the time, their conclusions are pretty similar to other sites... card A is faster than card B, for instance. However, sometimes, their conclusions are opposite what other sites come up with.

    4. Re:[H] raises more questions than it answers by Warll · · Score: 1

      No kidding! In game I'm sure physics could really slow you down. Your computer now needs to keep track of the guy you just shot (Rag doll) those few stray bullets which just hit that Jeep's gas tank (Explosion, motion blur, the five other jeeps parked right next to it...) and all this while its keeping track of your gunboat rolling in the waves.

    5. Re:[H] raises more questions than it answers by Mike+Rubits · · Score: 1

      One of the under-appreciated things about the Q3 and D3 engines is that demos are essentially a recording of the network stream. So running imedemo on a demo will be extremely accurate for real world performance.

    6. Re:[H] raises more questions than it answers by blueg3 · · Score: 1

      Ideally, the graphics card and all on-CPU calculations are running in parallel, so the influence of this extra work on graphics performance should be minimal. This is what they mean in TFA when they refer to situations that are not CPU-limited.

    7. Re:[H] raises more questions than it answers by Nebu · · Score: 1

      Finally, in relation to the Crysis real world gameplay versus the AT benchmark score, I thought it was common knowledge that the game would be slower when actually playing it because you likely have physics,AI,logic,sound calculations to do that you don't in timedemo mode. What is the big deal here?
      There's no reason you couldn't write a benchmark/demo which actually performs the physics/AI/logic/sound calculations, as opposed to pre-calculating that ahead of time. Even if your AI or physics code contains calls to a pseudo-random number generator, you could always use a fixed seed to ensure that the benchmark will always perform the same set of calculations each time it's run (i.e. the AI always makes the same decisions, the physical reactions always have the exact same "random noise", etc.).
    8. Re:[H] raises more questions than it answers by Sebastopol · · Score: 1

      any sort of statistical analysis.

      HardOCP didn't really do any sort of statistical analysis. They gave min/avg/max on a few cards. Anandtech and Toms Hardware have a sample population and a methodology that blows the doors of HardOCP statistically.

      HardOCP is just regurgitating age-old arguments that have been around since the dawn of benchmarks. I helped code 3DMark in 1996, we went through the same arguments then. Nothing has changed. Synthetic benchmarks serve a purpose: because playing the game and reporting how the card reacts on a random system to a random tester is far too subjective to be a real, usable scientific metric.

      The challenge for benchmark developers is to continually struggle to defeat and driver-based optimizations, which is why all of the major 3D benchmarking sites actually go out of their way to talk about driver versions and their impact on the scores. This rigorous attention to detail is what makes a statistically valid analysis, not some angry fanbois who think they discovered a new hotbutton issue.

      --
      https://www.accountkiller.com/removal-requested
  19. Benchmarks are a marketing tool only by Bullfish · · Score: 1

    Give you an idea relative to other cards tested using the same benchmark. However, I have always found them misleading and somewhat gratuitous. Declaring a card superior over another just because it gives five more frames a second than another card is dumb. Especially when it is the difference between 110 and 115 frames per second.

    As long as you don't run two 30 inch monitors, any name brand video card for about 200 bucks will give you great playable rates at 1680 x 1050.

    A lot of benchmarks imply you need to sell you child to get great frame rates. In the end, playing games etc is the only way to determine real performance. Benchmarks are mainly a marketing tool. Kind of an equivalent of spam's how big you need to be to have a satisfying sex life.

    1. Re:Benchmarks are a marketing tool only by jonnythan · · Score: 2, Insightful

      "As long as you don't run two 30 inch monitors, any name brand video card for about 200 bucks will give you great playable rates at 1680 x 1050."

      Not in Crysis, Call of Duty 4, UT3, etc.

      When I go to plunk down $200 - $300 on a video card, and one of them performs comfortably at my LCD's native resolution and the other one doesn't, that matters. Saying all cards in a given price range are roughly equivalent is saying that you are completely, 100% blind to the reality of video cards today.

    2. Re:Benchmarks are a marketing tool only by TheMeuge · · Score: 3, Informative

      As long as you don't run two 30 inch monitors, any name brand video card for about 200 bucks will give you great playable rates at 1680 x 1050.
      Evidently, you've never actually PLAYED Crysis. On an AMD64 Dual Core at 2.4GHz, 2GB of RAM, and Nvidia 8800GTS 640MB (>>$200), I needed to reduce my resolution to 1280x1024 and set everything to Medium, to have the framerate not drop into single digits or low teens, and stay at 20-30fps.
    3. Re:Benchmarks are a marketing tool only by Anonymous Coward · · Score: 0

      That is a very good setup and you still can't play it at high res? Is Crysis REALLY that good anyway?

    4. Re:Benchmarks are a marketing tool only by PoderOmega · · Score: 1

      Seconded. I have an almost identical setup other than my 8800GTS is 320 megs and I had to play with everything set on medium to be playable.

    5. Re:Benchmarks are a marketing tool only by Bullfish · · Score: 1

      actually, I have played (play) crysis... a mix of high and medium settings at 1680 x 1050... I use a HIS Ice 3850, 4 gigs of ram (yeah only 3 are used) and an E8400... I will say that I never said you could use a $200 card to run a game at high settings with great rate (and crysis is a pig for resources), just that you could get great frame rates, and you can by playing with the settings. And the games still look really good.

      The other guy who has trouble playing call of duty 4, that I don't get, I found it has fairly modest hardware needs. I play it with all at max.

      Of course the best video card will not give you good results if you have other weaknesses in your system

    6. Re:Benchmarks are a marketing tool only by mugnyte · · Score: 1


        I play using the quake raytracing engine and my benchmarks are sec/frame, not frame/sec.

    7. Re:Benchmarks are a marketing tool only by i.of.the.storm · · Score: 1

      Radeon HD 3870 should have you covered for about $200, at least at 1680x1050.

      --
      All your base are belong to Wii.
    8. Re:Benchmarks are a marketing tool only by jonnythan · · Score: 1

      It can't do Crysis at that resolution, and it is 5-10 fps (a significant number) slower in the likes of COD4 and similar at the same settings than a similarly-priced 8800GT.

      I can *just barely* enable AA and AF with the 8800GT. I would not be able to do this with a 30% slower card like the 3870.

      This is why reviews matter.

    9. Re:Benchmarks are a marketing tool only by Anonymous Coward · · Score: 0

      It makes my QUAD core 2.4GHz, 2GB RAM and Nvidia 8800GTX cry too. Seriously, medium for most things, a few high although its 1900x1200. No AA though :(

      Its still not perhaps as smooth as I would like, might have to drop a res.

      For a rig that could max bioshock and anything else I've thrown at it, crysis brought it to its knees.

    10. Re:Benchmarks are a marketing tool only by i.of.the.storm · · Score: 1

      I didn't realize that the 8800GTs had dropped down to the MSRP by now, so they are around the same price as the 3870. That 30% slower is definitely wrong though, it's only a little slower in some games. And if my 2900 Pro, which is slower than the 3870, can do Crysis at 1680x1050, medium/high settings at ~30fps (and that was the demo, not the final game with the 1.1 patch which supposedly improves performance considerably) then I have no doubt that the 3870 can run it fine.

      --
      All your base are belong to Wii.
    11. Re:Benchmarks are a marketing tool only by witte · · Score: 1

      Well, my rig is much older and I run crysis on a nv6800, and a 1.8GHz cpu.
      So the gameplay sucks.
      The difference is that I spent a lot less money on hardware, ergo, I got a lot more sucky gameplay for my money.
      Or in other words, my suck per buck ratio is a lot higher.
      Yeah.

    12. Re:Benchmarks are a marketing tool only by WhoBeDaPlaya · · Score: 1

      If you'd like, I could sell you a S3 Virge DX PCI for $200. Don't mind the 3D deceleration ;)

    13. Re:Benchmarks are a marketing tool only by Anonymous Coward · · Score: 0

      Uhm, then something is wrong with your PC.
      I run it in 1440x900, everything on high, 25-40fps.
      That's with a Q6600 @ 2.8Ghz (quad core, which means nothing on this game as its not even optimised for dual core), 8800GT 512Mb (which is worse than your graphics card) and 2Gb RAM.

      Not that the CPU has anything to do with it anyway. Running in 1440x900 at high almost certainly GPU limits the game.

      Why do gamers on slashdot seem to lack a clue about anything game related?

  20. Re:1st Post by majorme · · Score: 0

    Funny? No, not really.

    We don't really need artificial benchmarks as they tend to mislead, even delude most people. We need real world applications, in this case that would be any modern game. Or lots of games.

  21. FPS say what!? by Anonymous Coward · · Score: 0

    "We even discussed not putting in any framerate data. Funny eh? The framerates are not used in determining the card's value or gaming ability, so why supply them?"

    The simple inclusion of this line in their methodology should throw up red-flags to anyone who knows anything. Yes, FPS matter when determining how video cards stack up against each other.

    Also, most of their complaints about other sites review methods come down to "time-demos and real-world play don't give exactly the same FPS readings"--if you actually bother to look at their numbers, yeah, ok, the real-world numbers were always lower than the time-demos. Jee, I wonder why this is? Maybe because they specifically noted that they went and tried to find THE most stressful part of the game for their real-world tests, while time-demos generally are not developed in order to crush your system. What they didn't bother to mention was there was no giant flip in comparative performance between time-demos and real-world tests for the cards. The ATi card trailed in time-demos and trailed in real-world performance, and the relative difference wasn't too large moving from time-demos to real-world.

    So slower time-demo translates to slower real-world performance. Who would have thought?

    1. Re:FPS say what!? by i.of.the.storm · · Score: 1

      If you really understand their methods, they're trying to give a more subjective rating because numbers aren't always that helpful and can be misleading/thrown off by various factors including video card company driver "cheating" to improve framerates at the cost of image quality.

      --
      All your base are belong to Wii.
  22. Benchmarking Benchmarks? by Scubafish · · Score: 1

    So who's going to benchmark the benchmarks of the benchmarks?

  23. Re:Would like to see a real world comparison for E by irc.goatse.cx+troll · · Score: 1

    That would be nice, especially retouching on older ones and also cheaper combos you'd find in generic desktops.

    I'd also like to see a benchmark app you canr un from usb or dvd/cdrom booting. Something that gives you a clean slate to compare against running it in your existing install so you can see how much all the various apps and drivers are bogging your performance down.

    --
    Pain lasts, kid. Its how you know you're alive. Sometimes I think this growing up thing is just pain management-TheMaxx
  24. not getting a joke = Insightful ?!? by Anonymous Coward · · Score: 0

    Since when do we mod people insightful for not getting a joke (even a bad one)?

    1. Re:not getting a joke = Insightful ?!? by joaommp · · Score: 1

      oh it seems like a joke to you? then who wrote it really must have a very refined sense of irony...

  25. 2GB vs 4GB and PAE slowdown by Anonymous Coward · · Score: 0

    I noticed many reviewers use only 2GB of RAM, which is very unlikely in the real life, since if you can afford high end video card, why not spend a bit more to get at least 4GB of RAM. However, 4GB kicks in PAE on win32/linux32 that slows things by what, 10% ? That should bias 64/32 comparisons as well.

    1. Re:2GB vs 4GB and PAE slowdown by PitaBred · · Score: 1

      Windows won't use PAE very well in general, and will only turn on if you tell it to with a kernel switch. With Linux, you have to compile a kernel that's aware of it (set HIGHMEM64G=yes or something like), and it does lower the performance somewhat.

      But by default, Windows and Linux will boot and just ignore any extra memory they can't address. PAE shouldn't enter the picture for any serious gamers.

    2. Re:2GB vs 4GB and PAE slowdown by Gromius · · Score: 1

      Ah but you can only address 4GB of ram in 32-bit windows. That includes your graphics cards memory, after all its got to be addressable. High end video cards have lots of ram. So one 8800GTX knocks your addressable ram to ~3.25Gb and two in sli knocks it to 2.5 GB. Suddenly see why alot of people have 2gb of RAM when running with high end graphic cards...

    3. Re:2GB vs 4GB and PAE slowdown by ShadowsHawk · · Score: 1

      I didn't realize that the video card ram counts against the 4Gb. I just built a new duo core with 3Gb and a 8600GTS 512Mb, but still good to know. Thanks!

  26. Not the same card by jandrese · · Score: 2, Insightful

    One thing that's bothering me is that HardOCP said "Anandtech benchmarked this card vs. an 8800GTS and said it came out faster, then we benchmarked it against an 8800GTX and it game out faster, then people complained that our results didn't match". Isn't that expected? The GTX is a faster card than the GTS last time I looked. Why is it such a shock that the ATI card came in between them in performance?

    It is a bit of a shock that ATI's latest and greatest can't seem to consistently beat nVidia's over a year old GTX cards I guess.

    --

    I read the internet for the articles.
  27. It is about the "cheating" in benchmarks by Iberian · · Score: 1

    At least that is what I think he was trying to say. If ATI/NVIDIA knows that everyone will be benchmarking their respective cards using X benchmark why not write drivers that excel in that benchmark. Even further you can create hardware to much the same effect, though given the lead times for hardware design this will be harder.

    What the best method for eliminating the discrepancies from those best able to code for a given benchmark is I am not sure but it seems he tries.

  28. Suuure... by JohnnyBigodes · · Score: 1

    FLASH NEWS: [H]ardOCP throws such outdated concepts such as "controlled testing environment" and "repeatability" out the window and calls it revolutionary! Yay!

  29. Re:Would like to see a real world comparison for E by Maxo-Texas · · Score: 1

    hehe.

    Well you probably know what I meant and were making a funny but in case you didn't.

    In EQ, on a raid, you get 54 people close to you (so they can't be clipped based on distance), and 40-70 server side creatures (player pets, monsters, the big "bad") and your machine is trying to keep up and report on and render all that in real time. My frame rate is >60 (>100?) in some content but in the new content on a raid, it can go to 10 to 20 fps unless I turn off a lot of features. Kinda sucks.

    --
    She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
  30. We prefer stopwatches by neilticktin · · Score: 1

    MacTech Labs (part of MacTech Magazine) has done a number of benchmarks that were very mainstream in the past year -- including most recently Parallels vs. Boot Camp vs. VMware Fusion, and Office 2008. In designing each of these, we went out of our way to figure out how to make them "real world". In other words, not only to only test the things that most users would do ... but also to measure them in a way that users perceive. One way that we do that is to do the testing with stopwatches. Because, if it's not long enough to see with a stopwatch, it's certainly not long enough for a user to perceive. This has worked well ... and avoids the issue of getting erroneous timings as mentioned in other posts here.

    1. Re:We prefer stopwatches by TheCycoONE · · Score: 1

      While stopwatches may work well for load time and busy waiting scenarios, you'd have to be particularly quick to measure frame rates with one.

    2. Re:We prefer stopwatches by neilticktin · · Score: 1

      >> While stopwatches may work well for load time and busy waiting scenarios, >> you'd have to be particularly quick to measure frame rates with one. Agreed! We believe that the most important scenarios are the ones that are user perceivable ... and typically, that means that if it's not long enough to measure with a stopwatch, there's a good chance that it's hard for the user to perceive. There are DEFINITELY exceptions to this -- for example, video frame rates as you mention, may make the difference in how "smooth" something looks. In any event, if you look at the tests that we've done -- they are the types of user actions that are measured well by stopwatches ... in fact, better in our opinion, than some of the other timing methods.

  31. Why DX10? by InsaneProcessor · · Score: 1

    How about benchmarking frame rates on the real platform. Friends don't let friends play games on Vista. All of the serious gamers I know avoid it like the plague because of crappy frame rates and poor performance.

    --

    Athiesm is a religion like not collecting stamps is a hobby.
    1. Re:Why DX10? by Silver+Surfer+1 · · Score: 1

      No kidding,
      I personally dont know any gamers that use Vista other than what might have come on a new laptop and most of those have even removed it of the laptop.

  32. Re:Would like to see a real world comparison for E by Jeng · · Score: 1

    EQ is in many ways a very very bad example, or in some ways I guess a good example.

    Problem with EQ is that performance can vary greatly depending on the card, the drivers, and of course the settings.

    There are non-graphical settings within EQ that can slow down your computer in a raid environment that won't mess with it much in a non-raiding environment. Basically anything that logs information to your hard drive will really mess you up in a raid.

    But EQ has so many damn bugs in it that benchmarking would be useless. The West Bug being one that has been with the game for years now.

    --
    Don't know something? Look it up. Still don't know? Then ask.
  33. Re:Would like to see a real world comparison for E by __aaqvdr516 · · Score: 1

    I think your problem is SoE. I've also done raids on both EQ2 and SWG (back in the day). EQ's servers handle the load better than SWG's did back then. In SWG the lag got so bad around half of the people lost connection. So, in short, your end is not the problem.

  34. Dr. Farnsworth said it right... by Anonymous Coward · · Score: 0

    "No fair! You changed the outcome by watching it!"

  35. Re:Would like to see a real world comparison for E by Maxo-Texas · · Score: 1

    A fix for the west bug has been found.
    It is posted somewhere on "therunes.net" boards. I linked it to my guild boards a couple months ago.

    --
    She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
  36. Re:Would like to see a real world comparison for E by Anonymous Coward · · Score: 0

    No, he attached several harddisks to his graphics card to created a RAID

  37. Those were rigged, too. by SanityInAnarchy · · Score: 1

    Video drivers from both ATI and nVidia would look for specific binaries known to be games used for benchmarking. Example: Quake3. You could rename your quake3 binary to quack3 and it'd perform somewhat worse.

    Apparently, it had something to do with trading correctness for speed.

    --
    Don't thank God, thank a doctor!
  38. Re:FRAPS by joeytmann · · Score: 1

    That is true if you start recording in FRAPS, and actually probably even less than half your framerate if your proc/mem/disk speeds suck. FRAPS will give you a decent FPS display with out too much overhead. Usually though, most games have the ability to display their frame rates in game with even less overhead. And with most game publishers giving out demo's...download the demo and try it out....see what your fps is. If it sucks, decide if you really want to see the game in all its FX glory and spend the $$$ to get your rig there. Obviously if every demo you download sucks for you...low fps...its probably time to upgrade your rig, if you want to play the newer games, or just stick to Wolf3D or Doom.

    --
    Insert funny smart-ass comment here.
  39. Insufficient sample size by Guspaz · · Score: 1

    They've examined ONE SINGLE game and used this to (try to) invalidate the testing method for EVERY game. Sorry, doesn't work like that.

    All they've proven is that there is something wrong with the timedemo system in Crysis.

  40. Once again by Anonymous Coward · · Score: 0

    Slashdot is pwned by Sourceforge Inc.

  41. Re:back in my day... we didnt make bad analogies by mjwx · · Score: 1

    people are freaking out over Crysis the same way they freaked out over Aero Glass a year ago
    The difference is that Aero Glass required far more system resources than its equivalent under Linux or available for XP (Stardock have something, I cant remember its name though). I have yet to see a game that can match the graphics on Crysis. Crysis runs at a reasonable frame rate on my Geforce 8800 GTS, I average about 20 FPS which halves around effects like waterfalls. This is on an Athlon 6000, 2 GB RAM, running at 1280x1024 with graphics settings set to high and 4x FSAA.

    Vista at times brought my gaming rig to a crawl even when doing nothing (Vista has since been removed from my gaming rig having proven to have no benifits over XP or Linux), Ubuntu 7.10 runs with Compiz set to full on my 2yr old laptop which wasn't that good when I bought it (Cel 1.6 MHZ, 1 GB RAM, Intel 915 Graphics module) and hardly ever shows signs of stress. Point in short, the freak-out over Aero was justified but the freak-out over Crysis was blown out of proportion especially seeing as Crytek themselves said that Crysis would require a fairly chunky rig from the word go.
    --
    Calling someone a "hater" only means you can not rationally rebut their argument.
  42. Re:Would like to see a real world comparison for E by andi75 · · Score: 1

    The true reason Blizzard switched from 40-man to 25-man raids in the Burning Crusade.

  43. Re:1st Post by somersault · · Score: 1

    The thing is that games are different each time you play them, so that isn't really a benchmark. The summary says that real games are slower than benchmarks.. I mean DUHHHH! Benchmarks are (or should be) on rails, with no user interaction to ensure that they're the same on each system. Over and above what the benchmarks do, games need to monitor user input and do AI for the enemies at the very least (probably some other obvious things that I'm missing out but those seem to be the main differences to me at the moment). Benchmarks can also get away with faking physics, whereas games usually have to calculate their physics in realtime. A benchmark isn't really meant to be an objective thing - just because your computer performs well in a benchmark doesn't mean it can do well in real terms, it's there for comparing aspects of your computer subjectively.

    Using the classical /. method of car analogies, take the example of a drag racer that has been specifically setup to have a fast quarter mile time. When it comes to racing on a track or even everyday commuting, a drag racer is next to useless. Just because your vehicle/computer performs well in preset tests, does not mean that it is a good general purpose machine. These benchmarks test graphical prowess in a few specific areas - not full gameplaying ability.

    --
    which is totally what she said