Slashdot Mirror


What Makes a Valid Benchmark?

An anonymous reader writes "Benchmarks can make a big difference if they are accurate in predicting performance. That's simple enough to describe; it's not nearly so simple to implement. Benchmarks can be an excellent tool for predicting performance and estimating requirements, but they also can be misleading, possibly catastrophically so. This article looks at benchmarks; the good, the bad and the ugly."

7 of 20 comments (clear)

  1. The Best Benchmarks Lead by Jherek+Carnelian · · Score: 3, Informative

    The best benchmarks are those that lead their respective industries.

    As manufacturers seek to maximize benchmark scores, they end up improving their products in ways that make the product more useful to the consumers.

    One example of a bad benchmark: For the longest time, cpu frequency has been a sort of benchmark easily understood by the buying public. But it was a very poor one, leading Intel to maximize cpu frequency at the cost of almost all else -- actual computational performance fell behind, power efficiency became even worse with chips becoming mini-furnaces.

  2. The best benchmarks by Clockwurk · · Score: 4, Insightful

    are real world apps that your audience will be using. Whenever I read a review of a new CPU or graphics card, I always skip synthetic benchmarks (PCMark, 3Dmark, etc.) and go straight to the real world stuff like media encoding, and gaming benchmarks. Synthetic benchmarks tend to be little more than dick waving contests and have little bearing on the real world. If I see 4000 3Dmarks, its a meaningless number. If I see 58 fps in F.E.A.R. or 45 seconds in Photoshop, I immediately have a decent idea of how the computer is going to perform in real world use.

    1. Re:The best benchmarks by Anonymous Coward · · Score: 2, Interesting

      are real world apps that your audience will be using.

      Interesting, but metrics such as FPS and time to render a frame are equally meaningless for lots of other reasons. For example, you can "cheat" a real-world benchmark by changing the way some routines are drawn. Heck, some graphics cards makers have been known to optimize their code for a particular game or testing routine. A while back some tests would measure the time it took to run a particular Photoshop filters. This was also vulnerable to cheats because you could optimize that particular routine or *specific test* to look good in comparison.

      Today we were testing an Oracle data mirror transfer from an older Sun system to a newer pSeries AIX box. The bottleneck first appeared to be the network, then possibly the older Sun's CPUs. After tweaking the system (by spinning off more dmirror processes) we realized that the bottleneck was in fact the newer AIX machine since it was pegging the two CPUs it had. This was reverse to what was the original assumption that the old Sun box was hampering the transfer. Look at the real-world benchmark and it would appear that the Sun machine was slower because that version of dmirror is single-threaded and didn't utilize the other CPUs; thus in an isolated throughput comparison on real data, the AIX machine came out ahead.

      What I'm getting at is that no benchmark, synthetic or real-world, is all that useful until you test the actual workload.

  3. Three things. by aquabat · · Score: 2, Insightful
    Scope, repeatability and transparency.

    Scope means defining clearly and specifically what your benchmark measures and what it does not measure.

    Repeatability means being able to run the benchmark many times under the same conditions and getting statistically consistent results.

    Transparency means having the details of the mechanics of the benchmark, so that the results can be completely analyzed and understood.

    --
    A republic cannot succeed till it contains a certain body of men imbued with the principles of justice and honour.
  4. Apples and Oranges by Jzor · · Score: 3, Interesting

    This reminds me of a comparison I saw in Circuit City once... (Warning: I'm not going to talk about a computer hardware benchmark.) They were trying to sell the insanely expensive Monster video cables by comparing the Monster cables to standard cables on identical TV's. The screen with the monster cables looks hella better than the other monitor. The difference was so astounding that I just had to look at the back of the TV... The Monster TV was hooked up with an HDMI cable..... The other a FRIGGIN UNSHIELDED COMPOSITE VIDEO CABLE. Apples and oranges, apples and oranges...

  5. Re:doubling times are slow for Desktop Sytems by NeilTheStupidHead · · Score: 2, Funny

    Everyone knows that to get a real increase in performance, you need to paint the case red and put a big flame decal on the side. And fins, lots of fins to reduce drag.

    --
    Lose: misplace or fail || Loose: not bound together
  6. The Best Benchmark of All by TwilightSentry · · Score: 2, Funny

    Why, everyone knows that the only valid benchmark is bogomips!!!

    --
    How to enable garbage collection on a system without protected memory: #define malloc() ((void *) rand())