Examining Benchmarking

science studies by crossconnects · 2003-08-17 06:32 · Score: 5, Insightful

studies and benchmarks are so often biased. it's hard to get a study that isn't. follow the money trail --- sponsor of the study

--
no big sig

Goedel says benchmarks are inherently flawed. by Peter+Cooper · 2003-08-17 06:39 · Score: 5, Interesting

Benchmarks are inherently flawed for the reasons stated in the posts. Comparing hardware to itself and similar hardware means there's no external reference point. Comparing one thing to another is okay, but you can't get absolute numbers in a closed Platonic system.

Goedel's Incompleteness Theorem states that you can't define a system entirely in its own terms, and that any system needs to be defined by terms outside of it.

So, how can you accurately rate hardware based on similar hardware? To meet the GIT (Goedel's Incompleteness Theorem), you would need to compare the hardware with something outside of the system, so you have an external reference point. For example, if you're benchmarking graphics cards, you need to also compare them to something outside of that area of hardware.. so.. say, a graphics tablet, or an iPod.

So, say that the first graphics card is 0.7% compared to the iPod, we now have an external reference to use with the other graphics cards.. so a better card might be 10% compared with the iPod, or a few percent compared to the graphics tablet, which proves that the second card is better than the first, due to the respective ratings compared to the external objects.

This is just regular math. I have to say, it's pretty amazing what you can apply regular math to.. yes, even benchmarks!

Re:Goedel says benchmarks are inherently flawed. by digitalhermit · 2003-08-17 06:58 · Score: 5, Funny

Umm, yeah. Godel's Incompleteness Theorem of course applies to any system, regardless of whether "system" defines a set of axiomatic rules or a bunch of PC parts. Of course, we could also say that Heisenberg Uncertainty puts any benchmark into doubt, and if we assign a number to any attribute of the system we cannot then trust other numbers. I know I'm taking some liberty with the applicability of HUT, but hey, why not. Then there's the whole Hilbert Space objections to these arbitrary transforms; without any Kolmogorov-Smirnov test we cannot trust, in the mathematical sense, the reducibility of any Eigenfunction. The Smirnov test is perhaps not ideal; maybe Bacardi-Walker would be better, or at least produce more interesting (in a completely Lanis-Morton sense) results.

Prefer multipl e benchmarks, or your own 'problem' by Anonymous Coward · 2003-08-17 06:40 · Score: 5, Interesting

It all depends on the range of excercise-able aspects of some hardware a particular benchmarking suite excercices. That's why you prefer a suite rather than a stand-alone benchmark. For instance, Top500.org ranks HPC machine according to LINPACK, for which the ES (earth simulator) of course does well due its vectorization capabilities.

So, if you want to know about your hardware, you better run more than one benchmark, and more importantly, your 'problem code'. Yes, you want hardware that performs well for you problem. Something that can be good in general, is ratrher rare.

Hard to care. by xanderwilson · 2003-08-17 06:41 · Score: 5, Interesting

My favorite computers haven't been the fastest. In fact, I've been the most productive on systems that were objectively less impressive.

My favorite Operating Systems haven't been the ones with the best selection of software.

My favorite games haven't been the ones with the best graphics.

The reviews I find most valuable don't have the most complete set of numbers of why something's the best or worst.

It's interesting that the goal of benchmarks is to be objective as possible, when it's the subjective that makes me want to buy or not buy something. But meanwhile the more the objectivity of the benchmark tests are in doubt, the less important the tests become. So I guess that means benchmarks don't mean anything to me one way or the other, huh?

Alex.

Hardware compared to itself? by Cancel · 2003-08-17 06:53 · Score: 5, Funny

Benchmarks exist to determine how a particular piece of hardware performs in relation to itself, and to others.

Well, yep. Turns out my current PC configuration is 100% as good as my current PC configuration! That's an increase of 0%! I'm sure glad I ran that benchmark, or else I'd never know how much of a boost I got with my latest purchase of, well, nothing.

Frames per second benchmark idiocy by Animats · 2003-08-17 06:58 · Score: 5, Insightful

The whole FPS benchmark thing is not only dumb, it's distorting graphics card design.

What matters is how much stuff you can draw per frame time, not how many times you can redraw it during a single frame time. 3D benchmarks should gradually increase the scene complexity until the frame rate drops. Often, there's a huge performance drop when the onboard memory of the graphics board fills up. Running old games at huge frame rates won't show that.

Scene complexity is the limiting factor for game developers. Artists are always saying "I need a bigger poly budget". If benchmarks focused on scene complexity, we'd have gigabyte graphics boards, and "wow, you can see every eyelash" scene complexity.

We also need more intensity depth in graphics boards, to clean up that murky look so typical of games. Rendering really should be done into at least 16 bits of intensity, then sent to the screen through a film-like gamma conversion. That's how it's done in offline renderers for film.

Proper Method for Benchmarking by akedia · 2003-08-17 07:09 · Score: 5, Funny

1. Aquire your piece of test equipment (video card, motherboard, tower case)
2. Hold the equipment 3 to 5 feet above the bench surface
3. Release. Gravity will take care of the test
4. Measure the mark left in the bench by the equipment. Bigger mark = better equipment.

Benchmarking is an inexact science by adam872 · 2003-08-17 08:52 · Score: 5, Insightful

This might sound like I am stating the bloody obvious, but it's true. I think there are several facets to good benchmarking (based on my own experiences and reading other reports)....

1, Choose a test/workload that is representative of what *you* will be doing. There is no point in looking at SPECINT200 if you are going to be running an I/O intensive application like a RDBMS. Try and run or study tests that are relevant to the intended use of the system/component you are benchmarking.

2, Take note of things like compiler flags etc. These are important in tests like SPEC, as your results can vary wildly according to things like optimisation level. Some compilers produce faster code on certain CPU families and not on others. This is a reason why a lot of vendors will build their own compilers and test with them (e.g. SGI, SUN, DECPAQ).

3, Look at the full disclosure notice in the benchmarks. Take a look at the system configuration used. This is particularly, IMHO, on tests like TPC-C. The score you see might be based on a really whacky config, like most of the figures at the top of the list. For example, look at the Proliant figure (709k) and look at the config: 32 x 8 way servers to run a single database. Then compare it to a 64-way SuperDome or 32-way p690. Which comfig makes more sense? For a database, I would likely go with the single system for simplicity's sake. On another application, maybe the cluster would make sense.

4, Compare apples to apples. This is the hardest part, as CPU's, OS's, I/O, Apps. Compilers etc etc all vary across platforms. I like to to try and compare one variable if possible. To take the TPC-C again, I try to compare DB against DB, Cluster against Cluster, SMP against SMP etc etc. There is nothing to be gained, IMHO, from comparing MS-SQL server in a cluster on Xeon with Win2k3 to Sybase on a SF15k running SPARC Solaris. How do you properly compare these two results? Maybe the solution would be to look at SQLServer on one system against another or Sybase vs Oracle on a similar Unix system.

5, YMMV. Benchmarks are only ever an indicator of performance, not a guarantee. I tell my customers this all the time. They represent a result with a particular system, data set, O/S, tuning settings etc etc at a point in time. Other people's results with a similar config might differ considerably.

I could go on forever, but the above are my 2c

9 of 95 comments (clear)