Examining Benchmarking

science studies by crossconnects · 2003-08-17 06:32 · Score: 5, Insightful

studies and benchmarks are so often biased. it's hard to get a study that isn't. follow the money trail --- sponsor of the study

--
no big sig

Re:science studies by arth1 · 2003-08-17 07:12 · Score: 4, Insightful

It always amuses me to take a look at the language being used in "studies" of this nature. You can usually tell more from that than the actual figures.
Sometimes you get the feeling that the whole thing is a cut/paste job from someone else's paper, with your own study objects and figures pasted in here and there.
Other times, you know it's made by a weatherwane who will write up whatever is the most popular opinion at the time -- use of fad words like "vetting" and "triage" is a prime example of this, and enough that you should take the whole study with an ounce of salt.
Then there's the studies where you instantly see is made by a bureaucracy, where the documentation and amount of paperwork is much more important than whether the tests and figures actually makes sense. That's the studies where you read whole chapters on methodology, and still haven't figured out just WHAT they're testing, and WHY.
Then there's the tests that are overly consumer friendly, and try to produce one single big nice number that symbolises everything, while dumbing down the language so much that you have no idea what is really tested. Unfortunately, those seem to be the tests that people LIKE the most, although the value of them is moot.
Finally, there's the obviously paid (by money or by enthusiasm) studies, that will skew the results one particular way. Abundant use of graphs, and especially non-linear or cut-off graphs is a telltale sign here, as is the absense of any explanation for just why THAT particular test was emphasized, while other tests appear to be missing or downplayed. Use of deltas instead of hard numbers is also revealing -- you are told that going from video card A to B will give you 300% more increase than going from A to C, but if you analyse the raw figures you'll find that the real difference is going from 100fps to 104fps versus going from 100fps to 101fps.

All in all, I can't say that I've seen many benchmarks and benchmark studies that aren't biased or skewed one way or another, or just plain irrelevant.

Regards,
--
*Art

Depents on the user... by acegik · 2003-08-17 06:33 · Score: 0, Redundant

Not everybody needs it offcourse but if the user want to know that what he bought or want to buy worth the money or even decide between two pieces of hardware then sure.. the better and percise the test its easier to make a decision.

--
Dont just mail it - Maileet

yes... by Anonymous Coward · 2003-08-17 06:33 · Score: 2, Funny

Yes... just ask Nvidia and the will provide any information you need :)

They are irrelevant. by SHEENmaster · 2003-08-17 06:36 · Score: 2, Funny

I want to know the difference in speed between a dual G5 and a Quad UltraII Sun U80 when compiling Linux targetted to X86.

I don't care if one can get 900fps in Unreal Tournament while the other can only get 880.

As for bias, did you know that my Timex Sinclair is the best computer there that has ever been made or will ever be made? The salesman said so, so it must be true.

--
You can't judge a book by the way it wears its hair.

Re:They are irrelevant. by JanusFury · 2003-08-17 06:49 · Score: 3, Funny

A Timex Sinclair? Impossible. I have an authentic, original ENIAC. Your puny sinclair can't possibly match that.

The guy on eBay said it's scientifically proven as the fastest computer on earth*.

* In 1947, that is.

--
using namespace slashdot;
troll::post();
Re:They are irrelevant. by scalis · 2003-08-17 07:37 · Score: 1, Redundant

...In other words, MOST users DO get the information they are looking for.

--

True ravers don't need drugs

jeeez its a slow day at Slashdot by acegik · 2003-08-17 06:38 · Score: 0, Flamebait

Or isnt the topic interesting enough... Im used to see 50 replies in a minute but its been 5 minutes now and only 6 replies on most of them -1 rated. Maybe this topic should go else where?

--
Dont just mail it - Maileet

Goedel says benchmarks are inherently flawed. by Peter+Cooper · 2003-08-17 06:39 · Score: 5, Interesting

Benchmarks are inherently flawed for the reasons stated in the posts. Comparing hardware to itself and similar hardware means there's no external reference point. Comparing one thing to another is okay, but you can't get absolute numbers in a closed Platonic system.

Goedel's Incompleteness Theorem states that you can't define a system entirely in its own terms, and that any system needs to be defined by terms outside of it.

So, how can you accurately rate hardware based on similar hardware? To meet the GIT (Goedel's Incompleteness Theorem), you would need to compare the hardware with something outside of the system, so you have an external reference point. For example, if you're benchmarking graphics cards, you need to also compare them to something outside of that area of hardware.. so.. say, a graphics tablet, or an iPod.

So, say that the first graphics card is 0.7% compared to the iPod, we now have an external reference to use with the other graphics cards.. so a better card might be 10% compared with the iPod, or a few percent compared to the graphics tablet, which proves that the second card is better than the first, due to the respective ratings compared to the external objects.

This is just regular math. I have to say, it's pretty amazing what you can apply regular math to.. yes, even benchmarks!

Re:Goedel says benchmarks are inherently flawed. by sixdotoh · 2003-08-17 06:47 · Score: 4, Insightful

not only is there no external reference point, but what always gets me is the enviroments things are tested in. i love chaos, and my mind sometimes runs wild thinking of all the factors that could effect the enviroment, and thus the performance of whatever is being tested.

especially when pcworld or whoever compares entire PC packages... so much can change depending on what background software is being run, system tweaks/factory settings that could be off... no one should really buy a pc package based on those comparisons alone.

--
This post was brought to you by the number 584811 and the characters / and .
Re:Goedel says benchmarks are inherently flawed. by mesmartyoudumb · 2003-08-17 06:49 · Score: 1, Interesting

Software is an external reference point, its somthing outside of the system.

When you buy somthing to perform job X,how well it does at performing job x is one of the reasong for buying it.

Ipods don't run quake..so the point is maximum silliness.

The Dynamic application of intellect is what defines real intellegence..not theorys..thats just memorization. :-D

--
"Comedy's a dead art form. Now tragedy, that's funny."
Re:Goedel says benchmarks are inherently flawed. by digitalhermit · 2003-08-17 06:58 · Score: 5, Funny

Umm, yeah. Godel's Incompleteness Theorem of course applies to any system, regardless of whether "system" defines a set of axiomatic rules or a bunch of PC parts. Of course, we could also say that Heisenberg Uncertainty puts any benchmark into doubt, and if we assign a number to any attribute of the system we cannot then trust other numbers. I know I'm taking some liberty with the applicability of HUT, but hey, why not. Then there's the whole Hilbert Space objections to these arbitrary transforms; without any Kolmogorov-Smirnov test we cannot trust, in the mathematical sense, the reducibility of any Eigenfunction. The Smirnov test is perhaps not ideal; maybe Bacardi-Walker would be better, or at least produce more interesting (in a completely Lanis-Morton sense) results.
Re:Goedel says benchmarks are inherently flawed. by Anonymous Coward · 2003-08-17 07:02 · Score: 0

Godel's Incompleteness Theorem of course applies to any system, regardless of whether "system" defines a set of axiomatic rules or a bunch of PC parts.

You've just restated what your parent poster was (incorrectly) implying. Bravo.
Re:Goedel says benchmarks are inherently flawed. by Jeremy+Erwin · 2003-08-17 07:04 · Score: 3, Informative

The application of Goedel's Incompleteness theorem to benchmarks borders on 100% organic bullshit. On the other hand, the statement

Software is an external reference point, its somthing outside of the system. is itself iffy.

We know that the video cards are designed, in part, to benchmark well. Some manufacturers have even gone so far as to write drivers that inflate framerate at the expense of accuracy, under certain benchmark like conditions. (Quake.exe v. Quack.exe, anyone?). Apple inflated its spec results by using a unrealistic single threaded malloc library. Intel's icc is rumoured to detect, and optimize for SPEC.

The Dynamic application of intellect is what defines real intellegence..not theorys..thats just memorization. :-D

Theories? Theories are meant to be proven as an exercise for the student, not just memorized.
Re:Goedel says benchmarks are inherently flawed. by Chris_Jefferson · 2003-08-17 07:15 · Score: 4, Insightful

Seriously? Who modded this up?

Godel's incompleteness theorem is about complex mathematical systems and the essense of proof. You don't need an external object to compare to, anything will do. You just choose some graphics card as your fixed point and then compare everything to that card.

--
Combination - fun iPhone puzzling
Re:Goedel says benchmarks are inherently flawed. by digitalhermit · 2003-08-17 07:22 · Score: 1, Funny

Of course. But then "PC" may stand for (Twin) Prime Conjecture, versus the more obvious "Personal Computer". In this case, basic statistics (prime normal variation over n-space) dictate that an attribute of a system (here, the benchmark) is an indirect orthogonal vector across the set of L-space primes. GIT, in the limited case, applies to provability of functional equivalence of natural numbers and is directly applicable to this prime normal variation. For example, in The Millenium Problems, Keith Devlin writes how Godel "proved this result by showing how to translate questions about provability to equivalent questions about computability of certain functions" (108). I'm performing something similar here -- restating the original posit in an equivalent manner to highlight the cardinal aspect of GIT.
Re:Goedel says benchmarks are inherently flawed. by Anonymous Coward · 2003-08-17 13:25 · Score: 0

This is the stupidest thing I've ever heard. Please do not try to take the words of a mathematical theorem (which have exact, specific definitions) and use their fuzzy, general meaning in order to force the theorem to apply to a wholly unrelated situation. Thanks.

P.S.
Learn some math while you're at it. I bet you didn't even read Goedel's original paper (or its English translation).
Re:Goedel says benchmarks are inherently flawed. by ponos · 2003-08-18 04:22 · Score: 1

Benchmarking is not about *proving*, it is about
*measuring*. Note that Goedel's incompleteness
theorem does not necessarily imply that ALL
propositions are unprovable! It merely states that
there is at least one proposition that is true but
cannot be proven (Goedel constructed this
proposition in an ingenius manner[1]). It suffices
to build ONE such proposition to derive incompleteness.
MANY other propositions can be proven, including,
possibly the fact that card A is better than card B
(if you can call this thing "proof").

Therefore, proof that card A is better than card B
may be feasible. However, the idea of proving
something like this is ridiculous. What you need
is an ordering of the cards according to some
way of measurement. This means that for some
arbitrary benchmarking procedure P() for card
X you get a number P(X) and order all cards
according to that. IF P() is relevant to you (e.g.
quake 3) the ordering is sound (assuming many
other variables, like system, software etc are the
same). Nothing esoteric about that.

(by the way, the profound theoretical significance of
Goedel's incompleteness theorem is completely opposite
to its practical impact, which is almost none in everyday
mathematics--most things have nice proofs)

P.

[1] The importance of the theorem lies in the fact
that even if you convert an unprovable TRUE proposition
into an axiom you will get a new system that will
AGAIN be incomplete. You can't get away by building
bloated systems.

alpha geeks by sixdotoh · 2003-08-17 06:40 · Score: 1

most the people that would truley care for what piece of hardware goes into their system knows where and how to find the right type of information.

that said, people should demand accurate/unbiased benchmarking becuase of all budding nerdlings who end up with junk in there system and helping some bloated crap company stay on top

the whole unbiased thing: i personally have a VERY hard time believing almost any scientific study (be it benchmarking or dieting) to be unbiased... whether or not there may be a large commercial company behind the bias or not.

--

This post was brought to you by the number 584811 and the characters / and .

The only way benchmarks could work by justsomebody · 2003-08-17 06:40 · Score: 2, Insightful

Do it in secret with home made tools.

As for benchmarking results posted by benchmarking companies and sites. Well they have to eat too
Second problem are known tools, leads to driver tampering, ok that's not related with food on the table of benchmarkers

--
Signature Pro version 1.13.2-3 release 83.5 beta3try7 after-breakfast edition

Re:The only way benchmarks could work by arth1 · 2003-08-17 07:56 · Score: 2, Insightful

Do it in secret with home made tools.

Unfortunately, the "in secret" part applies far too often. Take a look at the tiny print of the license that came with your latest piece of hardware (and in some cases software), and chances are that you're agreeing to not publishing any benchmark results.

Hardware manufacturers, of course, only wants to see favourable benchmarks, and pays quite well to get them, and to supress others. Fair comparisons are all well and fine, but it won't make their investors any money.

Regards,
--
*Art

Slanted and biased by Jacer · 2003-08-17 06:40 · Score: 2, Informative

Benchmarks are all too often slanted with the drivers they're done with or by the person performing the benchmark. I wouldn't go so far as to say it's completely unreliable, but people should be aware that it isn't infalliable.

--
--fetch daddy's blue fright wig, i must be handsome when i release my rage

Re:Slanted and biased by Anonymous Coward · 2003-08-17 07:15 · Score: 0

The first graph's axis is from 100-160 fps, not 0-160 fps. The loss of scale makes the benchmark results look more significant then they are.

Prefer multipl e benchmarks, or your own 'problem' by Anonymous Coward · 2003-08-17 06:40 · Score: 5, Interesting

It all depends on the range of excercise-able aspects of some hardware a particular benchmarking suite excercices. That's why you prefer a suite rather than a stand-alone benchmark. For instance, Top500.org ranks HPC machine according to LINPACK, for which the ES (earth simulator) of course does well due its vectorization capabilities.

So, if you want to know about your hardware, you better run more than one benchmark, and more importantly, your 'problem code'. Yes, you want hardware that performs well for you problem. Something that can be good in general, is ratrher rare.

Not all of us by Microsofts+slave · 2003-08-17 06:41 · Score: 2, Informative

There is a comment in the article that we buy video cards to play games. This i agree is true for most, but they should at least make a mention to those who do high end 3d rendering and programming. For these individuals use their card to put out some of the most amazing images in computer history. FOr these people frame rates are not iportatn, it is render times, which even on the best cards in the best system , for complex effects can take hours.

--

Tragek

Re:Not all of us by calyxa · 2003-08-17 06:59 · Score: 2, Interesting

most 3d rendering apps that I'm familiar with don't use the video card for anything other than previewing. that's changing - a lot more will be handled by graphics cards in the future in order to approach the goal of fully rendered 'virtual reality'.

Ivan Sutherland cites "the wheel of reincarnation" whereby the graphics co-processor becomes more and more powerful until it is a stand-alone general purpose computer which in turn gets its own graphics co-processor starting the cycle again.

we have a long way to go before we run out of need for more powerful graphics cards.

-calyxa

--
Decay! Decay! Decay! -Helium
Re:Not all of us by JamesP · 2003-08-17 07:21 · Score: 3, Informative

These are NOT redered using Video Card power, but the processor...

Video card power is only used in DESIGNING and PREVIEWING the scenes...

Video card is good but not as good as raw math over some minutes (or even hours)

--
how long until /. fixes commenting on Chrome?

TAO weights more than benchmark by oakad · 2003-08-17 06:41 · Score: 0, Redundant

Benchmarks are for fulls. Only TAO (the way) matters. My favorite computer is SGI O2, terribly old. It takes 10 minutes for it to load Quake2. And still, it's much more lovable. I can even add that I always loved and used only ATI cards, though very bad support policy, buggy drivers and horrible X support (before XFree 4 I mean). Why? Because I love how they look like.

Hard to care. by xanderwilson · 2003-08-17 06:41 · Score: 5, Interesting

My favorite computers haven't been the fastest. In fact, I've been the most productive on systems that were objectively less impressive.

My favorite Operating Systems haven't been the ones with the best selection of software.

My favorite games haven't been the ones with the best graphics.

The reviews I find most valuable don't have the most complete set of numbers of why something's the best or worst.

It's interesting that the goal of benchmarks is to be objective as possible, when it's the subjective that makes me want to buy or not buy something. But meanwhile the more the objectivity of the benchmark tests are in doubt, the less important the tests become. So I guess that means benchmarks don't mean anything to me one way or the other, huh?

Alex.

Re:Hard to care. by sixdotoh · 2003-08-17 07:15 · Score: 1

once again proving that for many people hard "science" and numbers mean very little . . . people too often try to explain life in terms of equasions.

--
This post was brought to you by the number 584811 and the characters / and .

Does it really matter? by Channard · 2003-08-17 06:42 · Score: 1, Insightful

Sure, there are people who love posting benchmarks of their systems, but surely the real test of a system is not how it handles one specific cycled demo. It's whether or not it handles the games you want to play and if you're happy with the performance of your system as you're blowing the crap out of whatever 3D menace is threatening the world, don't start worrying over a few frames per second.

Benchmarks and science by nemaispuke · 2003-08-17 06:44 · Score: 4, Insightful

When I was performed photographic quality control, I ahd a reference platform and true statistically valid performance data to base any decisions on. Unfortunately hardware sites don't exactly do the same thing. They use different hardware (usually provided by the vendor or a reseller looking for a plug), and everything becomes a variable. What I was taught about analyzing anything was to eliminate variables. Most benchmarks will work as long as you create your own reference platform, specify everything used in excruiating detail (driver version, etc.) And also place a disclaimer that the test is only good based on your hardware and setup. When I read a benchmark, I use it as only a guide. I do not take the numbers literally since I cannot reproduce the test. And that is where the problem lies in hardware site benchmarking. Anyone should be able to get the specific hardware mentioned, assemble it, install the OS and run the benchmarks and get similar results. My money is they won't because of "tweaked" drivers, benchmark program versions, or hardware, software, or OS settings that do not make it into the documentation or the column for the site. The only benchmarks I pay any serious attention to is SPECInt and SPECf, because there has to be full disclosure of all options used before SPEC will approve of it.

Its very hard to get a good benchmark by Crashmarik · 2003-08-17 06:46 · Score: 4, Interesting

The problem is as a benchmark becomes widespread and respected, the incentive to cheat the mark increases at a much greater rate.

For less widely used benchmarks, its possible to do one offs in the lab and include the false results in the marketing material. The primary examples of this are spec, drhystone, and whetstone. For awhile Intels compilers had recognition routines just for these benchmarks. Apple has always done tuned versions of the benchmarks.

Once a benchmark gets into the wild and is in a form that anyone with a website can just load without too much trouble on a machine, you get manufacturers actively moving to cheat the benchmark. Best examples are Nvidia and ATI's optimizations that are specific to 3dmark and quake III.

I don't know of anyone who would buy a piece of hardware solely on a benchmark, However salesmen when they can't sell are without peer in inventing excuses and shifting blame. So as long as you have sales goals that are unrealistic and salespeople that are good at inventing excuses, you will have engineering departments forced to cheat the benchmarks.

Its true money changes everything.

Re:Its very hard to get a good benchmark by jetlag11235 · 2003-08-17 07:32 · Score: 1

I think this emphasizes the conclusion of the article even more. Make benchmarks on current games and current applications. If manufactures make changes to "cheat" against these benchmarks, then they are doing us a service. This is as opposed to when they "cheat" on benchmarks for older software ... 160 fps vs. 100 fps in Quake III isn't so useful.

Actually, this reminds me of the "teaching to the test" controversy. This should lead to good results when the test is carefully designed, but may lead to poor results otherwise.

-- jetlag --
Re:Its very hard to get a good benchmark by Anonymous Coward · 2003-08-17 09:14 · Score: 0

I don't play those games, it doesn't do me any service. I'd rather have raw data plus some real world benchmarks. For you, Quake fps. For me, OpenOffice/Mozilla compile time, r/w times for an encrypted disk partition, etc etc.

benchmarking gripes by segment · 2003-08-17 06:47 · Score: 3, Insightful

My issue with benchmarking is this... When people read benchmarks, aside from the bias occurring with someone using a favored product, people will often have to take benchmarking as nothing more than an indicator for the following reasons: People will not have access to all the equipment used in a benchmark trial, hardware/software, so they're often going to have to rely on someone else's OBSERVATION. Information can be tweaked easily, and someone who has say a favored product can often tweak it to perform better than the competition, or make the competition's product behave worse.

Also as stated on an above post, who is sponsoring the benchmark testing, and why. Often you will see that %99.99999 of the companies sponsoring benchmarking tests come out with gleaming reviews. Has anyone here seen an MS sponsored test prove unfavorable to MS. It just doesn't happen. Independent studies should post all information concerning why they're doing benchmark tests including any sponsors, this way those reading the published results can get an overall VIEW of the results and use them as nothing more than in indicator and not solid fact.

--
MoFscker

Re:benchmarking gripes by Anonymous Coward · 2003-08-17 09:38 · Score: 0

Take a look at http://www.spec.org. These are industary standard and accepted benchmarks.

The best benchmark is the app you're using by questamor · 2003-08-17 06:47 · Score: 4, Insightful

And that's the only way to look at it. I use photoshop more often than anything else, and as long as a machine can run it well then it's passed my benchmarking tests just fine.

When it comes pointless is when a single simple benchmark is taken alone. If that were the case then a machine like a 1GHz G4 would own everything else looking at just RC5-72 benchmarks. 10 million keys/sec? no problem, quicker than any other machine like it on the market.

Look at that as just one benchmark among dozens and you form a better picture, that the G4 has a vector unit that performs exceptionally well, and you can get an idea how the rest of it performs.

Add up enough of those simple numeric benchmarks and all you get is one huge mess in mind with no REAL idea of how a machine will perform other than theoretically. Best combine them all together and go back to running the app(s) you're likely to use most.

Re:The best benchmark is the app you're using by rebeka+thomas · 2003-08-17 07:01 · Score: 1, Troll

Your argument makes no sense. If you're using a chip that can theoretically hit X GB/sec in bandwidth and comparing it to a chip than can theoretically hit X+1 GB/sec bandwidth, but photoshop (and you should be using The GIMP anyway) runs better on the X GB system, then you're obviously using a flawed analogy.

Wouldn't it be better to then analyse what is making it run worse on the machine which should be capable of X+1 GB/sec and then improve the software to suit.

Just going with the best working solution is really selling yourself short here.

--
RST
Re:The best benchmark is the app you're using by jellomizer · 2003-08-17 08:05 · Score: 1

Well theories sometimes work good on paper but not in real life. Lets say a benchmark says the Processor X performs at Y speed. But an application is not a benchmark because applications use a wide verity of features on the computer and a combination of could make X slower because of the unbenchmarked feature.

--
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
Re:The best benchmark is the app you're using by rebeka+thomas · 2003-08-17 08:58 · Score: 1

The gimp is WORTHLESS for prepress work.

What you mean to say is that you haven't tried it yourself and just wish to parrot what the world says about it.

A little more maturity and you would see it is an ideal tool.

--
RST
Re:The best benchmark is the app you're using by Anonymous Coward · 2003-08-17 09:17 · Score: 0

Yes, you're right, SORRY.
Re:The best benchmark is the app you're using by BlueArchon · 2003-08-17 09:38 · Score: 1

Does GIMP support the CMYK colour mode?
Last time I tried it, it didn't. And without CMYK it's completely useless for prepress.

The answer is no. by Anonymous Coward · 2003-08-17 06:47 · Score: 3, Interesting

At least not this reader.

Here's the problem: I don't really want to put together any more systems - at least, not from scratch. My time is worth more now, and the savings from DIY are worth less.

But neither do I want to buy (or recommend) a system that's a stiff: one that uses an unreliable motherboard, or an older chipset, or flakey power supplies.

The site I need would:
- take the systems sold by everybody from Wal-mart, Dell, HP, etc
- find out what components they contain
- then use the review data from places like Tom's Hardware
- Pass judgement and explain why

For example, they might say something like "The Dell Excavator uses an obsolete chipset. For $10 more, buy the E-Machine X321 - but beware the reliability."

I've owned so many video cards by Rooked_One · 2003-08-17 06:48 · Score: 4, Interesting

from the very first Voodoo1 that turned the tide of gaming to the OGL route, to Nvidia and ATI offerings.

The bottom line is that you really can't put much trust in benchmarks. Well... Thats not exactly true, but think - of those games and apps that you always see the same people run over and over again, how many of those do you use on a daily basis? Personally, i've read so many reviews that I don't even have to think about what a pixel shader is anymore, so it probably will come as no suprize that I skip through the mumbo jumbo they tell you about the card and go straight to the benchies. And its always the same ones.

Thats all well and good, and I guess it gives you a VERY generic view at how those particular things work, but how about real life performance? How about a screenshot in the HL mod Natural Selection when there are 15 turrets firing at bile bombing aliens with the show_fps set to 1? Can we get something like that? I guess that would consitute in there with fill rate, and before you tell me thats an arcane game. Let me direct you to the little X on the top right of your browser. I don't care.

You can get a very good idea about the speed of a card, but you have no idea what the card will have trouble with until you load up your copy of Star Wars : Pod Racer just to be greeted by a big white screen when the race starts. Thats one thing I really miss about 3dfx. Thier cards worked. Always. Well, at least they did at the time.

I used to do this... by macemoneta · 2003-08-17 06:51 · Score: 4, Insightful

... on mainframes in the old days. The idea of a benchmark is to determine how your workload will perform on a given platform. The key here is "your workload". Using synthetic benchmarks is a great way to determine relative performance, if your workload is running synthethic benchmarks. For most people this isn't the case.

The problem is that every workload will have a different I/O and instruction mix. Each instruction has a different execution time, and the performance of I/O devices is frequently a function of the access patterns to data.

As a result, a synthetic benchamrk may be a poor indicator of the result from the actual execution of your individual workload. These benchmarks are intended to provide guidance, and potentially identify platform performance bottlenecks. That's all. Reading any more into them is the fault of those that use the results improperly.

--

Can You Say Linux? I Knew That You Could.

Hardware compared to itself? by Cancel · 2003-08-17 06:53 · Score: 5, Funny

Benchmarks exist to determine how a particular piece of hardware performs in relation to itself, and to others.

Well, yep. Turns out my current PC configuration is 100% as good as my current PC configuration! That's an increase of 0%! I'm sure glad I ran that benchmark, or else I'd never know how much of a boost I got with my latest purchase of, well, nothing.

Re:Hardware compared to itself? by zr-rifle · 2003-08-17 07:33 · Score: 1

Actually, it is widely known that similiar hardware performs differently. Two identically configured boxes might have a difference in performance (not counting stability, of course) of as much as 5%. So how well a "a particular piece of hardware performs in relation to (it's theoretical self)" might prove interesting. Would it annoy you to find out that your box performs badly in relation to other computers of the same series. I'd say yes.

No big deal, but since everybody is concerned with numbers, I just wanted to point this out.

--
Hack your mind out of its sandbox.
Re:Hardware compared to itself? by Cancel · 2003-08-17 07:44 · Score: 1

I'll admit it's a nitpicky point. Still, if that's what was intended, then the submitter should have phrased it like what you said, or maybe 'how well a particular piece of hardware peforms compared to others of the same type' or some such. As is, that phrase made little sense.
Re:Hardware compared to itself? by Anonymous Coward · 2003-08-17 09:25 · Score: 0

The same piece of hardware will also perform differently in relation to itself in different environments. Perhaps the submitter enjoys benchmarking the same machine outside during winter and inside a tin-roofed shack during summer.
Then again, if you can get the same benchmarks when the machine is fully submerged in water as to when it is dry, you've got some impressive hardware on your hands.
Re:Hardware compared to itself? by Anonymous Coward · 2003-08-17 11:10 · Score: 0

I'm not disagreeing, but I just want to emphasize: it's not just the purchase price, it's the total cost of ownership - installation, configuration, and maintenance. Nothing costs more than you'd think!
Re:Hardware compared to itself? by mcgroarty · 2003-08-21 07:02 · Score: 1

Next time, please think before posting.
Seriously... this is not helpful at all.
Re:Hardware compared to itself? by Lars+T. · 2003-08-21 12:54 · Score: 1

Next time, please acquire humour before posting.

--
Lars T.
To the guy who modded me down from perfect to terrible Karma - Apple haters still suck
Re:Hardware compared to itself? by mcgroarty · 2003-08-21 15:01 · Score: 1

Seriously -- work on this.
You're scaring the children.
Re:Hardware compared to itself? by Lars+T. · 2003-08-22 03:54 · Score: 1

So I scared you? Yeah, honey, that was mean, scaring a little girl like that.

--
Lars T.
To the guy who modded me down from perfect to terrible Karma - Apple haters still suck
Re:Hardware compared to itself? by mcgroarty · 2003-08-22 04:01 · Score: 1

I'm all LOL and KEKEKEKE because you called me a girl!
You fascists are all alike.
Re:Hardware compared to itself? by Lars+T. · 2003-08-22 04:20 · Score: 1

Yeah, little girls do that. What else is new?

--
Lars T.
To the guy who modded me down from perfect to terrible Karma - Apple haters still suck
Re:Hardware compared to itself? by mcgroarty · 2003-08-22 05:08 · Score: 1

Javol!

Frames per second benchmark idiocy by Animats · 2003-08-17 06:58 · Score: 5, Insightful

The whole FPS benchmark thing is not only dumb, it's distorting graphics card design.

What matters is how much stuff you can draw per frame time, not how many times you can redraw it during a single frame time. 3D benchmarks should gradually increase the scene complexity until the frame rate drops. Often, there's a huge performance drop when the onboard memory of the graphics board fills up. Running old games at huge frame rates won't show that.

Scene complexity is the limiting factor for game developers. Artists are always saying "I need a bigger poly budget". If benchmarks focused on scene complexity, we'd have gigabyte graphics boards, and "wow, you can see every eyelash" scene complexity.

We also need more intensity depth in graphics boards, to clean up that murky look so typical of games. Rendering really should be done into at least 16 bits of intensity, then sent to the screen through a film-like gamma conversion. That's how it's done in offline renderers for film.

Re:Frames per second benchmark idiocy by afidel · 2003-08-17 08:50 · Score: 1

As a game player I don't care what the next generation engine wants in poly budget, I care what the framerate will be on the games I play today. Actually I rarely care too much about the average fps shown in benchmarks but rather the min fps, because it is the min fps that shows as stutter in the game. In the future if I need to I will buy a new card to play those more demanding games. Of course maybe I'm a bit weird, I buy the card that can achieve my minimum performance level for around $100, not the fastest most expensive card available.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.

What interests me... by John+Seminal · 2003-08-17 06:59 · Score: 4, Insightful

I am not as interested in benchmarks as I am interested if my harware will work well with my software. If I buy a video card, will it run my apps? Perhaps some of these review sites should take the 10 most popular applications, like games, a compiler, database, etc... and tell you if your hardware will run it without hangups or hiccups.

The other bad thing about benchmarks is you will probably not have the same motherboard/ram/cpu as the test system.

--

Rosco: "If brains were gunpowder, Enos couldn't blow his nose."

Identify the need of person looking at benchmarks by leoaugust · 2003-08-17 07:03 · Score: 3, Insightful

I think the writer made very relevant points. Generally the universe expands faster than our own speed, and so we never get to the edge of the universe. But in the world of technology, sometimes technology grows faster than the real world it is supposed to inhabit, and so the real world gets left behind ... The writer has identified just such a crossover, and hence his call to update benchmarks is very valid ...

But I would like to add another dimension and that is the eye of the beholder ... If all the person is looking at the benchmarks for, is to quickly sell it to his unsophisticated boss, or another unsophisticated boss who will get his employees to use it, then what he needs are simple and clear cut benchmarks - and more important, time tested benchmarks. Generally the powers-to-be with the moolah do not like the messiness that inherently comes with trying to "realify" the models .... From my experience this is not how it should be, but I have found this is how it is ....

I am not saying that the writer was in anyway wrong .. just that he must also look at the consumer of his benchamrks ... is it someone who is going to use the technology him/her self or is someone who is going to sell a technology to someone who will have someone else use the technology ....

--
To see a world in a grain of sand, and then to step back and see the beach where the sand lies ...

Proper Method for Benchmarking by akedia · 2003-08-17 07:09 · Score: 5, Funny

1. Aquire your piece of test equipment (video card, motherboard, tower case)
2. Hold the equipment 3 to 5 feet above the bench surface
3. Release. Gravity will take care of the test
4. Measure the mark left in the bench by the equipment. Bigger mark = better equipment.

Re:Proper Method for Benchmarking by corkhead0 · 2003-08-17 07:36 · Score: 0

Cool! I bet I can tie more rocks to my ram than you can!

*Pulls out 3 ton boulder*
Crash!
Oh crap!
Re:Proper Method for Benchmarking by void+warranty() · 2003-08-17 14:48 · Score: 1

4. Measure the mark left in the bench by the equipment. Bigger mark = better equipment.

Given the amout of heat modern chips generate and the size of the heatsinks needed to dissipate the heat, this is not far from the truth.
Re:Proper Method for Benchmarking by BandwidthHog · 2003-08-18 03:52 · Score: 1

Actually, that's not far from the truth.

I'm trying to find a good home stereo amp, but I can't stand the modern stuff (usually identified by being all black with a giant volume knob). I like the vintage hardware, with actual transistors rather than integrated circuits. So the way I shop is (mostly) by weight. If it was manufactured prior to 1980 or thereabouts, I simply go for the heaviest thing I can find. All else being equal (which it never is, of course), the amp with the biggest, baddest set of heat sinks will be the highest quality.

So that form of benchmarking suits my needs just fine.

WTB: Vintage home stereo amp. Must score 3/16" or better on a 3' benchmark test.

--

Quantum materiae materietur marmota monax si marmota monax materiam possit materiari?

imho by zr-rifle · 2003-08-17 07:21 · Score: 3, Insightful

the best benchmarks are those provided by your favourite game. UT2003 is an example frequently cited (and used) since the it comes with a series of benchmark tests (fly-by and botmatch) built in. That is the information I value the most, since after all I don't really need to play 3DMark-Wildlife as smooth as possible, but the games I play. I hope software developers follow the trend.

--
Hack your mind out of its sandbox.

Identify the truths ... Re:Goedel says benchmarks by leoaugust · 2003-08-17 07:32 · Score: 1

One of the key things in Godel's conception is that there are certain truths, i.e. for example that the math on earth and the math on mars will have something similar to prime numbers, etc.. That is why the two maths created on earth and mars, where certain common symbols are used in both the earth-maths and mars-math, have some similarities. The similarity is not that they are using some of the same symbols, but that they using math to describe the same reality ...

the equivalent of the benchmarks definition is to split it in two views .... one is that of the benchmarks writer, and the second of the benchmark consumer. If the "realities" of both of them are same, then the benchmark consumer will be able to extract the "correct" meaning from that what has been put in the benchmark by the writer ....

If the realites are not the same, the transmission of meaning from the writer to the consumer will not take place correctly ...

So, the point is not to compare the card with another external device for anchoring, but to compare the "report required by the consumer" to the "report prepared by the writer."

Because of so many different requirements and writers, the task of writing benchmarks is like 'mass customization." The ultimate answer is to take some input from the benchmark consumer, and use that in creating a "number" for that benchmark .... or simply, as someone said ... to each his/her own ...

--
To see a world in a grain of sand, and then to step back and see the beach where the sand lies ...

Video Card Benchmarks by Prien715 · 2003-08-17 07:38 · Score: 3, Interesting

The article misses a major part of video card market. Most people don't buy video cards soley based on what games they can play. Otherwise, everyone's card would be out of date in less than a year. People buy video cards and other computer hardware based on not only what it can do in the present, but what it will be able to do in the future. Most people can't afford to buy a new video card every month. And for those people, looking at a benchmark will give them some idea of the advantages of different pieces of hardware in conjunction with software that hasn't been developed yet.

--
-- Political fascism requires a Fuhrer.

Re:Video Card Benchmarks by Moraelin · 2003-08-18 03:57 · Score: 1

Well, yes, except noone's ever managed to _know_ what will be required of those cards in the future.

I've seen a buttload of synthetic benchmarks (all the way from the 80's), and invariably they never predict anything useful. They're actually _less_ meaningful than just running your favourite game or app or whatever.

First of all, computers are too complex to put everything into a single number. Graphics cards too. The exact mix of instructions (e.g., which shaders are used), the exact data set (e.g., overdraw vs polygon count vs texture resolution), and so on, can make it all behave completely differently.

As a simple example, we already have at least two fundamentally different speeds for graphics cards: fill rate _and_ T&L speed. In a given game and in a given resolution, you may be limited by one, while the other still is barely stressed at all.

Now add on top of it other factors like pixel and vertex shaders, FSAA, anisotropic filtering, etc.

What you're left with is: every single game, in every single setting, will act very differently, speed-wise. Each game will produce a very different result when used as a benchmark.

And those purely synthetic benchmarks will not predict future games, they will just produce purely artifficial results. Results that don't resemble the actual performance of any actual game. Past, present or future.

Which brings me to two wishes of mine, when it comes to benchmarks:

1. That people would stop using these crap synthetic benchmarks as some Supreme Truth (TM), and just face reality. As in, "yeah, but how does it behave in actual _games_"?

2. That people would use more than 2-3 games as benchmarks.

I mean, really, there's more to computer gaming than Quake 3, UT and Serious Sam. Tell me how it performs in a flight sim, for a change. Or in a racing game. Or how about an RPG? Yeah, tell me how well it does in Morrowind or DAoC or whatever. Or, hey, how about benchmarking it in ePSXe? (Yes, that PSX emulator.) Or in a strategy game? Etc.

Now those might help me more than the frame rates for some games I don't even play. (No, I'm not that much into FPS any more.)

--
A polar bear is a cartesian bear after a coordinate transform.

Sound Cards by Coneasfast · 2003-08-17 07:58 · Score: 2, Interesting

What about sound cards? These should be discussed more than anything since most benchmarks are useless... They usually compare latency, which is possible the worst way to compare sound cards... and also midi features and surround features, which many people dont even use.. and 3d quality, which is required in a test, but not the main aspect

what they should compare the most is the sound quality, this i find very lacking in sound card benchmarks... such things as Signal/Noise ratio, frequency response, bass/treble controls, etc...

pcavtech does a good job of this, too bad more sites don't do something like this..

but the site did get the right point, speed is not everything, quality counts for much..

--
Marge, get me your address book, 4 beers, and my conversation hat.

The "study" seems to be fairly biased... by arthurh3535 · 2003-08-17 08:50 · Score: 1

...because, oh boy, doesn't it make a lot fairly sweeping declarations without substantiation. It put just enough graphs to look credible, and then never actually showed any proof of their premise.

They never actually showed how the Kyro II actually "out-performed" the GF2 in real life applications.

I could just say that I don't believe there's people being killed in Iraq, it's all a goverment conspiracy. If I had no proof, I *should* be laughed out of town.

Synthetic benchmarks are not *inheriently* bad. As long as the benchmark is not being skewed somehow (cheats, ignorance, mistakes) there's no reason not to use them as a guidline for performance. If you can *also* do real life testing, that's even better, there's more information to infer performance from.

The benchmark "3D Mark 2001SE" is not even a current benchmark. One of the main premises of the 3D Mark series tests is the fact that it is testing graphical items that are not yet *in* games. IIRC, they program in the latest wizbangs that graphics cards are putting in their hardware. So it is an estimation of "future" performance.

Their estimate is better than running current games that don't even test those future features. Their accuracy is not as great as actualy testing with games that will come out in two/three years (they can not and do not expect to replicate all programming tricks that the entire industry will come up with to cludge around some weird problem.)

In short, this article is slanted and bunk. They make sweeping inclusions and fail to back it up with hard data, misinterpet "synthetic benchmarks".

If I was a suspicious person, I'd guess they were hired to debunk 3D FutureMark and thier ilk as spurious testing methodology.

If they aren't, I'll apologize.

It's just bad reporting then.

Arthur Hansen

--
No! It's a *SIG*. Keep the Special Interest Groups away! (Con joke!)

Benchmarking is an inexact science by adam872 · 2003-08-17 08:52 · Score: 5, Insightful

This might sound like I am stating the bloody obvious, but it's true. I think there are several facets to good benchmarking (based on my own experiences and reading other reports)....

1, Choose a test/workload that is representative of what *you* will be doing. There is no point in looking at SPECINT200 if you are going to be running an I/O intensive application like a RDBMS. Try and run or study tests that are relevant to the intended use of the system/component you are benchmarking.

2, Take note of things like compiler flags etc. These are important in tests like SPEC, as your results can vary wildly according to things like optimisation level. Some compilers produce faster code on certain CPU families and not on others. This is a reason why a lot of vendors will build their own compilers and test with them (e.g. SGI, SUN, DECPAQ).

3, Look at the full disclosure notice in the benchmarks. Take a look at the system configuration used. This is particularly, IMHO, on tests like TPC-C. The score you see might be based on a really whacky config, like most of the figures at the top of the list. For example, look at the Proliant figure (709k) and look at the config: 32 x 8 way servers to run a single database. Then compare it to a 64-way SuperDome or 32-way p690. Which comfig makes more sense? For a database, I would likely go with the single system for simplicity's sake. On another application, maybe the cluster would make sense.

4, Compare apples to apples. This is the hardest part, as CPU's, OS's, I/O, Apps. Compilers etc etc all vary across platforms. I like to to try and compare one variable if possible. To take the TPC-C again, I try to compare DB against DB, Cluster against Cluster, SMP against SMP etc etc. There is nothing to be gained, IMHO, from comparing MS-SQL server in a cluster on Xeon with Win2k3 to Sybase on a SF15k running SPARC Solaris. How do you properly compare these two results? Maybe the solution would be to look at SQLServer on one system against another or Sybase vs Oracle on a similar Unix system.

5, YMMV. Benchmarks are only ever an indicator of performance, not a guarantee. I tell my customers this all the time. They represent a result with a particular system, data set, O/S, tuning settings etc etc at a point in time. Other people's results with a similar config might differ considerably.

I could go on forever, but the above are my 2c

Well... by Awptimus+Prime · 2003-08-17 09:45 · Score: 4, Insightful

I don't know.. The higher the 2001SE score, the higher the FPS in a game, typically. That is, unless someone's drivers are cheating.

He didn't even mention 3DMark2003, which does a more comprehensive job testing modern GPU features and is included on any benchmarks of 'modern' (aka DX9 supported) cards.

Think about it, in 2000 when they were working on the 3dmark 2001, directx8.1 wasn't even done; to my knowledge, most of DX8 wasn't even used in 3dmark 2001se.. Since then, cards came out with tons of new feature sets (directx9, AGP 8X, etc) and there was simply a lag time between good benchmarking software.

Now, I do agree with charting performance over time. This would be much more handy when doing comparisons of AMD and Inel processors. I get the same over-all frame rate with my AMD 2400 as an Intel @ ~2.6gig. But, the Intel w/ a faster bus will likely not be getting those split second ticks where the AMD is 100% occupied or the FSB is flooded.

I'm not knocking AMD at all. I can just tell a difference in the overall smoothness of a CPU intensive game. When I bought mine, I spent about 3/4 of what I would have spent on an Inel rig and got around 3/4 of the performance.

It all works out once you stop paying attention to a marketing department. People always say you can't trust advertising, but act so suprised when a company is exposed for making a false claim of some minor sort.

you get what you pay for.

Also, instead of complaining about poor benchmarks in real-world situations, you should write the various game developers and request they add, or consider adding, a benchmark to their game engines. Having to 'devise' a way to test game performance probably isn't going to result in wide-spread adoption of that particular benchmark. ID Software's engines have always come with built-in benchmarks (timedemo), thus making them very easy to test. That's why you always see the games that use ID's engines in benchmarks.

That brings me to my final point, he mentions that StarWars game should be tested instead of Q3, yet it uses the same engine. Sorry, more copies of Q3 exist, and since any game using that engine doesn't bring anything new to the table, might as well stick with it. eh?

Benchmarketing by Anonymous Coward · 2003-08-17 11:32 · Score: 0

'nuff said.

Speedy by mvpll · 2003-08-17 12:27 · Score: 3, Insightful

Modification of the front side bus speed is now a fairly trivial affair, it can be done by software whilst the system is running. Increasing it by 2% is very unlikely to cause the system to fail, but will give you a few dollars more on your benchmark result.

Various BIOS settings are also able to be changed on the fly, checking all these values whilst the benchmark is running will alter the results of the benchmark, but the difference they can make requires any true benchmarker to monitor them...

Studying Benchmarking? by stock · 2003-08-17 12:40 · Score: 4, Funny

Come on!

There's Lies, there's damn Lies and finally there are benchmarks.

Robert

Better definitions by frozenray · 2003-08-17 12:40 · Score: 2, Funny

Benchmark v. trans. To subject (a system) to a series of tests in order to obtain prearranged results not available on competitive systems. -- Stan Kelly-Bootle, "The Computer Contradictionary"

Edelstein's First Law of Benchmarks: Every commercial product has its best performance on standard benchmarks.

Edelstein's Corollary: If the system you wanted to win didn't, the benchmark wasn't fair.

--
"There are already a million monkeys on a million typewriters, and Usenet is NOTHING like Shakespeare." - Blair Houghton

"in relation to itself"??? by Anonymous Coward · 2003-08-17 14:30 · Score: 2, Funny

You're benchmarking a piece of hardware against itself?

That should prove to be an interesting technical comparison.

"We were surprised when Hardware A managed to score a 975 on the TurboMaxQuad Doohickey test, but we were shocked when Hardware A blasted out of the gates and scored a whopping 975 on the TurboMaxQuad Doohickey test..."

I play games by xenocide2 · 2003-08-17 14:39 · Score: 1

So should I use Quake to benchmark? Its popular and many people allready use it as their gauge of performance. But there was also mention of drivers that recognize the executable itself and tweak operations. These tweaks won't apply when I say, play Half-Life, when I should expect some sort of correlation in performance between the two.

--
I Browse at +4 Flamebait

Open Source Sysadmin

Good Benchmarks by Lelon · 2003-08-17 16:13 · Score: 4, Insightful

It seems rather obvious that we need a paradigm shift in the way we benchmark our hardware. I like benchmarks of things I actually do with me computer. For example, the time it takes a setup to encode an mp3 or svcd file. Some people are using benchmarks like these, but there is no readily available program suite that benchmarks your system using these real life scenarios. Sure, I could do them myself, but I wouldn't know how my system performs to other systems if there isn't a standard benchmark.

best scientific benchmark is Hint by Franosch · 2003-08-17 22:09 · Score: 1

Hint is designed to fulfill the following goals.

Scalability. Runs on a calculator, even on your brain. No need to upgrade benchmarks every year.
Compiler independent. No easy "cheating" possible.
Speed dependent on problem size is calculated. Different processors have different cache sizes and perform by a factor of 10 or more differently on different problem sizes.
You can check every result, Hint is Open Source and free.

As most other benchmarks fulfill none of the design goals above, almost all benchmarks are nearly useless.

What is HINT?

HINT or Hierarchical INTegration is a computer benchmarking tool developed at the Scalable Computing Laboratory (SCL) of Ames Laboratory, and is funded by the Office of Scientific Computing, U.S. Department of Energy (DOE). Unlike traditional benchmarks, HINT neither fixes the size of the problem nor the calculation time and instead uses a measure called QUIPS (QUality Improvement Per Second).

This enables HINT to display the speed for a given machine specification and problem size. Computers typically start up fast and slow down as they run out of fast memory and start using the main memory, or slow down even more if they have to access the disk. Such changes are easily visible with HINT generated data.

HINT is scalable and easily portable for a variety of architectures. It can be run on anything from a programmable calculator to a supercomputer.

YHBT. by Anonymous Coward · 2003-08-17 22:20 · Score: 0

ah ah me maties ye have been trolled

Re:Prefer multiple benchmarks, or your own 'proble by DancingSword · 2003-08-17 22:25 · Score: 1

Here's a nice example of how benchmarking can give non-applicable "information":

Say one's got a system that acts as a file-server, NAS, or something, and one backs-it-up using DVD-RW's, and one's victi^H^H^H^H^H users complain about the system being intermittently hammered, whenever you're doing the backup, calling-up some program to tell you how many context-switches are happening would show you that when you've got the DVD-RW loopbacked, and are diffing ( niced to 19 ) the ISO with the DVD-RW, you're sustaining more than 5000 context-switches / second...

Benchmarks don't usually context-switch between multiple programs that way ( or at least I've never 'eard of such ), so one's OWN benchmarks HAVE to include all the strange things one's own tools do, AND one has to actually check what one's tools actually do, to know what to check for/with...

( good rule, that: Check What Is, Rather Than What Is "Known". )

A dual-CPU'd be better for this case, obvaneously, but ..
.. as for why I said DVD-RW rather than DVD-R? organic-dyes die MUCH quicker than eutetic metal's crystallization-pattern, so DVD-R's I consider less long-term reliable ( organic-dye ) than the -RW +RW type discs, which record the information in the crystallization ( annealling is blanking, I gather ) of the metal-layer. More expensive, sure, but if it isn't going to rot on me data...

I'd dearly like to see a benchmark-suite that tested each "corner" condition it could, gave one a chart showing systematically the results, gave one a comparison-graph comparing the current system to the best/worst in comparable-systems for that test, and encouraged clear knowledge of what the corner-cases are, as well as the balances/interrelationships...
... instead of these damn "benchmarks" that show how well a system will perform .. synthetic-benchmark-99, which isn't particularly useful. iometer-runs, bonnie++, for streaming-media I suppose hdparm -t, diffing a pair of ISO images ( does the context-switching happen only when one is a loopbacked ISO and the other's on a SCSI device? hmm.. ), discovering what the actual hdparm and smartctl settings are on that drive ( just because one told a setting to be set doesn't mean it actually got set, eh? )... systematicness & rigour..
Happy Happy Joy Joy!!

--
Messages to/for me ( in me journal )

95 comments