Proposal For Open-Source Benchmarks

← Back to Stories (view on slashdot.org)

Proposal For Open-Source Benchmarks

Posted by Hemos on Friday April 14, 2000 @06:28AM from the but-what-about-winmarks dept.

nd writes: "Van Smith from Tom's Hardware has written a proposal that calls for open source benchmarking. He talks about the need for increasing the objectivity of benchmarking. The proposal is basically to develop a suite of open-source benchmarking tools and new methodologies. It's a rather dramatic column, as he discusses Transmeta, bias towards Intel, among other things. " Well, once you get through the inital umpteen pages of preamble, the generically named A Modest Proposal is the actual point. Interesting idea - but I shall weep for the passing of bogo-MIPs as the definitive measure of system performance. *grin*

11 of 118 comments (clear)

Min score:

Reason:

Sort:

"Open source" ideology by Anonymous Coward · 2000-04-14 00:54 · Score: 3

You know I'm getting somewhat sick of the whole open source thing. At first I thought it was a Good Thing, a way to allow people to collaborate on code and to keep it from being stolen. But gradually I am becoming more and more cynical about it - not so much the concept, but more the zealotry that surrounds it.
Just look at the title of the article linked in this story - "A Call to Arms - A Proposal for Open-Source Benchmarks". WTF? Why is this a call to arms? Isn't this just a bit rabid for what is, after all, just an article about benchmarks. Benchmarks may be important, but they're not worth getting worked up over.
And then the first page of the article is a rambling piece of tabloid "cyber"-journalism far worse than even Jon Katz has ever managed. Why is this diatribe necessary? Surely we all know what open-source is, and we all realise that the net has changed a lot of things. No, it's the same thing I see again and again - the zealotry of the open source proponent who feels the need for grand rhetoric and buzzword-filled arguments.
There is an ideology behind open source, and a good one, but it has been taken too far. Richard Stallman is not the best person to represent such a diverse group of people - his radical politics and hatred of commercialism make him quick with the denounciation of anything he disagrees with, like the name Linux - after all, he'd rather it was "GNU/Linux" or even worse "Lignux". This kind of ideological zeal is certainly putting me off of the idea, and others I'm sure too, but there seems to be a never-ending parade of people willing to subscribe to his beliefs and zealotry.
Anyway, what I'd like to see is a return to what open source is about - writing good, free code for the use of all. There's no need for flaming attacks on closed-source software or whatever - that shouldn't be the point of open source, and is just a waste of time better spent coding. Unfortunately /. seems to provoke this kind of hysteria, but even with this I'll still read it :)
If you disagree, feel free to reply. Nicely :)
Funny... by Anonymous Coward · 2000-04-14 00:55 · Score: 4

Tom: "Open Source Babble Transmeta Crusoe Linux Ramble Internet Cyber-World Paradigm Revolution"
Slashdot Multitudes: Yay! (clapclapclapclap)

Jon Katz: "Open Source Babble Transmeta Crusoe Linux Ramble Internet Cyber-World Paradigm Revolution"
Slashdot Multitudes: Windbag! Parasite! Media Whore!(boooo, hissssss)
1. Re:Funny... by bgarcia · 2000-04-14 02:44 · Score: 4
  
  He has a point about... wait a sec... Jon, is that you?
  
  --
  I'm a leaf on the wind. Watch how I soar.
Read the HOWTO by pb · 2000-04-14 04:02 · Score: 3

Anyone remember the Benchmarking HOWTO?

There are *lots* of open-source benchmarks, and of course we can make new and better ones, and get a test suite together.

For starters, the LBT (Linux Benchmarking Toolkit):
Run the BYTEmarks (and the old UNIX ones too, they're funny), Whetstone, XBench... oh, and compile a stock kernel (and don't fiddle with the options, 2.0.0 was recommended then.)

Personally, I'd also suggest bonnie, it's a good benchmark for disk performance, but you'd have to have a range of options here. (testing disk performance and cache, so you'd really want a large number here too, just to be fair. 2*RAM?)

Also, when RedHat boots up, it has those RAID checksumming tests, those are good. They test different implementations of the same algorithm, so they say a lot about the individual chip. (whether it likes MMX, works well with different optimizations, and whatnot)
---
pb Reply or e-mail; don't vaguely moderate.

--
pb Reply or e-mail; don't vaguely moderate.
Re:Uhhh....Yeah, but who will use it? by IntlHarvester · 2000-04-14 06:08 · Score: 3

Back in the old days, Cadillac shipped cars with 472 and 500 cubic inch engines (about 8 liters in modern terms). These things put out nearly 400 HP and buttloads of torque. With the exception of some muscle cars and the Corvette, Cadillacs were the fastest cars GM built.

But, nowhere in their advertising did they mention the size of the engine or the amount of power or anything about "performance". Back in those days everyone just knew Cadillacs had plenty of power. I suspect it's the same with IBM and their mainframes - just too much reputation to even advertise.
--

--
Business. Numbers. Money. People. Computer World.
Re: Tom's Hardware by BrianH · 2000-04-14 09:12 · Score: 3

If a benchmark could be written that would accurately simulate real world applications, then I'd say let them optomize their hardware/drivers for it. If the benchmark is good enough, then any optomizations made for the benchmark should also cause a performance increase in your genuine applications. Of course, therein lies the trick. Can you make a benchmark that realistic?

--

There is nothing so pathetic as seeing a beautiful young theory roughed up by a tough gang of facts.
Uhhh....Yeah, but who will use it? by Bowie+J.+Poag · 2000-04-14 00:39 · Score: 5

In an industry where hard disks capacities are still measured in 1,000,000 bytes per megabyte, and 19" monitors are still 17.9" viewable, what makes you think that any company would adopt a benchmarking standard that was actually impartial to their product? The whole point of benchmarking your own product is to give the marketing department something to crow about. So, logically, they gear their hardware (and choose their benchmarks) accordingly.

Sure, its a great thing for the rest of us, because we dont have anything we're trying to sell. Just dont expect anyone on the outside to hop on the bandwagon.

Yours In Science,

Bowie J. Poag
Project Founder, PROPAGANDA For Linux (http://metalab.unc.edu/propaganda)

--
Bowie J. Poag
The Good, The Bad, and the Ugly... by Silverpike · 2000-04-14 02:43 · Score: 4

Ol' Tom has a good point. Sysmark really isn't the right solution for comparing processors. What he proposes is a realistic, achievable goal, but you have to define the playing field first.

The Good:

There already is a great benchmark for processors, and it's called SPEC. Yes, it's not open source, but it's really quite reliable for comparing CPUs of any architecture. As slashdot user "cweber" pointed out in his post, they have been doing this for 11 years, and they periodically revise their benchmark suite to stress CPUs more uniformly.

The open-source method. This is really good to ensure that there are no cheaters at the benchmark level.

Tom's interesting ideas on Crusoe. This stems from the fact that SPECmarks don't quite approximate real usage that Crusoe depends on to use it's hotspot optimizations. However, we are interested in the raw sustained speed of the processor (in this case), not the speed of the OS or it's task swap latency. Tough problems to solve.

Open-source means that the benchmark code will be able to take advantage of the best compiler available for the target CPU (see comment at end).

The Bad:

Anyone who has done benchmarks knows that even small variations in system config can have strage or harmful effects on the benchmark results. This open-source effort is going to have to have a database of hardware configs in order for this to be useful.

The Ugly:

Vendors are going to oppose this (at least not support it). Why? Because plain and simple they have an interest in promoting the most favorable statistics possible about their products. They want to keep feeding you "polygon fill rates" and "texels per second" because their card may not stand up in a direct test program comparison. Plus, they are just dying to convince you that they have new BogusMarketingAcronym (tm) technology and their competitor does not. Nevermind that SSE and 3Dnow do pretty much the same thing -- companies have an interest in differentiating themselves as much as possible.

If this benchmark actually takes off (and gets widely accepted), we might get cheaters at the firmware or hardware level. This has happened before -- although which company it was and which benchmark they cheated I can't remember. I can't find it on the net or remember to save my life (sigh)...

I also need to say something to the people who think a processor should be judged independently of a compiler. This is just plain dumb. Why? Because a processor and it's compiler are a team. You can't use one without the other. When a chip is designed, there is a direct information dependence between the chip architects and the compiler writers. They are designed as a pair (ideally), and they should be tested as such. If a given compiler has great optimizations, then great! That means the compiler understands its target real well. It is a win for both the CPU and the compiler for pulling it off. This compiler is going to do the same kinds of optimizations when vendors use it to write programs, so that helps the comparison between benchmark code and apps.

However, I can see the need to compare not only the best compiler, but GCC as well, because of its broad acceptance. But if you are serious about performance, and want to get every once of juice out of your chip, you use the vendor provided compilers, not GCC. Don't get me wrong, GCC is great for compliance and portability, but it usually doesn't compare well with vendor compilers for generated code speed (with the possible exception of IA-32).

Ars Technica also published, a while back, some good information regarding CPU benchmarks. Check it out if you are interested in SPEC or CPU benchmarks in general.

--
The opinions I post here have nothing to do with my employer.
Benchmarks by nature are subjective by zerodvyd · 2000-04-14 00:47 · Score: 3

to be truly objective, the actual benchmark code should be written in a cross platform capacity. I question the reliability of benchmarking software in general, go ahead and call me a skeptic or whatnot...but I stand by that claim. What defines a benchmark? Is it not a measurement of the performance of one aspect of a system? Benchmarks should be open sourced, the community that uses the system(s) at large should define what the tests (torturous as they should be) actually test. That will determine the difference between fluff and actual fact.

...just as long as they keep the BogoMIPS around I'm okay with it :) lol

zerodvyd
I'll tell ya who. . . by xant · 2000-04-14 02:36 · Score: 3

Well . . . kind of the POINT of this whole exercise is to take the ability to perform referenceable benchmarks out of the hands of the interested parties (those who make money from them). Closed-source, commercial benchmarks are inherently flawed for some of the same reasons closed-source, commercial security is flawed. The difference is that those interested in finding and exploiting these flaws aren't crackers, but hardware companies.
So to answer your question: Tom's Hardware, and other reputable benchmarking authorities, would use it. TH has rapidly become one of the highest-integrity, best-respected hardware/computing sites around, even (indeed especially) for the Windows crowd. (After, Win32 is still the dominant gaming platform.) If such a thing as open benching became popular, then commercial entities would be FORCED to use the open benchmarks or be accused of marketing skewed numbers, whether those accusations had merit or not.

--
It's rare that you're presented with a knob whose only two positions are Make History and Flee Your Glorious Destiny.
Here are some suggestions... by Signail11 · 2000-04-14 01:29 · Score: 4

I suggest basing an open-source benchmark suite on the existing Spec benchmarks, as most of the code (or functionally equivalent code) is relatively freely available. Of the 12 SpecINT 2000 benchmarks, 5 (gzip, gcc, crafty, perlbmk, and bzip) already exist as open-source programs. The combinatorial optimization (181.mcf) benchmark's code is also on the Internet at www.zib.de, free for academic use. I'm sure someone could make a cleanroom interpretation of something similar. 175.vpr (a place and root program) can be found at http://www.eecg.toronto.edu/~vaughn/vpr/vpr.html. 197.parser is essentially a CS student's problem about parsing and extracting strings. 252.eon is a raytracer (we can use POVRay instead). 254.gap is a general purpose math library (Victor Shoup's NTL library exercises most of the same functions). 255.vortex is a standard RDBMS; MySQL or an equivalent could be used here. 300.twolf seems rather similar to 175.vpr; as circuit designing is really far removed from my field, I'll leave this to someone else.