Proposal For Open-Source Benchmarks
nd writes: "Van Smith from Tom's Hardware has written a proposal that calls for open source benchmarking. He talks about the need for increasing the objectivity of benchmarking. The proposal is basically to develop a suite of open-source benchmarking tools and new methodologies. It's a rather dramatic column, as he discusses Transmeta, bias towards Intel, among other things. " Well, once you get through the inital umpteen pages of preamble, the generically named A Modest Proposal is the actual point. Interesting idea - but I shall weep for the passing of bogo-MIPs as the definitive measure of system performance. *grin*
Tom: "Open Source Babble Transmeta Crusoe Linux Ramble Internet Cyber-World Paradigm Revolution"
Slashdot Multitudes: Yay! (clapclapclapclap)
Jon Katz: "Open Source Babble Transmeta Crusoe Linux Ramble Internet Cyber-World Paradigm Revolution"
Slashdot Multitudes: Windbag! Parasite! Media Whore!(boooo, hissssss)
In an industry where hard disks capacities are still measured in 1,000,000 bytes per megabyte, and 19" monitors are still 17.9" viewable, what makes you think that any company would adopt a benchmarking standard that was actually impartial to their product? The whole point of benchmarking your own product is to give the marketing department something to crow about. So, logically, they gear their hardware (and choose their benchmarks) accordingly.
Sure, its a great thing for the rest of us, because we dont have anything we're trying to sell. Just dont expect anyone on the outside to hop on the bandwagon.
Yours In Science,
Bowie J. Poag
Project Founder, PROPAGANDA For Linux (http://metalab.unc.edu/propaganda)
Bowie J. Poag
The Good:
There already is a great benchmark for processors, and it's called SPEC. Yes, it's not open source, but it's really quite reliable for comparing CPUs of any architecture. As slashdot user "cweber" pointed out in his post, they have been doing this for 11 years, and they periodically revise their benchmark suite to stress CPUs more uniformly.
The open-source method. This is really good to ensure that there are no cheaters at the benchmark level.
Tom's interesting ideas on Crusoe. This stems from the fact that SPECmarks don't quite approximate real usage that Crusoe depends on to use it's hotspot optimizations. However, we are interested in the raw sustained speed of the processor (in this case), not the speed of the OS or it's task swap latency. Tough problems to solve.
Open-source means that the benchmark code will be able to take advantage of the best compiler available for the target CPU (see comment at end).
The Bad:
Anyone who has done benchmarks knows that even small variations in system config can have strage or harmful effects on the benchmark results. This open-source effort is going to have to have a database of hardware configs in order for this to be useful.
The Ugly:
Vendors are going to oppose this (at least not support it). Why? Because plain and simple they have an interest in promoting the most favorable statistics possible about their products. They want to keep feeding you "polygon fill rates" and "texels per second" because their card may not stand up in a direct test program comparison. Plus, they are just dying to convince you that they have new BogusMarketingAcronym (tm) technology and their competitor does not. Nevermind that SSE and 3Dnow do pretty much the same thing -- companies have an interest in differentiating themselves as much as possible.
If this benchmark actually takes off (and gets widely accepted), we might get cheaters at the firmware or hardware level. This has happened before -- although which company it was and which benchmark they cheated I can't remember. I can't find it on the net or remember to save my life (sigh)...
I also need to say something to the people who think a processor should be judged independently of a compiler. This is just plain dumb. Why? Because a processor and it's compiler are a team. You can't use one without the other. When a chip is designed, there is a direct information dependence between the chip architects and the compiler writers. They are designed as a pair (ideally), and they should be tested as such. If a given compiler has great optimizations, then great! That means the compiler understands its target real well. It is a win for both the CPU and the compiler for pulling it off. This compiler is going to do the same kinds of optimizations when vendors use it to write programs, so that helps the comparison between benchmark code and apps.
However, I can see the need to compare not only the best compiler, but GCC as well, because of its broad acceptance. But if you are serious about performance, and want to get every once of juice out of your chip, you use the vendor provided compilers, not GCC. Don't get me wrong, GCC is great for compliance and portability, but it usually doesn't compare well with vendor compilers for generated code speed (with the possible exception of IA-32).
Ars Technica also published, a while back, some good information regarding CPU benchmarks. Check it out if you are interested in SPEC or CPU benchmarks in general.
The opinions I post here have nothing to do with my employer.
I suggest basing an open-source benchmark suite on the existing Spec benchmarks, as most of the code (or functionally equivalent code) is relatively freely available. Of the 12 SpecINT 2000 benchmarks, 5 (gzip, gcc, crafty, perlbmk, and bzip) already exist as open-source programs. The combinatorial optimization (181.mcf) benchmark's code is also on the Internet at www.zib.de, free for academic use. I'm sure someone could make a cleanroom interpretation of something similar. 175.vpr (a place and root program) can be found at http://www.eecg.toronto.edu/~vaughn/vpr/vpr.html. 197.parser is essentially a CS student's problem about parsing and extracting strings. 252.eon is a raytracer (we can use POVRay instead). 254.gap is a general purpose math library (Victor Shoup's NTL library exercises most of the same functions). 255.vortex is a standard RDBMS; MySQL or an equivalent could be used here. 300.twolf seems rather similar to 175.vpr; as circuit designing is really far removed from my field, I'll leave this to someone else.