Benchmark Program Rewritten to Favor Intel?

← Back to Stories (view on slashdot.org)

Benchmark Program Rewritten to Favor Intel?

Posted by ryuzaki0 on Saturday August 24, 2002 @02:02AM from the if-all-else-fails-change-the-requirements dept.

BrookHarty writes "Interesting article over at Van's Hardware, that BAPCo the maker of the SysMark benchmarking program, has re-written its SysMark 2002 benchmark program in favor of Intels P4. AMD joined BAPCo in order to "correct" these "broken" results. AMD reports that BAPCo's SysMark 2002 (written by Intel Engineers) is a collection of tasks to summarize "Real World" performance. Interestingly, these tasks are selected for Intel's favored performance, while removing certain tasks that favor AMD. Vans Hardware has additional information on BAPCo's Shady history."

12 of 228 comments (clear)

should be open. by GoatPigSheep · 2002-08-24 02:12 · Score: 5, Insightful

Obviously, the best bet for cpu benchmarks would be an open-source one compiled using a standard compiler. This is a case where open-source really shines.

--
GoatPigSheep, the 3 most important food groups
1. Re:should be open. by GoatPigSheep · 2002-08-24 02:36 · Score: 4, Interesting
  
  Well in this case the comparison is between two x86 cpu's, the athlon and the pentium4. Both would support standard x86 instructions. If you want to measure how fast the cpu is you would want the program to be unoptimized. Perhaps SSE would be fine since both cpu's support it.
  
  Using optimizations wouldn't be fair unless you had a good idea of the percentage of programs that ARE optimized for one or both cpu's. Many new programs are optimized for both cpu's, such as Cubase SX, a software studio program. I suppose you could use one of those programs as a benchmark in addition to the raw unoptimized open-source one so you can get an idea of how well the cpu performs with or without it's appropriate optimizations. Also, it makes a difference wether there is a free version of the optimized compiler, because if there isn't, there is a higher chance that programs made by individuals at home (who can't afford a 500$ compiler) would not be optimized.
  
  --
  GoatPigSheep, the 3 most important food groups
Re:Big deal by ergo98 · 2002-08-24 02:27 · Score: 5, Insightful

Intel has used the "once compilers catch up..." scam for years, and every time people find themselves with a long obsolete processor by the time the software the theoretically exploits it arrives.

My general practice is to ignore any synthetic benchmarks because they represent no real world value whatsoever: Instead I look to application benchmarks, like compressing divx movies or rendering 3D scenes, if that was the use that I had in plan for my PC.
Re:Partialy AMD's Fault by Ninja+Programmer · 2002-08-24 02:47 · Score: 5, Insightful

BapCo's head quarters are on the Intel campus. Its been Intel biased from day 1 (back when AMD was making K5's and thinking about making K6's) and AMD has known this.

The fact is, prior to the release of the Athlon, nearly all benchmarks were biased towards Intel. AMD's strategy when they released that Athlon was to make a CPU so good, it could beat Intel's CPUs even on these benchmarks. Sysmark just happens to be the one benchmark where Intel exercises so much control that it could literally say whatever Intel wanted it to say.

What you are seeing is AMD just starting to switch strategies from "lets just beat them on every benchmark under the sun regardless of bias" to "lets expose the bias where it is as its worse so people can know the truth".

This is all just preparation for the K8 launch I think. If AMD can properly put Sysmark results into perspective, maybe everything that is left will show what a monster K8 is versus any Intel offering. It is indicative that the K8 may not be winning on Sysmark on internal testing, or may not be winning by a sufficient margin.
Read the linked article by Anonymous Coward · 2002-08-24 02:47 · Score: 5, Insightful

As compilers become tuned to exploit this, it's plausible that the Athlon's performance is going to lag quite a bit more than it already does. That there is some benchmark out there that is specifically designed to show off this strength of the P4 is no real surprise to anyone, is it?
That's not the complaint at all. Read the linked article. The complaint is that Sysmark 2002 has been systematically altered relative to Sysmark 2001 so as to favour the P4 over Athlon.
For example, the PhotoShop test in Sysmark 2001 had 13 filters, of which 8 run faster on the Athlon and 5 faster on P4. The Sysmark 2002 PhotoShop test has 6 filters, of which 3 are filters from Sysmark 2001 on which P4 wins and the other 3 are additions on which the P4 also wins. The 8 filters on which the Athlon does better have all been removed.
There are several other examples in the article. Read the article
BTW, an interesting point is that this whole thing is basically an AMD publication that AMD have chosen to proxy via Van's. Van is at least open about it. The AMD presentation containing all the information in that article is linked at the end and is available here
Kyle @ HardOCP covered this yesterday by Cutriss · 2002-08-24 02:49 · Score: 5, Interesting

Here's Kyle's 4th Edition post from yesterday. Excerpts from Van's comments are in italics.

VansHardware & AMD: There is a report on VansHardware this morning that visits the differences between BAPCo's SysMark 2001 and SysMark 2002. The report's basic theme is that SysMark 2002 is skewed towards making the Intel Pentium 4 results look better than the AMD CPU results could have looked. It basically shows examples of things that were changed in SysMark 2002 that cherry pick areas in certain programs that the Pentium 4 excels at. While the article might seem to be work done by VansHardware there is something you need to know. All of the data shown in that article has been put together by AMD and not VansHardware. Take note of this one statement in the article.

However, AMD has been able to "pick the lock" on SysMark to gain a much keener understanding into the internal workings of these tests.

VansHardware is not the one with the "keener understanding", AMD is.

The original PDF document from AMD is linked for download so the fact that this data is not Van's is not exactly hidden either.

Also their opening paragraphs state this.

At this moment we will pause from the long march through our benchmark results to revisit the significant issues regarding BAPCo's SysMark 2002 brought up by AMD during our recent meeting with representatives from that chipmaker.

We must state up front that despite the condemning information divulged to us, the AMD spokesmen repeatedly expressed support and guarded optimism for the reformation of BAPCo.

The "significant issues" and "condemming information" shown were not harvested by VansHardware, actually all they do is interject a little bit of commentary.

AMD has verified to me this morning that all of the graphed and tabled data shown on the VansHardware report is data that has been mined by AMD. Does this make the data inaccurate? Of course not, but I am sure that it hardly shows both sides of the story. AMD is not going to supply VansHardware with information that makes Intel look good. VansHardware represents to me, nothing more than an AMD fansite that takes shots at Intel every chance they get. I think they are far from what anyone could consider objective journalist and reporters. Them doing a cut and paste job with AMD's data goes to show that as true in my opinion. Websites get fed information all the time, trust us, we know. It is our jobs to go back and prove data and claims in our labs on our own time, not to repost corporate data, that can be considered far from objective. Independent sites in our hardware community should not be reposting PR spin in such a way as this. There is a fine line here but I think this is stepping across it.

VansHardware does not exactly hide the fact that the data shown is not theirs but rather AMD's, but they certainly did not seem to represent that in an upfront manner so the reader sees the information for being exactly what it is...data released by the AMD PR machine.

I am a huge AMD fan but I just don't like big companies being able to pump their corporate data into our community when it is not presented as such. I think AMD should have the balls to post information like this on their own website and not try and "slip it in" through a back door. In fact, I would consider the information to be much more credible if it were posted on AMD's own website as AMD research.

I know Van has gotten upset here recently with his past employer removing his name from articles he has written. It seems to me that Van has done little to deserve his name being on this article and it should show authored by AMD.

(ED NOTE - This is referring to some allegedly plagiarised articles that Tom's Hardware published after removing Van's name from them)

Also worthy of mentioning is that AMD is now fully working with BAPCo, which they have not done in the past. AMD has had the ability to work with BAPCo for a long time now to make sure their products get represented properly and we are certainly happy to finally see AMD join the party to give the boat a more even keel.

Lastly, another tidbit worth throwing into the mix is that Van Smith, owner of VansHardware, possibly either works for or is contracted to VIA as a CPU validation tester. We are working on a confirmation of this from VIA now. Do we need hardware websites that do work for the companies they end up reporting on? Just another thing to consider when objectivity is in question.

--
"Mod, mod, mod...and another troll bites the dust."
Pick what you consider for your benchmark by (H)elix1 · 2002-08-24 03:04 · Score: 5, Insightful

When I dig through reviews on the latest CPU and/or mainboard, I initially groaned at the increasing number of benchmarks folks would put out. It is more than just increasing click-through rates (well maybe not for some, but...) - it lets me see applications that I use. Synthetic benchmarks and politician's promises garner then same level of trust from me.

Anyhow, I game and code but use games to judge where my cash goes. When the P4 came out, I saw it did great job with Quake and I started to get excited about the CPU. Then I saw the benchmarks on the games I actually play - UT, CS, and a few others - and it was not black and white. After the ATI fiasco, Quake is up there with synthetic benchmarks IMHO. As for Photoshop, you can pick what platform you want to 'win' by tuning the filters. Apple does it, their dually box wipes out the competition, the other do it and the tables are turned.

There are great graphs out there that show benchmarks using different sizes of data. Its like comparing a small turbo charged engine to a larger normally aspirated one - so what RPM were you at when you ran your test? BMW's M5 feels slower than an Audi S4 at the start, but get the RPM's up there and it is a different story. Even pickup trucks can beat a Ferrari if you tune the test to take advantage of a sweet spot.

I've done my homework, and my personal cluster is mostly AMD today. Still have one celeron 566@800 as a CS server, but my workstation (Intel Xeon box) was replaced by AMD MP chips. Secondary boxes are all XP chips, but they use to be PII&III's when Citrix and the K5 sucked. They run Oracle, Weblogic, LDAP, and other stuff quite well when I'm working, and one swap of a hard drive later I'm getting some solid fragging in on the same box. In another year or so, if Intel really hold the crown , the price is right, and my boxes are 'only fast enough for web browsing and email', I'll chose them.

--
+++ UGUCAUCGUAUUUCU
No, just nonsense. by fmaxwell · 2002-08-24 03:47 · Score: 5, Insightful

If AMD would stick to making totally Intel compatible chips instead of trying to infuse their own personality, we wouldn't have this problem. Hint: my software shouldn't need to know it's running on an AMD chip.

This is so wrong on so many counts...

1. Intel's chips aren't "totally Intel compatible". The Pentium 4 contains instructions that were not present in the Pentium, P2, and P3. Why should your software have to "know it's running on a" Pentium 4 rather than a P3, P2, or Pentium? Hell, there was even a Pentium and a Pentium MMX (the latter adding the MMX instructions).

2. Intel tries every trick possible to patent their instructions to prevent people from implementing them. They do it with hardware, too. Remember when you could plug a K6-2 in place of an Intel Socket 7 CPU? Starting with Slot 1, intel used patents to prevent others from making compatible CPUs, which is why AMD and Intel motherboards are now incompatible.

3. Why should AMD not provide useful processor extensions that improve on Intel's base instructions? That's what provides useful competition and makes the industry grow.

4. What interest do you have in seeing AMD in a constant catch-up mode? In your scenario, Intel gets an advantage every time they release new instructions -- that will take AMD months to implement in silicon. Do you own Intel stock?

5. Why doesn't Intel just stick to providing processors that are 'totally AMD compatible'?
Re:Big deal by Sivar · 2002-08-24 04:18 · Score: 5, Interesting

Besides, AMD has always been the value chip company. You can't expect them to keep up with Intel forever.

AMD has had a superior (in design) processor architecture to Intel since the K6 was released (though the K6 had mediocre FPU performance, the design was still more elegant--ask any x86 assembly programmer). The Athlon has given the P2, P3, AND P4 a run for its money, and early benchmarks of the hammer would seem to indicate that the expensive Itanium 2, which almost nobody actually uses, is going to be outrun as well.
The Pentium IV's really looong pipeline does allow the P4 to run at higher clockspeeds, but the branch prediction you mentioned is instant death. Branch mispredictions happen VERY frequently in any CPU (note the K6 had the most sophisticated branch prediction unit up until the "XP" series of Athlons) but with the Pentium IV, a single branch prediction requires up to 20 full clock cycles of work to be discarded.
The Pentium IV has other questionable design desisions that hurt performance as well. It has 8K of L1 cache, the same amount found in the ancient 486 processor, whereas the Athlon has that amount squared and doubled (128K). Current P4's have more L2 cache, but L2 cache is less important and slower. (Note though that the P4's L2 cache is particularly fast L2 cache)
The P4 has buffers to remember a series of decoded x86 instructions so that it does not have to decode them again--these are almost required because of the terribly long pipeline--but it doesn't have enough to speed things up in server environments. Most servers execute a wide variety of instructions such that the buffered instructions get very little use before being replaced by new instructions. This is even more a problem on systems that run many different applications at once, but this problem can be demonstrated just with DB servers (which use plenty of instructions) as the P4 tends to not scale as well as the Athlon MP when a second or third task is added (such as mail serving, web serving, etc.)

One dissapointment that I had with the Athlon is that AMD never used the excellent EV6 bus to its fullest. Athlons are superior in multiprocessor capabilities because different processors needn't share access to the memory bus. On Intel SMP setups, even on P4 Xeons (Which, IMO, are inferior to P3 Tualatin chips by the same company) when one CPU accesses main memory, it locks main memory for the other CPUs. All other CPUs have to sit and twiddle their transistors while the main memory is on use by only one CPU.
On AMD SMP setups, ALL processors can simultaneous access memory, merely sharing the bandwidth simultaneously. So, if one CPU is only using 100MB of memory bandwidth, the rest can be used by other CPUs at that time.
Unfortunately, this doesn't really matter much with only two CPUs, which is the largest AMD configuration you can get. You can, of course, see it in action with 8+ CPUs on EV6 Alpha setups (AMD licensed the bus from DEC's Alpha team) but Alpha setups are expensive as hell and are a dying breed.
If AMD had created a quad or 8-way setup, we would see the true power of a good design.

Fortunately, the Hammer has an even better design (one made by AMD no less) on an even better CPU. I fully expect the Hammer series to wipe the floor with all Xeons and possibly the Itanium 2 because of its design. An integrated memory controller that will tremendously drop memory latency, twice as many general-purpose registers of twice the size (Much less pushing and popping, for those that know some assembly) and, unlike the big vendor 64-bit processors, the ability to split half of the general purpose registers into chunks of 16 and 32 bits when huge numbers (2^64) are not needed. (On an Alpha/SPARC/R12000, if you want to store the number "42" you must use all of a register that can hold values up to 18,446,744,073,709,551,615. A bit wasteful)

--
Computer Science is no more about computers than astronomy is about telescopes. --E. W. Dijkstra
Re:Benchmarking the Benchmarks. by kigrwik · 2002-08-24 04:35 · Score: 5, Funny

> Benchmarking the various "Benchmarking Programs"

Yes but, "Quis benchmarkiet ipsos benchmarkiem ?":

Who benchmarks the benchmarks ?

(s/benchmark/custod/g and Google for the original quote :)

--
-- don't discount flying pigs until you have good air defense
No. by FreeUser · 2002-08-24 04:35 · Score: 4, Insightful

Wouldn't a better CPU benchmarks be taken by using the chipmakers' own compilers?

No.

The chipmaker would simply then optimize their compiler for the benchmark(s) in question, rather than for code more generally. In other words, what you suggest would still allow the chipmaker to cheat.

In order to have complete transparency in the benchmarking, both the benchmarks and the compiler should be open source (ideally free software, so that anyone can run and verify the benchmarks as well, allowing repeatable experimentation in the broadest scientific sense). If the chip maker wishes to submit optimizations to such a compiler they would be free to do so, since any such optimizations would in turn be open source (or free software) and subject to peer review.

A good candidate would be gcc, which runs on numerous platforms, and on several operating systems on AMD and Intel hardware.

Cheating would be much harder in this case, perhaps even impossible, something we need given the sordid history of benchmarking by all parties involved (except perhaps AMD? Can anyone recall an instance where AMD has cooked results? I ask because their current chip rating system is extremely conservative ... almost the antithes of what Intel is trying to do. Has this been a longstanding strategy on AMD's part?).

--
The Future of Human Evolution: Autonomy
Re:Big deal by Sivar · 2002-08-24 05:24 · Score: 5, Informative

Obviously you flunked your freshman-level computer architecture course. The P4 8K L1's 2-cycle load-use latency is 50% better than Athlon 128k L1's 3-cycle load-use latency (not even accounting for P4's clock speed advantage).Obviously you are imagining things, as I never said that was not the case. Latency is important, but it doesn't matter if the cache size isn't large enough to fit enough code in to enjoy the low latency.
The difference in hit rate between 8k and 128k is only about 5% meaning that it is substantially faster to go with the small/fast cache than the big/slow cache.
Really? That's interesting, and here's me wondering why both AMD and, other than in the P4, Intel have wasted so much money adding more cache memory.

Because you seem to be such an expert, so why don't you go ahead and list a few common programs for me that have a working set of less than 8K--the size that will fit into the tiny L1 cache. Can't find any? Gee, I guess that makes the size of the cache pretty important then. When a program's working set has to be swapped in and out between L1 and L2 cache, suddenly that latency doesn't much matter. Of course, you may feel free to prove to me that the P4 can run addition loops faster. Those will fit into about 8k.

Do the math - even an infinitely large 3-cycle load-use cache is slower than an 8k 2-cycle load-use cache.
Who was it again flunked their freshman computer architecture course? You're saying that if the Athlon had 512MB of L1 cache that the system would be slower than the P4 and it's 8K of lower latency cache?
What math is it that I should do? Do you know what the working set of a program is?
Having a tiny amount of cache is analogous to having a tiny amount of RAM. Put 32MB of low-latency RAM in your system. Overclock some DDR SDRAM to 200MHz (AKA "400MHz" by people that don't understand clock speeds) and set it to CAS2. Tell me how your system performs. Just as your system will have to swap just about all running code to disk, the Pentium IV will not be able to contain the core loops of the various running programs in L1 cache. The vast majority will have to be dropped to L2, which is significantly slower and higher latency, kinda defeating the purpose of that 8k of fast memory, no?
Working sets that cannot be fit into the P4's 256k or 512k or L2 will then be relegated to main memory and moved to L2 then L1 when the data is executed, and anything that won't fit in main memory (very rarely which includes the working set of a program) will be swapped to disk if the platform supports virtualizing memory.

In closing, your comment was surprisingly brash and conceited, not to mention rude and totally innacurate. Thankyou.

--
Computer Science is no more about computers than astronomy is about telescopes. --E. W. Dijkstra