Slashdot Mirror


Benchmark Program Rewritten to Favor Intel?

BrookHarty writes "Interesting article over at Van's Hardware, that BAPCo the maker of the SysMark benchmarking program, has re-written its SysMark 2002 benchmark program in favor of Intels P4. AMD joined BAPCo in order to "correct" these "broken" results. AMD reports that BAPCo's SysMark 2002 (written by Intel Engineers) is a collection of tasks to summarize "Real World" performance. Interestingly, these tasks are selected for Intel's favored performance, while removing certain tasks that favor AMD. Vans Hardware has additional information on BAPCo's Shady history."

1 of 228 comments (clear)

  1. Re:Big deal by Sivar · · Score: 5, Informative

    Obviously you flunked your freshman-level computer architecture course. The P4 8K L1's 2-cycle load-use latency is 50% better than Athlon 128k L1's 3-cycle load-use latency (not even accounting for P4's clock speed advantage).Obviously you are imagining things, as I never said that was not the case. Latency is important, but it doesn't matter if the cache size isn't large enough to fit enough code in to enjoy the low latency.
    The difference in hit rate between 8k and 128k is only about 5% meaning that it is substantially faster to go with the small/fast cache than the big/slow cache.
    Really? That's interesting, and here's me wondering why both AMD and, other than in the P4, Intel have wasted so much money adding more cache memory.

    Because you seem to be such an expert, so why don't you go ahead and list a few common programs for me that have a working set of less than 8K--the size that will fit into the tiny L1 cache. Can't find any? Gee, I guess that makes the size of the cache pretty important then. When a program's working set has to be swapped in and out between L1 and L2 cache, suddenly that latency doesn't much matter. Of course, you may feel free to prove to me that the P4 can run addition loops faster. Those will fit into about 8k.

    Do the math - even an infinitely large 3-cycle load-use cache is slower than an 8k 2-cycle load-use cache.
    Who was it again flunked their freshman computer architecture course? You're saying that if the Athlon had 512MB of L1 cache that the system would be slower than the P4 and it's 8K of lower latency cache?
    What math is it that I should do? Do you know what the working set of a program is?
    Having a tiny amount of cache is analogous to having a tiny amount of RAM. Put 32MB of low-latency RAM in your system. Overclock some DDR SDRAM to 200MHz (AKA "400MHz" by people that don't understand clock speeds) and set it to CAS2. Tell me how your system performs. Just as your system will have to swap just about all running code to disk, the Pentium IV will not be able to contain the core loops of the various running programs in L1 cache. The vast majority will have to be dropped to L2, which is significantly slower and higher latency, kinda defeating the purpose of that 8k of fast memory, no?
    Working sets that cannot be fit into the P4's 256k or 512k or L2 will then be relegated to main memory and moved to L2 then L1 when the data is executed, and anything that won't fit in main memory (very rarely which includes the working set of a program) will be swapped to disk if the platform supports virtualizing memory.

    In closing, your comment was surprisingly brash and conceited, not to mention rude and totally innacurate. Thankyou.

    --
    Computer Science is no more about computers than astronomy is about telescopes. --E. W. Dijkstra