Slashdot Mirror


Benchmark Program Rewritten to Favor Intel?

BrookHarty writes "Interesting article over at Van's Hardware, that BAPCo the maker of the SysMark benchmarking program, has re-written its SysMark 2002 benchmark program in favor of Intels P4. AMD joined BAPCo in order to "correct" these "broken" results. AMD reports that BAPCo's SysMark 2002 (written by Intel Engineers) is a collection of tasks to summarize "Real World" performance. Interestingly, these tasks are selected for Intel's favored performance, while removing certain tasks that favor AMD. Vans Hardware has additional information on BAPCo's Shady history."

29 of 228 comments (clear)

  1. should be open. by GoatPigSheep · · Score: 5, Insightful

    Obviously, the best bet for cpu benchmarks would be an open-source one compiled using a standard compiler. This is a case where open-source really shines.

    --
    GoatPigSheep, the 3 most important food groups
    1. Re:should be open. by GoatPigSheep · · Score: 4, Interesting

      Well in this case the comparison is between two x86 cpu's, the athlon and the pentium4. Both would support standard x86 instructions. If you want to measure how fast the cpu is you would want the program to be unoptimized. Perhaps SSE would be fine since both cpu's support it.

      Using optimizations wouldn't be fair unless you had a good idea of the percentage of programs that ARE optimized for one or both cpu's. Many new programs are optimized for both cpu's, such as Cubase SX, a software studio program. I suppose you could use one of those programs as a benchmark in addition to the raw unoptimized open-source one so you can get an idea of how well the cpu performs with or without it's appropriate optimizations. Also, it makes a difference wether there is a free version of the optimized compiler, because if there isn't, there is a higher chance that programs made by individuals at home (who can't afford a 500$ compiler) would not be optimized.

      --
      GoatPigSheep, the 3 most important food groups
    2. Re:should be open. by Shadow99_1 · · Score: 3, Interesting

      This would seem when Van's COMPREHENSIVE OPEN SOURCE BENCHMARK INITIATIVE (COSBI) would be useful... You could always get in touch with Van about helping out the project...

      --
      we are all invisible unless we choose otherwise
  2. Re:Big deal by GigsVT · · Score: 3, Informative

    I don't think there is much motivation on the part of compiler writers to optimize for this particular implementation of the x86-32 ISA. This isn't like previous chips, where new cache handling opcodes were added, which compilers could use if available. I've talked to people much better versed in compiler writing than myself, and they all seem to agree, when it comes to "optimizing for P4", their answer is going to be "don't hold your breath".

    --
    I've had enough abrasive sigs. Kittens are cute and fuzzy.
  3. Re:Big deal by ergo98 · · Score: 5, Insightful

    Intel has used the "once compilers catch up..." scam for years, and every time people find themselves with a long obsolete processor by the time the software the theoretically exploits it arrives.

    My general practice is to ignore any synthetic benchmarks because they represent no real world value whatsoever: Instead I look to application benchmarks, like compressing divx movies or rendering 3D scenes, if that was the use that I had in plan for my PC.

  4. Not another benchmark... by mustprotectdata · · Score: 3, Informative

    Coming from the Unix world, I'm used to comparing machines based on their SPECint and SPECfp performance...

    In general the SPEC people have done a better job being platform agnostic than some of the "miscellaneous" PC benchmarks.

    Current benchmarks for Intel http://www.spec.org/osg/cpu2000/results/res2002q2/ cpu2000-20020506-01357.html

    and AMD http://www.spec.org/osg/cpu2000/results/res2002q3/ cpu2000-20020701-01441.html

    Keep in mind that results for more recent AMD CPUs are not shown. If you compare the AMD 2200 with a 2.2G P4 you'll have 734 v's 784, which gives some credence to AMD's claimed rating.

    html4me!

    1. Re:Not another benchmark... by mustprotectdata · · Score: 3, Informative

      That's actually AMD = 764 v's Intel = 784 so it's even closer than stated above, i.e. within 3%.

      Like anyone would be able to tell :-).

      And my poor little Sunblade 100 is only 174. No wonder Solaris seems slower than linux.

      html 3

  5. Re:Partialy AMD's Fault by Ninja+Programmer · · Score: 5, Insightful

    BapCo's head quarters are on the Intel campus. Its been Intel biased from day 1 (back when AMD was making K5's and thinking about making K6's) and AMD has known this.

    The fact is, prior to the release of the Athlon, nearly all benchmarks were biased towards Intel. AMD's strategy when they released that Athlon was to make a CPU so good, it could beat Intel's CPUs even on these benchmarks. Sysmark just happens to be the one benchmark where Intel exercises so much control that it could literally say whatever Intel wanted it to say.

    What you are seeing is AMD just starting to switch strategies from "lets just beat them on every benchmark under the sun regardless of bias" to "lets expose the bias where it is as its worse so people can know the truth".

    This is all just preparation for the K8 launch I think. If AMD can properly put Sysmark results into perspective, maybe everything that is left will show what a monster K8 is versus any Intel offering. It is indicative that the K8 may not be winning on Sysmark on internal testing, or may not be winning by a sufficient margin.

  6. Read the linked article by Anonymous Coward · · Score: 5, Insightful

    As compilers become tuned to exploit this, it's plausible that the Athlon's performance is going to lag quite a bit more than it already does. That there is some benchmark out there that is specifically designed to show off this strength of the P4 is no real surprise to anyone, is it?

    That's not the complaint at all. Read the linked article. The complaint is that Sysmark 2002 has been systematically altered relative to Sysmark 2001 so as to favour the P4 over Athlon.

    For example, the PhotoShop test in Sysmark 2001 had 13 filters, of which 8 run faster on the Athlon and 5 faster on P4. The Sysmark 2002 PhotoShop test has 6 filters, of which 3 are filters from Sysmark 2001 on which P4 wins and the other 3 are additions on which the P4 also wins. The 8 filters on which the Athlon does better have all been removed.

    There are several other examples in the article. Read the article

    BTW, an interesting point is that this whole thing is basically an AMD publication that AMD have chosen to proxy via Van's. Van is at least open about it. The AMD presentation containing all the information in that article is linked at the end and is available here

  7. Kyle @ HardOCP covered this yesterday by Cutriss · · Score: 5, Interesting

    Here's Kyle's 4th Edition post from yesterday. Excerpts from Van's comments are in italics.

    VansHardware & AMD: There is a report on VansHardware this morning that visits the differences between BAPCo's SysMark 2001 and SysMark 2002. The report's basic theme is that SysMark 2002 is skewed towards making the Intel Pentium 4 results look better than the AMD CPU results could have looked. It basically shows examples of things that were changed in SysMark 2002 that cherry pick areas in certain programs that the Pentium 4 excels at. While the article might seem to be work done by VansHardware there is something you need to know. All of the data shown in that article has been put together by AMD and not VansHardware. Take note of this one statement in the article.

    However, AMD has been able to "pick the lock" on SysMark to gain a much keener understanding into the internal workings of these tests.

    VansHardware is not the one with the "keener understanding", AMD is.

    The original PDF document from AMD is linked for download so the fact that this data is not Van's is not exactly hidden either.

    Also their opening paragraphs state this.

    At this moment we will pause from the long march through our benchmark results to revisit the significant issues regarding BAPCo's SysMark 2002 brought up by AMD during our recent meeting with representatives from that chipmaker.

    We must state up front that despite the condemning information divulged to us, the AMD spokesmen repeatedly expressed support and guarded optimism for the reformation of BAPCo.


    The "significant issues" and "condemming information" shown were not harvested by VansHardware, actually all they do is interject a little bit of commentary.

    AMD has verified to me this morning that all of the graphed and tabled data shown on the VansHardware report is data that has been mined by AMD. Does this make the data inaccurate? Of course not, but I am sure that it hardly shows both sides of the story. AMD is not going to supply VansHardware with information that makes Intel look good. VansHardware represents to me, nothing more than an AMD fansite that takes shots at Intel every chance they get. I think they are far from what anyone could consider objective journalist and reporters. Them doing a cut and paste job with AMD's data goes to show that as true in my opinion. Websites get fed information all the time, trust us, we know. It is our jobs to go back and prove data and claims in our labs on our own time, not to repost corporate data, that can be considered far from objective. Independent sites in our hardware community should not be reposting PR spin in such a way as this. There is a fine line here but I think this is stepping across it.

    VansHardware does not exactly hide the fact that the data shown is not theirs but rather AMD's, but they certainly did not seem to represent that in an upfront manner so the reader sees the information for being exactly what it is...data released by the AMD PR machine.

    I am a huge AMD fan but I just don't like big companies being able to pump their corporate data into our community when it is not presented as such. I think AMD should have the balls to post information like this on their own website and not try and "slip it in" through a back door. In fact, I would consider the information to be much more credible if it were posted on AMD's own website as AMD research.

    I know Van has gotten upset here recently with his past employer removing his name from articles he has written. It seems to me that Van has done little to deserve his name being on this article and it should show authored by AMD.

    (ED NOTE - This is referring to some allegedly plagiarised articles that Tom's Hardware published after removing Van's name from them)

    Also worthy of mentioning is that AMD is now fully working with BAPCo, which they have not done in the past. AMD has had the ability to work with BAPCo for a long time now to make sure their products get represented properly and we are certainly happy to finally see AMD join the party to give the boat a more even keel.

    Lastly, another tidbit worth throwing into the mix is that Van Smith, owner of VansHardware, possibly either works for or is contracted to VIA as a CPU validation tester. We are working on a confirmation of this from VIA now. Do we need hardware websites that do work for the companies they end up reporting on? Just another thing to consider when objectivity is in question.

    --
    "Mod, mod, mod...and another troll bites the dust."
    1. Re:Kyle @ HardOCP covered this yesterday by BrookHarty · · Score: 3, Interesting

      The article was clear at Vans Hardware, he wrote an article using AMDs information... Van Smith should of wrote the article with a little more distance from AMD, but that doesnt alter the facts from AMD.

      I didnt see that article over at HardOCP when I posted the news last night. But after reading HardOCP comments, You can see Kyle is really pissed off at Van Smith. Kyle even links to another site Real World Tech where people are talking about Glad someone released the information... Could it be HardOCP is getting ready to release a major article, and Vans Hardware took the spotlight?

      There is a hint of back room dealings going on. I picked a new magazine "CPU" that has people from various places. Interesting to see what happens in the next year and major fansites... Heres a list of authors for "CPU" magazine. Rob "CmdrTaco" Malda, Anand Lal Shimpi, Kyle Bennett from Hard OCP, Joan Wood co-founder of Sharky Extreme, Alex "Sharky" Ross, Alex St. John (founder of directx at microsoft), Chris Pirillo (creator of LockerGnome/host on TechTV), Pete Loshin (former editor of BYTE Magazine, runs Internet-standard.com), Lisa Lopuck (Author of Web Design for Dummies).

  8. Pick what you consider for your benchmark by (H)elix1 · · Score: 5, Insightful

    When I dig through reviews on the latest CPU and/or mainboard, I initially groaned at the increasing number of benchmarks folks would put out. It is more than just increasing click-through rates (well maybe not for some, but...) - it lets me see applications that I use. Synthetic benchmarks and politician's promises garner then same level of trust from me.

    Anyhow, I game and code but use games to judge where my cash goes. When the P4 came out, I saw it did great job with Quake and I started to get excited about the CPU. Then I saw the benchmarks on the games I actually play - UT, CS, and a few others - and it was not black and white. After the ATI fiasco, Quake is up there with synthetic benchmarks IMHO. As for Photoshop, you can pick what platform you want to 'win' by tuning the filters. Apple does it, their dually box wipes out the competition, the other do it and the tables are turned.

    There are great graphs out there that show benchmarks using different sizes of data. Its like comparing a small turbo charged engine to a larger normally aspirated one - so what RPM were you at when you ran your test? BMW's M5 feels slower than an Audi S4 at the start, but get the RPM's up there and it is a different story. Even pickup trucks can beat a Ferrari if you tune the test to take advantage of a sweet spot.

    I've done my homework, and my personal cluster is mostly AMD today. Still have one celeron 566@800 as a CS server, but my workstation (Intel Xeon box) was replaced by AMD MP chips. Secondary boxes are all XP chips, but they use to be PII&III's when Citrix and the K5 sucked. They run Oracle, Weblogic, LDAP, and other stuff quite well when I'm working, and one swap of a hard drive later I'm getting some solid fragging in on the same box. In another year or so, if Intel really hold the crown , the price is right, and my boxes are 'only fast enough for web browsing and email', I'll chose them.

  9. Re:Big deal by fmaxwell · · Score: 3, Funny

    So you would trust a benchmark from a webpage that is in the midst of bashing another benchmark? A little biased, aren't we...

    "We" aren't, but apparently you are. When a web page takes a critical look at benchmarks and exposes those that are biased, then I would tend to trust that site to choose an appropriate, unbiased benchmark for CPU comparisons.

  10. Re:Big deal by Dr.+Spork · · Score: 3, Insightful
    I'm sorry, but I feel like I have to point out that your statement "Athlon's performance is going to lag quite a bit more than it already does" seems to imply that you actually think the P4 has the performance lead. You must not be reading slashdot much (check out 2600+ benchmarks and weep).

    Also, the story didn't imply this was a big deal. It only remids us of all the dirty tricks Intel is forced to resort to when they try to maintain a market lead with a grossly inferior product. As long as people know this, benchmark-cooking is really no big issue.

  11. Not to defend Intel but.... by DeadBugs · · Score: 3, Informative

    HardOCP notes that Vans got their info from AMD so it may be a bit biased. a quote from HardOCP:

    " AMD has verified to me this morning that all of the graphed and tabled data shown on the VansHardware report is data that has been mined by AMD"

    "AMD is not going to supply VansHardware with information that makes Intel look good. VansHardware represents to me, nothing more than an AMD fansite that takes shots at Intel every chance they get. I think they are far from what anyone could consider objective journalist and reporters."

    --
    http://www.kubuntu.org/
    1. Re:Not to defend Intel but.... by Chris+Johnson · · Score: 3, Insightful
      If they are telling the truth, it doesn't matter how biased they might be. They could believe in black helicopters and Elvis sightings for all it would matter. The question is: is the information they put forth accurate? If so, then Intel is indeed yanking people's chains with benchmarks (as a Mac dude I can't repress a 'oh, THAT'S a surprise' reaction) and the bias is in how the site draws conclusions from this, and how loudly they remind people of stuff like the class action suit over misleading performance claims for the P4.

      Which, surprise surprise, they do indeed remind people of! And if this is true, they'd be right that it was a smoking gun w.r.t. that lawsuit, too.

      Let them go on being an AMD fanboy site. I don't see INTEL fanboy sites breaking this story.

  12. Re:Partialy AMD's Fault by Dr.+Spork · · Score: 3, Insightful

    I think this speaks volumes: Intel are schmoozing the benchmarkers while AMD are designing kick-ass processors. I hope the stockholders are listening!

  13. Re:Gacy? by fmaxwell · · Score: 3, Funny

    Yes, filtering web access is just like raping and killing children...

    Not exactly. There's more screaming when you filter web access.

  14. No, just nonsense. by fmaxwell · · Score: 5, Insightful

    If AMD would stick to making totally Intel compatible chips instead of trying to infuse their own personality, we wouldn't have this problem. Hint: my software shouldn't need to know it's running on an AMD chip.

    This is so wrong on so many counts...

    1. Intel's chips aren't "totally Intel compatible". The Pentium 4 contains instructions that were not present in the Pentium, P2, and P3. Why should your software have to "know it's running on a" Pentium 4 rather than a P3, P2, or Pentium? Hell, there was even a Pentium and a Pentium MMX (the latter adding the MMX instructions).

    2. Intel tries every trick possible to patent their instructions to prevent people from implementing them. They do it with hardware, too. Remember when you could plug a K6-2 in place of an Intel Socket 7 CPU? Starting with Slot 1, intel used patents to prevent others from making compatible CPUs, which is why AMD and Intel motherboards are now incompatible.

    3. Why should AMD not provide useful processor extensions that improve on Intel's base instructions? That's what provides useful competition and makes the industry grow.

    4. What interest do you have in seeing AMD in a constant catch-up mode? In your scenario, Intel gets an advantage every time they release new instructions -- that will take AMD months to implement in silicon. Do you own Intel stock?

    5. Why doesn't Intel just stick to providing processors that are 'totally AMD compatible'?

    1. Re:No, just nonsense. by fmaxwell · · Score: 3, Insightful

      Maybe you're missing something with the words "Intel Compatible"...

      No, I am simply more intelligent than you are. The original poster was claiming that AMD should never add extensions to the Intel base instruction set and should support all Intel instructions -- including those for which Intel has patents, leaving AMD open to legal action.

      Intel P4's run the same code a Pentium 60 ran way back when, no recompiling needed, sure you can optimize the compile, but you don't HAVE to.

      So are you saying that AMD Athlon's cannot run the code written for a Pentium 60? If so, to what code are you referring?

      We have a situation now where we have the AMD Athlon series and the Intel Pentium 4 series. Both have added extensions relative to the original Pentium. Are you claiming that benchmarks should only support Intel's additional instructions? What's your point?

  15. Re:But the real question is... by barawn · · Score: 3, Insightful

    That's what SysMark is supposed to be: they measure "real-world" performance figures - they run a slew of Photoshop filters, and time it, and other crap.

    Unfortunately, SysMark's testing strategy is really terrible. I'm even a bit confused how it works: they say that they scale each test based on how long it takes to complete: but is the scaling from a "reference system" or from each system? If it's from a reference system, then it's biased against whatever that reference system is good at (since the difficult bits get weighted more). If it's from each system on the fly, then it's really meaningless, as one poorly-chosen benchmark can skew the whole thing.

    Worse yet: in SysMark 2002, AMD claims that BAPCo uses the same benchmark, multiple times: this is just plain bad, because not only does it magnify the importance of this benchmark, it shrinks the importance of all of the other ones. It's just plain idiotic. Take 3 tests, run them 4 times each, and use the results from all of the runs? It's a very very obvious bias - the only reason you would do that is if you wanted to cheat for one specific processor, and you knew which filters it was good at.

  16. Re:Big deal by Sivar · · Score: 5, Interesting

    Besides, AMD has always been the value chip company. You can't expect them to keep up with Intel forever.

    AMD has had a superior (in design) processor architecture to Intel since the K6 was released (though the K6 had mediocre FPU performance, the design was still more elegant--ask any x86 assembly programmer). The Athlon has given the P2, P3, AND P4 a run for its money, and early benchmarks of the hammer would seem to indicate that the expensive Itanium 2, which almost nobody actually uses, is going to be outrun as well.
    The Pentium IV's really looong pipeline does allow the P4 to run at higher clockspeeds, but the branch prediction you mentioned is instant death. Branch mispredictions happen VERY frequently in any CPU (note the K6 had the most sophisticated branch prediction unit up until the "XP" series of Athlons) but with the Pentium IV, a single branch prediction requires up to 20 full clock cycles of work to be discarded.
    The Pentium IV has other questionable design desisions that hurt performance as well. It has 8K of L1 cache, the same amount found in the ancient 486 processor, whereas the Athlon has that amount squared and doubled (128K). Current P4's have more L2 cache, but L2 cache is less important and slower. (Note though that the P4's L2 cache is particularly fast L2 cache)
    The P4 has buffers to remember a series of decoded x86 instructions so that it does not have to decode them again--these are almost required because of the terribly long pipeline--but it doesn't have enough to speed things up in server environments. Most servers execute a wide variety of instructions such that the buffered instructions get very little use before being replaced by new instructions. This is even more a problem on systems that run many different applications at once, but this problem can be demonstrated just with DB servers (which use plenty of instructions) as the P4 tends to not scale as well as the Athlon MP when a second or third task is added (such as mail serving, web serving, etc.)

    One dissapointment that I had with the Athlon is that AMD never used the excellent EV6 bus to its fullest. Athlons are superior in multiprocessor capabilities because different processors needn't share access to the memory bus. On Intel SMP setups, even on P4 Xeons (Which, IMO, are inferior to P3 Tualatin chips by the same company) when one CPU accesses main memory, it locks main memory for the other CPUs. All other CPUs have to sit and twiddle their transistors while the main memory is on use by only one CPU.
    On AMD SMP setups, ALL processors can simultaneous access memory, merely sharing the bandwidth simultaneously. So, if one CPU is only using 100MB of memory bandwidth, the rest can be used by other CPUs at that time.
    Unfortunately, this doesn't really matter much with only two CPUs, which is the largest AMD configuration you can get. You can, of course, see it in action with 8+ CPUs on EV6 Alpha setups (AMD licensed the bus from DEC's Alpha team) but Alpha setups are expensive as hell and are a dying breed.
    If AMD had created a quad or 8-way setup, we would see the true power of a good design.

    Fortunately, the Hammer has an even better design (one made by AMD no less) on an even better CPU. I fully expect the Hammer series to wipe the floor with all Xeons and possibly the Itanium 2 because of its design. An integrated memory controller that will tremendously drop memory latency, twice as many general-purpose registers of twice the size (Much less pushing and popping, for those that know some assembly) and, unlike the big vendor 64-bit processors, the ability to split half of the general purpose registers into chunks of 16 and 32 bits when huge numbers (2^64) are not needed. (On an Alpha/SPARC/R12000, if you want to store the number "42" you must use all of a register that can hold values up to 18,446,744,073,709,551,615. A bit wasteful)

    --
    Computer Science is no more about computers than astronomy is about telescopes. --E. W. Dijkstra
  17. Re:Benchmarking the Benchmarks. by kigrwik · · Score: 5, Funny

    > Benchmarking the various "Benchmarking Programs"

    Yes but, "Quis benchmarkiet ipsos benchmarkiem ?":

    Who benchmarks the benchmarks ?

    (s/benchmark/custod/g and Google for the original quote :)

    --
    -- don't discount flying pigs until you have good air defense
  18. No. by FreeUser · · Score: 4, Insightful

    Wouldn't a better CPU benchmarks be taken by using the chipmakers' own compilers?

    No.

    The chipmaker would simply then optimize their compiler for the benchmark(s) in question, rather than for code more generally. In other words, what you suggest would still allow the chipmaker to cheat.

    In order to have complete transparency in the benchmarking, both the benchmarks and the compiler should be open source (ideally free software, so that anyone can run and verify the benchmarks as well, allowing repeatable experimentation in the broadest scientific sense). If the chip maker wishes to submit optimizations to such a compiler they would be free to do so, since any such optimizations would in turn be open source (or free software) and subject to peer review.

    A good candidate would be gcc, which runs on numerous platforms, and on several operating systems on AMD and Intel hardware.

    Cheating would be much harder in this case, perhaps even impossible, something we need given the sordid history of benchmarking by all parties involved (except perhaps AMD? Can anyone recall an instance where AMD has cooked results? I ask because their current chip rating system is extremely conservative ... almost the antithes of what Intel is trying to do. Has this been a longstanding strategy on AMD's part?).

    --
    The Future of Human Evolution: Autonomy
  19. Re:Big deal by VAXman · · Score: 3, Insightful

    The Pentium IV has other questionable design desisions that hurt performance as well. It has 8K of L1 cache, the same amount found in the ancient 486 processor, whereas the Athlon has that amount squared and doubled (128K).

    Obviously you flunked your freshman-level computer architecture course. The P4 8K L1's 2-cycle load-use latency is 50% better than Athlon 128k L1's 3-cycle load-use latency (not even accounting for P4's clock speed advantage). The difference in hit rate between 8k and 128k is only about 5% meaning that it is substantially faster to go with the small/fast cache than the big/slow cache. Do the math - even an infinitely large 3-cycle load-use cache is slower than an 8k 2-cycle load-use cache.

    Cache size comparisons are more meaningless than megahertz comparisons. Whenever somebody tries to justify a big cache size without looking at performance, just walk away. AMD is playing marketing games with their slow-as-molasses (but massive) L1 cache.

    I won't bother to address the rest of the technical errors in your post...

  20. Re:Big deal by Sivar · · Score: 5, Informative

    Obviously you flunked your freshman-level computer architecture course. The P4 8K L1's 2-cycle load-use latency is 50% better than Athlon 128k L1's 3-cycle load-use latency (not even accounting for P4's clock speed advantage).Obviously you are imagining things, as I never said that was not the case. Latency is important, but it doesn't matter if the cache size isn't large enough to fit enough code in to enjoy the low latency.
    The difference in hit rate between 8k and 128k is only about 5% meaning that it is substantially faster to go with the small/fast cache than the big/slow cache.
    Really? That's interesting, and here's me wondering why both AMD and, other than in the P4, Intel have wasted so much money adding more cache memory.

    Because you seem to be such an expert, so why don't you go ahead and list a few common programs for me that have a working set of less than 8K--the size that will fit into the tiny L1 cache. Can't find any? Gee, I guess that makes the size of the cache pretty important then. When a program's working set has to be swapped in and out between L1 and L2 cache, suddenly that latency doesn't much matter. Of course, you may feel free to prove to me that the P4 can run addition loops faster. Those will fit into about 8k.

    Do the math - even an infinitely large 3-cycle load-use cache is slower than an 8k 2-cycle load-use cache.
    Who was it again flunked their freshman computer architecture course? You're saying that if the Athlon had 512MB of L1 cache that the system would be slower than the P4 and it's 8K of lower latency cache?
    What math is it that I should do? Do you know what the working set of a program is?
    Having a tiny amount of cache is analogous to having a tiny amount of RAM. Put 32MB of low-latency RAM in your system. Overclock some DDR SDRAM to 200MHz (AKA "400MHz" by people that don't understand clock speeds) and set it to CAS2. Tell me how your system performs. Just as your system will have to swap just about all running code to disk, the Pentium IV will not be able to contain the core loops of the various running programs in L1 cache. The vast majority will have to be dropped to L2, which is significantly slower and higher latency, kinda defeating the purpose of that 8k of fast memory, no?
    Working sets that cannot be fit into the P4's 256k or 512k or L2 will then be relegated to main memory and moved to L2 then L1 when the data is executed, and anything that won't fit in main memory (very rarely which includes the working set of a program) will be swapped to disk if the platform supports virtualizing memory.

    In closing, your comment was surprisingly brash and conceited, not to mention rude and totally innacurate. Thankyou.

    --
    Computer Science is no more about computers than astronomy is about telescopes. --E. W. Dijkstra
  21. Re:Big deal by VAXman · · Score: 3, Informative

    According to Hennesy & Patterson, 2nd Edition, page 391, the total miss rate (for SPEC92) of a 8k 4-way set associative cache (like the P4's) is 2.9%. The miss rate of a 128k 4-way set associative cache (like Athlon's) is 0.6%.

    The hit time for P4 is 2 cycles, and for Athlon it's 3 cycles. The L2 hit / L1 miss is ~10 cycles for both. Everything further out is approximately the same so we can ignore it for simplicity.

    So, the average memory access time for P4 is (0.971 * 2) + (0.029 * 10) = 2.2 Cycles. The average memory access time for Athlon is (0.994 * 3) + (0.006 * 10) = a little over 3 cycles.

    Suppose Athlon had an infinite size L1 cache (or 512 MB if you like to use numbers). The highest hit rate it could ever achieve is 100% (actually slightly less, since you cannot eliminate complulsory misses). The average memory access time would then be 3 cycles - which is higher than P4's 2.2 cycles!

    BTW, Paul DeMone wrote a pretty good article about P4's L1 cache.

  22. Re:Not Quite So by VAXman · · Score: 3, Insightful

    Apples to apples please - when you clock the Athlon at the same clockspeed as the Intel chip, (which are possible with the new chips that AMD just released) the FPU is far faster on the Athlon chips.

    Not that it matters, but P4 and Athlon performance are approximately equivalent on a per-clock basis: Athlon does 624 / 1800 MHz (0.346) "SPECfp points per megaherz" (whatever that means), and P4 does 861 / 2533 MHz (0.399). Of course since P4 can clock much faster it has better absolute FP performance.

    This is what indicates a superior FPU design, not a comparison based on a ~700mhz difference in clockspeed.

    Let's throw Itanium 2 into the picture, which does a whopping 1356 SPECfp at an equally astonishing 1.0 GHz, which puts it at this silly "SPECfp points per megaherz" of 1.356 - quadruple that of Athlon/P4.

    Does that mean Itanium 2's FPU is four times better than Athlon?

  23. corrections by RelliK · · Score: 3, Informative
    The Pentium IV's really looong pipeline does allow the P4 to run at higher clockspeeds, but the branch prediction you mentioned is instant death.... a single branch prediction requires up to 20 full clock cycles of work to be discarded.

    The situation is not quite as dire due to P4's trace cache (you actually addressed that later in your post). Nevertheless, your point stands.

    On Intel SMP setups, even on P4 Xeons (Which, IMO, are inferior to P3 Tualatin chips by the same company) when one CPU accesses main memory, it locks main memory for the other CPUs. All other CPUs have to sit and twiddle their transistors while the main memory is on use by only one CPU. On AMD SMP setups, ALL processors can simultaneous access memory, merely sharing the bandwidth simultaneously. So, if one CPU is only using 100MB of memory bandwidth, the rest can be used by other CPUs at that time.

    P4 Xeons (as well as P3s) have a shared memory bus. That is, multiple CPUs share the bandwidth of the 400MHz or 533MHz bus when accessing memory. However, Athlon has a point-to-point channel for each CPU. That is, each Athlon CPU has the full bandwidth of the 266MHz (soon to be 333MHz) memory bus, regardless of how many CPUs there are in the system. This means that beyond 2-way SMP systems, Athlon has a significant advantage in memory bandwidth over P4.

    --
    ___
    If you think big enough, you'll never have to do it.