Slashdot Mirror


Linux Shootout: Opteron 150 vs. Xeon 3.6GHz Nocona

danalien writes "Anandtech with their previous review have stirred up a bit of controversy, and they've released their follow-up review where they pit AMD's Opteron 150 vs Intel's Xeon 3.6 Nocona (on linux)."

46 of 217 comments (clear)

  1. Short version: Xeon RIP. by Anonymous Coward · · Score: 5, Interesting

    No message here. Oh, did you know that an Athlon64 3000+ is within 2fps of a P4 3.4 Extreme Edition in Doom 3?

    Look up the prices for those two items.

    1. Re:Short version: Xeon RIP. by Anonymous Coward · · Score: 5, Informative

      Athlon64 3000+ (2GHz): $167
      Pentium 4 3.4GHz Extreme Edition: $1025

    2. Re:Short version: Xeon RIP. by Ben+Hutchings · · Score: 5, Informative
      Thast being said the pentium 5 is in works, and it will run between 6-10 ghz and absolutely smoke everything the opteron can do, except asm code.

      The design intended to become the Pentium 5 (Tejas) was cancelled in favour of Pentium M derivatives. Intel basically had to give up on the Netburst micro-architecture and is now concentrating on increased parallelism (multiple cores) rather than extreme clock rates.

    3. Re:Short version: Xeon RIP. by TheLink · · Score: 2, Funny

      Summary: the cancelled P5 chips would have smoked everything including themselves.

      --
  2. Memory by lachlan76 · · Score: 4, Insightful

    To be able to show the real potential of the Opteron, you need to have more than one processor.

    This lets you take advantage of the on-die memory controller, by letting each processor do it's own memory work, rather than making the Northbrige do all the work.

    If you want to use a single processor, you might as well use an FX-Whatever, since they are just an Opteron without MP capability and only one HT bus.

    1. Re:Memory by Ianoo · · Score: 5, Informative

      Provided you have a NUMA-aware operating system, that is. The OS needs to know which memory is attached to which processor, since access to memory attached to the same processor on which a thread is running will obviously be faster and lower latency than going across hypertransport to a different processor and waiting for an answer.

    2. Re:Memory by Waffle+Iron · · Score: 4, Informative
      THE INSTANT ANY CPU WRITES A ONE TO GO!!, ALL THE CPUS KNOW IT!! AND CAN START WORKING INSTANTLY!

      How do they know? By cache coherence signals transferred between the CPUs. This isn't free and consumes bus bandwidth.

      The first CPU can't "instantly" write the value either, because it must first obtain exclusive ownership of that cache line by checking with the other CPUs.

      On the Opteron architecture (we call this NUMA, or "point-to-point"), as soon as one CPU writes a value to the 'GO!!' area, well, that's _just the beginning_. It has to tell another CPU in the system that it just did that. etc etc

      It has to use some communication resource to update the other CPUs on the state of that cache line. Just like the bus-based situation.

    3. Re:Memory by kasperd · · Score: 2, Insightful

      I still wish everybody would dump the backward compatability with the 8086 though. The CPU still has to bootstrap in 16bit real mode FFS!!!!

      I agree. But at least AMD did something right when designing the AMD64 architecture. Virtual 86 mode and segmentation was eliminated from the 64 bit mode, but they still exist in 32 bit mode of course. Completely eliminating 8086 compatibility was not really so much of an option. Backward compatibility is part of the reason for the AMD64 success. But it would have been nice, if they had at least offered a way to boot the CPU in 64 bit mode. As it is now, the CPU boots in 16 bit real mode, then you switch to 32 bit protected mode, and only after that have been done you can switch the CPU into 64 bit mode.

      --

      Do you care about the security of your wireless mouse?
    4. Re:Memory by ottffssent · · Score: 4, Insightful

      Actually, even without a NUMA-aware OS, the worst-case dual (and almost quad) memory latency is less than a Xeon's.

      What really sets the Opteron apart in MP scenarios is the bandwidth. Each chip gets 6.4G/sec to memory: add more chips, get more bandwidth. The Xeon on the other hand has to share its 6.4G/sec with all the chips in the system, which severely limits its scaling. A quad Opteron has over 25G/sec of aggregate memory bandwidth, while a quad Xeon is stuck with its 6.4G shared 4 ways. That's half the bandwidth of a 400MHz P4 - no wonder the quad Xeons are often barely faster than the duals.

      Add to this that cache snoops and other bus traffic all have to share the same FSB on the Xeon whereas on the Opteron local memory accesses don't touch the HT links at all. For a standard 2P system, this frees up 3.2G/sec of HT link bandwidth, and a NUMA-aware OS only increases the efficiency of the system.

      Despite Intel's recent marketing push, they really don't have the best CPU, and don't have the best system either. There are still considerable advantages to choosing a Xeon system but these days they have little to do with the chip or the board and a lot to do with Intel backing. That's an advantage that will quickly evaporate as industry gets comfortable with non-Intel parts.

    5. Re:Memory by andreyw · · Score: 2, Informative

      Well, while Solaris is *technically* going to be ported to x86-64 eventually... IRIX and AIX won't for sure.

  3. Re:opteron by lachlan76 · · Score: 5, Informative

    Athlon 64 is the name used for the desktop line, and Opteron is the name used for the server/workstation processors.

  4. These in-architecture tests are OK, but... by Amiga+Lover · · Score: 4, Interesting

    It's good to see benchmarks between processors in the same family, but is there anywhere that regularly tests CPUs across families? x86, PPC, Sparc, VIA etc. I'd like to see comparisons like that to see how various architectures strengths & weaknesses stack up

    1. Re:These in-architecture tests are OK, but... by DarkMantle · · Score: 2, Interesting

      It would not be possible to get an accurate reading from this. Because the operating system would have to be compiled for each hardware platform. Also to keep it fair we would have to do the same for the benchmark software. I am aware of no OS that will go accross all platforms. BSD comes close (since OS-X is BSD based) but doesn't quite get all architectures.

      --
      DarkMantle I been bored, so I started a blog.
    2. Re:These in-architecture tests are OK, but... by gunpowder · · Score: 3, Informative

      Perhaps this is what you are looking for

  5. Opteron cpu hacked by GuyFawkes · · Score: 5, Interesting

    I submitted this story an hour or two ago, but thinking about it it will be rejected just like everything else, and then pop up under someone else's name.

    so what the hell.

    Opteron Exposed: Reverse Engineering AMD K8 Microcode Updates

    Summary

    This document details the procedure for performing microcode updates on the AMD K8 processors. It also gives background information on the K8 microcode design and provides information on altering the microcode and loading the altered update for those who are interested in microcode hacking.

    Source code is included for a simple Linux microcode update driver for those who want to update their K8's microcode without waiting for the motherboard vendor to add it to the BIOS. The latest microcode update blocks are included in the driver.

    Background

    Modern x86 microprocessors from Intel and AMD contain a feature known as "microcode update", or as the vendors prefer to call it, "BIOS update". Essentially the processor can reconfigure parts of its own hardware to fix bugs ("errata") in the silicon that would normally require a recall.

    This is done by loading a block of "patch data" created by the CPU vendor into the processor using special control registers. Microcode updates essentially override hardware features with sequences of the internal RISC-like micro-ops (uops) actually executed by the processor. They can also replace the implementations of microcoded instructions already handled by hard-wired sequences in an on-die microcode ROM.

    AMD's U.S. Patent 6438664 ("Microcode patch device and method for patching microcode using match registers and patch routines") goes into substantial detail on this.

    Typically microcode update blocks are stored in the BIOS flash ROM and loaded into the processor as the system boots. They can also be loaded by the operating system; for instance, Linux contains a microcode device driver for Intel chips.

    AMD recently released a "BIOS fix" to motherboard makers to address Errata 109, in which REP MOVS instructions caused subsequent instructions to be skipped under specific pipeline conditions.

    Previously it was not clear if and how AMD even supported microcode updates in the K8 family until this announcement. After analyzing a number of BIOS images, it appears that AMD has secretly used the microcode update facility on several occasions over the past few years, but obviously avoided publicly disclosing that it actually had bugs patchable in this manner.

    Early K7 (Athlon) cores initially supported microcode updates as well, until ironically the microcode update mechanism itself was found to be broken and subsequently listed as an errata!

    The following sections describe the microcode update procedure, obtained by clean room reverse engineering various vendors' BIOS code. The actual microcode update blocks are embedded in the BIOS image; the most recent updates (created June 2004) have been included in the Linux driver source code attached to this description.

    Microcode Update Procedure

    The update procedure expects the 64-bit virtual address of the update data, including the 64 byte header, to be in edx:eax:

    edx = high 32 bits of 64-bit virtual address
    eax = low 32 bits of 64-bit virtual address
    ecx = 0xc0010020 (MSR to trigger update)

    Execute wrmsr with these register values. If the address and update block data are valid, wrmsr completes successfully. Otherwise, a GP fault is taken.

    The microcode does not appear to update MSR 0x8B with the new update signature as it does on Intel processors, despite the fact that some BIOS code I have analyzed does seem to check this field. It is possible the MSR is only updated under certain conditions, for instance when microcode is loaded before initializing the cache controller. Nonetheless, as we shall see below, the processor is clearly doing something internally when it claims to accept an update in this manner.

    The update generally takes around 5500 clock

    --
    http://slashdot.org/~GuyFawkes/journal
    1. Re:Opteron cpu hacked by Ianoo · · Score: 4, Insightful

      Microcode updates aren't permanent though - you need to reload them every time the machine boots. So clearly you would need to reload these "hacks" using a piece of software during the boot process.

      Also, the article admits that it's "very unlikely" that a particular processor could be fried using a dodgy microcode update, so why even mention it? It would be much easier to write a BIOS flashing virus, I believe a few of these did exist at one point (although the old memory is failing). I doubt the hoops you'd need to jump through to write such a thing for Intel processors are no higher than for AMD processors, and as such, this is just FUD.

    2. Re:Opteron cpu hacked by arose · · Score: 2, Insightful
      It would be much easier to write a BIOS flashing virus, I believe a few of these did exist at one point
      Chernobyl.
      --
      Analogies don't equal equalities, they are merely somewhat analogous.
    3. Re:Opteron cpu hacked by Sunspire · · Score: 4, Insightful

      I'm not sure if other Linux distributions do this already, but at least Fedora Core 1 and 2 both come with a processor microcode update service that runs in the bootup sequence. It's even enabled by default out of the box.

      Linux has for a long time already mostly ignored the system BIOS since they're notoriously broken because of legacy reasons. Supplying known good microcode is simply another step in eliminating variables that make system testing needlessly complex, I predict we'll see more developments along these lines in general.

      --
      It's like deja vu all over again.
  6. This doesn't look good for Intel by GreatDrok · · Score: 4, Interesting

    I recently installed Fedora 2 on a dual Opteron 248 system (Sun V20Z) and was amazed at the sheer grunt of the thing. Why anyone would even consider buying a Xeon just amazes me. I ran one of my own integer and memory heavy benchmark programs (single threaded) against my Athlon XP 2200+ and a single Opteron processor was 3x faster than the XP for only 400Mhz higher clock speed. These things are amazing, Intel should be crapping themselves and I am sure they would be if it wasn't for the cozy deal with Dell and the number of sites that have a Dell only policy. In a true free market they would be toast.

    --
    "I have the attention span of a strobe lit goldfish, please get to the point quickly!"
    1. Re:This doesn't look good for Intel by iCEBaLM · · Score: 4, Informative

      I wouldn't touch VIA and SIS with a ten foot pole for my own systems, even less for servers. Plenty of bad experiences.

      While in the past VIA and SIS have been really bad chipsets, modern VIA chipsets (KT266A and up) are rock stable and perform well. I have had both KT333 and KT600 boards which have never failed. SIS, while I have no first hand experience, I am told is similar.

    2. Re:This doesn't look good for Intel by drinkypoo · · Score: 3, Informative

      I bought a board with a SiS 745 chipset and it has been perfect, at least under windows. They provide a nice dual-channel PCI bus, even. SiS used to be a horrible joke but they've come a long, long way.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  7. Difficult to trust? by arose · · Score: 4, Insightful
    As we can see above, the difference between the two CPUs seems exaggerated and difficult to trust.
    Maybe it's because Intel still makes processors for MHz and not performance? Maybe because unlike some comercial vendors the POV-Ray Team doesn't feel the need to make processor specific optimizations and leave that to job to the compiler (where it belongs)?
    --
    Analogies don't equal equalities, they are merely somewhat analogous.
    1. Re:Difficult to trust? by ezzzD55J · · Score: 4, Insightful
      This review struck me as a bit clueless, or unfinished. The above quote is a good example of why i think so. They do some measurements, but aren't sure they're doing it fairly (compiler flags), and don't know what to do with the results. There is little in the way of analysis or conclusions. With the openssl measurements they don't even give any conclusions. But analysis and conclusions are the whole point of the review, and a remark like "As we can see above, the difference between the two CPUs seems exaggerated and difficult to trust." really devaluates this review - they're just showing us measurements they're not sure are correct ('difficult to trust') and let us figure out what they're worth?

      Well, the conclusion that the opteron kicks the xeon's ass is pretty inescapable to me, finding out opteron is available and the xeon isn't quite yet and more expensive, really closes the deal to me. But the review isn't very scientific, and didn't go very deep.

    2. Re:Difficult to trust? by BlowChunx · · Score: 2, Insightful

      It says to me that compiler flags matter a lot! ...oops, maybe those Gentoo zealots are onto something!

    3. Re:Difficult to trust? by alienw · · Score: 4, Insightful

      That's because it is nearly impossible to do a scientific comparison of two different processors. Anyone who tells you otherwise is a moron.

      You have to evaluate performance (possibly vs price) for your particular application. If you need a faster processor for Doom 3, look at Doom 3 benchmarks. If you need to encode video, look at video benchmarks. If you need to do integer computations, look at integer benchmarks. Xeons probably kick AMD's ass at some applications, and AMD might beat the Xeon at others. You can't just say that one is "better" than the other in general.

    4. Re:Difficult to trust? by drinkypoo · · Score: 2, Insightful

      Yes, yes we are. :)

      You definitely have to be careful about which ones you pick, though, and if you're really worried about performance you have to do what they did in this review, and try different settings with different programs because different flags will produce different results on different programs.

      In general I use -O3 on older processors, like pentium2 cores, and -O2 on newer ones. I don't know if it's still true but -O3 was known to cause problems (errors, not just a performance hit) with GCC3 and Athlon processors, as well as MIPS processors, though that may have changed in 3.3 or 3.4.

      Remember, gentoo's strategy of compiling everything is about more than cflags, it's also about the USE flags that ensure that you get support for the things you want, and skip the support for things you don't need. There's more than one kind of optimization.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  8. Lame conclusion? by Quixote · · Score: 4, Informative
    The comparison concludes with the wishy-washy statement:
    After all is said and done it became difficult (nearly impossible?) to justify the Xeon processor in a UP configuration over the Opteron 150,

    Huh? Here are some numbers:

    • POV-Ray 3.50c: Opteron is 40% faster
    • Crafty v19.15: Opteron is 70% faster
    • TSCP: 10% faster
    • PostgresSQL test-insert and test-select: Opteron takes 60% of the time it takes Xeon
    • MySQL test-insert: Opteron takes 80% of the time it takes Xeon.
    In almost every benchmark, where proper optimizations are used (and why shouldn't they be? Who in his/her right mind would not use proper optimizations??), the Opteron destroys the Xeon.
  9. Re:very little grey area by eddy · · Score: 5, Informative

    There are older dual and quad Opteron vs Xeon reviews around.

    When it comes to (Java) webservers and/or MySQL, the Opteron definitely has the advantage. In some cases, the Opteron simply annihilates the Xeon, but luckily for Intel the latter offers some resistance in our GZIP dominated benchmarks.

    Humorously, the also say this:

    The Opteron will probably remain the fastest CPU for the server tasks tested here until Intel introduces Nocona, the Xeon Prescott at 3.4-3.6 GHz (1 MB L2, 800 MHz FSB) at the end of the 2nd quarter of 2004.

    Now we know that the Nocona is here, and it's getting slaughtered at the Altar of The Opteron.

    --
    Belief is the currency of delusion.
  10. Hyperthreading is not good for these benchmarks by DJStealth · · Score: 4, Insightful

    As hyperthreading cuts the L2 cache in HALF, it should be disabled before doing any of these benchmarks. Hyperthreading only seems to improve the multithreading ability. These benchmarks being run on a single process are not realistic.

    1. Re:Hyperthreading is not good for these benchmarks by mczak · · Score: 2, Informative

      Hypterthreading doesn't cut the L2 cache in half. At least not statically, it is possible that if you're unlucky it will indeed look like each process will only have half the cache.

    2. Re:Hyperthreading is not good for these benchmarks by Glock27 · · Score: 2, Insightful
      As hyperthreading cuts the L2 cache in HALF, it should be disabled before doing any of these benchmarks. Hyperthreading only seems to improve the multithreading ability. These benchmarks being run on a single process are not realistic.

      Since it isn't possible to dynamically turn hyperthreading on and off while the system is running, the benchmarks should be run in the mode most systems will use - with (highly touted) HYPErthreading turned on. After all, it is supposed to be a useful feature...

      Personally, if I really need to run two threads with top efficiency, I'll invest in a dual 2xx Opteron box - with two real processors rather than an extra sorta-processor. That way if I have to pay a dual CPU license fee at least I'll get my money's worth. :-)

      --
      Galileo: "The Earth revolves around the Sun!"
      Score: -1 100% Flamebait
  11. Re:Opteron vs. A64 by Ianoo · · Score: 3, Interesting

    Actually, your information is out-of-date. The new Socket 939 Athlon 64's (both the + and FX series) feature a dual-channel memory controller for unregistered DDR SDRAM (this is one of the big reasons for introducing the new socket in the first place).

    This still leaves me wondering why an Opteron 250 (2.4GHz, 1MB L2 cache) seems to so seriously outperform an Athlon 64 3500+ (2.2GHz, 512KB L2 cache).

  12. AnandTech Biased by ironwill96 · · Score: 2, Interesting

    It seems to me that AnandTech seems to be biased in Intel's favor for some odd reason. Either that, or that particular reviewer happens to be. Last week in their other review they said the Intel Xeon processor was way better - even when the results were about the same skewed in Intel's favor. Now that the results are skewed toward AMD the reviewer still refuses to see that the Opteron is a better processor, is available NOW, and is $250 cheaper than the Xeon-yet-to-be-released that they are comparing it too.

    *Sigh* I've lost all faith in reviews by some of these hardware sites lately - they seem to be getting paid by someone to make invalid conclusions (or none at all) from fairly conclusive data.

    --
    "To strive, to seek, to find, and not to yield." - Tennyson
    1. Re:AnandTech Biased by swordgeek · · Score: 2, Insightful

      Anandtech isn't biased, it's incompetent.

      Don't get me wrong--I've liked Anand and company since they first hit the internet. They don't generally have an axe to grind or an ego to boost (both of which TomsHardware suffers from terribly), but they don't have the slightest bloody clue about statistics, or significance.

      Fun to read, and not consistently biased, but not a great source of actual benchmarking or review information. (techreport.com is better for that)

      --

      "People who do stupid things with hazardous materials often die." -- Jim Davidson on alt.folklore.urban
  13. Those results speak for themselves by Stevyn · · Score: 2, Informative

    Other than a few benchmarks that were either synthetic or not compiled specifically for the processor, AMD whooped Intel's ass. Some of the gains were quite significant.

    However, this speed increase seems to depend on being able to compile your software from scratch which is generally unknown in the windows world. That should change in the future, but for now it's still a tough call whether or not to buy one now. But if you're running gentoo, let the funroll-loops begin!

  14. Re:Opteron vs. A64 by at_18 · · Score: 5, Informative

    This still leaves me wondering why an Opteron 250 (2.4GHz, 1MB L2 cache) seems to so seriously outperform an Athlon 64 3500+ (2.2GHz, 512KB L2 cache).

    When people says that the first article was bad, it's because it was really bad: 64-bit binaries for Intel vs. 32-bit binaries for AMD, copy&pasted benchmark results from previous 32-bit benchmarks, tests (PI digit computation) that measured the libc optimization instead of the actual benchmark (when removing the printf() it got about a 10x boost). People on aceshardware forums were posting TSCP scores about 2x what Anandtech got, on the same processor. So the A64 3500+ scores you saw in that article are trash. Forget them.

  15. Other Ideas for benchmarks by johnhennessy · · Score: 4, Interesting

    How about profiling bytecode interpreters for the new breed 64 bit processors.

    Both Sun (the original innovators) and now Microsoft are putting their money on their bytecode (rather than binary) executables to try and avoid the whole backwards compatibilty problems when moving architectures. To get to grips with how important this is - Microsoft has only just recently managed to escape from the 16 bit code hell that it lived in for years (need proof - check out the Win16Lock you needed to get access to the video memory in DirectX).

    That said, I can't imagine that many (someone might enlighten us here) performance benchmarks that a 64 bit bytecode interpreter could do better in when compared to its 32 bit smaller brother.

    What would be interesting here would be to see how Javas bytecode and CIL scale to 64 bit. My first guess would be that Java should scale better (with Suns heritage of 64 bit platforms) but I wouldn't be surprised if MSFT weren't too far behind, as they were always keeping their eye on this test when designing the CIL. This would also be a good chance for the Mono project to try a "ours is better than yours" benchmark for their interpreterrs.

    --
    [ Monday is a terrible way to spend one seventh of your life. ]
  16. Re:opteron by NoMercy · · Score: 2, Informative

    The critical point being that Opterons unlike there Athlon 64 cousins have more hyper-transport interfaces, allowing them to be used in a multy-processor enviroment, depending on the seriese number up to 8-way systems can be built, though I think the largest Tyan's only carry 4 at present.

    There's other minor diferences but *goes off dreaming about a 4-way processor in a database server*

  17. ?!?! nocona ??! by rmr00t-f · · Score: 2, Funny

    cona in portuguese jargon, is a vagina... :D

  18. Re:Original source? by GuyFawkes · · Score: 2, Informative

    quoted in the submission
    http://www.realworldtech.com/forums/in dex.cfm?acti on=detail&PostNum=2527&Thread=1&entryID=35446&room ID=11

    hasn't been rejected yet, still pending, but no doubt will be.

    --
    http://slashdot.org/~GuyFawkes/journal
  19. Re:set-up benchmarks? by sigaar · · Score: 2, Interesting


    It really depends on what the rest of the hardware in the box is. AMD's (especially K6-II/III and Duron) CPUs tend to be seen as the low cost alternative and put in a box with a cheapo mobo, cheap mem and everything that goes with it, more often than Intel's CPUs. This is just my observation in dealing with a lot of SMEs, some who go all out and some who try to save where ever possible.

    Shining example. We run an Astaro firewall for one of our clients. At first they didn't have machine available, when we wanted to start it as a proof of concept. We used one of our own boxes standing around the office, a Duron 800mhz on a PC-Chips board with SiS everything onboard, 512MB SD-RAM running at 100mhz. This PC worked quite nicely, and load never went past about 0.90

    Later they retired one of their desktops to be the Astaro box. It's a P4 core 2Ghz Celeron, Intel board, 512MB SD-RAM (at 133Mhz). Load is constantly on 5.0. We've swapped out everything on that box, except the CPU. Even with a DDR board, it still running at an excessively high load.

    Another example. I have an AthlonXP 2400+ on a SD-RAM board. A friend of mine has a 3ghz HT P4 with DDR333. He helped me once make ogg files of various quality of a movie's sound to compare. The P4 was only a fraction faster per file than the Athlon. Encoding two files at a time, we expected the P4 to be much quicker overall, but despite the HT, the Atlon was actually quicker per file. The encoding time per file stayed the same (time devided by two files), while on the P4 it took longer per file if we did two at a time.

    This doesn't mean that the Athlon is always a faster CPU. My friend's gaming is a bit smoother, and he compiles KDE for example quite a bit quicker too. It's just that the performance depends entirely on what you do, and what quality hardware you use. If you put an Athlon on a good motherboard, it will kick arse. If you put a P4 on a crab board, it will suck.

    --
    sigaar
  20. Re:Is there somewhere that details all the opteron by Fweeky · · Score: 3, Interesting

    240 = 1.4GHz, £145
    242 = 1.6GHz, +£15 / +14% faster clock
    244 = 1.8GHz, +£90 / +28%
    246 = 2.0GHz, +£190 / +43%
    248 = 2.2GHz, +£345 / +57%
    250 = 2.4GHz, +£465 / + 71%

    First step's a no-brainer; next one isn't too bad, after that you're hitting significant diminishing returns, with each 200MHz gap being a smaller proportion of the total clock, not to mention other things becoming more likely to bottleneck (IO; memory bandwidth, disk latency, network, PCI bus, etc).

    Core differences are going to be minimal, and hypertransport's remained at 800MHz across the S940 range afaik, so the clocks *should* be a pretty accurate upper bound on the performance differences within each range.

  21. My own results by davros74 · · Score: 2, Interesting

    We have been benchmarking several loaner boxes at work to determine what will be our next purchases for our compute farm. We do primarily ASIC and FPGA design, simulation and verification. We have been in dire need of >4GB boxes, and until just recently, we had been forced to run on Solaris machine to get 8GB.

    The day of the Opteron, however, has come at last:
    All these were run with stock tools in 32-bit mode, no fancy compiler optimizations. These are the same programs that we run on 2GHz P4s.

    Agilent 3070 VCL vector conversion Perl program (which I wrote, this is very typical of the Perl programs we run to process large vector files - the benchmark only times data processing in memory, no file IO on read/write):
    Sun Blade-1000 750MHz: 103.08 sec
    P4 3.06GHz: 36.93 sec
    Opt 148 (2.2GHz): 27.01 sec
    Quad Opt 848: 27.42 sec
    Quad Xeon64 (3.6GHz): 31.17 sec

    Modelsim 5.8c simulation of LogicBIST simulation on 50K Flop ASIC:
    P4 3.06GHz: 5955 sec
    Opt 148: 3798 sec
    4x Opt 848: 5985 sec (See note below)
    4x Xeon64: 4858 sec

    Mentor Flextest fault grading using make -j1, -j2 and -j4 (parallel runs, results combined in later step that is not benchmarked):
    Sun Blade-1000: 7362 sec(-j1)
    P4 3.06GHz: 2188 sec(-j1)
    Dual P4 3.06GHz: 2189 sec/1333 sec (j2)
    Opt 2.2GHz 128: 1493 sec
    4x Opt 2.2GHz 848: 1562 sec(j1)/ 779 sec(j2)/ 393 sec (j4)
    4x Xeon64 3.6GHz: 1465 sec(j1)/796 sec(j2)/ 879 sec(j4)

    Mentor LbistArchitect on 50K ASIC:
    Sun Blade-1000: 15698 sec
    P4 3.06GHz: 3877 sec
    Opt 148: 2845 sec
    4x Opt 848: 3534 sec (See note below)
    4x Xeon-64 3.6GHz: 2604 sec

    Note - the poor performance of the quad opteron box was done on RedHat Enterprise Linux 3 AS-6, and I noticed that the SMP kernel did NOT have CONFIG_K8_NUMA set to y, so it's not fair to judge those numbers until we get a new kernel with ccNUMA support. I have run synthetic benchmarks on them too, and the memory performance on the Quad Opteron was indeed hurt by the lack of CONFIG_K8_NUMA in the linux kernel.

    Clearly though, the HyperTransport makes the Quad Opteron box scale very well, whereas the Quad Xeon box choked on 4 threads, probably beacuse the memory bus became saturated and the processors starved for data.

    Also, any serious optimizations need to use gcc-3.4.1 - which has specific optimizations for both Opteron and Nocona cores. gcc-3.4.0 does not have specific optimizations for Nocona ("Xeon64") cores. gcc-3.x does not have specific optimizations for Opteron.

    Anyway, our decision has been made - we are buying Opteron 150s for all our new compute farm boxes.

  22. Re:Xeon lose at SPEC too. by turgid · · Score: 2, Informative

    Synthetic benchmarks like SPEC often give very different results to real world applications. The fact that the Opteron is faster in the SPEC benchmarks and many real-world tests speaks for itself.

  23. Fair comparison? by illumina+us · · Score: 2, Insightful
    This is comparing Intel's latest chip to a very old Opteron.

    First of all, AMD's Opteron 150 is the highest performing AMD workstation CPU money can buy
    What about the AMD Opteron 850?
    --
    -illumina+us "I put on my robe and wizard hat..."
    1. Re:Fair comparison? by Lucretian · · Score: 2, Informative

      The way the opteron numbering system works is the first number is the amount of CPUs you can run SMP. The 150 is the fastest single CPU you can buy right now, the 850 is running at same clock rate as the 150 but can run in an 8way opteron system (if the boards ever become available). Right now you'd mainly find the 850s in a 4way system. The 850 would definitely be the most expensive opteron, but as a single chip would perform the same as the 150.