Linux Shootout: Opteron 150 vs. Xeon 3.6GHz Nocona
danalien writes "Anandtech with their previous review have stirred up a bit of controversy, and they've released their follow-up review where they pit AMD's Opteron 150 vs Intel's Xeon 3.6 Nocona (on linux)."
No message here. Oh, did you know that an Athlon64 3000+ is within 2fps of a P4 3.4 Extreme Edition in Doom 3?
Look up the prices for those two items.
i thought it was Athlon 64 now?
To be able to show the real potential of the Opteron, you need to have more than one processor.
This lets you take advantage of the on-die memory controller, by letting each processor do it's own memory work, rather than making the Northbrige do all the work.
If you want to use a single processor, you might as well use an FX-Whatever, since they are just an Opteron without MP capability and only one HT bus.
It's good to see benchmarks between processors in the same family, but is there anywhere that regularly tests CPUs across families? x86, PPC, Sparc, VIA etc. I'd like to see comparisons like that to see how various architectures strengths & weaknesses stack up
There's been some controversy about benchmarks being set up in favor of Intel, thus allowing it's processors - even having lower amount of MHz - to win most.
But I've seen some computers which, having only switched from AMD to comparable (in clock frequency) Intel processor, got some boost in speed. Especially in games. And I've seen some changing from Intel to AMD, suffering loss of speed - mainly in games.
I don't know if recent games (I've seen these effects mainly with Neverwinter Nights) are compiled with optimization for Intel line of 686, but that's the fact - AMD performed worse. In these particular computers, with some particular games of course.
I submitted this story an hour or two ago, but thinking about it it will be rejected just like everything else, and then pop up under someone else's name.
so what the hell.
Opteron Exposed: Reverse Engineering AMD K8 Microcode Updates
Summary
This document details the procedure for performing microcode updates on the AMD K8 processors. It also gives background information on the K8 microcode design and provides information on altering the microcode and loading the altered update for those who are interested in microcode hacking.
Source code is included for a simple Linux microcode update driver for those who want to update their K8's microcode without waiting for the motherboard vendor to add it to the BIOS. The latest microcode update blocks are included in the driver.
Background
Modern x86 microprocessors from Intel and AMD contain a feature known as "microcode update", or as the vendors prefer to call it, "BIOS update". Essentially the processor can reconfigure parts of its own hardware to fix bugs ("errata") in the silicon that would normally require a recall.
This is done by loading a block of "patch data" created by the CPU vendor into the processor using special control registers. Microcode updates essentially override hardware features with sequences of the internal RISC-like micro-ops (uops) actually executed by the processor. They can also replace the implementations of microcoded instructions already handled by hard-wired sequences in an on-die microcode ROM.
AMD's U.S. Patent 6438664 ("Microcode patch device and method for patching microcode using match registers and patch routines") goes into substantial detail on this.
Typically microcode update blocks are stored in the BIOS flash ROM and loaded into the processor as the system boots. They can also be loaded by the operating system; for instance, Linux contains a microcode device driver for Intel chips.
AMD recently released a "BIOS fix" to motherboard makers to address Errata 109, in which REP MOVS instructions caused subsequent instructions to be skipped under specific pipeline conditions.
Previously it was not clear if and how AMD even supported microcode updates in the K8 family until this announcement. After analyzing a number of BIOS images, it appears that AMD has secretly used the microcode update facility on several occasions over the past few years, but obviously avoided publicly disclosing that it actually had bugs patchable in this manner.
Early K7 (Athlon) cores initially supported microcode updates as well, until ironically the microcode update mechanism itself was found to be broken and subsequently listed as an errata!
The following sections describe the microcode update procedure, obtained by clean room reverse engineering various vendors' BIOS code. The actual microcode update blocks are embedded in the BIOS image; the most recent updates (created June 2004) have been included in the Linux driver source code attached to this description.
Microcode Update Procedure
The update procedure expects the 64-bit virtual address of the update data, including the 64 byte header, to be in edx:eax:
edx = high 32 bits of 64-bit virtual address
eax = low 32 bits of 64-bit virtual address
ecx = 0xc0010020 (MSR to trigger update)
Execute wrmsr with these register values. If the address and update block data are valid, wrmsr completes successfully. Otherwise, a GP fault is taken.
The microcode does not appear to update MSR 0x8B with the new update signature as it does on Intel processors, despite the fact that some BIOS code I have analyzed does seem to check this field. It is possible the MSR is only updated under certain conditions, for instance when microcode is loaded before initializing the cache controller. Nonetheless, as we shall see below, the processor is clearly doing something internally when it claims to accept an update in this manner.
The update generally takes around 5500 clock
http://slashdot.org/~GuyFawkes/journal
I recently installed Fedora 2 on a dual Opteron 248 system (Sun V20Z) and was amazed at the sheer grunt of the thing. Why anyone would even consider buying a Xeon just amazes me. I ran one of my own integer and memory heavy benchmark programs (single threaded) against my Athlon XP 2200+ and a single Opteron processor was 3x faster than the XP for only 400Mhz higher clock speed. These things are amazing, Intel should be crapping themselves and I am sure they would be if it wasn't for the cozy deal with Dell and the number of sites that have a Dell only policy. In a true free market they would be toast.
"I have the attention span of a strobe lit goldfish, please get to the point quickly!"
Analogies don't equal equalities, they are merely somewhat analogous.
After all is said and done it became difficult (nearly impossible?) to justify the Xeon processor in a UP configuration over the Opteron 150,
Huh? Here are some numbers:
- POV-Ray 3.50c: Opteron is 40% faster
- Crafty v19.15: Opteron is 70% faster
- TSCP: 10% faster
- PostgresSQL test-insert and test-select: Opteron takes 60% of the time it takes Xeon
- MySQL test-insert: Opteron takes 80% of the time it takes Xeon.
In almost every benchmark, where proper optimizations are used (and why shouldn't they be? Who in his/her right mind would not use proper optimizations??), the Opteron destroys the Xeon."As we can see above, the difference between the two CPUs seems exaggerated and difficult to trust."
---
The opteron seems to trounce the Xeon in most of the tests. The tests where it doesn't win outright, it is barely edged out. And this is from an Intel processor that mere mortals can't even buy right now against an Opteron I can order today and have in my hands tomorrow. That doesn't bode well for Intel.
Given the wide performance delta in AMD's favour in the database tests in this uniprocessor setting, I can't wait to see how things stack up with dual or quad processors. I know what I'LL be using next time I need to spec a departmental or enterprise database box.
Cheers,
Anandtech originally posted an article which was a comparison of the Intel Xeon Nocona ("3.6F") and the AMD Athlon 64 3500+. The Xeon "won" most of the benchmarks by a good amount.
The criticisms were that the Xeon is not a desktop CPU, or vice-versa, the Athlon is not a workstation/server CPU. But are they so different? The Xeon has 1MB L2 cache, and so does the P4 Prescott (and presumably Prescott with x86-64 enabled), and both run at the same speed.
Similarly, the 3500+ runs at 2.2GHz and has a 512KB L2 cache, whilst the Opteron 250 runs at 2.4GHz and has a 1MB L2 cache.
With this in mind, can anyone explain to me why the Opteron seems to perform much better? The benchmarks appear to show the Xeon trouncing the A64 and the Opteron 250 trouncing the very same Xeon in 64-bit.
So what's the deal? Are AMD's desktop chips choked on 512KB of L2 cache (and yet the new Socket 939 A64s seem to be dropping back to 512KB L2 cache, whereas a few of the Socket 754 chips had a full 1MB).
I'm no processor expert, I just wondered if anyone could explain the "big difference".
when racism affects your judgement of an article it's the equivalent of having no judgement in the first place. Racism is the single most evocative measure of a person lack of judgement or mental abstraction capabilities.
I can say it kicks all kinds of ass in Doom3. :)
I see that all sors of business/content creation benchmarks were used in the Anandtech review, but when a Baron of Hell corners you an ancient underground Mars excavation site... Well, lets just say your death animation looks all the more sweet with those extra fps.
You seem to be confused about the MHz thing ... an Opteron with the same clock frequency would win every benchmark against a P4: An Athlon 64 at 2 Ghz is faster than a 3.4 Ghz P4 in many benchmarks (and slower in others). Intel's design makes very high clock frequencies possible, but at the expense of a very long pipeline, which hurts applications with many branches or unplanned memory accesses.
If you really need high performance, you'll have to try and figure out the performance for the specific application you need.
A rough summary (when comparing processors with the same rating, not same clock frequency):
The Athlon 64 wins most gaming benchmarks against the P4.
The Athlon (and especially the Opteron with its larger chache) dominates database benchmarks.
Things are very different with video encoding - there Intel usually wins.
The main difference between the two is rather simple. The Opteron uses a dual-channel memory controller. You'll see most mobo's for the Opteron require registered memory. The A64 on the other hand being a consumer based chip uses a single channel memory controller. All in all, the Opteron has double the theoretical memory bandwidth.
On the other hand, Intel's version of 64 bit computing still does not have a memory controller built into the chip. While using dual channel memory, they are still having the Northbridge bottleneck that AMD wanted to avoid from the beginning.
What would be nice to see, as others have commented would be a scalled series of benchmarks. You think the Opteron is impressive now, try 2 and 8 processor configurations vs. the Xeon's. You'll see how much the memory controller plays a role. In the end, add up the GHZ and price points and you'll be able to see how much bang you'll get for your server $.
With the crap stories that have been posted today, I think perhaps the slashdot editors ARE taking it easy on the seventh day
As hyperthreading cuts the L2 cache in HALF, it should be disabled before doing any of these benchmarks. Hyperthreading only seems to improve the multithreading ability. These benchmarks being run on a single process are not realistic.
>This makes the Opteron 17% faster, not 70% faster.
So Xeons get creamed at SPEC too, you troll. Where's the beef?
It seems to me that AnandTech seems to be biased in Intel's favor for some odd reason. Either that, or that particular reviewer happens to be. Last week in their other review they said the Intel Xeon processor was way better - even when the results were about the same skewed in Intel's favor. Now that the results are skewed toward AMD the reviewer still refuses to see that the Opteron is a better processor, is available NOW, and is $250 cheaper than the Xeon-yet-to-be-released that they are comparing it too.
*Sigh* I've lost all faith in reviews by some of these hardware sites lately - they seem to be getting paid by someone to make invalid conclusions (or none at all) from fairly conclusive data.
"To strive, to seek, to find, and not to yield." - Tennyson
Other than a few benchmarks that were either synthetic or not compiled specifically for the processor, AMD whooped Intel's ass. Some of the gains were quite significant.
However, this speed increase seems to depend on being able to compile your software from scratch which is generally unknown in the windows world. That should change in the future, but for now it's still a tough call whether or not to buy one now. But if you're running gentoo, let the funroll-loops begin!
http://www.debian.org/ports/s -status
http://www.debian.org/devel/debian-installer/port
Sarge in September?
http://debianplanet.org/node.php?id=1131
What?!
Cigarettes are mentally diseased?
My God!
How about profiling bytecode interpreters for the new breed 64 bit processors.
Both Sun (the original innovators) and now Microsoft are putting their money on their bytecode (rather than binary) executables to try and avoid the whole backwards compatibilty problems when moving architectures. To get to grips with how important this is - Microsoft has only just recently managed to escape from the 16 bit code hell that it lived in for years (need proof - check out the Win16Lock you needed to get access to the video memory in DirectX).
That said, I can't imagine that many (someone might enlighten us here) performance benchmarks that a 64 bit bytecode interpreter could do better in when compared to its 32 bit smaller brother.
What would be interesting here would be to see how Javas bytecode and CIL scale to 64 bit. My first guess would be that Java should scale better (with Suns heritage of 64 bit platforms) but I wouldn't be surprised if MSFT weren't too far behind, as they were always keeping their eye on this test when designing the CIL. This would also be a good chance for the Mono project to try a "ours is better than yours" benchmark for their interpreterrs.
[ Monday is a terrible way to spend one seventh of your life. ]
When you don't have the material nor the time to validate an article, you have to judge its validity based on reputation. If you don't know the person who is making the article, then you have to use general knowledge, like cultural difference, to determine the value of the article.
For example, I won't trust an article praising Intel over AMD if it was made by a jewish person. I'd think there is a possibility he may have a political agenda. The opposite is also true. I won't trust someone praising AMD if he is openly against Israel. You may call this racism if you like but it's certainly not a lack judgment.
Also, most people use racist insults while not really being racist. It's a way to differenciate themselves from the other. It's like the brunette who bitches the blondes.
The parent poster also said the guy was only a kid. But you didn't reply to this. My guess is you're the kind of person who feels good about himself by "fighting" racism. My guess is you're the kind of person who refuse to see cultural difference. And to me it's the most evocative measure of a person lack of judgement.
Ehm, try both linux and bsd? The processors named are not exactly that rare. Both are supported by linux and maybe bsd.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
Does the above article have an original source? I'm guessing it didn't just spontaneously appear on half a dozen weblogs, it was probably written by someone who would like credit for his/her work? Perhaps this is why the story was rejected?
-jim
Why don't they just summarize each test/graph with the PERCENT difference?
Instead, we get gems like this:
This becomes our first real world test where we see Intel come out ahead. This coincides with what we saw on the previous page with the synthetic benchmark.
On a result where the Xeon is 55.894 and the Opteron is 56.26... Since when is half a percent significant?
And where is the kernel compile benchmark?
cona in portuguese jargon, is a vagina... :D
But the problem is that many of those shortcut decisions aren't really accurate. The other problem is that you don't need to trust the article if the facts are supported by a methodology. If you still don't trust that, then you've got to believe that rather than bias, actual lies are taking place. And you can bet that people will check the facts if the methodology is included. It's just silly to use cultural stereotypes when there are more fine grained indicators available (such as the style, presentation of methodology, etc).
And the sort of differentiate you refer to (blond vs. brunnette, for instance), is rather insidious. It sounds harmless, but it's amazing how quickly that sort of us versus them mentality can balloon. You may recall a psychology study in which the experimenter taught young children to form groups based on eye color -- after a while, the groups started to display shockingly divisive behavior. Another is the Zimbardo prison experiment (Google "Stanford Prison Experiment"), which demonstrated that when two groups were randomly formed -- one of prisoners, one of guards -- and they were asked to role play, the role play went beyond simple acting to actual maliciousness. The point is that these distinctions people make are self-reinforcing and can lead to excessive bias. Yes, you might not trust an article that has no methodology, but if you're going to read the article and it seems up-front about its methodology and you can evaluate it, then you should evaluate it on the merits: it doesn't matter who wrote it. If you really just want an opinion without doing any analysis yourself, then that's your perrogative, and you might want to seek out the best sources possible. But I think cultural distinctions are far less important than the concrete evidence provided by the article.
There seem to be a million of them.
A 240 is 1/3 the price of a 250, is it still a great processor? I haven't found a site that goes into all of them and of course the reason for favoring one over another
I don't know, the world is full of religious fundamentalists and they don't even know simple facts about their religions.
Genesis: And the evening and the morning was the first day.
go read this Infoworld review. They don't list any real figures that I could track -- do other reviews replicate their results?
These benchmarks show the Opteron 150, a $600, real-world-available chip handily beating a $850, non-real-world-available chip. But that's still not the half of it.
Wait until tests are run on multi-CPU machines. Because the Opterons scale so much better than Xeons, the performance advantage of the Opteron will be even greater.
When I've bought a quad Xeon machine, I've never been at all impressed with the scalability. When I bought a quad Opteron, I was blown away.
steve
Oh, you're not stuck, you're just unable to let go of the onion rings.
240 = 1.4GHz, £145
242 = 1.6GHz, +£15 / +14% faster clock
244 = 1.8GHz, +£90 / +28%
246 = 2.0GHz, +£190 / +43%
248 = 2.2GHz, +£345 / +57%
250 = 2.4GHz, +£465 / + 71%
First step's a no-brainer; next one isn't too bad, after that you're hitting significant diminishing returns, with each 200MHz gap being a smaller proportion of the total clock, not to mention other things becoming more likely to bottleneck (IO; memory bandwidth, disk latency, network, PCI bus, etc).
Core differences are going to be minimal, and hypertransport's remained at 800MHz across the S940 range afaik, so the clocks *should* be a pretty accurate upper bound on the performance differences within each range.
We have been benchmarking several loaner boxes at work to determine what will be our next purchases for our compute farm. We do primarily ASIC and FPGA design, simulation and verification. We have been in dire need of >4GB boxes, and until just recently, we had been forced to run on Solaris machine to get 8GB.
The day of the Opteron, however, has come at last:
All these were run with stock tools in 32-bit mode, no fancy compiler optimizations. These are the same programs that we run on 2GHz P4s.
Agilent 3070 VCL vector conversion Perl program (which I wrote, this is very typical of the Perl programs we run to process large vector files - the benchmark only times data processing in memory, no file IO on read/write):
Sun Blade-1000 750MHz: 103.08 sec
P4 3.06GHz: 36.93 sec
Opt 148 (2.2GHz): 27.01 sec
Quad Opt 848: 27.42 sec
Quad Xeon64 (3.6GHz): 31.17 sec
Modelsim 5.8c simulation of LogicBIST simulation on 50K Flop ASIC:
P4 3.06GHz: 5955 sec
Opt 148: 3798 sec
4x Opt 848: 5985 sec (See note below)
4x Xeon64: 4858 sec
Mentor Flextest fault grading using make -j1, -j2 and -j4 (parallel runs, results combined in later step that is not benchmarked):
Sun Blade-1000: 7362 sec(-j1)
P4 3.06GHz: 2188 sec(-j1)
Dual P4 3.06GHz: 2189 sec/1333 sec (j2)
Opt 2.2GHz 128: 1493 sec
4x Opt 2.2GHz 848: 1562 sec(j1)/ 779 sec(j2)/ 393 sec (j4)
4x Xeon64 3.6GHz: 1465 sec(j1)/796 sec(j2)/ 879 sec(j4)
Mentor LbistArchitect on 50K ASIC:
Sun Blade-1000: 15698 sec
P4 3.06GHz: 3877 sec
Opt 148: 2845 sec
4x Opt 848: 3534 sec (See note below)
4x Xeon-64 3.6GHz: 2604 sec
Note - the poor performance of the quad opteron box was done on RedHat Enterprise Linux 3 AS-6, and I noticed that the SMP kernel did NOT have CONFIG_K8_NUMA set to y, so it's not fair to judge those numbers until we get a new kernel with ccNUMA support. I have run synthetic benchmarks on them too, and the memory performance on the Quad Opteron was indeed hurt by the lack of CONFIG_K8_NUMA in the linux kernel.
Clearly though, the HyperTransport makes the Quad Opteron box scale very well, whereas the Quad Xeon box choked on 4 threads, probably beacuse the memory bus became saturated and the processors starved for data.
Also, any serious optimizations need to use gcc-3.4.1 - which has specific optimizations for both Opteron and Nocona cores. gcc-3.4.0 does not have specific optimizations for Nocona ("Xeon64") cores. gcc-3.x does not have specific optimizations for Opteron.
Anyway, our decision has been made - we are buying Opteron 150s for all our new compute farm boxes.
what i find interesting this that on the benchmarks, the opteron 150 kicks the pX to hell and back, but what is interesting is that the 32/64bit Gflops of the pX are higher, suspecting to me that they want to belive that they will perform in real life situations, when in actual fact they dont...
Well you can't win em all. Thank god you didn't accuse me of ageism too, or my post would have started to look REALLY bad!
Especially Australian ones on weekends! ;-)
What about the AMD Opteron 850?
-illumina+us "I put on my robe and wizard hat..."
Things are very different with video encoding - there Intel usually wins.
;)
When it comes to a standard a64 under windows, yes. Opterons are pretty close(and in some cases faster), and when you move into dual, Opterons dominate dual Xeons, even in media encoding, according to the reviews I've read and the tests I've done on my own system.
I know you said "usually", but I wanted to add a bit more specifity to it.
For example, I won't trust an article praising Intel over AMD if it was made by a jewish person. I'd think there is a possibility he may have a political agenda. The opposite is also true. I won't trust someone praising AMD if he is openly against Israel. You may call this racism if you like but it's certainly not a lack judgment.
Okay, here's my big fat ignorance here, but why do you say this? I have to know. The thing is, I am strongly critical of Israel(I don't see myself as against them, so much as against some of the things they've to do as a matter of policy) and I happen to favor AMD, BUT, my choice of AMD had nothing to do with this. Is there something about AMD being anti-Israel I'm not aware of?
I think that it has to do with Intel having a (very good) corps of engineers in Israel. Some people can make the massive leap from this tidbit to "Intel supports the murder of Palestenians!!!"
Or it could be something else.
Dissolve... Resolve... Evolve...
That group in Israel designed a fully asynchronous instruction decoder. They were able to get a 50% improvmnent over synchronous designs. This is very, very, very hard to do in digital logic design.
I originally wrote it for Real World Tech as a forum post. It was just a technical memo, not a full article like Crusoe Exposed was. I'm actually surprised neither of these made it as a Slashdot story (I wasn't aware they rejected unsigned articles.)
Obviously I cannot reveal my identity. That's why it was posted anonymously, like all my chip reverse engineering work.
Suffice it to say that I've been in contact with top level engineers at AMD and am working with them to fix the problem ASAP. An exploit at this level could be nasty.
One year ago I bought a Athelon XP 2600 based PC. At the time the chip itself cost 65 euros. I thought it was good value for a relatively speedy processor at the time. I just checked in the shop where I bought the PC (http://www.softworld.es/micro_amd/). It seems the 2600 is still the best value for price/performance - the cheapest Athelon 64 is 253.95 , three times as expensive and only 15% faster. So much for a years worth of technological adavancement.
Shouldn't it be?:
47 65 65 6b 21 20 3a 70
Remembering that Sun generally uses pretty good quality componentry, as well as possibly some enhancements to make these systems fly, of course... The opteron has a higher bus speed and memory control and 64-bit goodness and... well... the list goes on, no?
Founder & COO, Hayai India (hayai.in) / USA (hayaibroadband.com)