Xeon vs. Opteron Performance Benchmarks
QuickSand writes "Anand got his hands on some of Intel and AMD's enterprise processors including 4MB L3 Xeons, and put them to the test. Results were a little varied as 4-way Opteron systems seemed to fare the best, although dual Xeon configurations almost always beat dual Opterons. The exact benchmarks are here."
Can somebody tell me if the IA-32e processors will be in the socket 478 format to work with existing boards, or will they require a whole new socket and chipset (rather than a bios update) If they really are just "extensions" then I don't see why anything special would need to be on the motherboard correct? The cpu should switch into 64bit mode whenever the OS tells it to right?
Xeon's are almost always for servers, wheras Opeteron's can be for anything. Try running a windows xp workstation on a dual Xeon system and you'll be very disappointed.
When anger rises, think of the consequences.
Confucius (551 BC - 479 BC)
It's probably due to the lack of knowledge/tools to benchtest anything else. I'd like to see SQL benchtests, IIS/Apache test/etc but just like a lot of other people, I don't know exactly how to do that. Though if I ran a site which made it my business to test hardware I'd definately find out and learn how to do it.
I'd like to see more "Consumer Reports" type tests to. Test hardware configuration X as a high-volume SQL server, and show me how it's held up after a month, 3 months, 6 months, and a year. Yes, maybe I'd upgrade before then, but not everyone would, and I'd like to see common failures and problems down the line - not a 1-2 day test.
Looking for hardware (Currently need: Large Etch-a-Sketch) Have one? See my journal!
One thing I did not understand is how come the 3MB cache is helping with big database query ? I thought that will thrash the cache and there will be not much performance gain if you are working with bigger code/data set. Also, for the four CPU opteron, do they have hyper transport going from every cpu to every cpu ? Is it like a mesh or like a ring where every cpu has only two connections to it's next ones.
Another thing I did not get is how linux is handling ( not handling ) the local memory to the CPU. This thing looks like a mini-numa type system. Does linux actually try to keep the data in the RAM and process it with the cpu it is connected to ? how does this really work ?
May be you guys can help clear my ideas .
- People who believe other people have no right to live, got no right to live ...
So I see that M$ Windows was used as the OS. Unless this was a prerelease of the 64bit XP then they were running a 32bit OS on the chips. So, wouldn't that mean that this isn't a true test of the power?? Your not taking full advantage of the 64bit power.
Evolution or ID?
Folks were avoiding the Itanium because it was a disaster; slow and expensive. We've been looking at 64 bit computing for a while, because of the seamless > 4GB RAM capabilities. Intel's PAE extensions are OK, but they really didn't solve any of the problems we were having.
The net result was we went to 64 bit PPC architecture 3 years ago on those critical systems, And everything has been fine. AIX works great, and IBM's embrace of GNU/Linux means an easy learning curve for us Linux users.
You are in a maze of twisted little posts, all alike.
Alright I have had about 3 AMD processors die on me. I have owned about 4 Intel processors all the way back from original Pentium. Not one has ever had a problem.
Now... given this kind of statistics, as sad as it may sound I'd say I am willing to pay anything for an Intel just to avoid the headaches.
I thought the very definition of L3 cache was off die. If it is on die, wouldn't it be L2 cache, unless is does not run at core CPU speed?
But these days days with all the virtualization getting hot(vmware etc), a server architecture with a single memory bus/controller is getting old.
I'd like to see some test on servers like the IBM x445 with NUMA.
The tests in this article, involved running the same exact binaries (out-of-the-box Microsoft 386 stuff) on both types of CPUs, rather than the code being compiled to run natively. The Opterons were fighting with one hand tied behind their backs.
In other words, this benchmark is mainly only of interest to Microsofties. If that's what you run, then fine, the article may be useful to you and you may get something out of reading it.
If you are trying to maximize speed, though, then the software contraints that this test took place under, are totally contrary to what you'd actually be doing (running code that is appropriate for the hardware).
BTW, another weird thing I noticed about this article: these guys use flash for static images of bar graphs. WTF? Anandtech, your w3b d3$1gn3rz R S0 31337!!!1
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
In addition, Anand used sub-optimal memory in the Opteron, and non-NUMA config. Looks like he had some Intel "assistance" in designing the "benchmarks" as well... the database read/write ratio is not at all realistic, favors the Xeon.
I have not heard that the Itaniums were slow since Intel came out with the Itanium2. Yes, the Itanium1's were dog slow. I've got 65 Itanium2 processors downstairs, and I'm happy with them. For our purposes (crunching numbers on very large datasets) the Itanium2 was the platform of choice because of its 64bit addressing, high memory bandwidth and good processor speed.
I wish we could get by with cheap Xeons, but they just don't cut the mustard for our applications.
I'll soon be finishing up my extensive long term testing of an Apollo 735 HP-UX Unix Workstation with a 125Mhz PA-7200 PA-RISC processor. I'll post the results for you if you are in the market for one of these. You can still find them on Ebay for about $5. ;)
Bad boys rape our young girls but Violet gives willingly.
Anandtech had an Opteron vs Xeon test earlier too, AMD Opteron 248 vs. Intel Xeon 2.8: 2-way Web Servers go Head to Head where the Opteron trashes the Xeon handily. I guess that was more focused towards web serving, and now that Anandtech intended to replace their forums database server, they naturally based their latest test "AMD Opteron vs. Intel Xeon: Database Performance Shootout"on database performance.
"In a 4-way configuration AMD's Opteron cannot be beat, and thus it is our choice for the basis for our new Forums database server. We'll be documenting that upgrade in a separate article so stay tuned."
Heh, I guess the Cray Red Storm system kind of shoots down that theory... ;-)
Actually the design of Opteron beats Itanium for HPC, and the relative number of Opteron vs. Itanium HPC design wins bears that out nicely.
Galileo: "The Earth revolves around the Sun!"
Score: -1 100% Flamebait
Cache may always help but this is not as straightforward a statement as you indicate. It is highly dependent upon the architecture of the processor.
The reason the 4mb Xeon's are significantly outperforming the 2mb Xeon's is due to the shared bandwidth architecture of the Xeon's. The cache makes up for the lack of access to data via the FSB and keeps the very deep pipeline of the P4 series processors full. The long pipeline is the reason that cache misses impact the speed of the P4s so much - despite Intels attempt to improve branch prediction. Simply look @ the P4 Celeron's to see how they can be so utterly trounced by regular P4s @ the same clockspeed with little architectural difference but cache size.
Opterons/AMD 64s do not benefit as much from the boost in L2 Cache. Perfect example of this is the
Athlon 64 3000+ and Athlon 64 3200+
The 3200 has 1meg of L2 - and the 3000 has 512k - and both run @ 2 ghz. The performance difference between these two (in most benchmarks) is less than 10%
Anand Review of Athlon 64 3400+
So a doubling of cache at the same processor speed results in a 10% boost in performance 'maybe'.
Finally some applications are more sensitive to L2 cache sizes then others.
Therefore your statement "more L2 cache always helps" is strictly true - but the degree of performance increase must be compared against the increase in cost. And this benefit will change from processor to processor and application to application.
That's nice, but I'm not sure that difference will show up in the system price -- big vendors like Dell and IBM get huge discounts off Intel list.
``Results were a little varied as 4-way Opteron systems seemed to fare the best, although dual Xeon configurations almost always beat dual Opterons.''
Varied, perhaps, but not surprising. AMD has integrated the memory controller on the CPU, which could explain their getting better when the number of CPUs increases (the Intels being held back by having to go through the same memory controller).
As for Intel winning out on the dual CPU systems, well, they are ahead of AMD in the CPU speed race, aren't they?
Please correct me if I got my facts wrong.
Opteron systems seemed to fare the best, although dual Xeon configurations almost always beat dual Opterons.
Perhaps the benchmarks show the 2P Xeon's doing OK against 2P Opteron's, but for the price of two Xeon MP chips you can buy five Opteron 848's. Rounding that down, I wonder how well the 2P Xeon does against the 4P Opteron? Oops, Anand already though of that. He says "it would not be pretty." Indeed.
In the end they will lay their freedom at our feet and say to us, Make us your slaves, but feed us. - Fyodor Dostoyevsky
Yes, the tests weren't exactly apples-to-apples - the outcomes are actually much better for AMD than the graphs would initially appear.
The graphs mean that Opterons with a "measly" 1 meg of cache are beating out Xeons that have (a) four times the cache, (b) 50% higher clock speed, and (c) a price tag that's three times higher.
Hats off to AMD. In times past (K2/K3), price was the only thing they had better than Intel. Now they've got both price and performance.
steve
Oh, you're not stuck, you're just unable to let go of the onion rings.
"People bash the x86 architecture and at the same time, bash anything that isn't x86."
Well, I think that people look at the x86 architecture, and they can see the many, many horrible hacks that have been used to sustain it. That much is pretty obvious if you spend even 10 minutes looking over things. You sit there scratching your head and going, "What the hell? Why'd they do that?", and then realize it's because something, somewhere, was broken until they did it. The reason people don't like to start looking into replacement architectures is exactly as you expressed; the must-have software. You can try running that software under emulation, but the best architecture in the world is always going to take a performance nosedive when running code under emulation. I can look at what IBM has been doing, or even at what Intel was doing with EPIC back in the day, and I can say, "wow, that's pretty cool". But what I can't do is put down the x86, toss all the old software, and hope that all the new software, written for a completely new architecture, is going to work in some sort of reliable fashion. What you really get with x86 is 20 years of experience, and thus, a measure of predictability. In essence, you're paying for predictable problems (much better than unpredictable ones) with old, poor architecture.
"The AMD solution doesn't do away with x86"
AMD64 actually does get rid of a lot of garbage in x86 that is no longer in use. Take a look at the presentation (link at Ace's) by the guy who designed AMD64. He was actually pretty thrilled (well, as thrilled as this guy gets) about being able to dump a lot of the cruft x86 has accumulated. Unfortunately, many things had to remain intact, for the obvious reason of compatibility. I have to warn you though, the guy from the AMD presentation is a real ball of fire. (Although, the ex-Intel guy from the other presenation was pretty interesting and funny)
-- "Government is the great fiction through which everybody endeavors to live at the expense of everybody else."
But yes, I agree with you, AMD cannot neglect the desktop market, unless it makes AMD64 cheap enough that it can put them in all computers (which I think is their inevitable goal). Hell, once eMachines starts stocking them in Computer City, I think they'll have achieved it.
The Mobile Athlon64 3000+-based eMachines M6807 latpop is available at Circuit City and Best Buy (M6805).
The Athlon64 3200+-based Compaq s6900NX is also available at Circuit City.
The Athlon64 3200+-based eMachines T6000 is available at Best Buy.
That good enough?
Portable versions of Firefox, GIMP, LibreOffice, etc
Has anyone noticed how the comparisons of Intel vs AMD always show AMD slightly less than Intel? Has anyone ever suspected that AMD might be faking that it runs at a slower clock speed, with less cache just to get some people saying that AMD "whoops" intel's ass? Theres something not right about an 800 pound gorilla getting beat up by a monkey
Because it's not a s simple as X vs 2X computers. The chip, while significant, is not the total cost of the computer. Xeon 2x motherboards, for example, run 200 dollars less for equivalent Tyan MBs. (from looking at Newegg, anyway)
Because even if the chip is a significant portion of the cost of building the computer, it is only a small fraction of the total cost over the useful lifetime of the cluster.
Because one has to benchmark for one's own problem set. It's possible that one set of instructions are better optimized for Xeons.
Because the fewer number of nodes in a cluster, the more efficent each individual node is. A small performance increase may be substantial enough to require fewer nodes, bringing numbers into line.
Because if it's big enough, Intel might throw in a few days with an engineer to sweeten the deal. (But then again, so may AMD.)
Numbers arguments get too complex to make such an important decision a no-brainer.