Xeons, Opterons Compared in Power Efficiency
Bender writes "The Tech Report has put Intel's 'Woodcrest' and quad-core 'Clovertown' Xeons up against AMD's Socket F Opterons in a range of applications, including widely multithreaded tests from academic fields like computational fluid dynamics and proteomics. They've also attempted to quantify power efficiency in terms of energy use over over time and energy use per task, with some surprising results." From the article: "On the power efficiency front, we found both Xeons and Opterons to be very good in specific ways. The Opteron 2218 is excellent overall in power efficiency, and I can see why AMD issued its challenge. Yes, we were testing the top speed grade of the Xeon 5100 and 5300 series against the Opteron 2218, but the Opteron ended up drawing much less power at idle than the Xeons ... We've learned that multithreaded execution is another recipe for power-efficient performance, and on that front, the Xeons excel. The eight-core Xeon 5355 system managed to render our multithreaded POV-Ray test scene using the least total energy, even though its peak power consumption was rather high, because it finished the job in about half the time that the four-way systems did. Similarly, the Xeon 5160 used the least energy in completing our multithreaded MyriMatch search, in part because it completed the task so quickly. "
I know this is slashdot, but maybe I wanted to RTFA?
AMD needs to deliver some real quad core chips (or 8 core chips) that will beat Intel's performance. If they don't soon, AMD will quickly get kicked back to the 2nd rate Intel cloner that everyone knew them prior to their groundbreaking AMD 64s and dual core chips briefly took the performance lead from Intel. I'm keeping my fingers crossed that AMD will deliver, I've always liked (and bought) their chips as long as the performance is similar to Intel.
Crack - Free with every butt and set of boobs
AMD needs to do what they have been doing - thinking independently and coming up with original solutions.
the Opteron ended up drawing much less power at idle than the Xeons
...
the Xeon 5160 used the least energy in completing our multithreaded MyriMatch search, in part because it completed the task so quickly.
So what does this mean for people shopping for servers?
If your servers constantly tick along at nearly 100% CPU use, you might do better going with the Xeon system. If your machines basically sit idle most of the time with an occasional spike for a few seconds when it actually does something, the AMD would save you more on electricity.
Of course, this raises a third possibility - Would running a number of virtual servers on one large Xeon machine waste more energy than it saves, or give a net gain?
It's almost 2007 and we're still hanging bags on the side of the 8080. No matter how many cores, caches or pipelines, no matter the clock rate, it's still the same-old same-old single-accumulator, bizzaro CISC instruction set piece of shite. I blame Microsoft as much as I blame Intel for this, if Windows weren't married to the architecture, we could do something better. And don't get me started about Itanic, that bit of FUD cost us Alpha, PA-RISC, SPARC, MIPS and others. Looks like Cell and Power are our only hope.
/rant
Thanks, I feel better now.
No folly is more costly than the folly of intolerant idealism. - Winston Churchill
Apples compared to Oranges: Our findings on the page after the banner adds!
.. nothing to see here, move along...
Presumably, the article tests power consumption because businesses are concerned with how much running each of these systems will cost them. If the Xeons managed to win in power consumption because they completed the task in half the time, that has other cost-saving benefits even beyond power consumption. They can use fewer systems to complete tasks within the deadline, complete tasks ahead of schedule (making their business slightly more agile), and/or spend less money on animators waiting for their animations to render.
Are you seriously claiming "spoiler" on a tech article? That's a new level of silliness.
I love the thinking in this report, look at total energy consumption for a given render... Brilliant... TCO FTW!
one friend who works for oracle, in their datacenter, told me that they are swaping the dell intel xeon server with Sun AMD Opteron servers. the main reason behind this server swap is power efficiency of the new sun servers. So that means big corps already had their eye on AMD cpus :)
It has always been my understanding that best practices dictate a server running at a constant 100% CPU utilization is underpowered and needs upgraded. Normal, every day, steady CPU utilization should hover no higher than around 50% (closer to 75%, if you like living on the edge) leaving enough CPU to handle peak loads. Very few functions require a system that maintains a constant CPU utilization and never peaks over it.
"I got a 2KW Optitron running GoogOS, you?"
"3KW Sexium on Microsoft Linux."
"Shut up and roll."
It's very useful to have some normalized way of measuring watts/performance, as they try to do in this article. But at least they could have used a more general and useful benchmark, like those offered by www.spec.org.
I'd like to see these efficiency curves plotted against 100%, the maximum theoretical efficiency of the transfer function through the semiconductors. Anyone know how to calculate the minimum W:b (watts per bit) necessary for these real-world tasks? Or is that just way too complex a stat to compute without melting the datacenter at which it's computed?
--
make install -not war
I know this is Slashdot, but what's stopping you from R'ingTFA? The suspense lost by the spoiler?
--
make install -not war
I know of and have worked with too many organizations that figure it's just a matter of slapping all the computers in an air-conditioned room. Every watt of waste heat adds to the A/C bill.
Old fashioned water-cooled mainframes and big iron (for it's time) often recirculated the wasted heat into the heating systems of the surrounding buildings. We've known all along how to be more energy efficient, if companies and management would only place the emphasis on the environment in their budgets.
I do not fail; I succeed at finding out what does not work.
It's not going anywhere. Intel actually wanted to replace it though it's arguable if their replacement was better or worse but AMD won out the 64-bit round with x86-64. That's what Linux uses, that's what Windows uses, it's a done deal.
Now personally to me you sound like someone who's spent a little too much time in a computer science architecture class soaking up theories about ISAs and too little time actually looking at how chips are made these days and what works. When you get right down to it, x86 works just fine. The chips built on it are very fast, the compilers are able to generate efficient code for it, it plain works in the real world. You may not like it, but it does work well in the real world.
Will something like the Cell kill it? Maybe, but forgive me if I'm more than a little skeptical. There's been things that are going to kill x86 for a long time and none of it has panned out. You can try and make your ISA as brilliant as you like, what it really seems to get down to is good chip design for the money, and Intel and AMD are hard to beat at that.
"If your machines basically sit idle most of the time with an occasional spike for a few seconds when it actually does something, the AMD would save you more on electricity."
More importantly, I think, is that power consumption translates to heat output. If you have mostly idle servers with occasional spikes, you can either cool them for less or put more in the same space depending on what you need. And don't forget that you actually save money twice with the AMD since you have to pay to power and cool the Xeons.
Virtualization, if done correctly, should save you more money on hardware than anything else. You load up a Xeon machine with 6 virtual servers and keep it humming at 70% load. Then you're probably putting out less heat than 5 lightly loaded AMD processors. You've saved the money on the extra hardware, and gained a lot of good things about machine portability in the future.
>I know this is slashdot, but maybe I wanted to RTFA?
You must be new here...
are a pathetic developers. The 'You See, even
With the intel chip set there is only 2 x8 pci-e lanes coming out of the north bridge and sas / sata-2 , pci-x, networking, as well as the pci-e slots on the board have to share them.
So with a lot of network use and disk use you can choke up that bus.
So uh, this memory-mapped IO that I'm using instead of emulated PIO, and these programmable DMA controllers, and the cascading interrupt muliplexer, and this hybercube bus with cache coherency... that all is just a figment of my imagination.
Meanwhile my Sun has OH LOOK, a crossbar, and MY GOD! this newfangled PCI bus. WHAT HATH SCIENCE DONE?
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
Here is one test that needs to be done take a duel amd opteron workstation with 2 Quadro cards in sli and also put in a raid 5 sas or sata setup also do some networking at the same time. There are duel and quad amd opteron boards with nForce Professional chip sets. some have 4 pci-e slots x16 x8 x8 x16 with each half coming from a HTT link.
Also take a duel intel workstation and try to do the same thing the best that you can find is x8 x8
Use hacked sli drivers is ok.
I think that the amd system will do better as it has much better io bandwidth.
And so far the conclusion is your server farm should run at 50% utilization average, make it virtual and run it on Xeons at almost 100% and keep the other 50% on iddling Opterons waiting for the peaks?
No, I take it.
Then why do you care?
And we "fixed" this with x86_64. The extended instruction set allows for more orthogonal expression of what you want to do with your ops w/r/t regs and memory (although not all of them are equivalent length, the more common ones are shorter, so what does it matter?)
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
http://techreport.com/reviews/2006q4/xeon-vs-opter on/index.x?pg=7
Very interesting. The benchmark uses a database and is the only one I've seen that seems to test the limits of the CPU cache with a database.. and low and behold, at 8 threads, performance degrades for the 5355 and it's actually slower than the opteron 2218.
Or it could just be that this benchmark isn't coded well - it might use a global lock frequently so as you add more threads there's more contention. In any case someone with more time than me should dig into this benchmark which might show a weakness in the core 2 architecture.
Finally, good benchmarks. Where were these guys a month ago before I ordered those 5320s and when will those 5355's be available for the rest of us.
2 years and no mod points. Join reddit. Because openness is good.
See http://electricrain.com/greg/opteron-powersave.txt .
All AMD K8 (Opteron and Athlon 64) CPUs have the ability to run the clock and an extra slow speed when in HLT (idle) mode saving a bunch more power. Many (most?) BIOSes are not smart enough to enable this. A simple setpci command will turn it on under linux.
find out if its on:
setpci -d 1022:1103 87.b
If that returns 00, its off. To turn on clock-divide-in-hlt to div by 512 mode use:
setpci -d 1022:1103 87.b=61
(see the above URL for links to the AMD documentation on the PMM7 register; other values can work).
Complex instructions reduce the overall code size), reducing the need for code cache and RAM. Especially with 64 bit architectures this makes a big difference. Instead of 8 byte RISC instructions, the average instruction size is probably closer to 3 or 4 bytes (not including immediate values, which of course in 80x86 can be smaller than the machine word size). Obviously RISC chips can be designed with small instruction word sizes, and for instance a pretty good RISC instruction set could live in 32 bit words, but then there are extra alignment issues to deal with. Overall, I think the idea of having a compact instruction set wins out over the simplicity of a full RISC design. Not that there aren't things I'd change with 80x86, for instance it would be nice if the next generation of x86-64 chips would support a more RISCy 64-bit mode of execution for pure 64-bit code, allowing developers (or compilers) to make the tradeoff between code size and RISC speed advantages. x86-64 already includes 8 extra registers, so perhaps having another 16 (or 48) available only from a 64-bit RISC mode could help hasten the transition to a saner instruction set.
How can you subtract a unit of time (seconds) from a unit of power (watt) ?
Assuming multiplication was intended instead of subtraction, why use Watt.seconds instead of Joule ? Still, kudos for using SI units and not something like boe.
Flourescent (adj): smelling like ground wheat.
Up here in The Great White North, there is a second important feature (mostly for desktop and deskside systems) -- and that's efficiency as a space heater. When these boxes are running at full bore, how many BTUs do they generate, and how many BTUs/watt do they generate. How many Zeons or K7s would it take to heat the average house?
More importantly, how does that compare to a dedicated space-heater?
Sometimes boldness is in fashion. Sometimes only the brave will be bold.
Cyrix man just flat out is uh.ja98u&^Y)#CN(&n q dang over heating problem againa
The only ones affected are the tape monkeys, and their jobs were replaced by robotics years ago.
Twenty years ago satellite ground stations were dropped off up north with nothing more than a big tank of diesel, a power generator, and a fault-resilient or fault-tolerant server, left alone for months at a time.
With modern high speed networks and VPN access, it's often hard to tell the difference between being at work and remote access, other than the environment. Don't forget how much sysadmin work has been offshored to India and other regions, or how many global operations have geographically distributed locations, with staff at each covering the entire globe's sysadmin functions from different time zones.
Your theoretical idea has been possible for over 10 years.
I do not fail; I succeed at finding out what does not work.
The old Western Electric (A.T. & T.) CPUs (WE31000/WE32000, as in the 3B-series computers) Huffman-coded the instruction set. More-frequently used opcodes were smaller than less-frequently used opcodes, so "instructions/memory word" was denser than typical RISC. Lots of registers, very powerful instructions. The processors did not fetch "instructions", they read cache lines from memory at the next uncached address of instructions.
Unless your chip is very recent, the timestamp counter speeds will vary.
Unless your Linux kernel is very recent, this condition will not be detected automatically. Linux will assume that the discrepency means you are losing clock ticks.
You can try kernel parameters like clocksource=pmtmr to fix it. Good luck, you may need it...
The BIOS vendors disable this power-saving feature because there are Windows games that, like Linux, assume the timestamp counters don't vary in speed.
How did you come to the conclusion that AMD has better chipsets? I can get an nforce/crossfire/via motherboard for either AMD or Intel with pretty much identical specs. Intel has the advantage of making their own chipset, so Intel is the one that has the chipset advantage IMO.