Slashdot Mirror


Xeons, Opterons Compared in Power Efficiency

Bender writes "The Tech Report has put Intel's 'Woodcrest' and quad-core 'Clovertown' Xeons up against AMD's Socket F Opterons in a range of applications, including widely multithreaded tests from academic fields like computational fluid dynamics and proteomics. They've also attempted to quantify power efficiency in terms of energy use over over time and energy use per task, with some surprising results." From the article: "On the power efficiency front, we found both Xeons and Opterons to be very good in specific ways. The Opteron 2218 is excellent overall in power efficiency, and I can see why AMD issued its challenge. Yes, we were testing the top speed grade of the Xeon 5100 and 5300 series against the Opteron 2218, but the Opteron ended up drawing much less power at idle than the Xeons ... We've learned that multithreaded execution is another recipe for power-efficient performance, and on that front, the Xeons excel. The eight-core Xeon 5355 system managed to render our multithreaded POV-Ray test scene using the least total energy, even though its peak power consumption was rather high, because it finished the job in about half the time that the four-way systems did. Similarly, the Xeon 5160 used the least energy in completing our multithreaded MyriMatch search, in part because it completed the task so quickly. "

29 of 98 comments (clear)

  1. AMD needs to get back in the game, quick by Salvance · · Score: 4, Insightful

    AMD needs to deliver some real quad core chips (or 8 core chips) that will beat Intel's performance. If they don't soon, AMD will quickly get kicked back to the 2nd rate Intel cloner that everyone knew them prior to their groundbreaking AMD 64s and dual core chips briefly took the performance lead from Intel. I'm keeping my fingers crossed that AMD will deliver, I've always liked (and bought) their chips as long as the performance is similar to Intel.

    --
    Crack - Free with every butt and set of boobs
    1. Re:AMD needs to get back in the game, quick by aminorex · · Score: 2, Insightful

      Evidently you didn't read the review. Intel has serious problems for large scale computing. It does not scale up. It's fine as a thread engine for processing small transactions, but for the kind of problems that people like Google and NCAR are doing -- and it is people like that who drive some very large CPU buys -- the external MMU bites their ass every time. Is the current generation of Opterons a gamer buy? No. AMD probably won't dominate the gamer market until a high-end GPU is integrated on die at 45 nm. Meanwhile, it will eat up Intels server share as the roadmap materializes for quad core. People who buy systems with upgradability in mind are the only real market right now.

      --
      -I like my women like I like my tea: green-
  2. AMD's path by homey+of+my+owney · · Score: 4, Insightful

    AMD needs to do what they have been doing - thinking independently and coming up with original solutions.

  3. Hmm, so which better reflects real-world usage? by pla · · Score: 4, Interesting

    the Opteron ended up drawing much less power at idle than the Xeons
    ...
    the Xeon 5160 used the least energy in completing our multithreaded MyriMatch search, in part because it completed the task so quickly.

    So what does this mean for people shopping for servers?

    If your servers constantly tick along at nearly 100% CPU use, you might do better going with the Xeon system. If your machines basically sit idle most of the time with an occasional spike for a few seconds when it actually does something, the AMD would save you more on electricity.

    Of course, this raises a third possibility - Would running a number of virtual servers on one large Xeon machine waste more energy than it saves, or give a net gain?

    1. Re:Hmm, so which better reflects real-world usage? by archen · · Score: 3, Insightful

      Although some people will pipe in with their number crunching sever stories, are there any normal usage servers that really come in at 100% CPU usage? For the 20 odd servers I run few ever run at that rate for more than 30 minutes a day or so - and usually doing backups for that matter. Other system components often keep you from reaching that target, and most 24-7 servers I've seen do most of their work during a certain period then spend the rest of their time twiddling their thumbs.

    2. Re:Hmm, so which better reflects real-world usage? by Anonymous Coward · · Score: 2, Insightful

      Any server running at that rate for more than a few short peaks a day is under capacity. Ideally, you'd like to keep them at 100% but you don't control scheduling of server demand. It's too ad-hoc. You trend then build enough excess capacity to handle projected peak loads. Of course, this depends on the level of service you want to deliver. Most server "customers" expect the server to be always as responsive as it can be, regardless of load. (expectation of IT is always 100% all the time). So server farms or clusters are built to handle the peak, which typically happens only for short durations. Being able to use machines at full capacity and still maintain enough service overhead would require excess machines that can be brought up quickly.

      What if you had a mixed processors with clustering software savy enough to push jobs onto idling machines. You keep the Xeons humming along at nearly 100% and you push peak loads onto a bunch of idling Opterons? Could Xen be made to do that? Have the cluster optimize for best power efficiency at whatever load. On the scale of an enterprise, the cost savings in power over time would be significant.

    3. Re:Hmm, so which better reflects real-world usage? by ptbarnett · · Score: 2, Interesting
      Although some people will pipe in with their number crunching sever stories, are there any normal usage servers that really come in at 100% CPU usage?

      For capacity planning purposes, most of my clients target 40-50% CPU utilization on servers. If it starts creeping above 60% on a consistent basis (or is forecasted to do so soon), they begin the acquisition process to either upgrade or add servers.

      Queuing theory (M/M/1) shows that while the average response time doesn't increase that much, the standard deviation increases rapidly as utilization grows above 60%. Restated in simpler terms: a larger proportion of response times become significantly larger -- to the point that users start to notice and either complain or go elsewhere.

      Other system components often keep you from reaching that target,

      Yes, system overhead starts to increase rapidly on most systems as you approach 100% CPU utilization. In many cases, total throughput actually decreases above system utilization of about 85-90%.

      most 24-7 servers I've seen do most of their work during a certain period then spend the rest of their time twiddling their thumbs.

      I've looked at usage patterns for a number of systems. Whether they are public (online banking) or internal-use-only, they all seem to have the same pattern: usage peaks about 10:00 AM, with a smaller secondary peak about 1:30 PM. The second peak usually disappears on Friday afternoon.

    4. Re:Hmm, so which better reflects real-world usage? by twiddlingbits · · Score: 2, Interesting

      If I'm do General Purpose computing I would trade the 10W difference in power consumption for the redundancy and flexibility of the 4-way Opteron. With two 4 way boxes you can use one as the failover for the other, or load balance between them keeping low CPU use on each. General purpose computing really doesn't need the power of an 8-way SMP solution even with 1000's of users. You can virtualize either the 4 way or the 8 way with VMWare or Zen or Solaris Containers so that (IMHO) is a wash.

      It's really back to the old Horizontal vs Vertical scaling argument which involves a lot of factors along with power consumption. If floor space in your data center is a premium you probably want the 8ways as you can double your server density per rack (assumes you have the power and cooling). If your servers idle most of the time, space is not an issue and you are at close margins on data center power and cooling the Opteron 4way might be a better choice. There are also cost differences to consider. Opterons are usally priced below Xeons so if the botton line hardware costs are important that pushes to Opterons. You also have to look at the number of HBAs and network connections a 4way and an 8way will support. There are SO many combinations to consider including how much IT growth will occur it is mind boggling! It all depends on the strategic and tactical decisions made by the Data Center Team and the IT Organization, some places are all about performance and some are all about cost and others try to get a knife edge balance. Also keep in mind what you buy today is probably obsolete in 18 months and likely will be replaced in 36-48 months.

      There is also a 3rd Option. If you don't mind running on Sun SPARC equipment then the SPARC T1 based severs blow both options out of the water in terms of power consumption (just don't do a lot of floating point..they suck at that). If you are running Java and other products that have SPARC and Solaris 10 (Linux soon) versions then changing to a SPARC architecture might get some really big gains. However if you are a .NET shop or a Windows server shop you are stuck with the X86 Architecture with Xeon or Opteron.

    5. Re:Hmm, so which better reflects real-world usage? by rbanffy · · Score: 4, Insightful

      Well... If you have a couple servers that idle most of the time, I suggest that, instead of AMD, you buy VMWare.

      Or go Xen, OpenVz or whatever does the trick.

      But, most important, get rid of the idling boxes.

  4. This just in! by gentimjs · · Score: 4, Insightful

    Apples compared to Oranges: Our findings on the page after the banner adds!
    .. nothing to see here, move along...

  5. Conclusions converted to $$$ by ben+there... · · Score: 2, Interesting
    "The eight-core Xeon 5355 system managed to render our multithreaded POV-Ray test scene using the least total energy, even though its peak power consumption was rather high, because it finished the job in about half the time that the four-way systems did. Similarly, the Xeon 5160 used the least energy in completing our multithreaded MyriMatch search, in part because it completed the task so quickly."

    Presumably, the article tests power consumption because businesses are concerned with how much running each of these systems will cost them. If the Xeons managed to win in power consumption because they completed the task in half the time, that has other cost-saving benefits even beyond power consumption. They can use fewer systems to complete tasks within the deadline, complete tasks ahead of schedule (making their business slightly more agile), and/or spend less money on animators waiting for their animations to render.
  6. Re:God, I'm sick of this architecture by gentimjs · · Score: 2, Funny

    /me hugs his ultrasparc system
    Couldnt agree more. Oh wait, something's sending an Int. Req. , cant type have; to see what it wants.....

  7. Re:God, I'm sick of this architecture by ben+there... · · Score: 2, Interesting

    Aren't newer x86 processors essentially CISC that convert the instructions down to RISC? And RISC processors, like G4/G5, that use instruction sets such as Altivec are actually using some aspects of CISC?

    That was my understanding, after reading articles like this one on Ars Technica. If true, it would make fighting over CISC vs. RISC not make a lot of sense.

  8. Re:God, I'm sick of this architecture by Ancil · · Score: 5, Informative
    bizzaro CISC instruction set piece of shite
    I guess you didn't get the memo. Turns out RISC wasn't the good idea everyone thought it would be in the 1990's.

    RISC worked well when speed of memory and CPU's were at parity. The simplified instructions let the CPU be clocked a lot faster, not to mention their shallow pipelines made it less costly when branch prediction failed. The tradeoff was that it usually took more instructions to accomplish a given task.

    But as CPU's have spent more and more time waiting for memory, CISC has really come into its own. Think of CISC as a compression algrorithm: An x86 instruction which fits in 16-32 bits might take 4 or 5 instructions on a RISC processor, weighing in at 96-128 bits. It's no surprise why CISC processors have destroyed RISC in the past decade.
  9. oracle datacenter by chap_hyd · · Score: 4, Informative

    one friend who works for oracle, in their datacenter, told me that they are swaping the dell intel xeon server with Sun AMD Opteron servers. the main reason behind this server swap is power efficiency of the new sun servers. So that means big corps already had their eye on AMD cpus :)

    1. Re:oracle datacenter by aczisny · · Score: 2, Informative
      A basic server, costing about $4k (nothing fancy), running 24x7x365.25 at about 300Watts, will use 18408.6 KWH in one year. At $0.07/KWH, thats $1288.60 per year just to power the box.

      It took me forever to figure out what was wrong with this. I knew your numbers didn't add up but I couldn't put my finger on it until I realized you multiplied out exactly what people say when they mean constant uptime. The problem of course, is that it should be 300(watts)*24(hours/day)x365(days/year) or 24(hours/day)x7(days/week)x52(weeks/year) to get the power used in a year. You end up with 2628 KWH a year. At $0.07/KWH you get $183.96 which is much more reasonable. Not something I'd ignore as a business with hundreds of machines, but not a quart of the cost of the machine itself either.

      As my chemistry teacher always used to tell me, UNITS! It's all about keeping proper track of your units!

      --
      Now, landing thrusters.. landing thrusters, hmm. Now if I were a landing thruster, which one of these would I be?
  10. Best Practices by killmenow · · Score: 5, Insightful

    It has always been my understanding that best practices dictate a server running at a constant 100% CPU utilization is underpowered and needs upgraded. Normal, every day, steady CPU utilization should hover no higher than around 50% (closer to 75%, if you like living on the edge) leaving enough CPU to handle peak loads. Very few functions require a system that maintains a constant CPU utilization and never peaks over it.

  11. Re:God, I'm sick of this architecture by $RANDOMLUSER · · Score: 2, Insightful

    What I'm really referring to here is the extreme non-orthogonality of the ISA and the register set. I'm certainly not a purist when it comes to what individual instructions are allowed to do, but there's a lot to be said for having instructions all be the same width.

    --
    No folly is more costly than the folly of intolerant idealism. - Winston Churchill
  12. Business needs to pay attention by msobkow · · Score: 4, Insightful

    I know of and have worked with too many organizations that figure it's just a matter of slapping all the computers in an air-conditioned room. Every watt of waste heat adds to the A/C bill.

    Old fashioned water-cooled mainframes and big iron (for it's time) often recirculated the wasted heat into the heating systems of the surrounding buildings. We've known all along how to be more energy efficient, if companies and management would only place the emphasis on the environment in their budgets.

    --
    I do not fail; I succeed at finding out what does not work.
  13. Well too bad get used to it by Sycraft-fu · · Score: 2, Interesting

    It's not going anywhere. Intel actually wanted to replace it though it's arguable if their replacement was better or worse but AMD won out the 64-bit round with x86-64. That's what Linux uses, that's what Windows uses, it's a done deal.

    Now personally to me you sound like someone who's spent a little too much time in a computer science architecture class soaking up theories about ISAs and too little time actually looking at how chips are made these days and what works. When you get right down to it, x86 works just fine. The chips built on it are very fast, the compilers are able to generate efficient code for it, it plain works in the real world. You may not like it, but it does work well in the real world.

    Will something like the Cell kill it? Maybe, but forgive me if I'm more than a little skeptical. There's been things that are going to kill x86 for a long time and none of it has panned out. You can try and make your ISA as brilliant as you like, what it really seems to get down to is good chip design for the money, and Intel and AMD are hard to beat at that.

  14. Power = Heat by mungtor · · Score: 2, Insightful

    "If your machines basically sit idle most of the time with an occasional spike for a few seconds when it actually does something, the AMD would save you more on electricity."

    More importantly, I think, is that power consumption translates to heat output. If you have mostly idle servers with occasional spikes, you can either cool them for less or put more in the same space depending on what you need. And don't forget that you actually save money twice with the AMD since you have to pay to power and cool the Xeons.

    Virtualization, if done correctly, should save you more money on hardware than anything else. You load up a Xeon machine with 6 virtual servers and keep it humming at 70% load. Then you're probably putting out less heat than 5 lightly loaded AMD processors. You've saved the money on the extra hardware, and gained a lot of good things about machine portability in the future.

  15. Re:Way to put the conclusion in the article summar by daybot · · Score: 2, Funny

    >I know this is slashdot, but maybe I wanted to RTFA?

    You must be new here...

  16. Re:God, I'm sick of this architecture by Anonymous Coward · · Score: 2, Insightful

    This is foolish. Variable-width instructions provide higher instruction throughput by having lower memory bandwidth requirements and consuming less cache space. You want to code your instructions so that the most-frequently used instructions are as small as possible. This has been an active area of research for tailoring ISAs to workloads, but even an ad-hoc scheme that improves those two areas in the general case is better than none at all.

    This coding is more complicated than fixed-width instructions, but this complexity is less expensive than cache in power, latency, and die space. This isn't to say that x86 ISA is optimal, but it isn't bad-enough to warrant the incessant whining that people bring up every time they discuss ISAs.

  17. HOWTO: save 20W/socket when idle on Opteron or A64 by Splork · · Score: 4, Informative

    See http://electricrain.com/greg/opteron-powersave.txt .

    All AMD K8 (Opteron and Athlon 64) CPUs have the ability to run the clock and an extra slow speed when in HLT (idle) mode saving a bunch more power. Many (most?) BIOSes are not smart enough to enable this. A simple setpci command will turn it on under linux.

    find out if its on:

      setpci -d 1022:1103 87.b

    If that returns 00, its off. To turn on clock-divide-in-hlt to div by 512 mode use:

      setpci -d 1022:1103 87.b=61

    (see the above URL for links to the AMD documentation on the PMM7 register; other values can work).

  18. Re:God, I'm sick of this architecture by Chris+Burke · · Score: 2, Insightful

    It's no surprise why CISC processors have destroyed RISC in the past decade.

    Sorry but CISC, specifically x86 and children, has won simply by being the architecture for which most software was written. The dominance of CISC is similar to (but not the same, trying to stave off an off-topic rant) story as the dominance of Windows -- backward compatability is King.

    The RISC makers knew this too. Back when RISC was the hot new thing in the early 90s, they were touting that RISC would be so much faster than CISC that you could emulate/translate x86 code and run it faster than a native x86 machine. If this had come to pass, then the reason to have, and thus the dominance of, x86 would have ended.

    But it never did come to pass. CISC machines, starting with the Pentium Pro, started to translate CISC instructions into RISC micro-instructions internally, and then used all the benefits that RISC machines got with the main penalty being the complicated decoders on the front-end. Intel could push the performance of their chips, in large part by leveraging the enourmous profits of the lucrative desktop PC business, and thus kept rough parity with RISC machines, often being faster. Since the fundamental performance problem with CISC had been solved, and it still ran all the software, CISC won and RISC lost in the mainstream processor market.

    Now of course there are performance pros and cons to both. While potentially reduced code size is the main advantage of CISC, I don't think it adds up to much. Especially since things like SSE2 instructions have gotten large anyway. The main advantage of RISC is the simpler decoders, and more registers. x86-64 gives more registers, plus with a fast l1 cache stack accesses aren't expensive, and the x86 makers learned a long time ago how to make good super-scalar x86 decoders. In the end the pluses and minuses don't add up to much, and it's more about the specific architectures of each chip. In this sense x86 has done a fine job of keeping performance high.

    It's unfortunate from an aesthetic point of view, because x86 is an ugly beast, but in the end practicality won, and generally there's no practical reason to care any more.

    --

    The enemies of Democracy are
  19. What About Efficiency as a Space Heater by darkonc · · Score: 2, Funny

    Up here in The Great White North, there is a second important feature (mostly for desktop and deskside systems) -- and that's efficiency as a space heater. When these boxes are running at full bore, how many BTUs do they generate, and how many BTUs/watt do they generate. How many Zeons or K7s would it take to heat the average house?
    More importantly, how does that compare to a dedicated space-heater?

    --
    Sometimes boldness is in fashion. Sometimes only the brave will be bold.
    1. Re:What About Efficiency as a Space Heater by Cassini2 · · Score: 2, Insightful

      Computers are almost 100% efficient as space heaters. Almost every watt consumed gets converted to heat.

      The energy in the light radiated from the monitor or from the LEDs in the computer case is very small compared to the energy consumed by the computer. Computers do no useful physical work. The result is that almost all energy consumed by a computer is converted to heat.

    2. Re:What About Efficiency as a Space Heater by frieko · · Score: 2, Informative

      That light you mention ends up as heat too.

  20. Re:God, I'm sick of this architecture by Salamander · · Score: 3, Interesting

    You're forgetting the basic formula from Hennessy and Patterson:

    WorkPerSec = WorkPerInstruction * InstructionsPerCycle * CyclesPerSecond

    Yes, CISC has better work per instruction, except for one glaring issue I'll get to in a moment, but - for various reasons explained throughout H&P - it loses on the other two and thus overall. That's why nobody's making new processors that are CISC internally any more; they just couldn't hit the issue widths and clock speeds are achievable with a RISC core (even if that core has a CISC ISA bolted on the front). What's missing here is that not all work is useful work. As anyone who has accidentally coded an infinite loop knows, executing lots of instructions is not necessarily a good thing. The glaring issue I mentioned earlier is that a lot of the instructions executed on a register-poor architecture like x86 are not doing useful work. Register thrashing means i-cache bandwidth is wasted fetching instructions which are then used to waste d-cache bandwidth, which more than outweighs any advantage from variable-length instructions.

    So, you say, wouldn't variable-length instructions on a register-rich processor be the best of both worlds? Not so fast. A regular instruction set makes superscalar execution easier because it means that multiple instructions can be fetched literally at the same time without having to examine the first one to figure out where the second one begins and so on. It also makes deeper pipelines easier because it allows many internal activities (e.g. register allocation, hazard detection) to start after a simple pre-decode stage, in parallel with the remainder of decode. Either way, regular instruction sets allow for more parallelism - and parallelism in some form is the generally the key to CPU performance. If you're willing to give up performance by eschewing most modern processor-design techniques, which might be the case for a deeply embedded system with extreme size and/or power requirements, then variable-width instructions might still be a reasonable choice. In that case you might as well use an older architecture; there are plenty to choose from. For new processor designs, though, variable-width instructions are almost invariably a way to lose.

    --
    Slashdot - News for Herds. Stuff that Splatters.