Slashdot Mirror


AMD Finally Unveils Barcelona Chip

Justin Oblehelm writes "AMD has finally unveiled its first set of quad-core processors, three months after its original launch date due to its "complicated" design. Barcelona comes in three categories: high-performance, standard-performance and energy-efficient server models, but only the standard (up to 2.0 GHz) and energy-efficient (up to 1.9 GHz) categories will be available at launch. The high-performance Opterons, together with higher frequencies of the standard and energy-efficient chips, are expected in the out in the fourth quarter of this year. But it's far from clear that this is the product that will help right AMD's ship."

32 of 118 comments (clear)

  1. how well will it overclock? by Anonymous Coward · · Score: 2, Interesting

    Since it's essentially the same tech since their X2 design?

    I get 2.7ghz out of a 2.0ghz rated X2 (on air).

    Once again they have beaten Intel's prices by at least $100 so we all win.

    1. Re:how well will it overclock? by ZachPruckowski · · Score: 3, Informative

      Barcelona is a different architecture from K8 (the architecture of the current X2s). It's overclocking performance is currently unknown. Just as Intel's overclocking potential improved as it went from Pentium -> Core 2 Duo, Barcelona may increase or decrease AMD's overclocking potential.

    2. Re:how well will it overclock? by jimstapleton · · Score: 2, Insightful

      Core2 Quad = Desktop

      They are talking about server chips, which typically are more expensive than desktop chips.

      --
      34486853790
      Connection too slow for X forwarding? Try "ssh -CX user@host"
    3. Re:how well will it overclock? by cHiphead · · Score: 2, Funny

      No cuz when you call it a 'server chip' its magically more super and fancy.

      Kinda like an Associates Degree vs. a High School Diploma.

      Cheers. ;)

      --

      This is my sig. There are many like it, but this one is mine.
  2. Benchmarks by eebra82 · · Score: 5, Informative

    Here's some benchmarking done by Anandtech.

    And a performance preview for Barcelona desktop as well.

    1. Re:Benchmarks by ceeam · · Score: 3, Insightful

      Most of benchmarks are on 32-bit code. Can we at least start considering that as "legacy" and use AMD64 when performance really matters?

    2. Re:Benchmarks by 644bd346996 · · Score: 4, Insightful

      Since Barcelona is one of the bigger architectural changes from AMD in the past few years, the 32-bit benchmarks are relevant because they are good predictors of what's to come for the entire product line, including the desktop processors, where 32-bit code dominates. Also, if they used exclusively 64-bit code, they would be accused of using unrealistic benchmarks to highlight the fact that AMD has better 64-bit performance than Intel.

    3. Re:Benchmarks by evilbessie · · Score: 4, Insightful

      It could be argued, however, that these are server and workstation chips and so would be expected to perform mainly 64bit tasks to get the full use out of the performance. So 64bit benchmarks would make more sense. Now when the Phenom chips are out then 32 and 64 bits would be both useful as over the next few years most software will convert to 64bit and drop 32bit.

  3. "Full generation behind"? by Stentapp · · Score: 2, Informative

    "The delay puts the chip maker a full generation behind its archrival in terms of chip manufacturing processes. Intel's quad-core processor, which was launched in November last year, melds two of its duo-core processors into a single package."

    Heh, shouldn't that be "full generation ahead" since AMD manages to put four cores on a single die?

    1. Re:"Full generation behind"? by Mr+Z · · Score: 4, Insightful

      This is a direct reference to 65nm vs. 45nm geometry. If AMD brings their quad core to a 45nm process, that should help yield, power and performance. If nothing else, it puts them on a level playing field with Intel (who already have product at 45nm) so that it's down to "design vs. design." Being stuck one silicon technology generation back, they need to resort to other tricks to "keep up."

      In other words, to be at overall performance parity with Intel, they have to have a more advanced design in 65nm to keep up with Intel's 45nm work.

      Another thing worth noting: By being 1 generation back, the quad core setup is a double whammy. The die area of a given chip roughly halves with each technology node. Not only is AMD putting twice as much on one chip, it's also making chips that are twice the size per transistor. (Remember, to double square area, you only increase your linear feature size by sqrt(2). 65/45 = 1.444... which is about sqrt(2).) Each additional sq mm of die area causes greater yield loss than the one before it (driven by defect density in the source silicon). Doubling die size has a huge impact on yield. So, AMD will potentially suffer significantly higher yield loss, and correspondingly higher costs. Even if it can keep its ASP (average selling price) up, the profit margins will suck.

      It'll be interesting to see if AMD can quickly shrink this design to 45nm and get closer to parity. The benefits of the quad core design probably become much more apparent at 45nm.

      --Joe
    2. Re:"Full generation behind"? by struberg · · Score: 3, Informative

      Intel and AMD are using different production technologies for their dies. For what i know, AMD is using IBMs SOI (Silicon On Insulator) which has much less drain current and therefor is much better at the same size. But it seems also more complicated to shrink this technology to 45nm.

    3. Re:"Full generation behind"? by Mr+Z · · Score: 3, Insightful

      That could help with leakage power, but that doesn't address the yield and cost issues at all.

    4. Re:"Full generation behind"? by asliarun · · Score: 2, Interesting

      The die area of a given chip roughly halves with each technology node. This is not entirely true. Although I agree overall with what you're saying, core logic transistors scale much worse than cache as the manufacturing process decreases in size. I'm not sure if AMD factors this process disadvantage into their chip design, but it is an interesting design choice that they choose to stuff their chip real estate with logic transistors instead of cache. I'm sure that I'm oversimplifying, but I have a gut feeling that they possibly might be choosing to use less cache and more logic precisely because they know they will always be a process node behind Intel, and at least this way, their process disadvantage is somewhat compensated.

      Interestingly enough, Intel has traditionally adopted the exact opposite chip design strategy. IMHO, Intel's design ethos is first driven by manufacturing, and only secondly by pure design. Of course, they have every right to do so as they've consistently led the industry in process and manufacturing technology. However, this sometimes teeters into arrogance, and they have tended to fix a shoddy design by throwing cache at it, and/or relying on a die shrink (which also fabulously shrinks cache!).

      This process/cache luxury, compounded by bureaucracy can tend to make Intel come up with conservative designs. I'm not even going to talk about P4 or Prescott as it has been beaten to death and beyond. However, Justin Rattner recently hinted the same thing as well when he encouraged his research teams to come up with bolder and even impractical designs, and not start thinking about commercial viability so early on in the research/design stage. AMD, OTOH, simply cannot afford this luxure as they're usually getting whipped by Intel manufacturing AND by Intel marketing muscle, and are usually in a "do or die" mode. This usually makes them come up with riskier or bolder design.

      Having said that, Core2 is a superb architecture, and in my opinion, will be neck-to-neck with even Barcelona (win some benchmarks, lose some benchmarks). Its only in the server space that AMD will have two distinct advantages: Hypertransport for scalability, and DDR2 instead of FBD for power consumption. Sigh... if only Intel had not scrapped Whitefield. I guess it would have released by now.. and that too with a native quadcore design and CSI. Look at the Tigerton hack-job for a contrast... pathetic (but of course, easier to manufacture).
    5. Re:"Full generation behind"? by Frumious+Wombat · · Score: 2, Interesting

      Don't knock "easier to manufacture". The Cray3 and many other interesting designs failed because yields of some critical part never reached commercial viability. My first opteron servers (right out of the gate from a major vendor) had several failures, all due to the onboard memory controller frying. A little slower but fewer defects results in fewer recalls and less bad press.

      --
      the more accurate the calculations became, the more the concepts tended to vanish into thin air. R. S. Mulliken
  4. Techreport by Eukariote · · Score: 5, Informative

    The Techreport also has a review up: http://techreport.com/articles.x/13176/1. Barcelona is similar to Core2, clock for clock. It has better energy efficiency and SMP scaling. But the clock frequencies will need to come up in order to beat Intel's highest clocking chips in absolute performance.

    1. Re:Techreport by tietokone-olmi · · Score: 3, Insightful

      AMD's had I/O performance and memory latency advantages on Intel even before Barcelona though. I suppose Intel will be in even more serious trouble than before in the server space, until it can get its next-generation bus thingy (CSI they called it?) up and running in a year or three. Until then, Intel's stuck in a SMP scaling black hole... and I don't really see Intel coming out with integrated memory controllers and native NUMA like AMD did with their whiz-bang DEC Alpha engineers.

      Once Barcelona ramps up, Intel's going to be hard pressed to come up with an advantage besides clock speed for the C2 microarchitecture, given that Barcelona finally ups the SSE units to proper 128-bit wide computation; i.e. none of that splitting of SSE operations into pieces that are executed 2 pairs of operands at a time.

      Remember, high-performance floating point is not the mainstream workload that determines the success or failure of a microarchitecture. (Though it is one of the sexier ones.) So no yammering about "absolute performance" there; AMD's previous-gen offerings were crazy fast before the C2D and aren't half bad even after C2D.

  5. Re:I'm curious by llirik · · Score: 3, Informative

    * 2347 - 1,9 GHz, $316
    * 2350 - 2,0 GHz, $389
    * 8347 - 1,9 GHz, $786
    * 8350 - 2,0 GHz, $1019
    * 2344 HE - 1,7 GHz, $209
    * 2346 HE - 1,8 GHz, $255
    * 2347 HE - 1,9 GHz, $377
    * 8346 HE - 1,8 GHz, $698
    * 8347 HE - 1,9 GHz, $873

  6. Cool by afidel · · Score: 2, Interesting

    Literally. I can't wait to get in our first DL585 G2 with 4 of these beasties and 64GB of ram. The only regret I have is that we probably won't use em for DB servers because of Oracle's asinine policy of charging per core, sometimes I wish we had gone SQL2005 for more stuff as it is going to scale better with improving hardware. Then again maybe the proliferation of quad core (and above) server cpu's will make Oracle rethink their pricing policy again. I hope they go to what the rest of the industry is doing and license per socket.

    --
    There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    1. Re:Cool by afidel · · Score: 2, Informative

      You are charged per core and can only go below the number of physical cores in the machine if the architecture has hard partitioning of resources, for instance a zone with hard resource limits is acceptable but a container with soft limits is not (well, it is but you need licenses for the max possible resources the container has access to).

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
  7. Re:Not another fake number AMD! by afidel · · Score: 4, Insightful

    Uh, they are doing this to come closer to Intel's TDP numbers which have been average high use numbers instead of worst case for at least the last couple generation of chips. AMD is actually being much more upfront here by offering both worst case and average case numbers, I hope Intel follows their lead and offers both numbers.

    --
    There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
  8. I've been buying Intel/Nvidia . . . by WhiteWolf666 · · Score: 2, Interesting

    For a few years now, as that was the only platform that really, reliably ran Linux.

    Intel's been good to us Linux folk, and Nvidia has been easy enough to deal with.

    If AMD comes out with an end-to-end Linux solution, CPU, GPU, and a good Linux-friendly partner for chipset, I'll seriously consider switching back to AMD parts.

    --
    WhiteWolf666 an exBush supporter. All you new-school,compassionate,save the children Republicans can rot in hell
    1. Re:I've been buying Intel/Nvidia . . . by Anonymous Coward · · Score: 4, Interesting

      Interesting....I've been buying AMD/NVidia for a few years now for the exact same reason....

  9. Re:If you considered using MSSQL by afidel · · Score: 3, Insightful

    Uh, MSSQL 2005 is a serious enterprise DB, this isn't SQL 7 anymore. Also none of our enterprise software supports PostgreSQL so invalidating our 6 or 7 figure support contracts just isn't an option even if it WOULD work.

    --
    There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
  10. Re:question for the local geniuses... by TheThiefMaster · · Score: 3, Informative

    It depends on three things:
    1: Whether the software CAN use multiple cores.
    2: How efficiently it uses the extra cores.
    3: Whether the program is currently limited by cpu power or by something else.

    For "1:", if the program can't use the extra cores, then you'll only see a speed improvement from the fact that the cores are 15% more efficient. i.e. A 2GHz one of these quads performs the same as a 2.3GHz (+15%) dual core from the previous generation for applications in this category.

    For "2:", if the program can use the extra cores, but not as efficiently as the first, then you'll see a speed increase equivalent to this. e.g., if the program does two tasks at once, one that takes 70 seconds and one that takes 30, then on one core it'll take 100 seconds. On two cores it would do the 70 second task on one core and the 30 second task on the other, reducing the total time to 70 seconds, a ~40% speed improvement.

    For "3:", if the application is limited by something other than the cpu, e.g. "how quickly it can pull data from the hard-disk", you will likely see no improvement whatsoever.

    In conclusion, depending on what applications you use, you will see anywhere from no improvement up to 2.3x the previous speed (x2 for double the cores and +15% from the improved efficiency).

    Note: As these cpus also have an extra instruction set extension, applications that make use of this could exceed the speed improvements I noted above.

  11. More Barcelona by bigwophh · · Score: 2, Informative

    Specs of the entire new Barcelons line-up, more details, and pricing are available here as well:

    http://www.hothardware.com/Articles/AMD_Barcelona_Architecture_Launch_Native_QuadCore

  12. Re:Not another fake number AMD! by Chris+Burke · · Score: 3, Informative

    Ah, my bad, thanks for clearing this up...so that explains Intels ability to suddenly have lower power chips...so it is they that are playing with the numbers this time, interesting :)

    To some extent. The Pentium 4 is where this started. The Netburst architecture was very power hungry normally, but it's maximum power was insane. The graph of power consumption vs benchmark had a long "tail", which Intel sought to chop off. See, TDP is a real-life number, since it's used by OEMs and others to design thermal solutions for the parts. If the thermal solution is insufficient, then the parts fail. So it's not actually possible to fudge TDP numbers.

    What Intel decided to do was implement an on-chip thermal diode and some logic that halved the effective clock cycle* if the temperature went above a certain threshold. What this meant is that based on how they programmed this logic, they could guarantee that the chip's power consumption would never go above a certain level no matter what code you were running. They had effectively lopped off the long tail. The downside is that if your application does draw more power than the limit, then you'll see vastly reduced performance because of the clock throttling. Most of the time this is transient so it's not that noticeable, but there were benchmarks out there that showed this effect very clearly. Like a certain game benchmark would get lower scores at 640x480 than 1600x1200 because at the lower res the game was cpu bound as was crossing the thermal threshold.

    So theoretically with this feature Intel could fudge the numbers however they wanted and claim whatever TDP they desired. In practice they don't have that much flexibility because if they set the bar too low then their effective performance would suck, and their TDP numbers are set at average power + several standard deviations.

    The main reason why Intel was able to suddenly have low power chips is because they ditched the Netburst architecture and went back to a design that was more balanced between high clock speeds and high IPC.

    They kept the clock throttling logic, though, since it does still give them some benefit in reporting lower TDP numbers. AMD doesn't have this feature, so their TDP is truly the maximum power (as determined by running a "power virus") that you would ever see, even though it's unlikely. Since power has become ever more important as a marketing feature even outside of mobile, I'm not surprised that AMD would decide to start touting expected numbers vs maximum.

    * Actually a 50% duty cycle of full speed for some number of microseconds followed by completely off.

    --

    The enemies of Democracy are
  13. Re:question for the local geniuses... by fm6 · · Score: 3, Insightful

    Doesn't the software have to be optimised for multiprocessors?
    Well, it has to be multithreaded. Thing is, a lot of software is multithreaded already; even on a single-core system, it makes sense to distribute functionality among multiple threads so that resources are used efficiently. On server systems (which is where Opterons are mostly used) software pretty much has to be multithreaded — you don't want all your other clients hanging when one client is waiting on a resource. A web server is a classic example.

    When you move a multithreaded program to a system with more cores, than any given thread is more likely to get a core to run on when it needs it. Assuming, of course, that you have enough threads so that's an issue.

    Shameless plug: I'm the docs lead for this Opeteron-based server, which can have up to 8 CPUs, for a total of 16 cores. When the Barcelona-based CPU modules are ready, customers will be able to upgrade their systems to a maximum of 32 cores. (Don't ask me when this will happen; Marketing would have me killed.) Obviously any software running on such a system has already dealt with the multicore optimization issue.
  14. Re:Clock for clock Barcelona is faster than Clover by Wavicle · · Score: 3, Insightful

    specfp rate was running faster on pre-barcelona dual core Opterons than on Intel's dual core Woodcrest. The reason is no big secret: specfp is memory bandwidth limited and specfp_rate is specfp's running in parallel. Here is a good anandtech article on the subject.

    We already know that AMD has superior memory performance. If you are doing bandwidth-limited floating point, Barcelona is the clear winner.

    If you're making a general statement about floating point performance, you're wrong.

    --
    Education is a better safeguard of liberty than a standing army.
    Edward Everett (1794 - 1865)
  15. Re:haha, oh man, charging per core is a break by Chandon+Seldon · · Score: 2, Insightful

    I've never used it but Oracle is either one hell of a database, or one hell of a brand for people to put up with tactics like that. That shouldn't even be legal.

    Oracle is an amazingly powerful brand and managers think that "scalability" is something you buy rather than an engineering problem for programmers and system architects to solve. That's really the whole story. Given what servers cost and the actual performance differences between different database software given appropriately written client software, purchasing Oracle licenses is largely inexcusable unless you have existing Oracle dependent software and no time to switch databases and re-address scaling related design questions.

    --
    -- The act of censorship is always worse than whatever is being censored. Always.
  16. Re:"right AMD's Ship" ? by Wavicle · · Score: 4, Insightful

    I don't understand why everyone always talks about AMD's problems.

    Because it doesn't matter how many fronts you are leading on, if you run out of money and can't borrow any more, you lose.

    AMD has been running out of money, fortunately they can still borrow. If they don't stop losing money their credit rating will tank and then they will not be able to borrow any more.

    THAT is what righting the ship means.

    --
    Education is a better safeguard of liberty than a standing army.
    Edward Everett (1794 - 1865)
  17. Re:Clock for clock Barcelona is faster than Clover by Wavicle · · Score: 2, Informative

    When only measuring single core performance, clock for clock, Barcelona is on par with Cloverton.

    Unfortunately processors are not generally sold "clock for clock." If you're on par clock for clock, but the other guy is clocked more than 50% faster than you... that could be trouble.

    What good is an Intel chip that has fast floating point but the bus cannot feed it data fast enough?

    Plenty good if the data can fit in cache, in which case the unit can be fed fast enough. For instance, say you're running LinPack. But then, who uses LinPack as a benchmark?

    --
    Education is a better safeguard of liberty than a standing army.
    Edward Everett (1794 - 1865)
  18. Re:Clock for clock Barcelona is faster than Clover by Wavicle · · Score: 3, Informative

    I simply want to use the chip that gives me the greatest floating point throughput I can get.

    Define throughput. At some point you need to decide if you are solving equations like LinPack or equations like spec_fp. One causes lots of cache misses and benefits from memory bandwidth, the other does not.

    Right now that chip appears to be Barcelona.

    Well that's a hypothetical statement based on perception of your needs and their marketing.

    I'm not interested with hypothetical arguments

    That explains why you're making them (???)

    I am looking forward to using Barcelona processors because they will get my mathematical computations done faster.

    Hypothetically. Are you going to hypothetically switch when Intel's Penryn with SSE4 comes out? What about Intel's Nehalem?

    By the way, check out number 2 and 3 on your top 500 supercomputer list - they're Opterons.

    And?? They were designed and built before Core 2 was released. Do you think I'm going to argue they should have used Pentium 4's? Those systems also make solid use of NUMA through a custom Cray crossbar (Seastar), and Intel doesn't have that. If they made them today I see no reason for them not to use Opterons. Do you have a computer with lots of Opterons and a Cray Seastar router on order?

    The performance of those systems is measured using LinPack. As I mentioned at the beginning, declaring a 2.0 GHz Barcelona as having faster fp throughput than 3.2 GHz Core 2 depends wholly on which types of calculations you are doing. spec_fp does calculations that are memory bound, LinPack does not (at least not as much). Barcelona's faster fp throughput is not due to markedly superior fp unit (though it may be marginally better) but its onboard memory controller. If you need that sort of thing, great, go with barcelona. If you need raw speed on smaller units (under a couple of megabytes) chances are good that the higher clocked Core 2 with huge cache will win.

    --
    Education is a better safeguard of liberty than a standing army.
    Edward Everett (1794 - 1865)