AMD Finally Unveils Barcelona Chip
Justin Oblehelm writes "AMD has finally unveiled its first set of quad-core processors, three months after its original launch date due to its "complicated" design. Barcelona comes in three categories: high-performance, standard-performance and energy-efficient server models, but only the standard (up to 2.0 GHz) and energy-efficient (up to 1.9 GHz) categories will be available at launch. The high-performance Opterons, together with higher frequencies of the standard and energy-efficient chips, are expected in the out in the fourth quarter of this year.
But it's far from clear that this is the product that will help right AMD's ship."
Here's some benchmarking done by Anandtech.
And a performance preview for Barcelona desktop as well.
Full Tilt
Barcelona is a different architecture from K8 (the architecture of the current X2s). It's overclocking performance is currently unknown. Just as Intel's overclocking potential improved as it went from Pentium -> Core 2 Duo, Barcelona may increase or decrease AMD's overclocking potential.
The Techreport also has a review up: http://techreport.com/articles.x/13176/1. Barcelona is similar to Core2, clock for clock. It has better energy efficiency and SMP scaling. But the clock frequencies will need to come up in order to beat Intel's highest clocking chips in absolute performance.
* 2347 - 1,9 GHz, $316
* 2350 - 2,0 GHz, $389
* 8347 - 1,9 GHz, $786
* 8350 - 2,0 GHz, $1019
* 2344 HE - 1,7 GHz, $209
* 2346 HE - 1,8 GHz, $255
* 2347 HE - 1,9 GHz, $377
* 8346 HE - 1,8 GHz, $698
* 8347 HE - 1,9 GHz, $873
This is a direct reference to 65nm vs. 45nm geometry. If AMD brings their quad core to a 45nm process, that should help yield, power and performance. If nothing else, it puts them on a level playing field with Intel (who already have product at 45nm) so that it's down to "design vs. design." Being stuck one silicon technology generation back, they need to resort to other tricks to "keep up."
In other words, to be at overall performance parity with Intel, they have to have a more advanced design in 65nm to keep up with Intel's 45nm work.
Another thing worth noting: By being 1 generation back, the quad core setup is a double whammy. The die area of a given chip roughly halves with each technology node. Not only is AMD putting twice as much on one chip, it's also making chips that are twice the size per transistor. (Remember, to double square area, you only increase your linear feature size by sqrt(2). 65/45 = 1.444... which is about sqrt(2).) Each additional sq mm of die area causes greater yield loss than the one before it (driven by defect density in the source silicon). Doubling die size has a huge impact on yield. So, AMD will potentially suffer significantly higher yield loss, and correspondingly higher costs. Even if it can keep its ASP (average selling price) up, the profit margins will suck.
It'll be interesting to see if AMD can quickly shrink this design to 45nm and get closer to parity. The benefits of the quad core design probably become much more apparent at 45nm.
--JoeProgram Intellivision!
Uh, they are doing this to come closer to Intel's TDP numbers which have been average high use numbers instead of worst case for at least the last couple generation of chips. AMD is actually being much more upfront here by offering both worst case and average case numbers, I hope Intel follows their lead and offers both numbers.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Uh, MSSQL 2005 is a serious enterprise DB, this isn't SQL 7 anymore. Also none of our enterprise software supports PostgreSQL so invalidating our 6 or 7 figure support contracts just isn't an option even if it WOULD work.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
It depends on three things:
1: Whether the software CAN use multiple cores.
2: How efficiently it uses the extra cores.
3: Whether the program is currently limited by cpu power or by something else.
For "1:", if the program can't use the extra cores, then you'll only see a speed improvement from the fact that the cores are 15% more efficient. i.e. A 2GHz one of these quads performs the same as a 2.3GHz (+15%) dual core from the previous generation for applications in this category.
For "2:", if the program can use the extra cores, but not as efficiently as the first, then you'll see a speed increase equivalent to this. e.g., if the program does two tasks at once, one that takes 70 seconds and one that takes 30, then on one core it'll take 100 seconds. On two cores it would do the 70 second task on one core and the 30 second task on the other, reducing the total time to 70 seconds, a ~40% speed improvement.
For "3:", if the application is limited by something other than the cpu, e.g. "how quickly it can pull data from the hard-disk", you will likely see no improvement whatsoever.
In conclusion, depending on what applications you use, you will see anywhere from no improvement up to 2.3x the previous speed (x2 for double the cores and +15% from the improved efficiency).
Note: As these cpus also have an extra instruction set extension, applications that make use of this could exceed the speed improvements I noted above.
Interesting....I've been buying AMD/NVidia for a few years now for the exact same reason....
Intel and AMD are using different production technologies for their dies. For what i know, AMD is using IBMs SOI (Silicon On Insulator) which has much less drain current and therefor is much better at the same size. But it seems also more complicated to shrink this technology to 45nm.
That could help with leakage power, but that doesn't address the yield and cost issues at all.
Program Intellivision!
Ah, my bad, thanks for clearing this up...so that explains Intels ability to suddenly have lower power chips...so it is they that are playing with the numbers this time, interesting :)
To some extent. The Pentium 4 is where this started. The Netburst architecture was very power hungry normally, but it's maximum power was insane. The graph of power consumption vs benchmark had a long "tail", which Intel sought to chop off. See, TDP is a real-life number, since it's used by OEMs and others to design thermal solutions for the parts. If the thermal solution is insufficient, then the parts fail. So it's not actually possible to fudge TDP numbers.
What Intel decided to do was implement an on-chip thermal diode and some logic that halved the effective clock cycle* if the temperature went above a certain threshold. What this meant is that based on how they programmed this logic, they could guarantee that the chip's power consumption would never go above a certain level no matter what code you were running. They had effectively lopped off the long tail. The downside is that if your application does draw more power than the limit, then you'll see vastly reduced performance because of the clock throttling. Most of the time this is transient so it's not that noticeable, but there were benchmarks out there that showed this effect very clearly. Like a certain game benchmark would get lower scores at 640x480 than 1600x1200 because at the lower res the game was cpu bound as was crossing the thermal threshold.
So theoretically with this feature Intel could fudge the numbers however they wanted and claim whatever TDP they desired. In practice they don't have that much flexibility because if they set the bar too low then their effective performance would suck, and their TDP numbers are set at average power + several standard deviations.
The main reason why Intel was able to suddenly have low power chips is because they ditched the Netburst architecture and went back to a design that was more balanced between high clock speeds and high IPC.
They kept the clock throttling logic, though, since it does still give them some benefit in reporting lower TDP numbers. AMD doesn't have this feature, so their TDP is truly the maximum power (as determined by running a "power virus") that you would ever see, even though it's unlikely. Since power has become ever more important as a marketing feature even outside of mobile, I'm not surprised that AMD would decide to start touting expected numbers vs maximum.
* Actually a 50% duty cycle of full speed for some number of microseconds followed by completely off.
The enemies of Democracy are
When you move a multithreaded program to a system with more cores, than any given thread is more likely to get a core to run on when it needs it. Assuming, of course, that you have enough threads so that's an issue.
Shameless plug: I'm the docs lead for this Opeteron-based server, which can have up to 8 CPUs, for a total of 16 cores. When the Barcelona-based CPU modules are ready, customers will be able to upgrade their systems to a maximum of 32 cores. (Don't ask me when this will happen; Marketing would have me killed.) Obviously any software running on such a system has already dealt with the multicore optimization issue.
specfp rate was running faster on pre-barcelona dual core Opterons than on Intel's dual core Woodcrest. The reason is no big secret: specfp is memory bandwidth limited and specfp_rate is specfp's running in parallel. Here is a good anandtech article on the subject.
We already know that AMD has superior memory performance. If you are doing bandwidth-limited floating point, Barcelona is the clear winner.
If you're making a general statement about floating point performance, you're wrong.
Education is a better safeguard of liberty than a standing army.
Edward Everett (1794 - 1865)
I don't understand why everyone always talks about AMD's problems.
Because it doesn't matter how many fronts you are leading on, if you run out of money and can't borrow any more, you lose.
AMD has been running out of money, fortunately they can still borrow. If they don't stop losing money their credit rating will tank and then they will not be able to borrow any more.
THAT is what righting the ship means.
Education is a better safeguard of liberty than a standing army.
Edward Everett (1794 - 1865)
I simply want to use the chip that gives me the greatest floating point throughput I can get.
Define throughput. At some point you need to decide if you are solving equations like LinPack or equations like spec_fp. One causes lots of cache misses and benefits from memory bandwidth, the other does not.
Right now that chip appears to be Barcelona.
Well that's a hypothetical statement based on perception of your needs and their marketing.
I'm not interested with hypothetical arguments
That explains why you're making them (???)
I am looking forward to using Barcelona processors because they will get my mathematical computations done faster.
Hypothetically. Are you going to hypothetically switch when Intel's Penryn with SSE4 comes out? What about Intel's Nehalem?
By the way, check out number 2 and 3 on your top 500 supercomputer list - they're Opterons.
And?? They were designed and built before Core 2 was released. Do you think I'm going to argue they should have used Pentium 4's? Those systems also make solid use of NUMA through a custom Cray crossbar (Seastar), and Intel doesn't have that. If they made them today I see no reason for them not to use Opterons. Do you have a computer with lots of Opterons and a Cray Seastar router on order?
The performance of those systems is measured using LinPack. As I mentioned at the beginning, declaring a 2.0 GHz Barcelona as having faster fp throughput than 3.2 GHz Core 2 depends wholly on which types of calculations you are doing. spec_fp does calculations that are memory bound, LinPack does not (at least not as much). Barcelona's faster fp throughput is not due to markedly superior fp unit (though it may be marginally better) but its onboard memory controller. If you need that sort of thing, great, go with barcelona. If you need raw speed on smaller units (under a couple of megabytes) chances are good that the higher clocked Core 2 with huge cache will win.
Education is a better safeguard of liberty than a standing army.
Edward Everett (1794 - 1865)