AMD Finally Unveils Barcelona Chip
Justin Oblehelm writes "AMD has finally unveiled its first set of quad-core processors, three months after its original launch date due to its "complicated" design. Barcelona comes in three categories: high-performance, standard-performance and energy-efficient server models, but only the standard (up to 2.0 GHz) and energy-efficient (up to 1.9 GHz) categories will be available at launch. The high-performance Opterons, together with higher frequencies of the standard and energy-efficient chips, are expected in the out in the fourth quarter of this year.
But it's far from clear that this is the product that will help right AMD's ship."
Since it's essentially the same tech since their X2 design?
I get 2.7ghz out of a 2.0ghz rated X2 (on air).
Once again they have beaten Intel's prices by at least $100 so we all win.
Here's some benchmarking done by Anandtech.
And a performance preview for Barcelona desktop as well.
Full Tilt
"The delay puts the chip maker a full generation behind its archrival in terms of chip manufacturing processes. Intel's quad-core processor, which was launched in November last year, melds two of its duo-core processors into a single package."
Heh, shouldn't that be "full generation ahead" since AMD manages to put four cores on a single die?
The Techreport also has a review up: http://techreport.com/articles.x/13176/1. Barcelona is similar to Core2, clock for clock. It has better energy efficiency and SMP scaling. But the clock frequencies will need to come up in order to beat Intel's highest clocking chips in absolute performance.
* 2347 - 1,9 GHz, $316
* 2350 - 2,0 GHz, $389
* 8347 - 1,9 GHz, $786
* 8350 - 2,0 GHz, $1019
* 2344 HE - 1,7 GHz, $209
* 2346 HE - 1,8 GHz, $255
* 2347 HE - 1,9 GHz, $377
* 8346 HE - 1,8 GHz, $698
* 8347 HE - 1,9 GHz, $873
Literally. I can't wait to get in our first DL585 G2 with 4 of these beasties and 64GB of ram. The only regret I have is that we probably won't use em for DB servers because of Oracle's asinine policy of charging per core, sometimes I wish we had gone SQL2005 for more stuff as it is going to scale better with improving hardware. Then again maybe the proliferation of quad core (and above) server cpu's will make Oracle rethink their pricing policy again. I hope they go to what the rest of the industry is doing and license per socket.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Uh, they are doing this to come closer to Intel's TDP numbers which have been average high use numbers instead of worst case for at least the last couple generation of chips. AMD is actually being much more upfront here by offering both worst case and average case numbers, I hope Intel follows their lead and offers both numbers.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
For a few years now, as that was the only platform that really, reliably ran Linux.
Intel's been good to us Linux folk, and Nvidia has been easy enough to deal with.
If AMD comes out with an end-to-end Linux solution, CPU, GPU, and a good Linux-friendly partner for chipset, I'll seriously consider switching back to AMD parts.
WhiteWolf666 an exBush supporter. All you new-school,compassionate,save the children Republicans can rot in hell
Uh, MSSQL 2005 is a serious enterprise DB, this isn't SQL 7 anymore. Also none of our enterprise software supports PostgreSQL so invalidating our 6 or 7 figure support contracts just isn't an option even if it WOULD work.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
It depends on three things:
1: Whether the software CAN use multiple cores.
2: How efficiently it uses the extra cores.
3: Whether the program is currently limited by cpu power or by something else.
For "1:", if the program can't use the extra cores, then you'll only see a speed improvement from the fact that the cores are 15% more efficient. i.e. A 2GHz one of these quads performs the same as a 2.3GHz (+15%) dual core from the previous generation for applications in this category.
For "2:", if the program can use the extra cores, but not as efficiently as the first, then you'll see a speed increase equivalent to this. e.g., if the program does two tasks at once, one that takes 70 seconds and one that takes 30, then on one core it'll take 100 seconds. On two cores it would do the 70 second task on one core and the 30 second task on the other, reducing the total time to 70 seconds, a ~40% speed improvement.
For "3:", if the application is limited by something other than the cpu, e.g. "how quickly it can pull data from the hard-disk", you will likely see no improvement whatsoever.
In conclusion, depending on what applications you use, you will see anywhere from no improvement up to 2.3x the previous speed (x2 for double the cores and +15% from the improved efficiency).
Note: As these cpus also have an extra instruction set extension, applications that make use of this could exceed the speed improvements I noted above.
Specs of the entire new Barcelons line-up, more details, and pricing are available here as well:
http://www.hothardware.com/Articles/AMD_Barcelona_Architecture_Launch_Native_QuadCore
Ah, my bad, thanks for clearing this up...so that explains Intels ability to suddenly have lower power chips...so it is they that are playing with the numbers this time, interesting :)
To some extent. The Pentium 4 is where this started. The Netburst architecture was very power hungry normally, but it's maximum power was insane. The graph of power consumption vs benchmark had a long "tail", which Intel sought to chop off. See, TDP is a real-life number, since it's used by OEMs and others to design thermal solutions for the parts. If the thermal solution is insufficient, then the parts fail. So it's not actually possible to fudge TDP numbers.
What Intel decided to do was implement an on-chip thermal diode and some logic that halved the effective clock cycle* if the temperature went above a certain threshold. What this meant is that based on how they programmed this logic, they could guarantee that the chip's power consumption would never go above a certain level no matter what code you were running. They had effectively lopped off the long tail. The downside is that if your application does draw more power than the limit, then you'll see vastly reduced performance because of the clock throttling. Most of the time this is transient so it's not that noticeable, but there were benchmarks out there that showed this effect very clearly. Like a certain game benchmark would get lower scores at 640x480 than 1600x1200 because at the lower res the game was cpu bound as was crossing the thermal threshold.
So theoretically with this feature Intel could fudge the numbers however they wanted and claim whatever TDP they desired. In practice they don't have that much flexibility because if they set the bar too low then their effective performance would suck, and their TDP numbers are set at average power + several standard deviations.
The main reason why Intel was able to suddenly have low power chips is because they ditched the Netburst architecture and went back to a design that was more balanced between high clock speeds and high IPC.
They kept the clock throttling logic, though, since it does still give them some benefit in reporting lower TDP numbers. AMD doesn't have this feature, so their TDP is truly the maximum power (as determined by running a "power virus") that you would ever see, even though it's unlikely. Since power has become ever more important as a marketing feature even outside of mobile, I'm not surprised that AMD would decide to start touting expected numbers vs maximum.
* Actually a 50% duty cycle of full speed for some number of microseconds followed by completely off.
The enemies of Democracy are
When you move a multithreaded program to a system with more cores, than any given thread is more likely to get a core to run on when it needs it. Assuming, of course, that you have enough threads so that's an issue.
Shameless plug: I'm the docs lead for this Opeteron-based server, which can have up to 8 CPUs, for a total of 16 cores. When the Barcelona-based CPU modules are ready, customers will be able to upgrade their systems to a maximum of 32 cores. (Don't ask me when this will happen; Marketing would have me killed.) Obviously any software running on such a system has already dealt with the multicore optimization issue.
specfp rate was running faster on pre-barcelona dual core Opterons than on Intel's dual core Woodcrest. The reason is no big secret: specfp is memory bandwidth limited and specfp_rate is specfp's running in parallel. Here is a good anandtech article on the subject.
We already know that AMD has superior memory performance. If you are doing bandwidth-limited floating point, Barcelona is the clear winner.
If you're making a general statement about floating point performance, you're wrong.
Education is a better safeguard of liberty than a standing army.
Edward Everett (1794 - 1865)
Oracle is an amazingly powerful brand and managers think that "scalability" is something you buy rather than an engineering problem for programmers and system architects to solve. That's really the whole story. Given what servers cost and the actual performance differences between different database software given appropriately written client software, purchasing Oracle licenses is largely inexcusable unless you have existing Oracle dependent software and no time to switch databases and re-address scaling related design questions.
-- The act of censorship is always worse than whatever is being censored. Always.
I don't understand why everyone always talks about AMD's problems.
Because it doesn't matter how many fronts you are leading on, if you run out of money and can't borrow any more, you lose.
AMD has been running out of money, fortunately they can still borrow. If they don't stop losing money their credit rating will tank and then they will not be able to borrow any more.
THAT is what righting the ship means.
Education is a better safeguard of liberty than a standing army.
Edward Everett (1794 - 1865)
When only measuring single core performance, clock for clock, Barcelona is on par with Cloverton.
Unfortunately processors are not generally sold "clock for clock." If you're on par clock for clock, but the other guy is clocked more than 50% faster than you... that could be trouble.
What good is an Intel chip that has fast floating point but the bus cannot feed it data fast enough?
Plenty good if the data can fit in cache, in which case the unit can be fed fast enough. For instance, say you're running LinPack. But then, who uses LinPack as a benchmark?
Education is a better safeguard of liberty than a standing army.
Edward Everett (1794 - 1865)
I simply want to use the chip that gives me the greatest floating point throughput I can get.
Define throughput. At some point you need to decide if you are solving equations like LinPack or equations like spec_fp. One causes lots of cache misses and benefits from memory bandwidth, the other does not.
Right now that chip appears to be Barcelona.
Well that's a hypothetical statement based on perception of your needs and their marketing.
I'm not interested with hypothetical arguments
That explains why you're making them (???)
I am looking forward to using Barcelona processors because they will get my mathematical computations done faster.
Hypothetically. Are you going to hypothetically switch when Intel's Penryn with SSE4 comes out? What about Intel's Nehalem?
By the way, check out number 2 and 3 on your top 500 supercomputer list - they're Opterons.
And?? They were designed and built before Core 2 was released. Do you think I'm going to argue they should have used Pentium 4's? Those systems also make solid use of NUMA through a custom Cray crossbar (Seastar), and Intel doesn't have that. If they made them today I see no reason for them not to use Opterons. Do you have a computer with lots of Opterons and a Cray Seastar router on order?
The performance of those systems is measured using LinPack. As I mentioned at the beginning, declaring a 2.0 GHz Barcelona as having faster fp throughput than 3.2 GHz Core 2 depends wholly on which types of calculations you are doing. spec_fp does calculations that are memory bound, LinPack does not (at least not as much). Barcelona's faster fp throughput is not due to markedly superior fp unit (though it may be marginally better) but its onboard memory controller. If you need that sort of thing, great, go with barcelona. If you need raw speed on smaller units (under a couple of megabytes) chances are good that the higher clocked Core 2 with huge cache will win.
Education is a better safeguard of liberty than a standing army.
Edward Everett (1794 - 1865)