First 16-Core Opteron Chips Arrive From AMD
angry tapir writes "After a brief delay and more than a year of chatter, Advanced Micro Devices has announced the availability of its first 16-core Opteron server chips, which pack the largest number of cores available on x86 chips today. The new Opteron 6200 chips, code-named Interlagos, are 25 per cent to 30 per cent faster than their predecessors, the 12-core Opteron 6100 chips, according to AMD."
So... how do these compare to the new Sandy Bridge chips Intel announced on the same day? There must be some overlap of the target market - whether to buy a quad-socket Intel server or dual-socket AMD one, for example.
-- Ed Avis ed@membled.com
The "cores" in Bulldozer are not your typical first-class x86 core. Bulldozer "cores" are worth 2/3 of a modern x86 core. The 6200 is more like a 10 core. Add to that the crappy IPC and I'm not impressed.
I was excited about Bulldozer before it was released. It's not often that CPU makers take chances on radical new architectures. Too bad this one turned out to be a huge pile of fail.
"Liechtenstein is the world's largest producer of sausage casings, potassium storage units, and false teeth."
Pfft, how much harder can it be to design one with 32 :)
Design? Easy.
Manufacture? Tricky.
Make work? Trickier.
To read about? Interesting.
A feeling of having made the same mistake before: Deja Foobar
I so much want some real competition for Intel. Competition that doesn't artificially limit clock speeds and fuse off perfectly good working features in order to market a dozen overlapping and conflicting SKUs at a dozen different price points. And working drivers, current standards (DirectX 11 and OpenCL for starters), and USB-3 that doesn't require a $50 cable between every device would be nice.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
There will be server versions as well...I've seen specs (publicly available) for an 8-core (16-thread) sandy bridge EP with a 95W TDP. I suspect it's clocked a bit lower and maybe binned for efficiency.
Maybe...
It'll be interesting. Most server applications are integer-only and never touch the floating point units. That should mean that Bulldozer designs work close to the full core count in contrast to the poor benchmarking results it puts out in Photoshop filters and video encode.
This isn't the point. You get 16 cores (slowish compared to top of the line, they may be) that will fit in a single socket on a single motherboard, with a single power supply. This is a *huge* cost saving for machines that it makes sense to use them in...servers, where single core performance is relatively stupid to consider.
1: You can buy your new sandy bridge from newegg or such right now, while those new bulldozers are nowhere to be found.
2: Overclocking any chip is bound to require a lot more power than the TDP no matter which one you are using.
3: Dozer's core, as you said, feel like they are dozing on the job..
They both have the same issues, including that each module (two 4-issue cores) has a single 4-instruction decoder in front of it. Cache latency is also likely to be similar if not the same.
Servers need single-thread too; think stuff like big database writes, joins, ERP, and CRM. Think outside the embarrassingly-parallel web-serving box.
If multithreaded performance was all that matters, the Sun Niagara chips would have done a lot better than they did.
that must be why 3 supercomputers with dozer opterons have been ordered in the past 3 weeks.
Read radical news here
No, 8 integer cores per chip, but 4 actual real cores. For a total of 8 cores across 2 chips.
Pic related: amd vs intel decision making.
Umm, Joins can be done in parallel, in lots and lots of cases. ERP and CRP are applications that ought to see big improvements form more cores, if you have more than a few users anyway. It also simplifies things, you don't have to figure out how to architect the thing to run across 10 hosts anymore, good multi-core systems deliver there performance these apps need if you can get the disk IO solved. A good SAN with mutlipath support and multiple HBAs can get there.
Niagara failed because each individual core was too slow, a comparable cost Intel CPU could do in serial with one core two jobs, in less time than Niagara could do one job with on core. The question is here for most paralleled work loads like a database where all cores will be used are AMDs 16 core chips at least 62% the speed of Intel's 10 core chips on core vs. core basis? If true other things being equal for *some* work loads these Opterons will be better.
Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
The problem is, while this is true, bulldozer also suffers from being a fairly crappy arch design compared to sandy bridge. The result is that AMD's 8 "core" bulldozer is only roughly as fast as intel's 4 core i5 without hyperthreading. Extrapolate this to bolting two 8 "core" bulldozers together and you get to... well, that would only be about as fast as an 8 core sandy bridge with no hyperthreading, or a 6 core with hyperthreading. Given that Intel is selling 6 core E5 Xeons with hyperthreading for less than the $1000 AMD is asking for this, that really isn't boding well is it. This of course is then forgetting that this Bulldozer is very underclocked to keep power consumption down. This really doesn't look promising for AMD.
Eh, how about this:
Intel: I know, let's try to see just how many features/cores/cache we can fuse off in our dies and different socket combinations to try to make *puts pinky finger to mouth* one MILLION SKUs! Oh, and while we're at it, let's add a FOURTH memory channel, because more is better! Sure, we could get all the bandwidth we need with two DDR3-1866 or -2133 channels and that you really only get about three channels' worth of bandwidth because we have to clock the IMC down to DDR3-1333 with two modules per channel- but we still have FOUR channels! Oh, and we forgot, it's the start of a new quarter so we need to release a new socket. Can't let those socket suppliers get lazy making last quarter's socket design. What, you guys want us to release Sandy Bridge-based Xeon MPs because MP platforms actually need that much bandwidth and core count? We just released the Westmere-based ones a few months ago! Don'tcha know that Xeon MPs run two years behind everything else? Geez, what did you do, wake up yesterday? Next you'll want us to stop crippling our chips, stop using a new socket every other month or something ridiculous like that. Where do you guys get those ideas?
AMD: Based on market analysis, most server applications use primarily integer code and require a lot of bandwidth, memory capacity, and a high core count. We don't have over a hundred billion dollars in market cap to fund several parallel R&D teams to design a specific CPU for every edge use case, so we will design a CPU that is highly modular, has good integer performance (because that's what our research indicated most server apps are), and has a lot of cores. Experience with Intel's HyperThreading is less than stellar with regards to predictable performance, so we will use our CMT approach that leads to better integer performance than HyperThreading but doesn't increase the die size by a huge amount, since we can't afford to make 400-600 mm^2 dies like Intel does to have a lot of physical cores. Oh, and we'll continue to use the existing server platforms out there so our customers can drop-in upgrade and we'll also not change any feature sets in the SKU stack other than the clock speed and number of enabled modules and their associated caches. We do apologize for being "late" with these parts since we usually release server and client at about the same time...
Just "gittin-r-done," day after day.
I just got a fancy 8 core T7500 Dell workstation and only one of my compilers actually takes advantage of the multiple cores when it is compiling. As a result this expensive desktop is only 15% faster in terms of time to compile than the 4 year old PC it replaced (the new PC has twice the ram as the old though which may account for some of that speed increase). I am seriously unimpressed with all these cores. Maybe they are useful for something, but I've not found anything that I do that shows significant improvement. Putting my development projects on a SSD did much more for my work flow performance than this fancy new computer, that is for certain.
No, there are 16 integer pipelines with one scheduler and 4 logic units each, 16 128bit floating point units that can also be combined into 8 256bit units, and 8 fetch/decode units. This is not a MCU, it's one chip with the above mentioned components. Whether it's 16 cores or 8 or 4 modules is kind of academic unless you are trying to optimize a scheduler for it in which case the label's still don't matter, only the actual implementation and achievable performance matter.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
The basic point is that it has a total of 8 instruction fetch units, it has a total of 8 instruction decode units that they feed, and it has a total of 8 chunks of L2 cache. The fact that each of these 8 cores has 2 integer units on it is neither here nor there –hell, for years cores have had several floating point units on them, it didn't make them more than one core. Not only that, but this CPU behaves badly when the scheduler treats it as 16 cores instead of 8. The bottom line is that this chip in every single way behaves like an 8 core CPU, more so, it's slower than intel's 8 core CPUs at a similar clock even with hyper threading disabled.
What are you basing this on? As someone that runs both database and web servers using both AMD and Intel I find your conclusions to be completely counter to my experience and to the experience of almost everyone I know that does virtualized infrastructure.
I ran into a number of problems when I first tried to deploy them because SQL 2005 wouldn't install on it. SQL 2008 runs just great with 24 cores as they were dual processor 12 core servers. I have no reason to think the 16 cores variants would be much different.
and many, many, moooreeee
-mainconcept http://www.lostcircuits.com/mambo//i...&limitstart=17
-mediashow http://www.guru3d.com/article/amd-fx...ssor-review/14
-h.264 http://www.guru3d.com/article/amd-fx...ssor-review/14
-vp8 http://www.guru3d.com/article/amd-fx...ssor-review/17
-sha1 http://www.guru3d.com/article/amd-fx...ssor-review/17
-photoshop cs5 http://www.lostcircuits.com/mambo//i...&limitstart=14
-photoshop cs5 http://www.tomshardware.com/reviews/...x,3043-15.html
-winrar, faster than 2600k http://www.techspot.com/review/452-a...pus/page7.html
-winrar, improves over x6 http://www.tomshardware.com/reviews/...x,3043-16.html
-7-zip better than 2600k here: http://images.anandtech.com/graphs/graph4955/41698.png http://www.anandtech.com/show/4955/t...x8150-tested/7
-7-zip same perf as 2600k http://www.tomshardware.com/reviews/...x,3043-16.html
-POV-ray, faster than 2600k http://www.legitreviews.com/article/1741/10/
-POV-ray http://www.nordichardware.se/test-la...art=15#content
-x264(2nd pass AVX enabled) http://www.anandtech.com/show/4955/t...x8150-tested/7
-x264 (2nd pass, better overall than 2600k) http://www.bjorn3d.com/read.php?cID=2125&pageID=11108
-x264 (2nd pass +.3 than SB2600k) http://www.legitreviews.com/article/1741/7/
-handbrake; http://www.legitreviews.com/article/1741/9/
-truecrypt; http://www.bjorn3d.com/read.php?cID=2125&pageID=11111
-solidworks; faster than 2600k http://www.techspot.com/review/452-a...pus/page7.html
-abbyy filereader http://www.tomshardware.com/reviews/...x,3043-16.html
-C-Ray, as fast as $1k i7-990X, http://i664.photobucket.com/albums/v.../c-rayir38.png
Read radical news here
It's common, live with it. Every Cell processor in a PS3 comes with eight cell processing units, with one disabled. That way they can set the standard for seven and use most of the chips that come off the line.
Even AMD had a problem with too-good yield about ten years ago, so they restricted the clock and sold "crippled" low-end chips that were technically rated to run at much higher speeds.
AMD already had the on die memory controller. Their answer to intel's Hyperthreading was real cores. The QPI bus that intel uses is very similar to the one AMD pioneered with Hypertransport. Let's not forget that AMD64 (oh, did you want me to call it EM64T or x86_64?) was a product of AMD's engineering effort rather than forcing people toward the EPIC architecture which seems to be niche based.
It matters to virtualization. Higher density equates to more systems on a single server, which equates to less power for the same number of servers.
"My immediate reaction is "WTF? What kind of moron doesn't make things 64-bit safe to begin with?" Linus