First 16-Core Opteron Chips Arrive From AMD
angry tapir writes "After a brief delay and more than a year of chatter, Advanced Micro Devices has announced the availability of its first 16-core Opteron server chips, which pack the largest number of cores available on x86 chips today. The new Opteron 6200 chips, code-named Interlagos, are 25 per cent to 30 per cent faster than their predecessors, the 12-core Opteron 6100 chips, according to AMD."
The "cores" in Bulldozer are not your typical first-class x86 core. Bulldozer "cores" are worth 2/3 of a modern x86 core. The 6200 is more like a 10 core. Add to that the crappy IPC and I'm not impressed.
I was excited about Bulldozer before it was released. It's not often that CPU makers take chances on radical new architectures. Too bad this one turned out to be a huge pile of fail.
"Liechtenstein is the world's largest producer of sausage casings, potassium storage units, and false teeth."
This would compete with the Xeon-E chips that aren't out yet. But in terms of performance about 75%, so this is the equivalent of a 12-core intel chip.
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
Pfft, how much harder can it be to design one with 32 :)
Design? Easy.
Manufacture? Tricky.
Make work? Trickier.
To read about? Interesting.
A feeling of having made the same mistake before: Deja Foobar
I so much want some real competition for Intel. Competition that doesn't artificially limit clock speeds and fuse off perfectly good working features in order to market a dozen overlapping and conflicting SKUs at a dozen different price points. And working drivers, current standards (DirectX 11 and OpenCL for starters), and USB-3 that doesn't require a $50 cable between every device would be nice.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
There will be server versions as well...I've seen specs (publicly available) for an 8-core (16-thread) sandy bridge EP with a 95W TDP. I suspect it's clocked a bit lower and maybe binned for efficiency.
Maybe...
It'll be interesting. Most server applications are integer-only and never touch the floating point units. That should mean that Bulldozer designs work close to the full core count in contrast to the poor benchmarking results it puts out in Photoshop filters and video encode.
Intel's server chips are 8- and 10-core, and outperform Opterons by a considerable margin.
Servers need single-thread too; think stuff like big database writes, joins, ERP, and CRM. Think outside the embarrassingly-parallel web-serving box.
If multithreaded performance was all that matters, the Sun Niagara chips would have done a lot better than they did.
Put simply, the AMD ones are slower than the intel ones by about 2 fold per core. This isn't because AMD sucked at design, so much as their marketing department sucked at telling the truth. In reality, we're looking at 8 core AMD CPUs with 2 integer units per core - i.e. no more 16 core than intel's are 16 core chips because of hyperthreading.
Once that's ironed out, the AMD chips turn out to have rather good performance if you want lots of integer work done, and the Intel chips to have rather good performance if you want anything else done.
What's the Xeon E5-2650L, 2650, 2660, 2665, 2670, 2680, 2690 and 2687W then?
Hint: they're all 8 core SNB-E chips. Second hint - AMD's 16 "core" CPUs don't have 16 cores – they have 16 integer units. They only have 8 instruction fetch units, 8 decode units, 8 L2 caches, etc. That is, they're 8 core CPUs with strong integer support. SNB-E's particular strength is floating point, but it tends to beat the opterons at pretty much anything that isn't heavily integer biased.
No, 8 integer cores per chip, but 4 actual real cores. For a total of 8 cores across 2 chips.
Pic related: amd vs intel decision making.
Umm, Joins can be done in parallel, in lots and lots of cases. ERP and CRP are applications that ought to see big improvements form more cores, if you have more than a few users anyway. It also simplifies things, you don't have to figure out how to architect the thing to run across 10 hosts anymore, good multi-core systems deliver there performance these apps need if you can get the disk IO solved. A good SAN with mutlipath support and multiple HBAs can get there.
Niagara failed because each individual core was too slow, a comparable cost Intel CPU could do in serial with one core two jobs, in less time than Niagara could do one job with on core. The question is here for most paralleled work loads like a database where all cores will be used are AMDs 16 core chips at least 62% the speed of Intel's 10 core chips on core vs. core basis? If true other things being equal for *some* work loads these Opterons will be better.
Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
Eh, how about this:
Intel: I know, let's try to see just how many features/cores/cache we can fuse off in our dies and different socket combinations to try to make *puts pinky finger to mouth* one MILLION SKUs! Oh, and while we're at it, let's add a FOURTH memory channel, because more is better! Sure, we could get all the bandwidth we need with two DDR3-1866 or -2133 channels and that you really only get about three channels' worth of bandwidth because we have to clock the IMC down to DDR3-1333 with two modules per channel- but we still have FOUR channels! Oh, and we forgot, it's the start of a new quarter so we need to release a new socket. Can't let those socket suppliers get lazy making last quarter's socket design. What, you guys want us to release Sandy Bridge-based Xeon MPs because MP platforms actually need that much bandwidth and core count? We just released the Westmere-based ones a few months ago! Don'tcha know that Xeon MPs run two years behind everything else? Geez, what did you do, wake up yesterday? Next you'll want us to stop crippling our chips, stop using a new socket every other month or something ridiculous like that. Where do you guys get those ideas?
AMD: Based on market analysis, most server applications use primarily integer code and require a lot of bandwidth, memory capacity, and a high core count. We don't have over a hundred billion dollars in market cap to fund several parallel R&D teams to design a specific CPU for every edge use case, so we will design a CPU that is highly modular, has good integer performance (because that's what our research indicated most server apps are), and has a lot of cores. Experience with Intel's HyperThreading is less than stellar with regards to predictable performance, so we will use our CMT approach that leads to better integer performance than HyperThreading but doesn't increase the die size by a huge amount, since we can't afford to make 400-600 mm^2 dies like Intel does to have a lot of physical cores. Oh, and we'll continue to use the existing server platforms out there so our customers can drop-in upgrade and we'll also not change any feature sets in the SKU stack other than the clock speed and number of enabled modules and their associated caches. We do apologize for being "late" with these parts since we usually release server and client at about the same time...
Just "gittin-r-done," day after day.
I just got a fancy 8 core T7500 Dell workstation and only one of my compilers actually takes advantage of the multiple cores when it is compiling. As a result this expensive desktop is only 15% faster in terms of time to compile than the 4 year old PC it replaced (the new PC has twice the ram as the old though which may account for some of that speed increase). I am seriously unimpressed with all these cores. Maybe they are useful for something, but I've not found anything that I do that shows significant improvement. Putting my development projects on a SSD did much more for my work flow performance than this fancy new computer, that is for certain.
No, there are 16 integer pipelines with one scheduler and 4 logic units each, 16 128bit floating point units that can also be combined into 8 256bit units, and 8 fetch/decode units. This is not a MCU, it's one chip with the above mentioned components. Whether it's 16 cores or 8 or 4 modules is kind of academic unless you are trying to optimize a scheduler for it in which case the label's still don't matter, only the actual implementation and achievable performance matter.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
The basic point is that it has a total of 8 instruction fetch units, it has a total of 8 instruction decode units that they feed, and it has a total of 8 chunks of L2 cache. The fact that each of these 8 cores has 2 integer units on it is neither here nor there –hell, for years cores have had several floating point units on them, it didn't make them more than one core. Not only that, but this CPU behaves badly when the scheduler treats it as 16 cores instead of 8. The bottom line is that this chip in every single way behaves like an 8 core CPU, more so, it's slower than intel's 8 core CPUs at a similar clock even with hyper threading disabled.
is that why there have been 3 supercomputer orders in the last 3 weeks with amd's bulldozer opterons ?
Read radical news here
and many, many, moooreeee
-mainconcept http://www.lostcircuits.com/mambo//i...&limitstart=17
-mediashow http://www.guru3d.com/article/amd-fx...ssor-review/14
-h.264 http://www.guru3d.com/article/amd-fx...ssor-review/14
-vp8 http://www.guru3d.com/article/amd-fx...ssor-review/17
-sha1 http://www.guru3d.com/article/amd-fx...ssor-review/17
-photoshop cs5 http://www.lostcircuits.com/mambo//i...&limitstart=14
-photoshop cs5 http://www.tomshardware.com/reviews/...x,3043-15.html
-winrar, faster than 2600k http://www.techspot.com/review/452-a...pus/page7.html
-winrar, improves over x6 http://www.tomshardware.com/reviews/...x,3043-16.html
-7-zip better than 2600k here: http://images.anandtech.com/graphs/graph4955/41698.png http://www.anandtech.com/show/4955/t...x8150-tested/7
-7-zip same perf as 2600k http://www.tomshardware.com/reviews/...x,3043-16.html
-POV-ray, faster than 2600k http://www.legitreviews.com/article/1741/10/
-POV-ray http://www.nordichardware.se/test-la...art=15#content
-x264(2nd pass AVX enabled) http://www.anandtech.com/show/4955/t...x8150-tested/7
-x264 (2nd pass, better overall than 2600k) http://www.bjorn3d.com/read.php?cID=2125&pageID=11108
-x264 (2nd pass +.3 than SB2600k) http://www.legitreviews.com/article/1741/7/
-handbrake; http://www.legitreviews.com/article/1741/9/
-truecrypt; http://www.bjorn3d.com/read.php?cID=2125&pageID=11111
-solidworks; faster than 2600k http://www.techspot.com/review/452-a...pus/page7.html
-abbyy filereader http://www.tomshardware.com/reviews/...x,3043-16.html
-C-Ray, as fast as $1k i7-990X, http://i664.photobucket.com/albums/v.../c-rayir38.png
Read radical news here
It matters to virtualization. Higher density equates to more systems on a single server, which equates to less power for the same number of servers.
"My immediate reaction is "WTF? What kind of moron doesn't make things 64-bit safe to begin with?" Linus