Slashdot Mirror


First 16-Core Opteron Chips Arrive From AMD

angry tapir writes "After a brief delay and more than a year of chatter, Advanced Micro Devices has announced the availability of its first 16-core Opteron server chips, which pack the largest number of cores available on x86 chips today. The new Opteron 6200 chips, code-named Interlagos, are 25 per cent to 30 per cent faster than their predecessors, the 12-core Opteron 6100 chips, according to AMD."

24 of 189 comments (clear)

  1. Bulldozer Cores are not that Great by TheTyrannyOfForcedRe · · Score: 4, Interesting

    The "cores" in Bulldozer are not your typical first-class x86 core. Bulldozer "cores" are worth 2/3 of a modern x86 core. The 6200 is more like a 10 core. Add to that the crappy IPC and I'm not impressed.

    I was excited about Bulldozer before it was released. It's not often that CPU makers take chances on radical new architectures. Too bad this one turned out to be a huge pile of fail.

    --
    "Liechtenstein is the world's largest producer of sausage casings, potassium storage units, and false teeth."
    1. Re:Bulldozer Cores are not that Great by Theovon · · Score: 5, Informative

      Your description in inaccurate, but that's not surprising since most slashdot readers don't know much about CPU architecture.

      Bulldozers are essentially full-fledged cores, where the two cores in each module are mostly independent. There are two completely independent integer pipelines, so people seem to want to harp on the fact that the FPU is "shared". It's really a single split FPU, where each half can execute independent instructions, as long as the data width is 128 bits or less. Only when it is executing 256-bit AVX instructions is there any competition for resources. This is a very sensible design decision, since you don't find enough AVX software right now to justify completely dedicated AVX logic. (Plus, IIRC sandy bridge's FPU is only 128 bits wide and issues AVX instructions in two cycles, so what's the difference?) Moreover, even with AVX-heavy workloads, most software won't issue AVX instructions every cycle, and two AVX-heavy tasks on the same module won't really run into much contention. Assuming my memory of Sandy Bridge's FPU is correct, then Bulldozer has the advantage of having lower latency within the FPU on isolated AVX instructions.

      The PROBLEM with Bulldozer is that they just have not done some of the really aggressive and costly things that Intel has done in their design. Bulldozer is still a 3-issue design. While going to 4-issue doesn't help that much that often, it still gives Sandy Bridge a slight edge. But where SB REALLY gets its advantage is the huge instruction window. Intel found clever ways to shrink the logic for various components so that they could make room for a much larger physical register file and reorder buffer. As a result, SB can have many more decoded instructions in flight, which exposes more instruction-level parallelism and, critically, absorbs more memory access latency.

      A Sun engineer (discussing Rock, among other things) once described modern CPU execution as a race between last-level cache misses. When you have a miss on your L3 cache, it can cost hundreds of cycles, upwards of 1000. During that miss, the CPU fills up its reservation station with other instructions and then stalls, waiting on something to retire. This won't happen for a long time. Because of the disparity in speed (and latency) between compute and memory access, this is typically the most significant bottleneck. By enlarging the instruction window, SB can achieve much higher throughput, and it shows in the benchmarks.

      This is Bulldozer's Achilles' heel. I know there are a few benchmarks where Bulldozer is faster than SB, but they're not typical workloads with typical memory footprints. Anyhow, so if you're going to rag on Bulldozer, rag on it for the right reasons. Bulldozer's "shared" FPU is a red herring.

    2. Re:Bulldozer Cores are not that Great by Artraze · · Score: 5, Informative

      The OP right, and seems to understand the issues far better than you. It isn't that the FPU is shared, it that nearly _everything_ is shared: Instruction cache, fetch and decode, FPU, L2 data cache. The only things that aren't shared are L1 data and integer operations (scheduler and ALU).

      Instruction issuing and and cache misses are big performance areas, but these are precisely the resources the cores share! You're running two threads off (with the exception of L1 data) the same caches and instruction fetches. So, in reality, the second core in bulldozer is much more like ultra-hyperthreading than it is a second core. I think the fact that they're even listed as cores is a marketing strategy that has backfired pretty hard.

      P.S. L3 cache has proven to be quite useless in many workloads... It helps a bit in servers, IIRC, but that's about it. So it's more a race to L2 cache, which, again, is a shared resource. AMD, in fact, has indicated that it may drop the L3 from desktop parts.

  2. Re:Compared to Intel? by Surt · · Score: 4, Interesting

    This would compete with the Xeon-E chips that aren't out yet. But in terms of performance about 75%, so this is the equivalent of a 12-core intel chip.

    --
    "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
  3. Re:Only 16? by ackthpt · · Score: 4, Interesting

    Pfft, how much harder can it be to design one with 32 :)

    Design? Easy.

    Manufacture? Tricky.

    Make work? Trickier.

    To read about? Interesting.

    --

    A feeling of having made the same mistake before: Deja Foobar
  4. Wish List by Nom+du+Keyboard · · Score: 3, Informative

    I so much want some real competition for Intel. Competition that doesn't artificially limit clock speeds and fuse off perfectly good working features in order to market a dozen overlapping and conflicting SKUs at a dozen different price points. And working drivers, current standards (DirectX 11 and OpenCL for starters), and USB-3 that doesn't require a $50 cable between every device would be nice.

    --
    "It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
  5. sandy bridge ep 95W by Chirs · · Score: 3, Informative

    There will be server versions as well...I've seen specs (publicly available) for an 8-core (16-thread) sandy bridge EP with a 95W TDP. I suspect it's clocked a bit lower and maybe binned for efficiency.

  6. Re:really 16 core? by Zan+Lynx · · Score: 4, Interesting

    Maybe...

    It'll be interesting. Most server applications are integer-only and never touch the floating point units. That should mean that Bulldozer designs work close to the full core count in contrast to the poor benchmarking results it puts out in Photoshop filters and video encode.

  7. Re:Compared to Intel? by the+linux+geek · · Score: 5, Informative

    Intel's server chips are 8- and 10-core, and outperform Opterons by a considerable margin.

  8. Re:Poor performance by the+linux+geek · · Score: 3, Insightful

    Servers need single-thread too; think stuff like big database writes, joins, ERP, and CRM. Think outside the embarrassingly-parallel web-serving box.

    If multithreaded performance was all that matters, the Sun Niagara chips would have done a lot better than they did.

  9. Re:Compared to Intel? by beelsebob · · Score: 4, Interesting

    Put simply, the AMD ones are slower than the intel ones by about 2 fold per core. This isn't because AMD sucked at design, so much as their marketing department sucked at telling the truth. In reality, we're looking at 8 core AMD CPUs with 2 integer units per core - i.e. no more 16 core than intel's are 16 core chips because of hyperthreading.

    Once that's ironed out, the AMD chips turn out to have rather good performance if you want lots of integer work done, and the Intel chips to have rather good performance if you want anything else done.

  10. Re:Compared to Intel? by beelsebob · · Score: 4, Interesting

    What's the Xeon E5-2650L, 2650, 2660, 2665, 2670, 2680, 2690 and 2687W then?

    Hint: they're all 8 core SNB-E chips. Second hint - AMD's 16 "core" CPUs don't have 16 cores – they have 16 integer units. They only have 8 instruction fetch units, 8 decode units, 8 L2 caches, etc. That is, they're 8 core CPUs with strong integer support. SNB-E's particular strength is floating point, but it tends to beat the opterons at pretty much anything that isn't heavily integer biased.

  11. Re:Only 16? by beelsebob · · Score: 3, Informative

    No, 8 integer cores per chip, but 4 actual real cores. For a total of 8 cores across 2 chips.

  12. Re:Only 16? by kvvbassboy · · Score: 4, Funny
  13. Re:Poor performance by DarkOx · · Score: 3, Insightful

    Umm, Joins can be done in parallel, in lots and lots of cases. ERP and CRP are applications that ought to see big improvements form more cores, if you have more than a few users anyway. It also simplifies things, you don't have to figure out how to architect the thing to run across 10 hosts anymore, good multi-core systems deliver there performance these apps need if you can get the disk IO solved. A good SAN with mutlipath support and multiple HBAs can get there.

    Niagara failed because each individual core was too slow, a comparable cost Intel CPU could do in serial with one core two jobs, in less time than Niagara could do one job with on core. The question is here for most paralleled work loads like a database where all cores will be used are AMDs 16 core chips at least 62% the speed of Intel's 10 core chips on core vs. core basis? If true other things being equal for *some* work loads these Opterons will be better.

     

    --
    Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
  14. Re:Intel vs AMD's philosophy as of late by level_headed_midwest · · Score: 4, Interesting

    Eh, how about this:

    Intel: I know, let's try to see just how many features/cores/cache we can fuse off in our dies and different socket combinations to try to make *puts pinky finger to mouth* one MILLION SKUs! Oh, and while we're at it, let's add a FOURTH memory channel, because more is better! Sure, we could get all the bandwidth we need with two DDR3-1866 or -2133 channels and that you really only get about three channels' worth of bandwidth because we have to clock the IMC down to DDR3-1333 with two modules per channel- but we still have FOUR channels! Oh, and we forgot, it's the start of a new quarter so we need to release a new socket. Can't let those socket suppliers get lazy making last quarter's socket design. What, you guys want us to release Sandy Bridge-based Xeon MPs because MP platforms actually need that much bandwidth and core count? We just released the Westmere-based ones a few months ago! Don'tcha know that Xeon MPs run two years behind everything else? Geez, what did you do, wake up yesterday? Next you'll want us to stop crippling our chips, stop using a new socket every other month or something ridiculous like that. Where do you guys get those ideas?

    AMD: Based on market analysis, most server applications use primarily integer code and require a lot of bandwidth, memory capacity, and a high core count. We don't have over a hundred billion dollars in market cap to fund several parallel R&D teams to design a specific CPU for every edge use case, so we will design a CPU that is highly modular, has good integer performance (because that's what our research indicated most server apps are), and has a lot of cores. Experience with Intel's HyperThreading is less than stellar with regards to predictable performance, so we will use our CMT approach that leads to better integer performance than HyperThreading but doesn't increase the die size by a huge amount, since we can't afford to make 400-600 mm^2 dies like Intel does to have a lot of physical cores. Oh, and we'll continue to use the existing server platforms out there so our customers can drop-in upgrade and we'll also not change any feature sets in the SKU stack other than the clock speed and number of enabled modules and their associated caches. We do apologize for being "late" with these parts since we usually release server and client at about the same time...

    --
    Just "gittin-r-done," day after day.
  15. When are multiple cores going to help me? by craftycoder · · Score: 4, Interesting

    I just got a fancy 8 core T7500 Dell workstation and only one of my compilers actually takes advantage of the multiple cores when it is compiling. As a result this expensive desktop is only 15% faster in terms of time to compile than the 4 year old PC it replaced (the new PC has twice the ram as the old though which may account for some of that speed increase). I am seriously unimpressed with all these cores. Maybe they are useful for something, but I've not found anything that I do that shows significant improvement. Putting my development projects on a SSD did much more for my work flow performance than this fancy new computer, that is for certain.

    1. Re:When are multiple cores going to help me? by Anonymous Coward · · Score: 4, Informative

      You're doing it wrong.

      make -j8

    2. Re:When are multiple cores going to help me? by friedmud · · Score: 3, Informative

      What do you mean by "only one of my compilers actually takes advantage of the multiple cores when it is compiling"?

      Are you on Windows? Because any compiling done in linux with a "make" based (or similar) build system can use as many cores as you can throw in a machine (regardless of the actual compiler it's running). It should be the same in Windows...

      Don't look to your compiler to be multithreaded... look at the build system (i.e. in Visual Studio there should be an option somewhere to tell it how many processors to use while compiling). For make you just do "make -j8" to use 8 "jobs" total for compiling (i.e. 8 instances of the compiler will be running).

      Here is a test for one of my software projects doing "make -j#" where # is 1,4,8,12,16,24:

      1 : 15m9.614s
      4 : 3m57.947s
      8 : 2m6.354s
      12 : 1m33.426s
      16 : 1m25.559s
      24 : 1m17.345s

      That is on my dual 6-core hyperthreaded Mac workstation (so it had 12 "real" cores and 12 "hyperthreads"). You can see that hyperthreads definitely aren't as good as real cores... but do provide some speedup. That said, I thank God every time I compile (which is all day long) for the cores he has bestowed upon me...

      Good to hear that you are already on SSD... because parallel compiling does need speedy disk to keep the processors humming. The timings above are for two 256GB SSD's in RAID0.

  16. Re:Only 16? by afidel · · Score: 5, Insightful

    No, there are 16 integer pipelines with one scheduler and 4 logic units each, 16 128bit floating point units that can also be combined into 8 256bit units, and 8 fetch/decode units. This is not a MCU, it's one chip with the above mentioned components. Whether it's 16 cores or 8 or 4 modules is kind of academic unless you are trying to optimize a scheduler for it in which case the label's still don't matter, only the actual implementation and achievable performance matter.

    --
    There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
  17. Re:Only 16? by beelsebob · · Score: 4, Interesting

    The basic point is that it has a total of 8 instruction fetch units, it has a total of 8 instruction decode units that they feed, and it has a total of 8 chunks of L2 cache. The fact that each of these 8 cores has 2 integer units on it is neither here nor there –hell, for years cores have had several floating point units on them, it didn't make them more than one core. Not only that, but this CPU behaves badly when the scheduler treats it as 16 cores instead of 8. The bottom line is that this chip in every single way behaves like an 8 core CPU, more so, it's slower than intel's 8 core CPUs at a similar clock even with hyper threading disabled.

  18. Re:Compared to Intel? by unity100 · · Score: 4, Interesting

    is that why there have been 3 supercomputer orders in the last 3 weeks with amd's bulldozer opterons ?

  19. Re:how do they compare ? by unity100 · · Score: 5, Informative

    and many, many, moooreeee

    -mainconcept http://www.lostcircuits.com/mambo//i...&limitstart=17
    -mediashow http://www.guru3d.com/article/amd-fx...ssor-review/14
    -h.264 http://www.guru3d.com/article/amd-fx...ssor-review/14
    -vp8 http://www.guru3d.com/article/amd-fx...ssor-review/17
    -sha1 http://www.guru3d.com/article/amd-fx...ssor-review/17
    -photoshop cs5 http://www.lostcircuits.com/mambo//i...&limitstart=14
    -photoshop cs5 http://www.tomshardware.com/reviews/...x,3043-15.html
    -winrar, faster than 2600k http://www.techspot.com/review/452-a...pus/page7.html
    -winrar, improves over x6 http://www.tomshardware.com/reviews/...x,3043-16.html
    -7-zip better than 2600k here: http://images.anandtech.com/graphs/graph4955/41698.png http://www.anandtech.com/show/4955/t...x8150-tested/7
    -7-zip same perf as 2600k http://www.tomshardware.com/reviews/...x,3043-16.html
    -POV-ray, faster than 2600k http://www.legitreviews.com/article/1741/10/
    -POV-ray http://www.nordichardware.se/test-la...art=15#content
    -x264(2nd pass AVX enabled) http://www.anandtech.com/show/4955/t...x8150-tested/7
    -x264 (2nd pass, better overall than 2600k) http://www.bjorn3d.com/read.php?cID=2125&pageID=11108
    -x264 (2nd pass +.3 than SB2600k) http://www.legitreviews.com/article/1741/7/
    -handbrake; http://www.legitreviews.com/article/1741/9/
    -truecrypt; http://www.bjorn3d.com/read.php?cID=2125&pageID=11111
    -solidworks; faster than 2600k http://www.techspot.com/review/452-a...pus/page7.html
    -abbyy filereader http://www.tomshardware.com/reviews/...x,3043-16.html
    -C-Ray, as fast as $1k i7-990X, http://i664.photobucket.com/albums/v.../c-rayir38.png

  20. Re:Only 16? by certain+death · · Score: 3, Informative

    It matters to virtualization. Higher density equates to more systems on a single server, which equates to less power for the same number of servers.

    --
    "My immediate reaction is "WTF? What kind of moron doesn't make things 64-bit safe to begin with?" Linus