Slashdot Mirror


First 16-Core Opteron Chips Arrive From AMD

angry tapir writes "After a brief delay and more than a year of chatter, Advanced Micro Devices has announced the availability of its first 16-core Opteron server chips, which pack the largest number of cores available on x86 chips today. The new Opteron 6200 chips, code-named Interlagos, are 25 per cent to 30 per cent faster than their predecessors, the 12-core Opteron 6100 chips, according to AMD."

189 comments

  1. Only 16? by masternerdguy · · Score: 0

    Pfft, how much harder can it be to design one with 32 :)

    --
    To offset political mods, replace Flamebait with Insightful.
    1. Re:Only 16? by unity100 · · Score: 1

      20 next year, 24 the next, and so on.

    2. Re:Only 16? by ackthpt · · Score: 4, Interesting

      Pfft, how much harder can it be to design one with 32 :)

      Design? Easy.

      Manufacture? Tricky.

      Make work? Trickier.

      To read about? Interesting.

      --

      A feeling of having made the same mistake before: Deja Foobar
    3. Re:Only 16? by sjames · · Score: 1

      So what are you waiting for? Hop to it and corner the market!

      Go ahead, I'll just wait over here and read the paper.

    4. Re:Only 16? by unixisc · · Score: 1

      lesser incremental value? Even more difficult!

    5. Re:Only 16? by beelsebob · · Score: 1, Informative

      Pffft, it's only 8 cores anyway, 8 cores each with 2 integer units. It's no more 16 core than intel's 8 cores with hyperthreading.

    6. Re:Only 16? by jessehager · · Score: 1

      It's 8 cores per chip *and 2 chips per package* for a total of 16 cores.

    7. Re:Only 16? by beelsebob · · Score: 3, Informative

      No, 8 integer cores per chip, but 4 actual real cores. For a total of 8 cores across 2 chips.

    8. Re:Only 16? by kvvbassboy · · Score: 4, Funny
    9. Re:Only 16? by afidel · · Score: 5, Insightful

      No, there are 16 integer pipelines with one scheduler and 4 logic units each, 16 128bit floating point units that can also be combined into 8 256bit units, and 8 fetch/decode units. This is not a MCU, it's one chip with the above mentioned components. Whether it's 16 cores or 8 or 4 modules is kind of academic unless you are trying to optimize a scheduler for it in which case the label's still don't matter, only the actual implementation and achievable performance matter.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    10. Re:Only 16? by beelsebob · · Score: 4, Interesting

      The basic point is that it has a total of 8 instruction fetch units, it has a total of 8 instruction decode units that they feed, and it has a total of 8 chunks of L2 cache. The fact that each of these 8 cores has 2 integer units on it is neither here nor there –hell, for years cores have had several floating point units on them, it didn't make them more than one core. Not only that, but this CPU behaves badly when the scheduler treats it as 16 cores instead of 8. The bottom line is that this chip in every single way behaves like an 8 core CPU, more so, it's slower than intel's 8 core CPUs at a similar clock even with hyper threading disabled.

    11. Re:Only 16? by Killjoy_NL · · Score: 1

      pffff why the troll mod, it's funny and on topic :)
      probably not very accurate, but still quite enjoyable :)

      --
      This is the sig that says NI (again)
    12. Re:Only 16? by Vancorps · · Score: 2

      What are you basing this on? As someone that runs both database and web servers using both AMD and Intel I find your conclusions to be completely counter to my experience and to the experience of almost everyone I know that does virtualized infrastructure.

      I ran into a number of problems when I first tried to deploy them because SQL 2005 wouldn't install on it. SQL 2008 runs just great with 24 cores as they were dual processor 12 core servers. I have no reason to think the 16 cores variants would be much different.

    13. Re:Only 16? by Chrisq · · Score: 1

      Pfft, how much harder can it be to design one with 32 :)

      To run at the same speed - very difficult. Think about twice the heat unless you make major changes

    14. Re:Only 16? by Anonymous Coward · · Score: 0

      Well, I got modded insightful after that, which was the last thing I expected or even intended. ;) And you are right, it was just a joke.

    15. Re:Only 16? by Mashiki · · Score: 1

      Doesn't really matter until developers get off their asses and start including multi-threading code. You'd think that after multicore and multiprocessor usage started jumping through the roof, that you'd see it.

      --
      Om, nomnomnom...
    16. Re:Only 16? by certain+death · · Score: 3, Informative

      It matters to virtualization. Higher density equates to more systems on a single server, which equates to less power for the same number of servers.

      --
      "My immediate reaction is "WTF? What kind of moron doesn't make things 64-bit safe to begin with?" Linus
    17. Re:Only 16? by RalphTheWonderLlama · · Score: 1

      The term "core" has about lost its meaning so this is all useless arguing. Funny that you tried to use how the chip "behaves" as definition for how many cores it has. Is that a better way? Come on.

      --
      simple, fast homepage with your links: http://www.ngumbi.com/
    18. Re:Only 16? by Pence128 · · Score: 1

      A M D is real-ly real-ly great. MOAR CORES!

      --
      404: sig not found.
  2. Compared to Intel? by Ed+Avis · · Score: 2

    So... how do these compare to the new Sandy Bridge chips Intel announced on the same day? There must be some overlap of the target market - whether to buy a quad-socket Intel server or dual-socket AMD one, for example.

    --
    -- Ed Avis ed@membled.com
    1. Re:Compared to Intel? by Anonymous Coward · · Score: 1

      The Sandy Bridge chips released so far are all "Extreme" versions which suck power so much you'd be insane to use them for a server.

    2. Re:Compared to Intel? by Surt · · Score: 4, Interesting

      This would compete with the Xeon-E chips that aren't out yet. But in terms of performance about 75%, so this is the equivalent of a 12-core intel chip.

      --
      "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
    3. Re:Compared to Intel? by rrossman2 · · Score: 1

      Not sure what Tyan has planned and what the chips can do, but tyan had boards that supported 4 quad core opterons plus you could add a "daughter board" that allowed you to add 4 more (plus more ram slots)

      Now that setup using 16 core cpus in an eatx format would be crazy

    4. Re:Compared to Intel? by 0123456 · · Score: 2

      Given that an 8-core Bulldozer already needs its own power station to operate, I can't imagine Intel could have a worse TDP than a 16-core.

    5. Re:Compared to Intel? by ByOhTek · · Score: 1

      Yeah. I could ditch my furnace in the winter with a computer like that... Might even have to open a few Windows.

      --
      Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
    6. Re:Compared to Intel? by ByOhTek · · Score: 1

      If they aren't out yet, how can you know? I wouldn't trust the performance benchmarks from either manufacturer.

      --
      Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
    7. Re:Compared to Intel? by unity100 · · Score: 1

      intel cant field more than 6 cores at the same time in even sandy bridge E. multithreaded apps like server apps, shine in bulldozer.

    8. Re:Compared to Intel? by Talderas · · Score: 2

      The idle heat would be sufficient, no? I don't see why you would need to open some windows just to ramp up the temperature unless you're using this thing to few heat for a sauna.

      --
      "Lack of speed can be overcome. In the worst case by patience." --Znork
    9. Re:Compared to Intel? by the+linux+geek · · Score: 5, Informative

      Intel's server chips are 8- and 10-core, and outperform Opterons by a considerable margin.

    10. Re:Compared to Intel? by Kjella · · Score: 2

      Even the fastest Sandy Bridge-E draws less power than a Bulldozer even at much higher performance. It also costs 3-4 times as much, so performance/$ is quite shitty (hey, it's an extreme $999 proc) but you the winner in performance is clear. But thanks for trolling, come again.

      --
      Live today, because you never know what tomorrow brings
    11. Re:Compared to Intel? by beelsebob · · Score: 4, Interesting

      Put simply, the AMD ones are slower than the intel ones by about 2 fold per core. This isn't because AMD sucked at design, so much as their marketing department sucked at telling the truth. In reality, we're looking at 8 core AMD CPUs with 2 integer units per core - i.e. no more 16 core than intel's are 16 core chips because of hyperthreading.

      Once that's ironed out, the AMD chips turn out to have rather good performance if you want lots of integer work done, and the Intel chips to have rather good performance if you want anything else done.

    12. Re:Compared to Intel? by beelsebob · · Score: 4, Interesting

      What's the Xeon E5-2650L, 2650, 2660, 2665, 2670, 2680, 2690 and 2687W then?

      Hint: they're all 8 core SNB-E chips. Second hint - AMD's 16 "core" CPUs don't have 16 cores – they have 16 integer units. They only have 8 instruction fetch units, 8 decode units, 8 L2 caches, etc. That is, they're 8 core CPUs with strong integer support. SNB-E's particular strength is floating point, but it tends to beat the opterons at pretty much anything that isn't heavily integer biased.

    13. Re:Compared to Intel? by Anonymous Coward · · Score: 0

      The real interesting thing is the new Intel chips are actually 8 core with two of them disabled because of TDP limits.
      Do the new Opterons burn as much power as the new E series? I'd guess not but I haven't seen any TDP comparisons of the new series from either manufacturer.

    14. Re:Compared to Intel? by Surt · · Score: 1

      This assumes that performance is not significantly different from the desktop line, which is usually the case.

      --
      "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
    15. Re:Compared to Intel? by Surt · · Score: 2

      Slight correction, on threaded workloads, we'd be talking about a 6-core chip, intel runs 2 threads per core.

      --
      "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
    16. Re:Compared to Intel? by KingMotley · · Score: 1

      Well except the 130TDP of the 3690x is less than the 140TDP of the (almost equivalent) 6282 SE from AMD. Don't let facts get in the way of your beliefs.

    17. Re:Compared to Intel? by bloodhawk · · Score: 1

      You need to find a better line for such fanboyish, using stuff that is easily known and proven wrong is just silly. Intel server lines are 8 and 10 core. So far they have also trounced AMD in performance, though it would be nice if AMD can edge closer or even pass them with something new. competition is much needed in this area and something intel has not had for a few years now.

    18. Re:Compared to Intel? by unity100 · · Score: 4, Interesting

      is that why there have been 3 supercomputer orders in the last 3 weeks with amd's bulldozer opterons ?

    19. Re:Compared to Intel? by beelsebob · · Score: 2

      Really? Given that an 8 "core" bulldozer FX-8150 gets beaten by a 4 core i5 2500, you would reasonably expect that this 16 "core" bulldozer would get beaten by an 8 core sandy bridge chip with no hyperthreading at roughly the same clock speed. A little bit of imagination might convince you that a 6 core with hyprethreading might perform similarly too.

      AMD – 16 "core" bulldozer – $1000
      Intel – 6 core + HT Xeon E5-1650 at much higher clock – $583.
      Alternatively, if you want to be able to stick the intel chips in NUMA
      Intel – 6 core + HT Xeon E5-2640 at the same clock as the AMD chip – $884, but with only 95W power consumption.

      Final alternative:
      Intel –4 core (with no HT) Xeon E5-2609 at roughly the same clock –$294, stick two of them in, and there you are.

    20. Re:Compared to Intel? by Anonymous Coward · · Score: 0

      Source?

    21. Re:Compared to Intel? by the+linux+geek · · Score: 1

      SPECcpu results, TPC-H, and personal experience.

    22. Re:Compared to Intel? by Vancorps · · Score: 2

      Which would be what? I sounds to me like databases and webservers benefit greatly from the AMD approach. Alternatives such as render farms use GPUs, so what strength is Intel actually offering?

    23. Re:Compared to Intel? by beelsebob · · Score: 2

      Not really, no –databases and web servers don't spend their time doing parallel integer work, they spend their time doing logic work. Sandy Bridge kicks the snot out of it there.

    24. Re:Compared to Intel? by gilboad · · Score: 2, Interesting

      While I do agree that AMD is *well* behind Intel's latest and greatest in the 1P / desktop world, I fail to see how you could make such bold statement, unless you have had the chance to compare and AMD 4S machine to Intel 4S machine (say, Opteron 62xx based HP DL585G7 vs. Xeon 75xx/E7 based HP DL580G7).

      In my experience (and I venture and guess that is just as good as yours, if not better) the picture is far from being black-and-white and greatly (!!!) depends on the application that is being tested. The pictures becomes even more complex, once you factor in the Xeon E7 excessive price. ... So I ask again, have you had any experience in benchmarking the Opteron 6200 or are you simply making things up as you go along?

      - Gilboa

    25. Re:Compared to Intel? by Luyseyal · · Score: 1

      If the logic is parallelizable, then the AMD chips could be a good choice. A webserver would be a good example of parallel logic in run-of-the-mill software were it not hampered by all that pesky I/O.

      -l

      --
      Help cure AIDS, cancer, and more. Donate your unused computer time to worldcommunitygrid.org. Join Team Slashdot!
    26. Re:Compared to Intel? by sneakyimp · · Score: 1

      How about we take all the energy we are putting into this pissing contest and do some actual benchmarking?? Put up or shut up.

      COME ON EVERYONE! BENCHMARK! BENCHMARK! BENCHMARK!

    27. Re:Compared to Intel? by sneakyimp · · Score: 2

      "Honey, it's kinda cold. Can you fire up Linpack on the server?"

    28. Re:Compared to Intel? by sneakyimp · · Score: 1

      BENCHMARK!
      In the meantime, *please* STFU.

    29. Re:Compared to Intel? by im_thatoneguy · · Score: 1

      Render farms don't use GPUs. Good luck fitting a 3D scene into 1GB of memory!

      Maybe for a couple specialty applications custom written for a few narrow pipeline tools but certainly not the backbone which is still all PRman, Arnold, Vray, Brazil and Mental Ray. None of which use the GPU yet. Only production renderer nearing GPU acceleration is Final Render and maybe Brazil/Vray for specialized passes.

    30. Re:Compared to Intel? by hairyfeet · · Score: 1

      Because thanks to Intel rigging their compiler, which they do to this very day, you can't just benchmark because without knowing what compiler the benchmark software was compiled upon the benchmark is useless? I mean Nvidia kicked ass on Q3 with the FX series until you changed the exe to Quack.exe and then whoops! It turned out to be a scam. Same thing here as Intel runs any CPU that gives a CPU-ID of Authentic AMD a pile of shit code while the Intel chip gets the latest SSE optimization, not exactly a fair test now is it?

      --
      ACs don't waste your time replying, your posts are never seen by me.
    31. Re:Compared to Intel? by greg1104 · · Score: 1

      Databases that are working on in-RAM workloads (so not I/O bound) spend most of their time moving data pages around in memory. There are few computational components to database work, compared with how often chunks of data are touched. Neither the floating point or integer speed is the real limiting factor on how fast that can happen. The size of the CPU caches and the speed of the CPU->RAM interconnect are the important factors.

      I've been working on a memory oriented benchmark aimed at testing for this particular area of performance for a while now. The Intel vs. AMD situation is very complicated. It depends quite a bit on how many concurrent programs are running, especially on the big servers where you can't fully utilize all of the memory channels available. I see Intel as having an edge on smaller systems, their performance with only one or two cores going can be much better. At larger active core counts, the two manufacturers are much closer to equal. I don't see this new product line as changing that.

    32. Re:Compared to Intel? by Anonymous Coward · · Score: 0

      The real interesting thing is the new Intel chips are actually 8 core with two of them disabled because of TDP limits.

      Only sort of on the TDP limit thing. It's more like 8 core with 2 disabled because Intel thinks this makes it a better product for the "enthusiast" market.

      Basically... Intel's using one die design for 4/6-core SB-E ("enthusiast" class chips) and 4/6/8-core SB-EP (workstation/server class Xeon chips). They could enable 8 cores in an enthusiast chip, but they'd have to cut the clock rate to keep it inside a 130W TDP (the self-imposed max Intel wants to ship in a consumer CPU). Intel seems to believe that the "enthusiast" (read: high end gamer) market wants clock frequency more than it wants more cores (and they're probably right), so they're not offering an 8-core enthusiast version.

      There is a healthy market for servers which run tons of threads, however, so servers will get a reduced clock 8-core 130W TDP version... and probably even an array of lower TDP bin models (65W, 95W) with dramatically lower clocks, since this is what some server customers want. (You don't have to cut the clock by half to cut the power consumption by half, so a 65W 8-core should have better performance per watt than a 130W 8-core, and some data center designs are driven more by perf/watt than absolute performance.)

      There will also be a "Workstation" 150W TDP 8-core with higher clocks than the server 130W 8-core. However, clocks on this version still won't be as fast as the "Enthusiast" 6-core 130W TDP. Also, Intel's 150W workstation Xeons typically have a reduced maximum case temperature rating compared to all other variants (meaning the cooling had better be damn good).

      (Same thing holds true for AMD's 140W chips, BTW. Whenever the TDP rating goes up, typically the Tcase_max has to go down.)

    33. Re:Compared to Intel? by Anonymous Coward · · Score: 0

      Do you seriously think one core running two threads is anywhere near as fast as two separate cores running those same two threads?

      I don't know what the scaling would be, but assuming 2X scaling for 2 way SMT does not sound remotely reasonable.

    34. Re:Compared to Intel? by Surt · · Score: 1

      I think that Intel's hyperthreads and AMD's Bulldozer 'cores' both use a resource sharing arrangement, and in neither case are full cores. The benchmarks bear this out: intel's hyperthreading is nearly as good as AMD's.

      --
      "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
    35. Re:Compared to Intel? by ak3ldama · · Score: 1

      That is why I stick to the benchmarks on Linux for fair comparisons of AMD and Intel. Not to mention most gaming benchmarks (or real performance) don't give two shits about what cpu it is ran with. The main benchmarks I've seen so far have been from phoronix. It seems like a perfectly decent cpu though maybe the price should drop a bit. My two cents: the AMD A-series is the way to go, especially in laptops like this A6-3400.

      --
      "but money is the God of Algiers & Mahomet their prophet." - Rich. O'Bryen June 8th 1786
    36. Re:Compared to Intel? by Anonymous Coward · · Score: 0

      Intel's chips do indeed outperform AMD's chips as a general rule, and this is no exception. However, every production Intel motherboard since the Pentium 3 (when they quit using the same socket as AMD) has had 50% or less FSB speed. No exceptions. This means that, while Intel chips have always been faster (especially in benchmarks where they use their own internal boards with comparable FSB) AMD systems actually outperform Intel systems despite having slower processors overall.

      This is the big problem with the AMD/Intel debate. Intel makes better chips, but in real world practice, AMD systems process the same amount of data faster because of the faster FSB, allowing considerably more data IO to and from the chip itself. Because of this, neither can truly claim to be "better" because it varies based upon whether you're in a lab test or the real world.

      As for me, I love AMD, but I'm running 2 Intel systems. Not so much by choice - when you see an ASUS G72 (retail for $1599 at time of purchase) on sale for $729 because there's a single, barely-noticeable scratch on the lid, you have a bad habit of saying "to hell with the processor manufacturer" and just buying it anyway. I'm very happy with this system, but I do believe that I'd prefer the exact same box with an AMD in it. Oh well.

    37. Re:Compared to Intel? by hairyfeet · · Score: 2

      Oh I have NO problem with FOSS benchmarks, its one of the few places where you can be sure to get a rigging free test. Not real big on the A-Series though, I find that the Deneb and Thuban chips are a better deal ATM and in many tests Deneb and Thuban stomp Bulldozer. it looks like Bulldozer is gonna be another Phenom I, where it took them a generation to get the bugs out and crank up the clocks like they did with Phenom II.

      But I'd say the E series is another story altogether. Its priced the same as Atom but frankly stomps it and gets scores usually above Atom+ION which is a more expensive option. its great for netbooks and all in ones, I've even built a couple of HTPCs with it and it works great in that role, quiet as a churchmouse while having no trouble with 1080p. I liked what I saw enough i put my own money where my mouth was and sold my athlon II Wind for a EEE with an E-350 and i just love the thing. its light, gets great battery life, never gets hot, and just about every video format under the sun is accelerated with DXVA.

      So I say stay away from the new socket for now, go with a Deneb or Thuban and by the time that is long in the tooth and you are ready to upgrade the chips after Piledriver will be out and they'll have any performance problems licked. Again I intend to put my money where my mouth is and after the holidays upgrade from Deneb quad to thuban. Do I need it? Not really but I WANTS IT precious, I WANTS IT!

      --
      ACs don't waste your time replying, your posts are never seen by me.
    38. Re:Compared to Intel? by Anonymous Coward · · Score: 0

      Yeah, you're much better off with a desktop chip if you want a... desktop. Opteron is a server chip.

      How much for a quad socket motherboard and 4 of those 6 core Intel chips? Because the 16 core opterons start at 85W and $523 and a quad mobo is less than a grand from newegg for the AMD option.

      I've got a stack of opteron 6128's in HP DL165 1U 2P cases. I can order the 6272 opterons from HP today and have them installed this week if I needed more CPU. Even with HP markup, it's only $650 a socket.

    39. Re:Compared to Intel? by Anonymous Coward · · Score: 0

      Didn't your physics teacher tell you not to mix units and names, bitch?

      There's no such thing as "130TDP". There could be a TDP of 130W.

    40. Re:Compared to Intel? by Anonymous Coward · · Score: 0

      How quaint, expecting that performance per core can be linearly extrapolated.

    41. Re:Compared to Intel? by Surt · · Score: 1

      Why not provide some evidence that it can't. Every benchmark says you're wrong and I'm right.

      --
      "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
    42. Re:Compared to Intel? by c++0xFF · · Score: 1

      You're right -- it's not really 16 cores. But nor is it just 8 cores with HyperThreading. Bulldozer is an interesting compromise that gives every concurrent thread of execution its own set of computation resources. This should result in faster execution than an 8-core machine, with or without hyperthreading, but probably isn't quite as faster as a true 16-core machine.

      Also, (and I might be very wrong here), I thought there were 16 floating point units, too.

    43. Re:Compared to Intel? by cheesybagel · · Score: 1

      If you actually did your math you would realize AMD's Bulldozer 16-core has the same peak theoretical FP performance as a 16-core Sandy Bridge would if it existed. A Sandy Bridge 256-bit AVX instruction typically has 2 cycles latency while Bulldozer's 2x128-bit AVX has 1 cycle latency for the same math operation. At the same time Bulldozer should have twice the integer performance. My guess is there is some bottleneck, hardware bug, or lack of OS/compiler optimizations to enable it to perform adequately.

  3. really 16 core? by neuro88 · · Score: 1

    Hmmm... According to the article, these new chips seemed to be based on the bulldozer architecture, so it might be better to think of these opterons as 8 core chips that have really good hyperthreading.

    1. Re:really 16 core? by ackthpt · · Score: 1

      Hmmm... According to the article, these new chips seemed to be based on the bulldozer architecture, so it might be better to think of these opterons as 8 core chips that have really good hyperthreading.

      Hold your horse, cowpoke.

      Just because it's based upon doesn't mean it will suffer the same issues as the Bulldozer. Perhaps this is the core which really works well, while the more consumer oriented Bulldozer is the red-headed stepchild.

      --

      A feeling of having made the same mistake before: Deja Foobar
    2. Re:really 16 core? by Zan+Lynx · · Score: 4, Interesting

      Maybe...

      It'll be interesting. Most server applications are integer-only and never touch the floating point units. That should mean that Bulldozer designs work close to the full core count in contrast to the poor benchmarking results it puts out in Photoshop filters and video encode.

    3. Re:really 16 core? by Anonymous Coward · · Score: 1

      Exactly. The way I see Buldozer is that it is a good chip for things like web hosting, databases, middleware (ie. "the cloud"). Floating point performance is not that important if your threads do not do floating point. Heck, even if 1/2 of the threads do floating point, then you are fine.

      Frankly, I only care how fast each thread can run and access memory. This is what is important in server consolidation. Floating point, meh.

    4. Re:really 16 core? by the+linux+geek · · Score: 2

      They both have the same issues, including that each module (two 4-issue cores) has a single 4-instruction decoder in front of it. Cache latency is also likely to be similar if not the same.

    5. Re:really 16 core? by gweihir · · Score: 1

      Well, as Intel hyperthreading is basically brain-dead (had to disable it for decent performance as some things were glacially slow), really good hyperthreading just means usable hyperthreading for me. If Interl did not have so much money, AMD would have blown them away a long time ago. Intel technology sucks badly.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    6. Re:really 16 core? by Anonymous Coward · · Score: 0

      Let's be clear to distinguish Intel's physical technologies, including fabrication, from their hardware designs. Intel is indisputably years ahead of anyone else in terms of the actual manufacture of semiconductor devices.

    7. Re:really 16 core? by Anonymous Coward · · Score: 0

      Hold your horse, cowpoke.

      Just because it's based upon doesn't mean it will suffer the same issues as the Bulldozer. Perhaps this is the core which really works well, while the more consumer oriented Bulldozer is the red-headed stepchild.

      In this case "based on" means they take two of the exact same pieces of silicon (or 'die' in industry terminology) used in consumer Bulldozer, and mount them in a single package (a multichip module, or MCM), making a 16-core chip from two 8-core die. AMD can't do too many different silicon designs (for budget / human resource reasons), so they tend to do as much sharing between consumer and server products as possible.

    8. Re:really 16 core? by AmiMoJo · · Score: 1

      It's just a shame Bulldozer is their desktop CPU. I really don't know how they could have screwed up so badly... Something major must have happened for them to end up releasing their next gen architecture in a state where even at a high clock speed performance is worse than the previous (and much cheaper) generation for most customers.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    9. Re:really 16 core? by gweihir · · Score: 1

      Does not help if the designs they put on the chips are stupid. And they are.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  4. how do they compare ? by unity100 · · Score: 0

    they are much more capable in multithreaded performance. dozer cores were especially designed for these. like, how apparently 'sucking' for desktops, dozer surpasses all normal SBs (not e) in heavily multithreaded apps (4 cores and more) like photoshop cs5.

    servers are heavily multithreaded. since dozer cores are especially suitable for tasks that are run on servers, more cores become even more desirable.

    1. Re:how do they compare ? by PIBM · · Score: 2

      1: You can buy your new sandy bridge from newegg or such right now, while those new bulldozers are nowhere to be found.
      2: Overclocking any chip is bound to require a lot more power than the TDP no matter which one you are using.
      3: Dozer's core, as you said, feel like they are dozing on the job..

    2. Re:how do they compare ? by KingMotley · · Score: 1

      No, they aren't.

    3. Re:how do they compare ? by beelsebob · · Score: 1

      Really? Because this looks like the FX-8150 getting beaten 3 ways silly by even an i5-2500 at photoshop:
      http://images.anandtech.com/graphs/graph4955/41688.png

    4. Re:how do they compare ? by unity100 · · Score: 1

      really.

      http://www.tomshardware.com/reviews/fx-8150-zambezi-bulldozer-990fx,3043-15.html

      radial blur, shape blur, median, polar coordinates.

      This test employs threaded filters, taxing as many cores as we throw at it. Zambezi’s eight integer units capitalize, flying past the Core i5 and Core i7, outright trouncing the six-core Phenom II X6 1100T, too.

    5. Re:how do they compare ? by unity100 · · Score: 5, Informative

      and many, many, moooreeee

      -mainconcept http://www.lostcircuits.com/mambo//i...&limitstart=17
      -mediashow http://www.guru3d.com/article/amd-fx...ssor-review/14
      -h.264 http://www.guru3d.com/article/amd-fx...ssor-review/14
      -vp8 http://www.guru3d.com/article/amd-fx...ssor-review/17
      -sha1 http://www.guru3d.com/article/amd-fx...ssor-review/17
      -photoshop cs5 http://www.lostcircuits.com/mambo//i...&limitstart=14
      -photoshop cs5 http://www.tomshardware.com/reviews/...x,3043-15.html
      -winrar, faster than 2600k http://www.techspot.com/review/452-a...pus/page7.html
      -winrar, improves over x6 http://www.tomshardware.com/reviews/...x,3043-16.html
      -7-zip better than 2600k here: http://images.anandtech.com/graphs/graph4955/41698.png http://www.anandtech.com/show/4955/t...x8150-tested/7
      -7-zip same perf as 2600k http://www.tomshardware.com/reviews/...x,3043-16.html
      -POV-ray, faster than 2600k http://www.legitreviews.com/article/1741/10/
      -POV-ray http://www.nordichardware.se/test-la...art=15#content
      -x264(2nd pass AVX enabled) http://www.anandtech.com/show/4955/t...x8150-tested/7
      -x264 (2nd pass, better overall than 2600k) http://www.bjorn3d.com/read.php?cID=2125&pageID=11108
      -x264 (2nd pass +.3 than SB2600k) http://www.legitreviews.com/article/1741/7/
      -handbrake; http://www.legitreviews.com/article/1741/9/
      -truecrypt; http://www.bjorn3d.com/read.php?cID=2125&pageID=11111
      -solidworks; faster than 2600k http://www.techspot.com/review/452-a...pus/page7.html
      -abbyy filereader http://www.tomshardware.com/reviews/...x,3043-16.html
      -C-Ray, as fast as $1k i7-990X, http://i664.photobucket.com/albums/v.../c-rayir38.png

    6. Re:how do they compare ? by beelsebob · · Score: 0

      Notably though, everywhere else, even the i5 beats the shit out of it. Not because the other tests aren't multithreaded, but because they're not leveraging integer work exclusively. Or are you trying to suggest that the more threaded i7 beats the higher clocked i5 based on pure magic?

      Bottom line – Bulldozer isn't good at multithreading, it's good at integer work. Unfortunately, servers are mostly logic work, so sandy bridge is likely to destroy it.

    7. Re:how do they compare ? by beelsebob · · Score: 1

      Good work digging up all the graphs where Bulldozer manages to get between the i5 and the i7 (which, based on its price point *it damn well should*, being priced half way between the two). Unfortunately, while you've dug up a nice bunch of places it just about holds its own, there many times more where the Sandy Bridge chip eats it for breakfast, including heavily multithreaded work. As I said above – Bulldozer is good at very multithreaded integer work, and pretty much nothing else.

    8. Re:how do they compare ? by unity100 · · Score: 1

      there many times more

      yes. then instead if shooting from the hip, recount those times and occasions.

    9. Re:how do they compare ? by Rockoon · · Score: 1

      Logic work *is* integer work, fool.

      --
      "His name was James Damore."
    10. Re:how do they compare ? by unity100 · · Score: 1

      Bottom line – Bulldozer isn't good at multithreading, it's good at integer work. Unfortunately, servers are mostly logic work, so sandy bridge is likely to destroy it.

      oh boy. i just saw this. you dont know shit.

      'servers are mostly logic work' hahahahaa. luckily someone else gave your answer.

      next time, dont talk without knowing shit. 'servers' mean heavily multithreaded integer work. in these, bulldozer excels. and that is also one of the reasons why there have been 3 amd opteron (bulldozer 16 core) supercomputer orders in the past 3 weeks. NOT intel. amd. opteron, bulldozer. SUPERcomputer.

    11. Re:how do they compare ? by unity100 · · Score: 1

      Bottom line – Bulldozer isn't good at multithreading, it's good at integer work. Unfortunately, servers are mostly logic work, so sandy bridge is likely to destroy it.

      excuse me but you have posted the same bullshit without knowing SHIT about what you are talking on the second time here. apparently you havent read what you have been told about how logic work being integer work by another slashdotter.

      i replied to you on your ignorance in the other post. 3 supercomputers that are bulldozer based, in the past 3 weeks. a supercomputer a week. yes. sandy bridge e must be 'LIKELY' to destroy bulldozer in heavily multithreaded workloads.

      how about not talking on stuff you dont know shit about next time, and not coming up like a moron as a consequence ? please.

    12. Re:how do they compare ? by unity100 · · Score: 1

      even the i5 beats the shit out of it

      are you aware that the tooling process and silicon cutting in the factories for this chip, has not matured yet ? do you even know what these mean ?

    13. Re:how do they compare ? by beelsebob · · Score: 1

      No, no it's not, logic work includes all kinds of things like branch prediction, pipeline length and hence amount flushed when it all goes titsup, etc. Notably Bulldozer, does terribly at this, but not so badly at pure integer work.

    14. Re:how do they compare ? by zixxt · · Score: 1

      Good work digging up all the graphs where Bulldozer manages to get between the i5 and the i7 (which, based on its price point *it damn well should*, being priced half way between the two). Unfortunately, while you've dug up a nice bunch of places it just about holds its own, there many times more where the Sandy Bridge chip eats it for breakfast, including heavily multithreaded work. As I said above – Bulldozer is good at very multithreaded integer work, and pretty much nothing else.

      Nice Trolling

      --
      ---- GENERATION 26: The first time you see this, copy it into your sig on any forum and add 1 to the generation.
  5. Bulldozer Cores are not that Great by TheTyrannyOfForcedRe · · Score: 4, Interesting

    The "cores" in Bulldozer are not your typical first-class x86 core. Bulldozer "cores" are worth 2/3 of a modern x86 core. The 6200 is more like a 10 core. Add to that the crappy IPC and I'm not impressed.

    I was excited about Bulldozer before it was released. It's not often that CPU makers take chances on radical new architectures. Too bad this one turned out to be a huge pile of fail.

    --
    "Liechtenstein is the world's largest producer of sausage casings, potassium storage units, and false teeth."
    1. Re:Bulldozer Cores are not that Great by synapse7 · · Score: 1

      Hopefully they can be improved upon. I remember the first P4s had enough suck to be the target of a class-action suit.

    2. Re:Bulldozer Cores are not that Great by Theovon · · Score: 5, Informative

      Your description in inaccurate, but that's not surprising since most slashdot readers don't know much about CPU architecture.

      Bulldozers are essentially full-fledged cores, where the two cores in each module are mostly independent. There are two completely independent integer pipelines, so people seem to want to harp on the fact that the FPU is "shared". It's really a single split FPU, where each half can execute independent instructions, as long as the data width is 128 bits or less. Only when it is executing 256-bit AVX instructions is there any competition for resources. This is a very sensible design decision, since you don't find enough AVX software right now to justify completely dedicated AVX logic. (Plus, IIRC sandy bridge's FPU is only 128 bits wide and issues AVX instructions in two cycles, so what's the difference?) Moreover, even with AVX-heavy workloads, most software won't issue AVX instructions every cycle, and two AVX-heavy tasks on the same module won't really run into much contention. Assuming my memory of Sandy Bridge's FPU is correct, then Bulldozer has the advantage of having lower latency within the FPU on isolated AVX instructions.

      The PROBLEM with Bulldozer is that they just have not done some of the really aggressive and costly things that Intel has done in their design. Bulldozer is still a 3-issue design. While going to 4-issue doesn't help that much that often, it still gives Sandy Bridge a slight edge. But where SB REALLY gets its advantage is the huge instruction window. Intel found clever ways to shrink the logic for various components so that they could make room for a much larger physical register file and reorder buffer. As a result, SB can have many more decoded instructions in flight, which exposes more instruction-level parallelism and, critically, absorbs more memory access latency.

      A Sun engineer (discussing Rock, among other things) once described modern CPU execution as a race between last-level cache misses. When you have a miss on your L3 cache, it can cost hundreds of cycles, upwards of 1000. During that miss, the CPU fills up its reservation station with other instructions and then stalls, waiting on something to retire. This won't happen for a long time. Because of the disparity in speed (and latency) between compute and memory access, this is typically the most significant bottleneck. By enlarging the instruction window, SB can achieve much higher throughput, and it shows in the benchmarks.

      This is Bulldozer's Achilles' heel. I know there are a few benchmarks where Bulldozer is faster than SB, but they're not typical workloads with typical memory footprints. Anyhow, so if you're going to rag on Bulldozer, rag on it for the right reasons. Bulldozer's "shared" FPU is a red herring.

    3. Re:Bulldozer Cores are not that Great by Anonymous Coward · · Score: 0

      true, but P4 was never "improved upon". It was orphaned. The new and fancy features of the P4 architecture (aside from SMT) began, and died, with the P4.

    4. Re:Bulldozer Cores are not that Great by Anonymous Coward · · Score: 0

      It depends on what you're working with. Most logic is integer based and with that a Bulldozer core scales really well.
      http://www.phoronix.com/scan.php?page=article&item=amd_bulldozer_scaling&num=5

    5. Re:Bulldozer Cores are not that Great by ericloewe · · Score: 2

      Bulldozer was very poorly handled from the beginning. What really suprises me is that they tried the NetBurst approach: when all else fails, go for clocks. Unfortunately, ARM seems to be focusing on a similar strategy (more cores, higher clocks, less focus on IPC)... Anyways, I don't buy their "poorly optimized" story. They knew all about it and could've waited - surely they realized at the early stages of development that OSes aren't optimikzed for this yet. They could've delayed Bulldozer and pushed out yet another incremental upgrade to the Phenoms - the die shrink alone would probably yield better results than those achieved by Bulldozer. Meanwhile Intel is able to get away with what is essentially 50% more performance in multi-threaded applications, 0% more in single-threaded ones (save minor influences from the memory subsystem and cache, which surprisingly have a HIGHER latency than SB). All this for around 100% more cash, plus added costs for "high-end" motherboards (still lacking native USB 3.0 from the chipset, along with only two native SATA 6.0Gb/s ports), quad-channel memory and a cooler.

    6. Re:Bulldozer Cores are not that Great by Anonymous Coward · · Score: 0
    7. Re:Bulldozer Cores are not that Great by Artraze · · Score: 5, Informative

      The OP right, and seems to understand the issues far better than you. It isn't that the FPU is shared, it that nearly _everything_ is shared: Instruction cache, fetch and decode, FPU, L2 data cache. The only things that aren't shared are L1 data and integer operations (scheduler and ALU).

      Instruction issuing and and cache misses are big performance areas, but these are precisely the resources the cores share! You're running two threads off (with the exception of L1 data) the same caches and instruction fetches. So, in reality, the second core in bulldozer is much more like ultra-hyperthreading than it is a second core. I think the fact that they're even listed as cores is a marketing strategy that has backfired pretty hard.

      P.S. L3 cache has proven to be quite useless in many workloads... It helps a bit in servers, IIRC, but that's about it. So it's more a race to L2 cache, which, again, is a shared resource. AMD, in fact, has indicated that it may drop the L3 from desktop parts.

    8. Re:Bulldozer Cores are not that Great by Tomato42 · · Score: 1

      Then why Bulldozers are slower than Phenom II's in file compression (rar, zip, 7z, pick your poison) clock for clock? That's definitely not a shared FPU problem...

    9. Re:Bulldozer Cores are not that Great by mestlick · · Score: 1

      There are a few big mistakes about Bulldozer here.

      The FP is completely shared between the integer clusters. The FP is 4-wide and the two clusters compete for all the resources in the FP.

      Each Bulldozer integer cluster is 4-wide. The shared instruction fetch is also 4-wide.

      Sandy Bridge has 168 instructions in flight and Bulldozer has 128 per cluster. Sandy Bridge has a combined FP/INT scheduler with 54 entries. Bulldozer has separate schedulers with 40 INT per cluster and 60 FP entries.

      You are correct about BDs Achilles heal. The L2 and L3 latencies are longer than SB. I think the solution is to reduce the latencies, not increase the in flight window size.

    10. Re:Bulldozer Cores are not that Great by TheTyrannyOfForcedRe · · Score: 0

      Your description in inaccurate, but that's not surprising since most slashdot readers don't know much about CPU architecture.

      Gotta love Slashdotters who think they know you inside and out after reading one post. I graduated from one of the top Computer Engineering programs in the world and with very good grades. I ended up going into software after graduation but I did study computer architecture extensively in college, and designed and built a working CPU for my senior design project.

      --
      "Liechtenstein is the world's largest producer of sausage casings, potassium storage units, and false teeth."
    11. Re:Bulldozer Cores are not that Great by Rockoon · · Score: 1

      If you look at the performance numbers comparing Phenom II x4 830 (2.8ghz) to the new A8-3850 (2.9ghz) you see that the lack of L3 isnt a problem at all when you can also pack on twice as much L2.

      --
      "His name was James Damore."
    12. Re:Bulldozer Cores are not that Great by WilliamBaughman · · Score: 1

      Your description is also inaccurate. Instruction decode and L2 cache are shared between cores in Bulldozer modules as well; I wouldn't ding Bulldozer for the shared L2 cache but the L1 cache is write-through, and there doesn't seem to be enough cache bandwidth to keep both integer cores busy. Bulldozer is not a 3-issue design, it is a 4-issue design. With regards to Bulldozer's Achilles' heel, I think that its deficiency in single-threaded performance comes more from actual cache misses and latency than the smaller instruction window. I could be proven wrong by architectural studies that come out in the future. Either way, those studies will be interesting.

    13. Re:Bulldozer Cores are not that Great by loufoque · · Score: 1

      (Plus, IIRC sandy bridge's FPU is only 128 bits wide and issues AVX instructions in two cycles, so what's the difference?)

      My SSE code converted to AVX runs two times faster (not all of it though -- certain instructions do run in two cycles)

    14. Re:Bulldozer Cores are not that Great by Anonymous Coward · · Score: 0

      - Bulldozer has a 64kB L1 instr cache (per mod), which is 2x SandyBridge
      - Bulldozer has a 2MB L2 cache (per mod), which is 10x SandyBridge

      Sure they are shared, but given the size increase that isn't necessarily a bad thing. The real question is the cache contention due to cacheline sharing, and access latency to said cache (competing requirements based on associativity). The real question is what is the effectiveness of the shared components within a given workload.

      - Bulldozer has a 16kB L1 data cache (per core), which is 1/2 of SB.

      Personally I'd be more worried about this fact despite that it's not shared. Again, real workloads and real data to indicate a components effectiveness is more important (i.e L1 cache miss rates + cyc cost).

    15. Re:Bulldozer Cores are not that Great by Anonymous Coward · · Score: 0

      The real question is the cache contention due to cacheline sharing, and access latency to said cache (competing requirements based on associativity).

      Here's some more complete data, comparing the 4-core/8-thread SB 2600 to the 4-module/8-"core" FX 8150. Latency measurements are as reported by AnandTech. The "Interlagos" 16-core Bulldozer is two 8-core FX-8150 dice mounted in a MCM, and one SB 2600 die generally outperforms one FX-8150 die, so this is a more relevant comparison than you might think.

      4-core SB hierarchy:
      L0: 1.5K entry 8-way uop cache (a special ultrafast icache that stores previously decoded instructions)
      L1: 32K I + 32K D, 8-way set associative, 4 cycle latency
      L2: 256K, 8-way, 11 cycle latency
      L3 (per-core slice): 2MB, 16-way, 25 cycle latency
      L3 (total): 8MB, 64-way (or at least I think it acts as if it's 64-way)

      A note on the L3 slices: each SB core has slightly lower latency access to its "local" 2MB slice. Although the cache as a whole acts like one cache (I think), each core and slice has its own drop on a shared ring bus; to access a remote slice a core must send a message on the ring bus and wait for the response, and this adds a few cycles (I think it's supposed to be ~5 extra cycles worst case).

      4-module Bulldozer hierarchy:
      L1 icache: 64K, 2-way, 4 cycle latency (shared between both "cores" in a module)
      L1 dcache: 16K, 4-way, 4 cycle latency (1 per core, or 2 per module)
      L2: 2MB, 16-way, 21 cycle latency (1 per module)
      L3: 8MB, ??-way, 65 cycle latency

      In Bulldozer the L3 cache apparently operates mainly as a victim buffer for data evicted from the L2 caches.

      As you can see the cache hierarchies are very different! AMD has a lot more cache memory, 16768KB total compared to Intel's 9472KB (not including the L0 uop cache in Intel's total since it's not straightforward how many bytes of cache that translates to).

      However, Intel's L1 caches likely have a significantly better hit rate due to the 8-way set associativity, and AMD has nothing comparable to the SB uop cache. (This is something tightly coupled to the side of the decoder... as the processor fetches instructions and decodes them, the micro-ops (uops) emitted by the decoder are also stored in the uop cache. When the processor's instruction fetcher hits in the uop cache, there is no need to fetch anything from the L1 icache, or re-decode the instruction.)

      And when you hit L2 and beyond... yes, AMD has a lot of bits, but look at the access latencies. Intel's L3 is almost as fast as AMD's L2, and AMD's L3 is horrible. It's very likely that average access time is significantly better on Intel.

    16. Re:Bulldozer Cores are not that Great by Anonymous Coward · · Score: 0

      What would be those mistakes other than the 3-issueness of BD and the SB's FPU AVX delay? The BD FPU was marketed from the beginning as 2x128bit design, giving it the benefit over a 1x256bit unit with the older codes.

    17. Re:Bulldozer Cores are not that Great by goarilla · · Score: 1

      Huh, Northwood was a big improvement over Willamette !

  6. Poor performance by Anonymous Coward · · Score: 1

    I have a test machine with the 12-core version and the single-core performance is truly dreadful. Intel chips that are several year older perform way better in this regard. Even with a workload where the 16 cores can all be used to the fullest extent, I doubt the performance comes close to modern Intel chips.

    1. Re:Poor performance by nomel · · Score: 2

      This isn't the point. You get 16 cores (slowish compared to top of the line, they may be) that will fit in a single socket on a single motherboard, with a single power supply. This is a *huge* cost saving for machines that it makes sense to use them in...servers, where single core performance is relatively stupid to consider.

    2. Re:Poor performance by the+linux+geek · · Score: 3, Insightful

      Servers need single-thread too; think stuff like big database writes, joins, ERP, and CRM. Think outside the embarrassingly-parallel web-serving box.

      If multithreaded performance was all that matters, the Sun Niagara chips would have done a lot better than they did.

    3. Re:Poor performance by timster · · Score: 1

      The big question I have is if it will be like AMD's previous 12-core chips, where you could get 4 of them crammed into a 2U server for not all that much money. 4-Xeon configurations are way more expensive.

      --
      I have seen the future, and it is inconvenient.
    4. Re:Poor performance by DarkOx · · Score: 3, Insightful

      Umm, Joins can be done in parallel, in lots and lots of cases. ERP and CRP are applications that ought to see big improvements form more cores, if you have more than a few users anyway. It also simplifies things, you don't have to figure out how to architect the thing to run across 10 hosts anymore, good multi-core systems deliver there performance these apps need if you can get the disk IO solved. A good SAN with mutlipath support and multiple HBAs can get there.

      Niagara failed because each individual core was too slow, a comparable cost Intel CPU could do in serial with one core two jobs, in less time than Niagara could do one job with on core. The question is here for most paralleled work loads like a database where all cores will be used are AMDs 16 core chips at least 62% the speed of Intel's 10 core chips on core vs. core basis? If true other things being equal for *some* work loads these Opterons will be better.

       

      --
      Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
    5. Re:Poor performance by the+linux+geek · · Score: 1

      The whole point of the Niagara was to provide zero-impact context switches and to effectively hide latency, and to get close to 100% utilization as a result. On embarrassingly parallel workloads (web serving was the one Sun hyped hardest) a 32-thread T1 or 64-thread T2 did quite a bit better than its Intel contemporaries. The problem is that a lot of workloads expect to be able to consistently issue more than 200 million instructions per second per thread to do well, even when you are hiding latency by doing fast thread switches. Databases, a workload you cite as parallelizing well, tended to run like crap on Niagara.

      With Bulldozer, you effectively have 16 2-issue cores, since each module has a shared 4-instruction decoder. I'm skeptical that it will perform all that much better on multithreaded integer workloads than the T3 did, which had 16 1-issue cores with aggressive multithreading and latency hiding on top. On the other hand, Westmere-EX is 10-core, 4-issue, 2-way SMT-capable per core, and has big caches with good latency numbers. On both a technical level and based on early benchmarks (SPECcpu), things don't look good for Interlagos.

  7. Wish List by Nom+du+Keyboard · · Score: 3, Informative

    I so much want some real competition for Intel. Competition that doesn't artificially limit clock speeds and fuse off perfectly good working features in order to market a dozen overlapping and conflicting SKUs at a dozen different price points. And working drivers, current standards (DirectX 11 and OpenCL for starters), and USB-3 that doesn't require a $50 cable between every device would be nice.

    --
    "It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
    1. Re:Wish List by Anonymous Coward · · Score: 0

      The real competition is from ARM. The match has not really started yet...

  8. Intel vs AMD's philosophy as of late by Anonymous Coward · · Score: 1

    Intel: "Let's improve the memory controller's bandwidth, increase our IPC and also improve our platform by adding more PCIe lanes to the chipset that enthusiasts will find a use for"
    AMD: "MOAR COARS!!!111!one"

    1. Re:Intel vs AMD's philosophy as of late by level_headed_midwest · · Score: 4, Interesting

      Eh, how about this:

      Intel: I know, let's try to see just how many features/cores/cache we can fuse off in our dies and different socket combinations to try to make *puts pinky finger to mouth* one MILLION SKUs! Oh, and while we're at it, let's add a FOURTH memory channel, because more is better! Sure, we could get all the bandwidth we need with two DDR3-1866 or -2133 channels and that you really only get about three channels' worth of bandwidth because we have to clock the IMC down to DDR3-1333 with two modules per channel- but we still have FOUR channels! Oh, and we forgot, it's the start of a new quarter so we need to release a new socket. Can't let those socket suppliers get lazy making last quarter's socket design. What, you guys want us to release Sandy Bridge-based Xeon MPs because MP platforms actually need that much bandwidth and core count? We just released the Westmere-based ones a few months ago! Don'tcha know that Xeon MPs run two years behind everything else? Geez, what did you do, wake up yesterday? Next you'll want us to stop crippling our chips, stop using a new socket every other month or something ridiculous like that. Where do you guys get those ideas?

      AMD: Based on market analysis, most server applications use primarily integer code and require a lot of bandwidth, memory capacity, and a high core count. We don't have over a hundred billion dollars in market cap to fund several parallel R&D teams to design a specific CPU for every edge use case, so we will design a CPU that is highly modular, has good integer performance (because that's what our research indicated most server apps are), and has a lot of cores. Experience with Intel's HyperThreading is less than stellar with regards to predictable performance, so we will use our CMT approach that leads to better integer performance than HyperThreading but doesn't increase the die size by a huge amount, since we can't afford to make 400-600 mm^2 dies like Intel does to have a lot of physical cores. Oh, and we'll continue to use the existing server platforms out there so our customers can drop-in upgrade and we'll also not change any feature sets in the SKU stack other than the clock speed and number of enabled modules and their associated caches. We do apologize for being "late" with these parts since we usually release server and client at about the same time...

      --
      Just "gittin-r-done," day after day.
    2. Re:Intel vs AMD's philosophy as of late by Anarke_Incarnate · · Score: 2

      AMD already had the on die memory controller. Their answer to intel's Hyperthreading was real cores. The QPI bus that intel uses is very similar to the one AMD pioneered with Hypertransport. Let's not forget that AMD64 (oh, did you want me to call it EM64T or x86_64?) was a product of AMD's engineering effort rather than forcing people toward the EPIC architecture which seems to be niche based.

    3. Re:Intel vs AMD's philosophy as of late by Anonymous Coward · · Score: 0

      I remember in the K6/K6-2 days how AMD was harping on about FPU performance.

      Some of us bought into the FPU thing, only to discover a P-II with faster integer performance was better, as most games used integer-based fixed-point math, and everything else later on was handed off to Glide.

      Mind you, the K6 series chips were much cheaper than Intel-based solutions, so we did get good 'bang for our buck'.

      It's kinda funny that they're on to the integer performance nowadays, whereas most games are floating point based all the way through. Oh well, at least the back-end people will be happy..

    4. Re:Intel vs AMD's philosophy as of late by CajunArson · · Score: 1

      2005 Called they want their list of bragging rights back. Oh and hypertransport is mostly technology that AMD bought from Digital along with parts of the Alpha team. They get some credit for bringing their version to market first, but it's hardly like they came up with the idea for Hypertransport out of thin air. As for x86-64, AMD brought it to market first, but Intel had internal builds of 64 bit enabled X86 chips around for some time, which is why they could bolt it onto the P4 and not require a brand new microarchitecture.

      Of course, what people forget most about x86-64 is that Microsoft was the big proponent of a 64 bit x86 chip since it meant they could start moving into larger scale data server applications. The irony is that EPIC, for all the hate it gets in this website, is dominated by Linux and other UNIX derivatives where MS is almost completely absent.

      --
      AntiFA: An abbreviation for Anti First Amendment.
    5. Re:Intel vs AMD's philosophy as of late by CajunArson · · Score: 1

      Oh.. and integrated memory controller:
      1. The 486 had one too.
      2. Look at Bulldozer's atrocious memory performance: There's a difference between slapping any memory controller on-die and slapping a *good* memory controller on-die. Intel has the good one.

      --
      AntiFA: An abbreviation for Anti First Amendment.
    6. Re:Intel vs AMD's philosophy as of late by Anarke_Incarnate · · Score: 1

      Which is why they fought tooth and nail and then implemented a poorer version of it (original EM64T had issues with pointers) after the fact. Somebody forgot their history.

      The problems with EPIC are the poor performance for certain applications as well as limited jumps in performance compared to the leaps that x86 gets, despite being a problematic architecture.

    7. Re:Intel vs AMD's philosophy as of late by Anarke_Incarnate · · Score: 1

      Intel's memory controller has issues too. The configuration of servers with high amounts of RAM becomes needlessly complicated when dealing with intel as to prevent the performance dropping when stepping from 1333 to 1066 to 800MHz in certain configurations. The new Interlagos chips run at 1600. Call me when you have actual benchmarks of Interlagos, and finally decide that explicitly parallel jobs benefit more from actual cores than from hyperthreading.

    8. Re:Intel vs AMD's philosophy as of late by CajunArson · · Score: 1

      --> Call me when you have actual benchmarks of Interlagos,
      You don't have them either. What's amusing is that I'm using known data from 1/2 of an interlagos chip (Bulldozer) at much higher clockspeeds than what Interlagos will operate at to make my assumptions. There's plenty of data from just the 6 core 3960 and 3930 chips that came out today that indicate that even desktop Bulldozer x 2 with theoretically perfect scaling won't beat the upcoming Xeons. You ain't gonna get perfect scaling and you ain't gonna get desktop Bulldozer clock speeds. You're just hoping that reading John Fruehe's blog will result in a miracle.

      --> and finally decide that explicitly parallel jobs benefit more from actual cores than from hyperthreading.

      Nobody has ever argued that hyperthreading is better than *real* cores. AMD's problem is that they didn't really introduce *real* cores but this bizzaro quasi core setup. In practice it looks like AMD's solution is about on par with hyperthreading, so you can insult Intel all you want and scream MOAR COARSS just like AMD told you to, but that won't make Interlagos magically destroy Intel.

      --
      AntiFA: An abbreviation for Anti First Amendment.
    9. Re:Intel vs AMD's philosophy as of late by Anonymous Coward · · Score: 0

      How about this: the quad channel controller allows 8 dimms. I don't think it's about bandwidth.
      At work the static FEA wants as much ram as possible but only 1 fast core of CPU. The RAM is for caching disk IO and while 16 GB is good, I'd prefer 32 GB (or 64 GB for 6 times the price).

    10. Re:Intel vs AMD's philosophy as of late by yuhong · · Score: 1

      Don't forget Intel pricing Xeon MP at thousand of dollars per CPU while AMD rags about the lack of this "4P tax".

    11. Re:Intel vs AMD's philosophy as of late by Anonymous Coward · · Score: 0

      Which is why they fought tooth and nail and then implemented a poorer version of it (original EM64T had issues with pointers) after the fact. Somebody forgot their history.

      Somebody never knew their history in the first place. Problems with pointers? EM64T never had problems with pointers, saying so says that you're a poser who has no idea what he's talking about.

      AMD64 and EM64T are virtually identical in user mode (or they wouldn't be compatible). That includes every instruction which might be used to make a pointer calculation or otherwise generate a memory address. If these things did not work identically there would be no hope of ever shipping a single 64-bit program which worked on both AMD64 and EM64T.

      The main differences between the two are in privileged mode instructions, and thus only affect operating systems. (and not very much, either.)

    12. Re:Intel vs AMD's philosophy as of late by Anonymous Coward · · Score: 0

      Intel's memory controller has issues too. The configuration of servers with high amounts of RAM becomes needlessly complicated when dealing with intel as to prevent the performance dropping when stepping from 1333 to 1066 to 800MHz in certain configurations. The new Interlagos chips run at 1600.

      Oh good lord, you're a mindless AMD fanboy zombie.

      The "certain configurations" you're referring to are when you install multiple DIMMs in a single memory channel. Intel's "issue" here is cold hard reality for everyone: each DIMM adds capacitive load and stubs to the memory channel, and you can't run busses with lots of stubs and capacitance at super high clock rates. More load and more sources of signal reflections equals lower clock rate. (Unless you like unreliability!)

      AMD has not been granted any exceptions to the laws of physics. Just look at the user's manual for any Socket G34 (Magny-Cours/Interlagos) motherboard, you'll find there are also complex rules there about how fast the memory can run based on how much you're trying to install, as well as memory type. Registered DIMMs reduce load compared to unbuffered DIMMs (UDIMMs), which increases clock rate when installing lots of memory, but really large RDIMMs are quad rank, which adds some load back and can drop speed back down again. Reduced voltage DDR3 (used to reduce power) also reduces memory clocks. So on and so forth.

      I guarantee you an Interlagos won't be able to run DDR3 at 1600 with 3 UDIMMs hanging off each memory controller channel. For that matter, it probably won't even make that speed with registered DIMMs, 3 per channel is a lot. Once again, this is physics.

    13. Re:Intel vs AMD's philosophy as of late by Anarke_Incarnate · · Score: 1

      The point was "There are no benchmarks" so hold off judgement until we see how it handles work. AMD did introduce real cores, what they do have, however are a poorly implemented shared FPU design. It is like they are trying to shoe-horn server chips at people. FPU matters a lot less to most server tasks.

    14. Re:Intel vs AMD's philosophy as of late by Anarke_Incarnate · · Score: 1

      While not an issue right NOW, the original EM64T did have an issue with pointers. The issue was that there was no hardware IOMMU for them. Thus, in order to DMA memory above 32bit allocation, they had to use pointers.

      http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/3/html/Release_Notes/as-amd64/RELEASE-NOTES-U2-x86_64-en.html

      Software IOTLB — Intel® EM64T does not support an IOMMU in hardware while AMD64 processors do. This means that physical addresses above 4GB (32 bits) cannot reliably be the source or destination of DMA operations. Therefore, the Red Hat Enterprise Linux 3 Update 2 kernel "bounces" all DMA operations to or from physical addresses above 4GB to buffers that the kernel pre-allocated below 4GB at boot time. This is likely to result in lower performance for IO-intensive workloads for Intel® EM64T as compared to AMD64 processors

    15. Re:Intel vs AMD's philosophy as of late by Anarke_Incarnate · · Score: 1

      Actually I am not an AMD Fanboy. I currently only own 1 AMD machine, an older Dell that I bought back in '07 that acts as a fileserver. My 2 laptops, and desktop are all running intel CPUs. At work, we use intel almost exclusively (I'm a systems engineer). At my previous company, we did switch to AMD, at my recommendation, because they needed the actual threads and HT would have cost them about 30% performance.

      My issue with intel was not the fact that there are electrical signaling issues when trying to install a lot of memory, but rather that intel's design required a multiple of 3 instead of 4 which often caused a dance of the slots. I have spoken to AMD and Intel engineers. AMD's new offering will run at 1600 until you fill past a major point, then step down to 1333. Intel does not make it as simple, and when dealing with field service it is easier to have them work on an AMD machine.

    16. Re:Intel vs AMD's philosophy as of late by Anonymous Coward · · Score: 0

      Actually I am not an AMD Fanboy. I currently only own 1 AMD machine, an older Dell that I bought back in '07 that acts as a fileserver. My 2 laptops, and desktop are all running intel CPUs. At work, we use intel almost exclusively (I'm a systems engineer). At my previous company, we did switch to AMD, at my recommendation, because they needed the actual threads and HT would have cost them about 30% performance.

      What the hell? The only time Intel HT ever cost significant performance was the original Pentium IV HT, and it could be turned off. Did you lose that job when your boss noticed that you invent crazy reasons to do things?

      My issue with intel was not the fact that there are electrical signaling issues when trying to install a lot of memory, but rather that intel's design required a multiple of 3 instead of 4 which often caused a dance of the slots.

      Please, stop equivocating. Your issue was:

      The configuration of servers with high amounts of RAM becomes needlessly complicated when dealing with intel as to prevent the performance dropping when stepping from 1333 to 1066 to 800MHz in certain configurations.

      Not a single word in there about Intel having 3 channels instead of 4, plenty of them about memory clock speeds dropping (which is what happens when you put 2 or 3 DIMMs in a single channel).

      As for correct population of three DDR3 memory controller channels being too hard, whatever kind of "systems engineer" you claim to be, you aren't much of one if you can't handle multiples of 3 instead of multiples of 2 or 4. I can't believe you're actually whining about how haarrrrrrdd that is.

      I have spoken to AMD and Intel engineers.

      If you did, your biases and inability to grasp simple concepts such as "three" caused you to misunderstand practically everything they said.

      AMD's new offering will run at 1600 until you fill past a major point, then step down to 1333.

      Bull. I actually looked up the docs for a Tyan Socket G34 motherboard (that'd be the socket Interlagos goes in, FYI, doubt you knew that) and it is nowhere near as simple as that.

      The fact that you think AMD is in total control of this and Intel isn't speaks volumes: it means you have no idea what you're talking about. AMD can't guarantee "fill to a major point, then 1333" since signal integrity is not just about the CPU, it's also very dependent on motherboard design & manufacturing quality, and even DIMM PCB design & quality. (That last is why systems which need to be reliable are almost never bought with generic memory, instead they do real qualification testing in a lab on the specific DIMM designs intended to be shipped with the system and they require a re-qual if the supplier changes, if they do a change to the DIMM PCB or memory chip supplier, etc etc.)

      Intel does not make it as simple, and when dealing with field service it is easier to have them work on an AMD machine.

      Total nonsense.

    17. Re:Intel vs AMD's philosophy as of late by Anonymous Coward · · Score: 0

      While not an issue right NOW, the original EM64T did have an issue with pointers. The issue was that there was no hardware IOMMU for them. Thus, in order to DMA memory above 32bit allocation, they had to use pointers.

      /forehead

      You are terminally confused about basic concepts. Bounce buffers are not pointers! And everyone "has to" use pointers, for everything, all the time!

      A MMU translates virtual addresses used by programs to physical memory addresses. An IOMMU is the same, except instead of sitting between the CPU and memory, it sits between bus-master IO devices (such as a PCI network card with DMA support) and memory.

      http://en.wikipedia.org/wiki/IOMMU

      IOMMUs are not inherently a processor feature. They want to sit between IO devices and memory, and in system architectures where the memory interface isn't part of the CPU, that means an IOMMU can live wherever that memory interface happens to be. In fact, countless 32-bit AMD and Intel x86 machines equipped with IOMMUs were built and sold long before AMD64 processors hit the market: as the Wikipedia article says, the AGP GART (Graphics Address Remapping Table) was in fact an IOMMU. (The GART wasn't a fully general purpose IOMMU, in particular because it only translated addresses for the video card, but it was a real IOMMU.)

      What the RHEL release note is talking about is not your "issue with pointers". Rather, it was talking about a common limitation in bus mastering IO devices of the time -- most PCI cards and so forth only generated 32-bit addresses, and thus could not target physical memory addresses above 4GB for DMA. If an AMD64/EM64T system had an IOMMU, the kernel could work around this by programming the IOMMU with translations which mapped 64-bit memory addresses into the 32-bit space the PCI device understood.

      AMD chose to integrate the north bridge into every AMD64 processor, and made an IOMMU a standard feature of that integrated north bridge. Intel was still using discrete north bridges, not integrated, and chose to leave IOMMUs out of every NB except expensive server chipsets. Thus, it was possible (and common) to get an EM64T system with no IOMMU. All the RHEL release note is saying is that because they couldn't depend on an IOMMU on 64-bit Intel machines, they were punting for a while and just letting the kernel use the traditional solution, a bounce buffer. A bounce buffer is simply an I/O buffer deliberately allocated somewhere in the low 4GB of physical memory to make sure a 32-bit PCI device can target it.

      That was not an "issue with pointers". There were no issues with pointers. None. EM64T handled them fine. Once again, if it didn't, it never would have worked at all.

    18. Re:Intel vs AMD's philosophy as of late by Anarke_Incarnate · · Score: 1

      Ok, I tried to be somewhat cordial, but perhaps you need the proverbial smack in the mouth.

      HT is a performance sync in a lot of applications. Maybe you work in a small shop and do stupid things, but I do not. If the word enterprise is a foreign one, perhaps you should pay attention. Oracle recommends turning off HT on databases, due to performance problems. Solid NTP is near impossible with HT without locking NTP down with processor affinity. Also, changing clock speeds with any form of turbo or reduced clock speed can impact your time resolution. When you need to be within about 300 microseconds (that's right, not milli, micro) between machines, that is a Big F'n Deal! If machines are not all performing the same, this can cause an SEC investigation if the results are shown to favor someone in the market more than others. Oh, you didn't work for an exchange? I did.

      My points were that HT is not necessarily the performance boost that it is marketed as. It SHOULD be turned off in numerous cases and still causes issues when there is contention of resources. That can cause a context switch, which in turn, is usually a 50-150 microsecond penalty. The reason AMD was chosen was the application needed more threads than could be serviced by a similar intel setup (needed to go to 4 sockets rather than 2) without HT. HT was a non starter.

      Also, the reason that a 3 channel memory configuration was a pain in the ass was not because I, as a systems engineer couldn't figure it out, but again, as enterprise seems to be a difficult concept for you, imagine having to do a field upgrade or change, where all machines have to be identical, but 5 different field service techs are dispatched, all with the equivalent of a GED and rudimentary grasp of English or technology. I did not sit in the data center. Do you know how lights-out data centers operate? The machines had to be serviced around the world. The configurations being simpler on the AMD side made it more likely that the field service guys would not fuck it up.

      Lastly, I give a shit about your Tyan motherboard (Oh, and I did know about the socket name, thanks very much. I was there for NDA screenings of AMD's roadmap at my previous employer's premises). I am talking about vendors like HP, Dell and IBM who have world support. Tell me when a technician from Tyan is going to drop into a DC in Switzerland to do a rip and replace, then I might care.

      So, in conclusion, your working on a few racks of machines is irrelevant. When you have to design an infrastructure around how an application behaves in a complex interconnected way, on different continents as well as taking into account the serviceability of the machines, latency, throughput, and overall performance characteristics to the point where the machines cannot be more than X feet apart due to preferential lengths of network cable you and I can talk more.

    19. Re:Intel vs AMD's philosophy as of late by Anarke_Incarnate · · Score: 1

      Sink, not sync

  9. sandy bridge ep 95W by Chirs · · Score: 3, Informative

    There will be server versions as well...I've seen specs (publicly available) for an 8-core (16-thread) sandy bridge EP with a 95W TDP. I suspect it's clocked a bit lower and maybe binned for efficiency.

  10. Can you imagine a beowulf cluster of these? by Anonymous Coward · · Score: 0

    Sorry, had to try and bring it back...

  11. Compared to Intel by Anonymous Coward · · Score: 0

    We have a Sandy Box here at the office and we've benchmarked it and got some pretty impressive results. We have a NDA, but all I can say is that the performance is game-changing. So excited...

    University Employee
    Research Computing

  12. yes by unity100 · · Score: 2

    that must be why 3 supercomputers with dozer opterons have been ordered in the past 3 weeks.

    1. Re:yes by cheekyboy · · Score: 1

      AMD still has a 5% server market share.

      3 orders might push it to 6%

      --
      Liberty freedom are no1, not dicks in suits.
    2. Re:yes by unity100 · · Score: 1

      these are supercomputers. not servers. but you are right, dozer will push server share up, a lot.

  13. fool. by unity100 · · Score: 1

    they are like 3/4 cores. neither 1 core, nor half core.

    1. Re:fool. by beelsebob · · Score: 2

      The problem is, while this is true, bulldozer also suffers from being a fairly crappy arch design compared to sandy bridge. The result is that AMD's 8 "core" bulldozer is only roughly as fast as intel's 4 core i5 without hyperthreading. Extrapolate this to bolting two 8 "core" bulldozers together and you get to... well, that would only be about as fast as an 8 core sandy bridge with no hyperthreading, or a 6 core with hyperthreading. Given that Intel is selling 6 core E5 Xeons with hyperthreading for less than the $1000 AMD is asking for this, that really isn't boding well is it. This of course is then forgetting that this Bulldozer is very underclocked to keep power consumption down. This really doesn't look promising for AMD.

    2. Re:fool. by Afell001 · · Score: 1

      1) Sandy Bridge is on its second generation. It inherits from the long line of progression from the Core legacy and has done very well considering the amount of money that Intel has pumped into developing these processors. To say that these chips are very mature would be an understatement.

      2) AMD has invested a fraction of the R&D expense that Intel has sunk into developing SB/Core architecture when comparing it to BD development. On top of that, BD is in its infancy and is exploring new paths to try and gain efficiencies. I think BD developers need to be proud of their accomplishment, even if it doesn't quite match up clock-for-clock against SB. As the design for these processors matures, and AMD releases a few more Steppings, we will probably see improvements in power usage and performance.

      3) As this was a new model, none of the OS kernels out today use these processors in the most optimal way. As the architecture matures, I'm sure that the OS developers will redevelop thread initiation and assignment to make better use of BD's assets. This in itself will net better performance even without improvements in the overall design.

      You might think I am just rooting for the underdog, but as a consumer, so should you. Without AMD to keep Intel on it's toes in the X86 market, we will eventually see new chips from Intel that are nothing more than speedbumps, but at prices that will make it difficult for anyone to afford. Intel still prices competitively where AMD still has alternative product, but look at where AMD has not kit to compete. Intel will price there accordingly, because they can. No competition means that the price will float as high as demand.

      I try to alternate my personal machines. One year, I will buy AMD, while the next, I will buy Intel. For one machine, I may buy NVidia graphic cards, while another will use AMD. The home media server in the closet is due for an upgrade. I went with an Intel Xeon build three years ago. This time, I will build it with BD Opterons. Do you think anyone besides me will notice the difference, unless I told them? Probably not.

  14. When are multiple cores going to help me? by craftycoder · · Score: 4, Interesting

    I just got a fancy 8 core T7500 Dell workstation and only one of my compilers actually takes advantage of the multiple cores when it is compiling. As a result this expensive desktop is only 15% faster in terms of time to compile than the 4 year old PC it replaced (the new PC has twice the ram as the old though which may account for some of that speed increase). I am seriously unimpressed with all these cores. Maybe they are useful for something, but I've not found anything that I do that shows significant improvement. Putting my development projects on a SSD did much more for my work flow performance than this fancy new computer, that is for certain.

    1. Re:When are multiple cores going to help me? by Anonymous Coward · · Score: 4, Informative

      You're doing it wrong.

      make -j8

    2. Re:When are multiple cores going to help me? by fyngyrz · · Score: 2

      Try doing DSLR image editing with Lightroom or Aperture. Those cores make one hell of a difference.

      --
      I've fallen off your lawn, and I can't get up.
    3. Re:When are multiple cores going to help me? by Anonymous Coward · · Score: 0

      GNU Make takes the argument -j for the number of jobs to run. What language and what platform are you using?

    4. Re:When are multiple cores going to help me? by Anonymous Coward · · Score: 0

      You should try getting a better comiler. Most moder compilers will take full advantage of an 8 core system by using certain swithces. For example you can tell gcc to use 8 thread with the -j 8 switch.

    5. Re:When are multiple cores going to help me? by Anonymous Coward · · Score: 1

      Consider taking advantage of "make -j" if you use that tool.

    6. Re:When are multiple cores going to help me? by Anonymous Coward · · Score: 0

      Is that compiling a single source file, or multiple files? With GNU Make, at least, "make -j N" will run N separate processes for compiling separate files.

    7. Re:When are multiple cores going to help me? by onefriedrice · · Score: 1

      I just got a fancy 8 core T7500 Dell workstation and only one of my compilers actually takes advantage of the multiple cores when it is compiling.

      If your compiler isn't threaded, then at least run multiple compile jobs simultaneously--this is probably better anyway. If your build system can't do this, your tools are broken.

      --
      This author takes full ownership and responsibility for the unpopular opinions outlined above.
    8. Re:When are multiple cores going to help me? by JohnnyBGod · · Score: 2

      And if it's too much for one machine, use distcc.

    9. Re:When are multiple cores going to help me? by Renegrade · · Score: 1

      Yeah, they do in image editing.

      However, there will always be things that must be done in series, and always a maximum speed-up you can get from multiprocessing. (Amdahl's Law comes to mind) Plus, you'll often hit other bottlenecks, especially if you have an obscene number of cores. Memory, disk, video, network..

      Memory has always been a problem after the 6502 era. Even single core systems splat into the performance barrier that is main memory.

      I'd rather have a single-core system that's 8x faster than an 8-core system. However, it's my belief that we're seeing crazy core increases not because it's the best way to better performance, but rather that the CPU makers are hitting walls (or at least massive difficulties) with traditional speed increases (mhz/ipc/branch prediction accuracy/etc).

      Intel engineer: Our new architecture, with the die shrink, is about five percent faster...
      Intel manager: How are we going to sell that to people for $300-1200??
      Intel engineer: Well, we COULD put two/four/six/eight of them into a single die, as they're much smaller than before..
      Intel manager: Do it!!
      Intel engineer: Sir, it would cost us thirty or forty percent more due to--
      Intel manager: Nobody's going to buy a 5% increase without this! DO IT NOW!!

    10. Re:When are multiple cores going to help me? by friedmud · · Score: 3, Informative

      What do you mean by "only one of my compilers actually takes advantage of the multiple cores when it is compiling"?

      Are you on Windows? Because any compiling done in linux with a "make" based (or similar) build system can use as many cores as you can throw in a machine (regardless of the actual compiler it's running). It should be the same in Windows...

      Don't look to your compiler to be multithreaded... look at the build system (i.e. in Visual Studio there should be an option somewhere to tell it how many processors to use while compiling). For make you just do "make -j8" to use 8 "jobs" total for compiling (i.e. 8 instances of the compiler will be running).

      Here is a test for one of my software projects doing "make -j#" where # is 1,4,8,12,16,24:

      1 : 15m9.614s
      4 : 3m57.947s
      8 : 2m6.354s
      12 : 1m33.426s
      16 : 1m25.559s
      24 : 1m17.345s

      That is on my dual 6-core hyperthreaded Mac workstation (so it had 12 "real" cores and 12 "hyperthreads"). You can see that hyperthreads definitely aren't as good as real cores... but do provide some speedup. That said, I thank God every time I compile (which is all day long) for the cores he has bestowed upon me...

      Good to hear that you are already on SSD... because parallel compiling does need speedy disk to keep the processors humming. The timings above are for two 256GB SSD's in RAID0.

    11. Re:When are multiple cores going to help me? by craftycoder · · Score: 1

      Neither my Java nor Android compilers do a good job of taking advantage of the multiple cores from within Eclipse. I am able to get significant improvement when compiling GWT projects by giving it a 6 core directive. I save about 40% of the time it used to take. Plain old Java and Android showed little improvement though.

    12. Re:When are multiple cores going to help me? by craftycoder · · Score: 1

      I mostly work with Eclipse doing Java, Android, and GWT. Only GWT offered an effective way to use those cores. It is VERY possible that I just don't know how to use Eclipse to the best of its ability, but I can tell you that Eclipse never pushes more than one core during a build except when its building GWT projects for me (I had to tell it explicitly to do that though).

    13. Re:When are multiple cores going to help me? by Anonymous Coward · · Score: 0

      Try make -k 8 foo. That should speed up large builds, especially if you have the cache and disk throughput for it.

    14. Re:When are multiple cores going to help me? by Anonymous Coward · · Score: 0

      The compiler doesn't really need to be parallel unless you're compiling a lot of really, really big files. Otherwise, just doing

      make -j 9

      would start 9 instances of the compiler in parallel, each of them compiling one file. For most projects this speeds up compilation a lot. On my dual-core machine at work, issuing make with 3-4 jobs is much faster than with one. If your project happens to all be in one or two source files though, it won't help much since the individual files aren't compiled in parallel.

    15. Re:When are multiple cores going to help me? by Anonymous Coward · · Score: 0

      You realize that there is "make -j ", if you're running make, or a project option, if you're running VC++. The third possibility, is that you're just faking for ratings, or you're just a plain fucking moron.

    16. Re:When are multiple cores going to help me? by lexman098 · · Score: 1

      To everyone who's replying about compiler switches to activate multithreading: it's only relevant in this instance (maybe). I can totally sympathize with the disappointment in this machine, as I too have recently begun using this exact computer and have realized zero gains in having 5 extra cores. My work day involves running vim and about 50 verilog logic simulations, which are single threaded, and having this supercomputer under my desk (which I connect to remotely anyway) is absolutely useless. I wish they would give me a crappier one and a $4000 bonus instead.

    17. Re:When are multiple cores going to help me? by Bitmanhome · · Score: 1

      Yeah, you're screwed, sorry. Eclipse integrates nicely with Ant, but Ant doesn't do multi-core builds either. And Ant tasks are very heavy, so parallelizing them wouldn't help much anyway. You might try rebuilding your build process in plain 'make' and try that -j option.

      Also, I'm sorry you have to use GWT. That thing was just absurdly slow last time I used it, to the point that it would be faster to hand-code JavaScript.

      --
      Not that this wasn't entirely predictable.
    18. Re:When are multiple cores going to help me? by cbhacking · · Score: 1

      Well, you could consider using a better compiler, or a better configuration for it. Many parts of compilation parallelize reasonably well, especially if you have a lot of source files. Some things will have dependencies on other parts (which limits parallelism) and some have dependencies on the entire previous stage (which severely limits or prevents it, for that stage).

      Besides, unless you're just building a pure build machine (and I doubt it, if your compilation setup is so bad), multiple cores can help a lot in other places too. Things like background syntax checking and storing symbol information can be done in parallel with your workload. Looking up stuff online, or streaming music or even video, can be done without impacting the performance of your dev tools. Many web browsers themselves will get much faster (even Firefox to some degree, since it's multi-threaded even though it still uses a single process). There's plenty of places for heavy workloads to be spread across cores.

      Granted, 16 cores is more worloads than I almost ever have, but software is also becoming increasingly parallel. My build system defaults to splitting the workload across 4 cores, some of my games can use 5 or 6 pretty well, and my computer can remain responsive for doing other things too.

      --
      There's no place I could be, since I've found Serenity...
    19. Re:When are multiple cores going to help me? by Anonymous Coward · · Score: 0

      I wish Adobe would get around to multithreading Acrobat's OCR function. It is an embarrassingly paralell problem people...

    20. Re:When are multiple cores going to help me? by Anonymous Coward · · Score: 0

      Video encoding, BOINC, and that's about it.

      I was dreading taking home a 600-page document to work on, but my baby Macbook Pro is easily a match for the dual-Xeon 2008 Mac Pro beastie at work. The little dual-core Core i5 ran InDesign *at least* as well as the eight-core Mac Pro rig. Of course, the lappy has more/faster RAM, and better graphics, which can't hurt, I suppose.

      It took a decade for 64-bit to take hold, I expect it will be a few years yet before Adobe can find uses for more than a couple of cores.

    21. Re:When are multiple cores going to help me? by craftycoder · · Score: 1

      That's what I thought. I research it every couple months when I get annoyed by multi minute builds. I never get any answers.

      GWT is slow and deployment is a little cumbersome, but the code is so elegant I just don't care. I love GWT. I wish Google provider more libraries, but I'm pleased with it. I'm not certain it has a future though.

      I loath Ruby. What's a fella to do if he wants a strongly typed object oriented website?

    22. Re:When are multiple cores going to help me? by Anonymous Coward · · Score: 0

      make -j 8

    23. Re:When are multiple cores going to help me? by lexman098 · · Score: 1

      Scratch that. $1000 bonus.

    24. Re:When are multiple cores going to help me? by evilviper · · Score: 1

      only one of my compilers actually takes advantage of the multiple cores when it is compiling.

      Send your octo-core my way, I'll see that it gets some use...

      For any RPM based Linux distro, just edit your RPM macros file to add eg. -j8 option to make, and every "rpmbuild" will max-out all 8 cores with 8 instances of gcc operating on different files each.

      And if you're lzma compressing the RPMs in question, and they're a non-trivial size, you can get a pretty good speed-up using either parallel-xz or p7zip across multiple cores. If you're packing-up large quantities of data in RPMs, or just using xz in general, we're talking a serious number of wall-clock hours savings.

      For video encoding, while you lose a bit of quality with threading (so I discourage it on mere dual-core systems) you can see a pretty impressive speed-boost. And for video-decoding, multithreading is a no-brainer.

      In conclusion, you bought an SUV before measuring if it had enough cargo room to haul your toys, and lost out. Those who need to haul different cargo find it a grat solution. There will always be some usage cases which don't benefit.

      The real benefit is servers, though. I can't remember the last time you could get as big a performance boost on your server from upgrading the CPU as you can today, going from dual-core to 16 core, without needing a new mobo due to socket changes. And if you're lucky, your server can take 2 or 4 of them...

      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    25. Re:When are multiple cores going to help me? by Bitmanhome · · Score: 1

      What's a fella to do if he wants a strongly typed object oriented website?

      I think JBoss is the usual answer to that. That only takes care of the back end, but GWT has your front end covered anyway, and the more code you can move into JBoss, the less you have to crank through GWT's slow processor.

      --
      Not that this wasn't entirely predictable.
    26. Re:When are multiple cores going to help me? by Anonymous Coward · · Score: 0

      Dude, split up your codebase into separately compilable libraries or components. No sense re-building it all EVERY time you want to test. This is a non-issue.

    27. Re:When are multiple cores going to help me? by cthulhu11 · · Score: 1

      I was about to post the same thing -- Lightroom can profitably use as many as 8 cores. More cores vs fewer faster cores depends a lot on the workload.

    28. Re:When are multiple cores going to help me? by Anonymous Coward · · Score: 0

      more like -j11 at least, the main thread won't be able to feed the lower threads fast enough to keep all cores busy otherwise

    29. Re:When are multiple cores going to help me? by Anonymous Coward · · Score: 0

      with make use -j=8 to get 8 threads when building large projects, your compile times will be reduced

    30. Re:When are multiple cores going to help me? by craftycoder · · Score: 1

      GWT is only the front end. I use Glassfish for the back end of websites. That is just J2EE stuff so it's not nearly as slow as GWT though it does only use one core. In essence I already am doing as you suggest.

      As a developer and not a gamer or video maker I stand by my original complaint. These multi-core processors have been a step in the wrong direction for me from a performance standpoint. Some of us would benefit from 4 cores that packed as much punch (whatever that really means) as the the 8 cores we can buy. I really do appreciate that my OS doesn't lock up anymore when it's working hard. I just get annoyed when I'm waiting and waiting on a build and my CPU is pegged at 13%.

      I also feel like the marketing of these processors is confusing. Back in the day, when I was doing my first profession software projects on an IBM XT it was very clear what the performance boost would be when transitioning to the 80286 processor or the addition of the 80287 to your IBM AT. I continued to understand what I was buying throughout the next decade or two. I chose the 486DX 50 rather than the DX2 66 because the distinction was as clear. At some point though the marketing materials just got too confusing. Maybe it's just that I'm old now, but I can't figure out what they are selling or why I would choose one processor from another anymore. I went to Dell and ordered the fanciest one (based on price). That clearly was not the right answer. I've asked people who appeared knowledgeable on the topic to explain it to me, but the answers sounded more like BS than CS. Perhaps as it has become more difficult to differentiate their products based on merit they chose to obscure their offerings with lingo and slogans in hopes of gaining sales through confusion. Or, maybe I'm just to old and dumb to get it anymore.

    31. Re:When are multiple cores going to help me? by BuildMonkey · · Score: 1

      I run a recent Dell T3500, 24GB RAM and dual GTS450 graphics cards. The extra cores help a lot with Adobe After Effects. A. LOT. Not so much with Adobe Premiere, because you get far more bang for your buck with a medium-to-high end NVIDIA graphics card: Premiere makes excellent use of CUDA, particularly for encoding. (By default, Premiere will only see and use "pro" cards: Quadro, Fermi, Tesla. There is an easy configuration hack that lets it use any 200 series or better card. My GTS450 encodes full 1080 HD in real-time. ) There is a caveat there: only for encoding in the foreground, and it only uses a single NVIDIA graphics card: SLI does not matter.

      I find it strange the the foreground encoding uses GPU acceleration, but batch encoding does not. So the extra cores would help you there, too.

      In my programming, (large scale network simulation, real-time audio processing) any cores beyond 2 do little to nothing. When I re-compile the latest version of Boost, the extra cores substantially speed the build, but this is something I only do 2-3X per year.

    32. Re:When are multiple cores going to help me? by Bitmanhome · · Score: 1

      I believe we live in "the between times:" The laws of physics have made faster cores impossible, so we now have multi-core chips .. but we don't have enough cores to make multi-core software effective. You can either run on one core and ruin performance by not taking advantage of the chip, or you can run on all cores and ruin performance with synchronization overhead.

      I suspect this problem won't be resolved until we top 100 cores, where the new programming paradigm (whatever that turns out to be) will be able to be effective. In the mean time .. we're just screwed.

      --
      Not that this wasn't entirely predictable.
    33. Re:When are multiple cores going to help me? by craftycoder · · Score: 1

      That rings true to me. I don't know the numbers, but I do know that a lot of software I use is not very thoughtful about using available resources. Until developers or our tools are smarter about using the resources on the target devices we will continue to see disappointing performance numbers. We've been spoiled by Intel for a long time now. I think its our turn to start writing better software because Intel isn't saving our bacon anymore.

  15. Crippling chips by Quila · · Score: 2

    It's common, live with it. Every Cell processor in a PS3 comes with eight cell processing units, with one disabled. That way they can set the standard for seven and use most of the chips that come off the line.

    Even AMD had a problem with too-good yield about ten years ago, so they restricted the clock and sold "crippled" low-end chips that were technically rated to run at much higher speeds.

  16. Build your own tablet? by Eggbloke · · Score: 0

    I have been thinking for a while that you could build your own tablet with one of these boards. Strap a touchscreen to one side and a battery to the other and install some tablet edition of Windows or Linux and it should work pretty well. Certainly more powerful than most tablets available today.
    The only issue might be power consumption but it's quite a good trade off for performance and modularity. You could just use a bigger battery anyway.

    --
    I care not for your karma and your mod points.
    1. Re:Build your own tablet? by Eggbloke · · Score: 1

      derp, wrong article

      --
      I care not for your karma and your mod points.
    2. Re:Build your own tablet? by Anonymous Coward · · Score: 0

      fuck you, buddy. I want to hear more about this 16 core tablet.

    3. Re:Build your own tablet? by Sloppy · · Score: 0

      derp, wrong article

      You never should have admitted that. Everyone was so boggled by your post, that they couldn't even flame it. That was awesome.

      --
      As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
  17. What do pennies have to do with it? by Anonymous Coward · · Score: 0

    "code-named Interlagos, are 25 per cent to 30 per cent faster than their predecessors"

    So for a nickel you can get something that's twice as fast as your previous generation machine?

  18. Thats cool but. by xmorg · · Score: 0

    Does it play skyrim?

  19. Reality Check for my pampered princess by Sloppy · · Score: 0

    I have a test machine with the 12-core version and the single-core performance is truly dreadful.

    Usually when I talk about a pampered princess I'm referring to my chorkie, but today it's you. You just called Bulldozer "truly dreadful." Not "not as good as Sandy Bridge" but "truly dreadful."

    I'm typing this on an Atom 330 HTPC and its performance is far, far above "truly dreadful" so I must conclude you are from the year 2030 if you think any 2011 processor, even the bottom-of-the-line, is dreadful even in hyperbole, much less "truly." Get back in your time machine, pampered princess.

    The worst shit you can get today, is a monstrous beast. Fucking kids. Get off my lawn.

    --
    As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
  20. build system launches multiple compilers by Anonymous Coward · · Score: 0

    I use the Greenhills compiler. The build system compiles four files at the same time by launching four instances of the compiler. It is an simple solution I just wish more more build systems (make) would implement.

  21. read the I7-3000s TH review by cheekyboy · · Score: 1

    You want fast IO? check the i7-3ks

    And dont forget power, the intels use less on idle and on max usage.

    Compute power usage over 4years, and the AMD will use the same power as it cost to buy an intel.

    Dont forget AES speeds are 6-8x FASTER on Intel.

    Every benchmark, single core and combined cores yield faster results.

    But hey if you want a 48core amd server and its $5000 cheaper for you, go for it. ( if youre utilizations is 20% it doesnt matter )

    --
    Liberty freedom are no1, not dicks in suits.
  22. I am an AMD fan by Travoltus · · Score: 1

    and I can clearly see that beelsebob has done his/her research.

    http://hardware.slashdot.org/comments.pl?sid=2524922&cid=38053256

    --
    --- Grow a pair, liberals... stop letting the Republicans bully you!
  23. wow by unity100 · · Score: 1

    branch prediction, pipeline length and all the calculations happen over what ? floating point ?