Slashdot Mirror


A Three-Way AMD Opteron Server

Abdul tips a thin little review up at The Inquirer of the Themis Slice. "The Slice is a three socket Opteron machine with two PCIe slots and two Infiniband 4x ports... Why would you want three sockets rather than four? Easy, latency. Any CPU in a 3S system is one hop away from any other CPU. In a 4S system, you can be two hops away. This adds latency, and more importantly, you take a big hit on cache coherency latency. This kills performance."

28 of 137 comments (clear)

  1. nothing new by Exter-C · · Score: 3, Informative

    There is nothing new in this product at all, IBM have had this type of server platform (3 socket supported) for some time in the form factor of the x3755.

  2. IBM System x3755 by OS24Ever · · Score: 5, Informative

    Disclaimer, I work for IBM.

    The IBM System x3755 has offered this feature since it came out as well. Instead of the fourth processor card you install a pass through card and it turns it into a three way. We've done a few benchmarks (warning pdf) with the Pass Through card and what it could do between 3CPU and 4CPU operations.

    pretty cool ability for a few things.

    --

    As a rock-in-roll Physicist once said, No matter where you go, there you are.

    1. Re:IBM System x3755 by Anonymous Coward · · Score: 5, Funny

      OS24Ever wrote, "Disclaimer, I work for IBM."

      You don't say... : p

    2. Re:IBM System x3755 by mr_mischief · · Score: 2, Interesting

      Actually, I've never worked for IBM, and I keep pricing eComStation. I'd kind of like to use that on a system or two. Warp 3 is getting a bit paunchy. I don't want to drop it, though, because then I'd be down to Linux, BSD, Windows, OS X, DOS, and AmigaOS.

      Visopsys, ReactOS, OpenSolaris, plan9, Minix, QNX, MMURTL, OpenVMS, Haiku, and some others could serve for utility and novelty in varying degrees, but I already have plenty of software for OS/2.

      Yes, I'm an avid system collector. If you have hardware or software that's old, obsolete, and quirky, I probably want it.

  3. Re:Weird by Anonymous Coward · · Score: 5, Informative

    This is also a problem on FSB systems, as all CPUs need to snoop the bus for cache coherency information. On Intels dual-bus systems, this information needs to go across busses. The Intel 4 FSB systems are even worse. AFAIK, Opteron is the only x86 chip that would support 6 cores (12 cores with Barcelona) with a single hop.

  4. What is this article about? by WFFS · · Score: 2, Funny

    Sorry... I tuned out after 'A Three-Way'.

  5. CoProcessors? by tji · · Score: 4, Interesting

    Wasn't AMD also talking about licenses or agreements with other companies to allow for different types of coprocessor chips to be used alongside their processors?

    There is some interesting potential in that realm.. Crypto accelerators for VPN, SAN, or other devices. Multimedia encode/decode accelerators (encode 1080P H.264 in real time?). Inevitable video game acceleration devices (physics co-processor, accelerated NIC chip, 3D GPU offload processor?).

    Those would be even more interesting in home-user oriented Athlon64 boards. Multi-socket opteron boards are out of my price range.

    1. Re:CoProcessors? by DigiShaman · · Score: 2, Insightful

      That's why we have buses to open up expansion possibilities.

      For example, we have NIC chips that offload TX checksum processing, Audio accelerators (Creative X-Fi), 3D GPU cards (nVidia and ATI cards), and physic cards (ASUS brand AGEIA card). The only reason you want a dedicated socket is for extremely fast and wide IO to RAM. So far, only the GPU has come close to needing that but hanging just fine with the PCI Express interface.

      --
      Life is not for the lazy.
  6. What would you do... by Tackhead · · Score: 5, Funny
    ...with a million dollars?

    > Why would you want three sockets rather than four? Easy, latency. Any CPU in a 3S system is one hop away from any other CPU. In a 4S system, you can be two hops away. This adds latency, and more importantly, you take a big hit on cache coherency latency. This kills performance."

    Lawrence: Three chips at the same time, man.
    Peter: That's it? If you had a million dollars, you'd use three sockets at the same time?
    Lawrence: Damn straight. I always wanted to do that, man. And I think if I worked at AMD I could hook that up, too; 'cause I hate motherboard layouts with latency.
    Peter: Well, not all layouts.
    Lawrence: Well, the type of chips that'd triple up on a board like that would.
    Peter: Good point.
    Lawrence: Well, what about you now? what would you do?
    Peter: Besides three chips at the same time?
    Lawrence: Well, yeah.
    Peter: Idle.
    Lawrence: Idle, huh? Peter: I would relax... I would sit on my ass all day... I would idle.
    Lawrence: Well, you don't need a million dollars to idle, man. Take a look at that fourth chip: it's two hops away, don't do shit.

  7. Mac OS X on this machine... by andrewd18 · · Score: 2, Funny

    Any CPU in a 3S system is one hop away from any other CPU.
    So... if I run Mac OS X on this box, can we call it an iHOP?
    1. Re:Mac OS X on this machine... by mrchaotica · · Score: 2, Funny

      Only if you use it to fry pancakes!

      --

      "[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz

  8. Where's the specs? by achbed · · Score: 2, Interesting

    There's no reference to this board/blade anywhere on the manufacturer's site. The only thing I can find is that this guy saw this board at a conference and took a shot and wrote a really short article about it. Ok, so a 3-way is a bit of a novelty, but good luck getting it to work. Isn't most microcode on the processors designed with 1, 2, or 4 way in mind? And isn't the cache coherency microcode embedded (at least in part) on the processors themselves? So setting up a 3-way using current processors would actually increase latency and error-checking, correct? IANAPD, but this seems like a dead end.

    1. Re:Where's the specs? by SQL+Error · · Score: 4, Funny

      No.

  9. Threesome by macdaddy · · Score: 2, Funny

    So what kind of doe will this Opteron Threesome run me?

    1. Re:Threesome by smurphmeister · · Score: 5, Funny

      So what kind of doe will this Opteron Threesome run me? Probably a couple of bucks at least!
  10. Same latency with 4 processors by Laxator2 · · Score: 4, Interesting

    The article states that with 3 processors one gets better performance, latency wise, because in a triangle configuration any processor cache is just one hop away. You can have 4 processors in a tetrahedron configuration and still have any processor one hop away. Of course it will take 3 hypertransport connections per processor just for the internal communications, so a 4th connection is needed for at least one processor to connect to the northbridge. The quad-core Opteron will have a maximum of 4 hypertransport connections, is that right ?

    1. Re:Same latency with 4 processors by default+luser · · Score: 5, Informative

      Yes, the quad-core chips will have the fourth link. In addition, the chips will be able to split their 16-bit HT links into dual 8-bit HT links, allowing for 8-way CPU configurations without hops (8 x 8-bit HT links per socket). In reality, this is the reason why AMD is pushing the new HyperTransport 3.0: so they can cut the bus lines to 8 without sacrificing too much bandwidth.

      Check it out here.

      --

      Man is the animal that laughs.
      And occasionally whores for Karma.

  11. Re:Weird by TheRaven64 · · Score: 5, Informative

    Yes, it's possible. The main problem in general is that cost scales in proportion to the factorial of the number of nodes. The main problem in the specific case of Opterons is that each chip needs one HyperTransport controller per other CPU. Current Opterons come with up to three HT connections, and you need one for connecting to the PCIe bus, and other peripherals, leaving two for CPU-to-CPU connections.

    --
    I am TheRaven on Soylent News
  12. Re:Weird by poopdeville · · Score: 2, Interesting

    I was under the impression that this latency issue was caused by the fact that there is no positive solution to the utility problem. Essentially, each core is connected directly to the other two, in a planar graph. There's no way to connect each of 4 cores to the other three without the connections intersecting, at least if the connections are made on anything topologicically the same as a convex subset of the plane (that is, no planar graph exists).

    This can be solved directly by creating chips with multiple planes on which connections can be made, or indirectly by running messages through other cores, at the cost of latency. Then again, I have no idea if multi-layer chips are in production.

    --
    After all, I am strangely colored.
  13. Re:Weird by pla · · Score: 2, Interesting

    If it is impossible, please explain why.

    Problem 1)
    Draw four circles on a piece of paper.
    Now draw a line from every circle to every other circle without crossing any lines.

    Problem 2)
    Draw four circles on a piece of paper. Draw two "pins" on each.
    Now draw a minimal path between any two circles such that you can only start and stop at a pin, and only one connection can go to a single pin.



    You have the right idea for problem 1, that for low-N, you can just route connections through different layers of the board. But that only works for low-N and doesn't generalize (though in fairness, neither does to the "3-CPU" solution).

    For problem #2, no real solution exists other than limiting the degree of connectedness to some low number of pins (2 gives the simplest case above single-CPU, a daisy-chain or ring topology), or having centralized signal switching (star topology).

  14. hard to justify by aapold · · Score: 5, Funny

    I mean how to convince the wife that we need a three-way?

    --
    "Waste not one watt!" - CZ
    1. Re:hard to justify by swb · · Score: 2, Funny

      Especially when you haven't shown her the value in a two-way yet.

    2. Re:hard to justify by pimpimpim · · Score: 2, Insightful
      tell her it will mean less hops in general, and she might be fine with it.

      (sorry about this)

      --
      molmod.com - computing tips from a molecular modeling
  15. Multi core by jshriverWVU · · Score: 2, Interesting

    Curious if it can take multi-core cpu's. Having a 3way system with dual core opteron's sounds really nice.

  16. Re:Weird by TheRaven64 · · Score: 2, Informative

    Not really, because modern circuit boards are not planes. A modern motherboard is typically 7 layers, with wires in one layer all running parallel to each other. Within a die the utility problem is much more of an issue, but this is largely due to constraints other than those under discussion.

    --
    I am TheRaven on Soylent News
  17. Re:Weird by rrhal · · Score: 5, Insightful

               x
              /|\
             / | \
            /  x  \
           / .   . \
          x---------x

    --
    All generalizations are false, including this one. Mark Twain
  18. Re:Not as good as it sounds by dlapine · · Score: 2, Informative
    Ok, so it's not for HPC systems. I'm betting that the number of servers/server farms out there may make this attractive for the non hpc users, if the 3 way is significantly cheaper than a 4 way. If you can get this on a blade, you get a 50% increase in CPU power for non-parallel tasks.


    Hmmm, now that I think about it, a three way box might be really interesting for some HPC loads as well. The low latency is a really big issue for some codes, and the three way could be more scalable (with some hand coding and profiling) than a 4 socket box with non-uniform latencies. The would apply to MPI code written and optimized for specific tasks- not the simple parallelization that some compilers can do. There's a significant number of HPC users who are happy running non-parallel code on hundreds of dual socket systems who might be able to scale fairly easily to 3 way systems. Actually, the code is parallel, to the extent that it runs on both cpus, but these particular users don't want the network latency for MPI code, even on fast networks. They could scale to three way with little loss of performance on one of these.

    Hmmm, a third thought occurs to me. A 3 socket system might also be really,really useful for codes that are I/O intensive- let the traditional mpi code run on the first two cpus and let the third handle OS tasks, network operations and high performance filesystem operations. The latency is less of a value in this case, but simply keeping the OS from interrupting the 2 cpus running MPI could be a big win as well. Call it 2N+1 computing.

    Ok, I admit it- I like options when it comes to designing systems to meet the needs of different users.

    --
    The Internet has no garbage collection
  19. Tell it to a BMW or Jaguar driver by jkevin99 · · Score: 3, Informative

    Sorry, this just isn't true in practice. The Geo's, Suzuki's, VW's and Audi's which used odd-numbers of cylinders did so only for packaging considerations, not because the engineering (smoothness, etc.) made sense. They represented a cylinder added onto or removed from a 4 cylinder engine to meet displacement needs while still fitting in the car.

    The smoothest piston automotive engines are in-line 6 cylinder engines or V-12 engines, which provide a power pulse with every 30 degrees of crankshaft rotation.

    Anything else (3-, 4-, 5- cylinder in-line, V6, V8) has more widely-spaced power pulses and is less smooth. Most of these engines use a rotating counterweight (either an off-balanced flywheel or a separate rotating countershaft) in order to dampen these power pulses and increase smoothness. This works imperfectly and comes at the price of increased weight, rotating mass, and/or complexity.

    Yet another approach which should be very smooth is the boxter design, which is used by Subaru and Porsche: cylinders are horizontally opposed at 180 degrees; this works quite well for Porsche, somewhat less well for Subaru.

    Of course the smoothest automotive engine is the Wankel rotary currently used by Mazda - the "pistons" (rotors) rotate rather than reciprocate, and each power pulse lasts for 270 degrees.