Slashdot Mirror


A Three-Way AMD Opteron Server

Abdul tips a thin little review up at The Inquirer of the Themis Slice. "The Slice is a three socket Opteron machine with two PCIe slots and two Infiniband 4x ports... Why would you want three sockets rather than four? Easy, latency. Any CPU in a 3S system is one hop away from any other CPU. In a 4S system, you can be two hops away. This adds latency, and more importantly, you take a big hit on cache coherency latency. This kills performance."

10 of 137 comments (clear)

  1. nothing new by Exter-C · · Score: 3, Informative

    There is nothing new in this product at all, IBM have had this type of server platform (3 socket supported) for some time in the form factor of the x3755.

  2. IBM System x3755 by OS24Ever · · Score: 5, Informative

    Disclaimer, I work for IBM.

    The IBM System x3755 has offered this feature since it came out as well. Instead of the fourth processor card you install a pass through card and it turns it into a three way. We've done a few benchmarks (warning pdf) with the Pass Through card and what it could do between 3CPU and 4CPU operations.

    pretty cool ability for a few things.

    --

    As a rock-in-roll Physicist once said, No matter where you go, there you are.

  3. Re:Weird by Anonymous Coward · · Score: 5, Informative

    This is also a problem on FSB systems, as all CPUs need to snoop the bus for cache coherency information. On Intels dual-bus systems, this information needs to go across busses. The Intel 4 FSB systems are even worse. AFAIK, Opteron is the only x86 chip that would support 6 cores (12 cores with Barcelona) with a single hop.

  4. Re:Weird by TheRaven64 · · Score: 5, Informative

    Yes, it's possible. The main problem in general is that cost scales in proportion to the factorial of the number of nodes. The main problem in the specific case of Opterons is that each chip needs one HyperTransport controller per other CPU. Current Opterons come with up to three HT connections, and you need one for connecting to the PCIe bus, and other peripherals, leaving two for CPU-to-CPU connections.

    --
    I am TheRaven on Soylent News
  5. Re:Weird by TheRaven64 · · Score: 2, Informative

    Not really, because modern circuit boards are not planes. A modern motherboard is typically 7 layers, with wires in one layer all running parallel to each other. Within a die the utility problem is much more of an issue, but this is largely due to constraints other than those under discussion.

    --
    I am TheRaven on Soylent News
  6. Re:think three-dimensional by Anonymous Coward · · Score: 1, Informative

    They are talking specifically about the Opteron. Each CPU has two links. You'd need three links from each CPU to form a tetrahedron.

  7. Re:4 way? by Anonymous Coward · · Score: 1, Informative

    As I understand it, this is more analagous to a chemistry problem than a topographical one. You can consider each CPU as, say, an oxygen atom, with two available HT "bonds" (three minus the one required for PCIe/etc). You can't get four oxygen atoms to mutually bond with each other, no matter what geometry you try.

  8. Re:Same latency with 4 processors by default+luser · · Score: 5, Informative

    Yes, the quad-core chips will have the fourth link. In addition, the chips will be able to split their 16-bit HT links into dual 8-bit HT links, allowing for 8-way CPU configurations without hops (8 x 8-bit HT links per socket). In reality, this is the reason why AMD is pushing the new HyperTransport 3.0: so they can cut the bus lines to 8 without sacrificing too much bandwidth.

    Check it out here.

    --

    Man is the animal that laughs.
    And occasionally whores for Karma.

  9. Re:Not as good as it sounds by dlapine · · Score: 2, Informative
    Ok, so it's not for HPC systems. I'm betting that the number of servers/server farms out there may make this attractive for the non hpc users, if the 3 way is significantly cheaper than a 4 way. If you can get this on a blade, you get a 50% increase in CPU power for non-parallel tasks.


    Hmmm, now that I think about it, a three way box might be really interesting for some HPC loads as well. The low latency is a really big issue for some codes, and the three way could be more scalable (with some hand coding and profiling) than a 4 socket box with non-uniform latencies. The would apply to MPI code written and optimized for specific tasks- not the simple parallelization that some compilers can do. There's a significant number of HPC users who are happy running non-parallel code on hundreds of dual socket systems who might be able to scale fairly easily to 3 way systems. Actually, the code is parallel, to the extent that it runs on both cpus, but these particular users don't want the network latency for MPI code, even on fast networks. They could scale to three way with little loss of performance on one of these.

    Hmmm, a third thought occurs to me. A 3 socket system might also be really,really useful for codes that are I/O intensive- let the traditional mpi code run on the first two cpus and let the third handle OS tasks, network operations and high performance filesystem operations. The latency is less of a value in this case, but simply keeping the OS from interrupting the 2 cpus running MPI could be a big win as well. Call it 2N+1 computing.

    Ok, I admit it- I like options when it comes to designing systems to meet the needs of different users.

    --
    The Internet has no garbage collection
  10. Tell it to a BMW or Jaguar driver by jkevin99 · · Score: 3, Informative

    Sorry, this just isn't true in practice. The Geo's, Suzuki's, VW's and Audi's which used odd-numbers of cylinders did so only for packaging considerations, not because the engineering (smoothness, etc.) made sense. They represented a cylinder added onto or removed from a 4 cylinder engine to meet displacement needs while still fitting in the car.

    The smoothest piston automotive engines are in-line 6 cylinder engines or V-12 engines, which provide a power pulse with every 30 degrees of crankshaft rotation.

    Anything else (3-, 4-, 5- cylinder in-line, V6, V8) has more widely-spaced power pulses and is less smooth. Most of these engines use a rotating counterweight (either an off-balanced flywheel or a separate rotating countershaft) in order to dampen these power pulses and increase smoothness. This works imperfectly and comes at the price of increased weight, rotating mass, and/or complexity.

    Yet another approach which should be very smooth is the boxter design, which is used by Subaru and Porsche: cylinders are horizontally opposed at 180 degrees; this works quite well for Porsche, somewhat less well for Subaru.

    Of course the smoothest automotive engine is the Wankel rotary currently used by Mazda - the "pistons" (rotors) rotate rather than reciprocate, and each power pulse lasts for 270 degrees.