A Three-Way AMD Opteron Server
Abdul tips a thin little review up at The Inquirer of the Themis Slice. "The Slice is a three socket Opteron machine with two PCIe slots and two Infiniband 4x ports... Why would you want three sockets rather than four? Easy, latency. Any CPU in a 3S system is one hop away from any other CPU. In a 4S system, you can be two hops away. This adds latency, and more importantly, you take a big hit on cache coherency latency. This kills performance."
There is nothing new in this product at all, IBM have had this type of server platform (3 socket supported) for some time in the form factor of the x3755.
Disclaimer, I work for IBM.
The IBM System x3755 has offered this feature since it came out as well. Instead of the fourth processor card you install a pass through card and it turns it into a three way. We've done a few benchmarks (warning pdf) with the Pass Through card and what it could do between 3CPU and 4CPU operations.
pretty cool ability for a few things.
As a rock-in-roll Physicist once said, No matter where you go, there you are.
This is also a problem on FSB systems, as all CPUs need to snoop the bus for cache coherency information. On Intels dual-bus systems, this information needs to go across busses. The Intel 4 FSB systems are even worse. AFAIK, Opteron is the only x86 chip that would support 6 cores (12 cores with Barcelona) with a single hop.
Wasn't AMD also talking about licenses or agreements with other companies to allow for different types of coprocessor chips to be used alongside their processors?
There is some interesting potential in that realm.. Crypto accelerators for VPN, SAN, or other devices. Multimedia encode/decode accelerators (encode 1080P H.264 in real time?). Inevitable video game acceleration devices (physics co-processor, accelerated NIC chip, 3D GPU offload processor?).
Those would be even more interesting in home-user oriented Athlon64 boards. Multi-socket opteron boards are out of my price range.
> Why would you want three sockets rather than four? Easy, latency. Any CPU in a 3S system is one hop away from any other CPU. In a 4S system, you can be two hops away. This adds latency, and more importantly, you take a big hit on cache coherency latency. This kills performance."
Lawrence: Three chips at the same time, man.
Peter: That's it? If you had a million dollars, you'd use three sockets at the same time?
Lawrence: Damn straight. I always wanted to do that, man. And I think if I worked at AMD I could hook that up, too; 'cause I hate motherboard layouts with latency.
Peter: Well, not all layouts.
Lawrence: Well, the type of chips that'd triple up on a board like that would.
Peter: Good point.
Lawrence: Well, what about you now? what would you do?
Peter: Besides three chips at the same time?
Lawrence: Well, yeah.
Peter: Idle.
Lawrence: Idle, huh? Peter: I would relax... I would sit on my ass all day... I would idle.
Lawrence: Well, you don't need a million dollars to idle, man. Take a look at that fourth chip: it's two hops away, don't do shit.
The article states that with 3 processors one gets better performance, latency wise, because in a triangle configuration any processor cache is just one hop away. You can have 4 processors in a tetrahedron configuration and still have any processor one hop away. Of course it will take 3 hypertransport connections per processor just for the internal communications, so a 4th connection is needed for at least one processor to connect to the northbridge. The quad-core Opteron will have a maximum of 4 hypertransport connections, is that right ?
Yes, it's possible. The main problem in general is that cost scales in proportion to the factorial of the number of nodes. The main problem in the specific case of Opterons is that each chip needs one HyperTransport controller per other CPU. Current Opterons come with up to three HT connections, and you need one for connecting to the PCIe bus, and other peripherals, leaving two for CPU-to-CPU connections.
I am TheRaven on Soylent News
I mean how to convince the wife that we need a three-way?
"Waste not one watt!" - CZ
No.
x
/|\
/ x \ . . \
/ | \
/
x---------x
All generalizations are false, including this one. Mark Twain
Sorry, this just isn't true in practice. The Geo's, Suzuki's, VW's and Audi's which used odd-numbers of cylinders did so only for packaging considerations, not because the engineering (smoothness, etc.) made sense. They represented a cylinder added onto or removed from a 4 cylinder engine to meet displacement needs while still fitting in the car.
The smoothest piston automotive engines are in-line 6 cylinder engines or V-12 engines, which provide a power pulse with every 30 degrees of crankshaft rotation.
Anything else (3-, 4-, 5- cylinder in-line, V6, V8) has more widely-spaced power pulses and is less smooth. Most of these engines use a rotating counterweight (either an off-balanced flywheel or a separate rotating countershaft) in order to dampen these power pulses and increase smoothness. This works imperfectly and comes at the price of increased weight, rotating mass, and/or complexity.
Yet another approach which should be very smooth is the boxter design, which is used by Subaru and Porsche: cylinders are horizontally opposed at 180 degrees; this works quite well for Porsche, somewhat less well for Subaru.
Of course the smoothest automotive engine is the Wankel rotary currently used by Mazda - the "pistons" (rotors) rotate rather than reciprocate, and each power pulse lasts for 270 degrees.