A Three-Way AMD Opteron Server
Abdul tips a thin little review up at The Inquirer of the Themis Slice. "The Slice is a three socket Opteron machine with two PCIe slots and two Infiniband 4x ports... Why would you want three sockets rather than four? Easy, latency. Any CPU in a 3S system is one hop away from any other CPU. In a 4S system, you can be two hops away. This adds latency, and more importantly, you take a big hit on cache coherency latency. This kills performance."
That is one weird looking board. "you take a big hit on cache coherency latency" Isn't this only a problem with NUMA based systems (of which Opteron is)? The article also mentions UltraSparc and PowerPC-64....
If you post as Anonymous Coward, don't expect a reply.
There is nothing new in this product at all, IBM have had this type of server platform (3 socket supported) for some time in the form factor of the x3755.
Disclaimer, I work for IBM.
The IBM System x3755 has offered this feature since it came out as well. Instead of the fourth processor card you install a pass through card and it turns it into a three way. We've done a few benchmarks (warning pdf) with the Pass Through card and what it could do between 3CPU and 4CPU operations.
pretty cool ability for a few things.
As a rock-in-roll Physicist once said, No matter where you go, there you are.
Can't post that one on Youtube.
Sorry... I tuned out after 'A Three-Way'.
Wasn't AMD also talking about licenses or agreements with other companies to allow for different types of coprocessor chips to be used alongside their processors?
There is some interesting potential in that realm.. Crypto accelerators for VPN, SAN, or other devices. Multimedia encode/decode accelerators (encode 1080P H.264 in real time?). Inevitable video game acceleration devices (physics co-processor, accelerated NIC chip, 3D GPU offload processor?).
Those would be even more interesting in home-user oriented Athlon64 boards. Multi-socket opteron boards are out of my price range.
> Why would you want three sockets rather than four? Easy, latency. Any CPU in a 3S system is one hop away from any other CPU. In a 4S system, you can be two hops away. This adds latency, and more importantly, you take a big hit on cache coherency latency. This kills performance."
Lawrence: Three chips at the same time, man.
Peter: That's it? If you had a million dollars, you'd use three sockets at the same time?
Lawrence: Damn straight. I always wanted to do that, man. And I think if I worked at AMD I could hook that up, too; 'cause I hate motherboard layouts with latency.
Peter: Well, not all layouts.
Lawrence: Well, the type of chips that'd triple up on a board like that would.
Peter: Good point.
Lawrence: Well, what about you now? what would you do?
Peter: Besides three chips at the same time?
Lawrence: Well, yeah.
Peter: Idle.
Lawrence: Idle, huh? Peter: I would relax... I would sit on my ass all day... I would idle.
Lawrence: Well, you don't need a million dollars to idle, man. Take a look at that fourth chip: it's two hops away, don't do shit.
There's no reference to this board/blade anywhere on the manufacturer's site. The only thing I can find is that this guy saw this board at a conference and took a shot and wrote a really short article about it. Ok, so a 3-way is a bit of a novelty, but good luck getting it to work. Isn't most microcode on the processors designed with 1, 2, or 4 way in mind? And isn't the cache coherency microcode embedded (at least in part) on the processors themselves? So setting up a 3-way using current processors would actually increase latency and error-checking, correct? IANAPD, but this seems like a dead end.
So what kind of doe will this Opteron Threesome run me?
The article states that with 3 processors one gets better performance, latency wise, because in a triangle configuration any processor cache is just one hop away. You can have 4 processors in a tetrahedron configuration and still have any processor one hop away. Of course it will take 3 hypertransport connections per processor just for the internal communications, so a 4th connection is needed for at least one processor to connect to the northbridge. The quad-core Opteron will have a maximum of 4 hypertransport connections, is that right ?
so 3 is better than 4?
is this AMDs way of saying "oh look we cant make a proper quad core system like intel so we just make 3 the magic number! and everyone will buy our marketing technobable crap"
This reminds me of some 6-way systems that I'm told Data General used to sell. They took two 4-way systems, and used one of the processor slots on each as a bridge between the two boards.
Yep 4-way lines don't fit on a 2-dimensional plane, without crossing each other. But who said, we have a single 2-dimensional plane?
Prov 9:8 Do not rebuke mockers or they will hate you; rebuke the wise and they will love you.
Isn't this only a problem if the OS doesn't manage the NUMA architecture well? Surely there is an OS out there smart enough to recognize separate processors with separate memory regions and assign physical addresses appropriately....
I mean how to convince the wife that we need a three-way?
"Waste not one watt!" - CZ
Curious if it can take multi-core cpu's. Having a 3way system with dual core opteron's sounds really nice.
I guess you shouldn't have tuned out, now look what you're stuck with.
Twice.
- None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton
How are the hypertransport links arranged?
How about a tetrahedron for four CPUs?
This architecture might be good for server applications - i.e. lots of instances of a single-CPU task.
However, it doesn't work that well for large apps that get parallelized across multiple CPUs. It turns out that most code, and most compilers, are good at splitting tasks in two - or in powers of two - so having three CPUs is no faster than having two.
I'm kinda new to enterprise servers. In the picture it looks as though each CPU has its own bank of memory. If so, is that efficient or not?
The game.
I'm probably missing something, but you can definitely have a fully-connected planar graph with four nodes. Make a triangle out of three, stick the fourth in the middle of the triangle and connect it out to the other three.
people are more surprised by the 3 CPU sockets than they are by the IB ports.
I thought IB was dead - replaced by 10gigE?
Clear, Dark Skies
Is that a Flux Capacitor?
Or better yet, bond the memory to the cores like Intel and IBM are working on.
"Victory means exit strategy, and it's important for the President to explain to us what the exit strategy is." G.W.Bush
I thought a while ago that AMD, specifically, should create a 3-core processor. Why? Because they can call it the TriAthlon!
From the picture the sockets look to be of the 940 type. Why not make an L1 version of this so you can at least get DDR2 or Barcelona running.
Something about weird non-standard systems gets me going. I think I want this system. Dunno what for or why,but I want it.
Blar.
A 3-way server could sell better than 4-way ones in China, as the number 4 in China is associated with death.
Does anyone know how the Opteron is designed? I'll give you a hint: Two cores/CPU, two CPUs/system is the optimum configuration. There is the ability to run signals across core cross-links, such that each core is only one step away from any other--in a four way system.
"People who do stupid things with hazardous materials often die." -- Jim Davidson on alt.folklore.urban
a 3 cylinder engine is smoother than a 4 cylinder, a 5 cylinder engine is smoother than a 6 (or an 8 for that matter). with an even number of cylinders, 1 is on a power stroke lined up with one on an intake stroke. with odd numbers, no 2 cylinders move at the same time.
"The Most Fun Possible on 4 wheels" is at SunBuggy in Las Vegas
Sorry, this just isn't true in practice. The Geo's, Suzuki's, VW's and Audi's which used odd-numbers of cylinders did so only for packaging considerations, not because the engineering (smoothness, etc.) made sense. They represented a cylinder added onto or removed from a 4 cylinder engine to meet displacement needs while still fitting in the car.
The smoothest piston automotive engines are in-line 6 cylinder engines or V-12 engines, which provide a power pulse with every 30 degrees of crankshaft rotation.
Anything else (3-, 4-, 5- cylinder in-line, V6, V8) has more widely-spaced power pulses and is less smooth. Most of these engines use a rotating counterweight (either an off-balanced flywheel or a separate rotating countershaft) in order to dampen these power pulses and increase smoothness. This works imperfectly and comes at the price of increased weight, rotating mass, and/or complexity.
Yet another approach which should be very smooth is the boxter design, which is used by Subaru and Porsche: cylinders are horizontally opposed at 180 degrees; this works quite well for Porsche, somewhat less well for Subaru.
Of course the smoothest automotive engine is the Wankel rotary currently used by Mazda - the "pistons" (rotors) rotate rather than reciprocate, and each power pulse lasts for 270 degrees.