SGI Introduces World's Densest Server
Twirlip of the Mists writes "Today SGI announced the Origin 3900 server, the world's densest computer. How dense? How about 16 MIPS R14000A processors and 32 GB of RAM in a 4-rack-unit 'superbrick,' for a grand total of 128 processors and 256 GB of RAM in a single rack. That makes the new machine the densest single-system-image computer in the world; it's even denser than most blade systems. Just for fun, the server also includes a whole bunch of 64-bit, 133 MHz PCI-X slots (from 11 up to hundreds and hundreds, depending on configuration). There's coverage of the announcement on ZDNet, CNET, and InfoWorld, as well as on SGI's own site."
Obviously, that should be 64 gigabytes of RAM, not 64 megs.
Interesting thing about this system will be, rather than the maximum RAM capacity, the minimum RAM required. The original Origin 3000 required some minimal amount of RAM-- 256 or 512 MB or something-- for every four processors. I'm not sure if this new model has the same requirement, but I'd imagine that it does. (It's an architectural thing. Every node board has to have some RAM on it, because that node board may be nominated at boot time to act as the boot master, among other reasons.)
If that's true, then a 128-processor system would require a minimum of either 32 or 64 GB of RAM, depending on whether you can put 256 MB on a node board.
I write in my journal
Just an FYI - the CNet article (linked above) talks about its possible use on oil rigs - that type of mapping usually takes some horsepower and as usual, anything that is sea-based will be somewhat cramped for space!
www.clustercompute.com
well, on a per mips basis maybe, but then again I could use faster cpu's today.
MP3 Search Engine
There are 128 cpu intel/amd solutions that fit in a single rack. I know of at least 3 companies that produce them and they are cheap.
There are a few blade systems that can squeeze 128 or more processors into a rack, but those are blade systems, not single-system-image compute servers. You can't use a blade server to do the job of an Origin 3900. (Of course, the converse is also true; you wouldn't buy an Origin 3900 to do something you could do with a blade server instead.)
SGI tends to produce exactly what the customer wants. It's just that their customer is more often than not the federal government, or a very large corporation. It's not well-known-- in fact, for a time it was classified-- but SGI designed, manufactured, and sold an entire line of what were basically DSP coprocessor units specifically for Lockheed's satellite division. Called the "tensor processing unit," each one was basically an expansion module for the Origin 2000. SGI built it just like a commercial product, complete with documentation and everything, and manufactured them in large quantities. It's just that you couldn't buy them unless you were Lockheed.
It's only when SGI tries to branch out that they do poorly. I don't know WTF they were thinking when they decided to try selling inexpensive (relative to other SGI products) workstations running NT or Linux. That was just insane. But as SGI strips more and more of that BS away, they get closer and closer to being a sound company again.
I write in my journal
I'd worry about the bus chipset heating up more than the processors.
It does. The Bedrock chip is both considerably larger and considerably hotter than the R14000A is. (Bedrock is the memory controller, node crossbar, and "bus" arbitrator.)
As to your other comment, SGI got a lot for their money when they bought Cray back in the mid 90's. They took a lot of good Cray technology-- like crossbar-based NUMA system design principles-- and incorporated them into their large server systems. I believe SGI was the first company-- other than Cray itself-- to break the one-hundred CPU barrier on a single system image. (The T3 series was a monster, but I don't recall exactly how many CPUs you could cram into one.)
I think it was Seymour himself who once said, "A supercomputer is a device for turning compute-bound problems into I/O bound problems."
I write in my journal
Close, but no kewpie doll. A superbrick hold 16 processors (not 64; I think that was a typo on your part), and connects externally via NUMAlink to other superbricks. But, if I remember my numbers right, the maximum memory latency across the longest multi-router NUMAlink hop in a 128-processor Origin 3000-series system is less than the normal processor-to-processor latency in the Sun Fire 15K. NUMAlink is incredibly fast. The ratio of local memory latency to remote memory latency is something 1:1.5, as opposed to about 1:10 in IBM's and Sun's big systems.
I write in my journal
Sure, if you buy a ton of second-hand peecees and glue them together in a Beowulf, you have lots and lots of flops (= CPU power).
;-)
But the flops are not everything. The problem with clusters is the network latency when the nodes talk to each other. That latency is small for your average network application, but immense for a supercomputer trying to make all its CPUs talk together. This is why there are entire classes of problems that cannot be solved properly on clusters (non-parallelizable problems).
As opposed to that, an SGI supercomputer has the inter-CPU latency orders of magnitude lower. Same GFlops per total (same CPU power), but certain problems are solved orders of magnitude faster.
That's the power of latency.
There are entire classes of problems which cannot be solved fast enough on clusters, but only on single-image systems. Anything that cannot be made into a parallel algorithm falls into that category.
With networked clusters you're always going to have latencies, orders of magnitude higher than with single-image supercomputers.
Sure, perhaps in 10 or 15 years, we're going to have network latencies as small as those of a PCI bus, but i'm not really talking about future that far. Until then, clusters will be slow for certain problems. Deal with it.