SGI Demos 64-Proc Linux Box

← Back to Stories (view on slashdot.org)

Posted by ryuzaki0 on Monday September 9, 2002 @04:54AM from the hardware-to-lust-after dept.

foobar104 writes "Details are scarce, but SGI announced this morning that their prototype Itanium 2 system has demonstrated more than 120 GB/s to and from main memory on the STREAM TRIAD benchmark, which is the fourth best result in the world. For comparison, the Cray C90 sustains 105 GB/s, while an even larger Sun Fire 15K clocks a measly 55 GB/s. The interesting part? The system wasn't running IRIX, SGI's proprietary version of UNIX. It was running Linux. More information on STREAM TRIAD, including results from other systems, is available here. The system, incidentally, was an Origin 3800 straight out of manufacturing equipped with Itanium 2 processor modules. SGI will start selling the systems early next year."

7 of 253 comments (clear)

Min score:

Reason:

Sort:

So what is faster than it in the TRIAD? by Neon+Spiral+Injector · 2002-09-09 05:01 · Score: 5, Interesting

That was my first though. So it beats a C90, but what is faster?

Found the answer here.

And if you were wondering about a Beowolf cluster of these, the top ten ranking excludes "cluster results".
impressive w/Linux by d3xt3r · 2002-09-09 05:04 · Score: 5, Interesting

What is most impressive about this to me is that they did it using Linux over IRIX. Why? Because this has provent to be Linux's weakest point: scalability. Most of the changes in 2.5 are concentrating on scalability, could this be reaping those benefits?
Linux running at 120 GB/s with 64 processors is impressive for an OS that has been criticized as inefficient when running on more than 8.
I would be very interested to know what version of the kernel they are using.
1. Re:impressive w/Linux by tempest303 · 2002-09-09 05:15 · Score: 5, Interesting
  
  I'm wondering the same thing - I wouldn't be surprised if this wasn't a very customised 2.4/2.5 hybrid or some such.
  
  What I'm more curious about is what the licensing of all this will be like... are they just doing standard kernel patching, in which case the changes might get rolled back into the vanilla kernel? I'm a little worried that they might be doing it all via binary-only modules, which means that Linux proper gets none of the changes rolled back in... :-( I'd be somewhat surprised if SGI did this, though - they seem to have been pretty damn OSS friendly. (XFS!)
  
  --
  The Free desktop that Just Works
2. Re:impressive w/Linux by Angry+White+Guy · 2002-09-09 05:28 · Score: 4, Interesting
  
  I think that the big question is will this get Big Iron back into the rendering farms, and what will be the effect?
  With the major animation companies going to Linux server farms to save cost and get better performance, maybe moving back away from x86 architecture to these large machines may be beneficial cost/productivity wise.
  
  --
  You think that I'm crazy, you should see this guy!
Historical comparison... by Durinia · 2002-09-09 05:05 · Score: 3, Interesting

...interesting that SGI chose the Cray C90 - a system released in *1991* - to compare against. It's nice to know that it's only taken them 10+ years to catch up. :)
They also mention the SV1, which is a "low-end" Cray. I'm curious how the new X1 (nee SV2) does on the STREAM suite.
It's good to see that their "scalable linux" work seems to be doing pretty well! I'm sure it was much easier for them to use the IA-64 port of Linux than to port IRIX...
Re:Two things by foobar104 · 2002-09-09 05:40 · Score: 4, Interesting

The second thought is: can it be partitioned?

Since this machine is a standard Origin 3000 with McKinley processor modules, I'm going to assume the answer will be yes. You can partition an O3000 down to a single processor brick + base IO brick, so I imagine that SGI will implement the necessary software bits to make that happen on the SN1-IA systems. I know there are both user space bits (mkpart, partmgr) and kernel space bits (the TCP-over-NUMAlink driver).

I personally have only seen partitioning used on HA systems and lab systems. For a fully fault-tolerant N-processor system, you can buy one 2N-processor Origin and partition it down the middle. The two nodes can run in parallel, passing data back and forth over the NUMAlink via TCP/IP, until one goes down. Also, partitioning is great in a lab environment. It's nice to be able to carve up a big multiprocessor system and give each user a 4-processor (or multiple of 4) node.

I wonder what linux apps would someone run on a system this big?

Anything you'd run on an IRIX system of that size, I'd imagine. I believe-- not positive-- that MSC has already released Nastran for Itanium 2 Linux. (Nastran is a computer-aided engineering tool used extensively in the automotive industry, and other manufacturing industries. It's used for things like stress, heat transfer, and vibration analysis.)

And, as long as the Fortran compilers are worth a damn, you can run just about any other scientific, analytical, or technical software, I'd imagine.
there's no point in doing that by halfelven · 2002-09-09 05:57 · Score: 5, Interesting

The whole point with the SGI supercomputers (there are Origin servers running Irix on 1024 processors) is that there's one single copy of the OS running across all those CPUs, and the entire memory is available to all CPUs on the same piece of hardware. That means, any CPU can access any piece of information at the speed of mem-IO, and you can easily create a large matrix (think many tens or hundreds of GB) to keep all your data in one piece.
Networked clusters (Mosix, Beowulf) split the CPU bunch across the network, and the memory is split too. That means there's a huge latency when a CPU wants to access data that happens to be on a different node on the network: the network latency is many times larger than memory latency.

There are problems that simply cannot be solved on networked clusters, precisely because of network latency. While true supercomputers (all CPUs on the same machine) do not have this limitation.
Well, ok, so you can split the matrix across nodes in a Beowulf, but even if you have the same CPU power as the SGI supercomp, you're going to solve the problem several times slower (if not several orders of magnitude slower). Such is the importance of latency.

This is why there's no point in clusterising this kind of computers: you lose their biggest advantage: single OS copy, all memory on the same machine.