SGI to Scale Linux Across 1024 CPUs

← Back to Stories (view on slashdot.org)

SGI to Scale Linux Across 1024 CPUs

Posted by CmdrTaco on Sunday July 18, 2004 @03:37AM from the thats-a-lotta-chips dept.

im333mfg writes "ComputerWorld has an article up about an upcoming SGI Machine, being built for the National Center for Supercomputing Applications, "that will run a single Linux operating system image across 1,024 Intel Corp. Itanium 2 processors and 3TB of shared memory.""

8 of 360 comments (clear)

Whoa! by rylin · 2004-07-18 03:39 · Score: 5, Funny

Sweet, now we'll be able to run Doom3 at highest detail in *SOFTWARE*-rendering mode!
Ok by CableModemSniper · 2004-07-18 03:39 · Score: 5, Funny

But does it run--crap. I mean what about a Beowulf--doh!
Damn you SGI!

--
Why not fork?
Re:In other news... by levram2 · 2004-07-18 04:08 · Score: 5, Informative

The limit for Windows Server 2003, Datacenter edition for 64 bit Itaniums is actually 64 processors and 512 GB RAM. http://www.microsoft.com/windowsserver2003/64bit/i pf/datacenter.mspx
Re:Solaris by kasperd · 2004-07-18 04:20 · Score: 5, Interesting

Until the vanilla Linux kernel accepts these changes and scale, Solaris still has a big edge in this area.

I wouldn't be surprised to see these changes in the 2.8 kernel. And what will people do until then I hear some people ask. I can tell you that right now it is very few people that actually have the need to scale to 1024 CPUs. And that will probably also be true by the time Linux 2.8.0 is released. AFAIK Linux 2.6 does scale well to 128 CPUs, but I don't have hardware to test it, neither does any of my friends. So I'd say there is no need for a rush to get this in mainstream, the few people that need this can patch their kernels. My guess is that in the time from now until 2.8.0 is released, we will see less than 1000 such machines worldwide.

--

Do you care about the security of your wireless mouse?
The solution! by Sidicas · 2004-07-18 04:22 · Score: 5, Funny

"will run a single Linux operating system image across 1,024 Intel Corp. Itanium 2 processors..."
"The National Center for Supercomputing Applications will use it for research"

1. Make a system that generates more heat than a supernova.
2.Research a solution to global warming.
3. Profit!
Re:Similar software available? by dwgranth · 2004-07-18 04:39 · Score: 5, Informative

well, sgi uses/hacks NUMA, spinlocks, etc to make this happen in a more efficient manner. We recently had a SGI rep come and explain their 512CPU architechture at our LUG meeting... and he basically said that SGI has their own implementation of all of the clustering/cpu stacking techs... which they will eventually feed back into the community.. all good stuff.. understandably they will wait for a year or so so they can get their money's worth before they release their changes.
Re:In other news... by caluml · 2004-07-18 04:48 · Score: 5, Funny

We don't care about your actual facts for Windows - here at Slashdot we have FUD, rumour, and downright persistence. I think you will find if you read up on it more closely that 2003 Datacentre can only support up to 2 CPUs, and 256Mb maximum.
Please stop letting facts get in the way of a good MS bashing session.

Minister for Dis-Information.

--
Get your own free personal location tracker
Re:Scalability of applications by xtp · 2004-07-18 05:48 · Score: 5, Informative

SGI has had 512 and 1024-cpu MIPS-based systems in operation for more than 5 years. Much work was done on the Irix systems to initialize large parallel computations and provide libraries and compiler support for these configurations. One technique is to provide message-passing libraries that use shared memory. A better technique is to morph (slightly) parallel mesh apps so that each computational mesh node exposes the array elements to be shared with neighbors. No message-passing needed - you push data after a big iteration and then use the (really fast) sync primitives to launch into the next iteration. With shared-nothing clusters (i.e. Beowulf) a computation (and its memory) must be partitioned among the compute nodes. The improvement over a "classical" cluster can be startling especially with computations that are more communications-bound than compute-bound. This means there is no value for replacing a render farm with a big system. But there are big compute problems, e.g. finite element, for which the shared-nothing cluster is often inadequate.

With a single memory image system the computation can easily repartition dynamically as the computation proceeds. Its very costly (never say impossible!) to do this on a cluster because you have to physically move memory segments from one machine to another. On the NUMA system you just change a pointer. The hardware is good enough that you don't really have to worry about memory latency.

And let's not forget io. Folks seem to forget that you can dump any interesting section of the computation to/from the file system with a single io command. On these systems the io bandwidth is limited only by the number of parallel disk channels - a system like the one mentioned in the article can probably sustain a large number of GBytes/sec to the file system.

Let's not forget page size. The only way you can traverse a few TB of memory without TLB-faulting to death is to have multi-MByte-size pages (because TLB size is limited). SGI allowed a process to map regions of main memory with different page sizes (upto 64 MB I think) at least 10 years ago in order to support large image data base and compute apps.

When I used to work at SGI (5 years ago) the memory bandwidth at one cpu node was about 800 MBytes/s. My understanding is that the Altix compute nodes now deliver 12 GBytes/s at each memory controller. Although I haven't had a chance to test drive one of these new systems, it sounds like they have gradually been porting well-seasoned Irix algorithms to Linux. It is unlikely that a commodity computer really needs all of this stuff, but I'm looking at a 4-cpu Opteron that could really use many of the memory management improvements.

g