NCSA To Build $53 Million, 13-Teraflop Facility

← Back to Stories (view on slashdot.org)

NCSA To Build $53 Million, 13-Teraflop Facility

Posted by timothy on Friday August 10, 2001 @02:07AM from the lucky-numbers dept.

Quite a few readers submitted news of a distributed system to be built by four U.S. institutions (mostly) out of IBM computers, and paid for with a whopping grant. DoctorWho and november writes: "'The National Science Foundation has awarded $53 million to four U.S. research institutions to build and deploy a distributed terascale facility...' A link to the press release is here." An anonymous reader contributed a link to coverage on Wired, and GreazyMF to one of this story at the New York Times.

7 of 162 comments (clear)

Min score:

Reason:

Sort:

Re:for comparision by Anonymous Coward · 2001-08-10 04:16 · Score: 2, Informative

McKinley is the second generation Itanium CPU which is at least a year away from production. The SGI cluster is using the first generation Itanium CPU (also known as "Merced") which is actually just a technology demonstration, and not a full-blown product from Intel.
Re:for comparision by ajiva · 2001-08-10 02:30 · Score: 2, Informative

Big deal, the article claims its going to use McKinley based Itanium processors. Which are at least 2 years away from production. Plus they are using 1300 processors, while the one in Britain only has 152 processors. Quite a bit of a difference if you ask me :)
Linux doesn't have to scale by Anonymous Coward · 2001-08-10 03:54 · Score: 1, Informative

Linux doesn't have to efficently use all 1300 processors - they're not even going to try and do that. All Linux has to do is efficently manage one CPU. (Well, for space concerns the nodes are probably going to be dual-proc SMP machines...). You're thinking of something called Single-System-Image (SSI), and IMHO it's the wrong approach to take with this many machines.
This is not a big SMP machine - the kernel does not have to manage all 1300 CPU's at once. Instead, there will be 1300 copies of Linux running (in the long run, you don't really want the OS involved much anyway)
It totally depends on exactly what they'll run on it, but based on what's currently running on the NCSA machines the concerns will be a high speed, low latency network (which they got in Myrinet - note that I didn't say cheap) and a good MPI implemenation to take advantage of it. Both LAM and MPICH have Myrinet-aware implementations, and they're both pretty fast.
Re:Linux by Jeff+Knox · 2001-08-10 04:48 · Score: 2, Informative

You seemed to be slightly confused about how such clusters work. Linux is more than just a good choice, it is the definitive best choice in the supercomputing industry for clusters. If you ever goto the SuperComputing conferences, you would notice how there are many dozens of cluster companies, and they all use linux. Clustering is what supercomputing is all about now.

Linux does not need to efficiently utilize 1300+ itanium processors. This isnt a singular machine, it's a cluster. The linux kernel needs to be able to handle its individual node (consiting of a couple processors or so) efficiently, not all the processors. The distribution and parallelization is handled by other software, such as message passing interfaces like MPI. To be honest, linux is tested on many clusters with this many processors and whatnot, and it has been customized and hardened for use in large magnatude clusters. But like I said, it really isnt a kernel thing, its the other software in the package that controls distribution of processing payloads to the individual nodes.

Building an operating system for scracth is just a bad idea for something like this. They are not exactly something that can be built a couple weeks.Look at all the other OS projects out there besides Linux. Even with a few dozen constributors, alot of been years in the making, and are not any where near the level of linux, or an OS that could be used in such a fashion. Basically, it would take a very long time to build an OS from scratch that would do all the things necessary, and have the stability requirements for such a project.

--
Jeff Knox
for comparision by Alien54 · 2001-08-10 02:23 · Score: 4, Informative

For comparision there is the Cosmology Machine in Britain, which among other things consists of an integrated cluster of 128 Ultra-SparcIII processors and a 24-processor SunFire, and has a total of 112 Gigabytes of RAM and 7 Terabytes of data storage. With all of this power it can perform up to 456 billion arithmetic operations in a second (228 billion floating point and 228 billion integer operations)
This is impressive, but the nasa machine will blow it out of the water.

--
"It is a greater offense to steal men's labor, than their clothes"
Re:This is the future of the Internet by Brazilian+Geek · 2001-08-10 02:40 · Score: 3, Informative

We're heading towards a massive parallel global computing system controlled by no single entity

Unless, maybe, it's controlled by MS... Take a look at these two articles on The Register:

- MS poised to switch Windows file systems with Blackcomb
- How Microsoft's file system caper could wrongfoot the DoJ

--
All browsers' default homepage should read: Don't Panic...
Re:OS/software by Anonymous Coward · 2001-08-10 02:24 · Score: 4, Informative

That's NOT the OS software they're using; they're using Linux. Globus is NOT an OS. It's an add-on, and one that's been around for years and years now.