NCSA To Build $53 Million, 13-Teraflop Facility
Quite a few readers submitted news of a distributed system to be built by four U.S. institutions (mostly) out of IBM computers, and paid for with a whopping grant. DoctorWho and november writes: "'The National Science Foundation has awarded $53 million to four U.S. research institutions to build and deploy a distributed terascale facility...' A link to the press release is here." An anonymous reader contributed a link to coverage on Wired, and GreazyMF to one of this story at the New York Times.
McKinley is the second generation Itanium CPU which is at least a year away from production. The SGI cluster is using the first generation Itanium CPU (also known as "Merced") which is actually just a technology demonstration, and not a full-blown product from Intel.
Big deal, the article claims its going to use McKinley based Itanium processors. Which are at least 2 years away from production. Plus they are using 1300 processors, while the one in Britain only has 152 processors. Quite a bit of a difference if you ask me :)
This is not a big SMP machine - the kernel does not have to manage all 1300 CPU's at once. Instead, there will be 1300 copies of Linux running (in the long run, you don't really want the OS involved much anyway)
It totally depends on exactly what they'll run on it, but based on what's currently running on the NCSA machines the concerns will be a high speed, low latency network (which they got in Myrinet - note that I didn't say cheap) and a good MPI implemenation to take advantage of it. Both LAM and MPICH have Myrinet-aware implementations, and they're both pretty fast.
You seemed to be slightly confused about how such clusters work. Linux is more than just a good choice, it is the definitive best choice in the supercomputing industry for clusters. If you ever goto the SuperComputing conferences, you would notice how there are many dozens of cluster companies, and they all use linux. Clustering is what supercomputing is all about now.
Linux does not need to efficiently utilize 1300+ itanium processors. This isnt a singular machine, it's a cluster. The linux kernel needs to be able to handle its individual node (consiting of a couple processors or so) efficiently, not all the processors. The distribution and parallelization is handled by other software, such as message passing interfaces like MPI. To be honest, linux is tested on many clusters with this many processors and whatnot, and it has been customized and hardened for use in large magnatude clusters. But like I said, it really isnt a kernel thing, its the other software in the package that controls distribution of processing payloads to the individual nodes.
Building an operating system for scracth is just a bad idea for something like this. They are not exactly something that can be built a couple weeks.Look at all the other OS projects out there besides Linux. Even with a few dozen constributors, alot of been years in the making, and are not any where near the level of linux, or an OS that could be used in such a fashion. Basically, it would take a very long time to build an OS from scratch that would do all the things necessary, and have the stability requirements for such a project.
Jeff Knox
This is impressive, but the nasa machine will blow it out of the water.
"It is a greater offense to steal men's labor, than their clothes"
We're heading towards a massive parallel global computing system controlled by no single entity
Unless, maybe, it's controlled by MS... Take a look at these two articles on The Register:
- MS poised to switch Windows file systems with Blackcomb
- How Microsoft's file system caper could wrongfoot the DoJ
All browsers' default homepage should read: Don't Panic...
That's NOT the OS software they're using; they're using Linux. Globus is NOT an OS. It's an add-on, and one that's been around for years and years now.