SGI & NASA Build World's Fastest Supercomputer
GarethSwan writes "SGI and NASA have just rolled-out the new world number one fastest supercomputer. Its performance test (LINPACK) result of 42.7 teraflops easily outclasses the previous mark set by Japan's Earth Simulator of 35.86 teraflops AND that set by IBM's new BlueGene/L experiment of 36.01 teraflops. What's even more awesome is that each of the 20 512-processor systems run a single Linux image, AND Columbia was installed in only 15 weeks. Imagine having your own 20-machine cluster?"
This page contains images of the NASA Altix system. After reading the article I was curious as to how much room 10K or so processors take up.
http://www.busyweather.com/
1) This was fully deployed in only 15 weeks.
(Link)
2) This number was using only 16 of the 20 systems, so a full benchmark should be larger too.
(link)
3) The storage attached holds 44 LoC's (link)
Your hair look like poop, Bob! - Wanker.
when they hit the "TURBO" button on the front of the boxes they'll really scream.
They did! According to C-Net article they "quietly submitted another, faster result: 51.9 trillion calculations per second" (equivalent to 51.9 teraflops).
An effective signature identifies a particular user amongst a base of thousands.
The amazing thing about it is that it's built at a fraction of the cost/space/size as the Earth simulatior. If I remember correctly, I think they already have some of the systems in place for 36 teraflops. It's the same Blue Gene/L technology from IBM, just a larger scale.
There's also a dark horse in the supercomputer race; a cluster of low-end IBM servers using PPC970 chips that is in between the BlueGene/L prototype and the Earth Simulator. That pushes the last Alpha machine off the top 5 list, and gives Itanium and PowerPC each two spots in the top 5. It's amazing to see the Earth Simulator's dominance broken so thoroughly. After so long on top, in one list it goes from first to fourth, and it will drop at least two more spots in 2005.
Whoever corrects a mocker invites insult;
whoever rebukes a wicked man incurs abuse.
--Proverbs 9:7
You asked for it: "...with Columbia, scientists are discovering they can potentially predict hurricane paths a full five days before the storms reach landfall."
In other words: RTFA, that's exactly what they're using it for.
-- If no truths are spoken then no lies can hide --
Since no one else has answered my question, I'll post the results of searching on my own:
f or+supercomputer/2100-1010_3-5286156.html
http://news.com.com/Space+agency+taps+SGI,+Intel+
The cost is quoted in the article at $45 million over a three year period, which indicates that the "Columbia" super cluster gets a bit more than 1 teraflop per million dollars. That seems impressive to me, considering the overall performance.
It would be interesting to see how well the Xserve-based architecture held its performance per dollar when scaled up to higher teraflop levels...
This is why SMP computers tend to have 2 or 4 processors, and 8 at a pinch, but no more. It's just not practical, using current methods, to directly wire up more than 8 processors in such a tight package.
Lets say you have N processors, each capable of executing I instructions per second. Your total theoretical throughput would be N x I. However, this would only be the case if the system is 100% parallel, and no processor needed to communicate with any other. Rarely the case.
In practice, the function of performance to processors follows a distribution that looks a bit like a squished bell curve. As you increase the number of processors, the performance gain decreases, reaches zero, and actually becomes negative. At that point, adding more CPUs will actually SLOW the computer down.
The exact shape and size of the curve is partly a function of the way the components are laid out. A good layout keeps the amount of traffic on any given line to a minimum, minimizes the distances between nodes, and minimizes the management and routing overheads.
However, layout isn't everything. If your software can't take advantage of the hardware and the topology, then all the layout in the world won't gain you a thing. To take advantage of the topology, though, the software has to comprehend some very complex networking issues. It has to send data by efficient pathways.
If connections are not all the same speed or latency, then the most efficient pathway may NOT be the shortest. This means that the software must understand the characteristics of each path and how to best utilize those paths, by appropriate load-balancing and traffic control techniques.
If you look at extreme-end networking hardware, they can be crudely split into two camps - those where the bandwidth is phenomenal, at the expense of latency, and those where the latency is practically zero but so's the bandwidth.
The "ideal" supercomputer is going to mix these two extremes. Some data you just need to get to point B fast, and sometimes you're less worried about speed, but do need to transfer an awful lot of information. This means you're going to have two physical networks in the computer, to handle the two different cases. And that means you need something capable of telling which case is which fast enough to matter.
Even when only one type of network is used, latency is a real killer. Software, being the slowest component in the machine, is where most of the latency is likely to accumulate. Nobody in their right minds is going to build a multi-billion dollar machine with superbly optimized hardware, if the software adds so much latency to the system they might as well be using a 386SX with Windows 3.1
And that means Linux has damn good traffic control and very very impressive latencies. And it looks like these are areas the kernel is going to be improving in still further...
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)