Supercomputing: Raw Power vs. Massive Storage
securitas writes "The NY Times reports that a pair of Microsoft researchers are challenging the federal policy on funding supercomputers. Gordon Bell and Jim Gray argue that the money would be better spent on massive storage instead of ultra-fast computers because they believe today's supercomputing centers will be tomorrow's superdata centers. They advocate building cheap Linux-based Beowulf clusters (PCs in parallel) instead of supercomputers." NYTimes free reg blah blah.
Its nice to see some MS researchers going against the perceived stereotype and being open in their suggestions like this.
And I think they have a good point about massive memory being a very important part of computing advancement right now.
Atheism is a religion to the same extent that not collecting stamps is a hobby.
You're right, this could be more about impeding Sun and IBM than anything else, but I don't seem them recommending this as a one-size-fits-all deal - rather, they're making the case that clusters should be pursued over supercomputers for the data-intensive number crunching activities like nuclear explosion modeling, etc.
Stop by my site where I write about ERP systems & more
The BBC has an article on a group of scientists who have built a beowulf cluster of Playstation 2s.
You have to wonder why, all things seriously being equal, they don't recommend a *BSD-based solution instead of a Linux-based one. Esp given the near-equivalent functionality of the *BSDs, and the fact that MS has publicly endorsed the BSD license in the past, citing it as an superior alternative to the GNU License.
From the MS site, the Bay Area Research Center is "... a small Microsoft Research group located in the San Francisco Bay Area. We've been working on two large projects with other universities, companies, other Microsoft Research groups, and with Microsoft product groups in Redmond and Cupertino. These projects are Scalable Servers and Media Presence. "
I can't see scalability involving commodity hardware with MS OSes. In spite of Microsoft's desktop domination strategies, and small business server dominance (arguably, at least for the moment) they know they won't be taken seriously about clustering Windows 2003 server, purely because there is no design AFAIK in the kernel for operating in clusters in the first place. This is supercomputing using commodity hardware, not supercrashing using commodity OSes. Linux is perfectly situated to be recommended by anyone because it is not a competitors product, per se.
The homepages of the two men can be seen here, if anyone is interested in some of the more interesting history of the two. Little of it has to do with Microsoft propaganda and the marketing machine:-
Gordon Bell
Jim Gray
Conversion Rate Optimisation French / English consultant
Regarding Gordon Bell, having an award for chip design named after him doesn't make him an expert applied mathematics or scientific computing.
Mr. Bell is no Peter Lax, the applied mathematician whose proposal led to the creation of the national supercomputing centers.
Bell's suggestion is off-base. it will cripple efforts to solve fundamental scientific problems such as cardiac simulation only to facilitate Poindexter's wet dream, Total Information Awareness.
Raw speed will always be useful for problems that are hard to parallelize. Right now those problems (parts of crypto, some quantum physics calculations, etc.) are important scientifically, but away from the money.
Industry will spend R&D money on clustering for storage and reliability, without major government subsidy, because there's a crying need for it. How much government money went into Google/eBay/Amazon?
Government research is supposed to complement industry R&D - to be aimed at fields where the results are still important, but maybe not as profitable. This is why government should not abandon raw speed as a research goal.
To a Lisp hacker, XML is S-expressions in drag.
I've seen reports the US decided not to sell supercomputers countries like Pakistan and India. So my question is, Can this countries good enough job with Bewolf clusters ?? What is it they absolutely can not do without a supercomputer??
"The core of our argument is to give money back to the sciences and let them do the planning," he said.
He says it himself. Stop deciding where to put the money and let science decide itself.
---
"The chances of a demonic possession spreading are remote -- relax."
I agree to the point that money should be spent on data storage, but I'm not sure that money should be taken out of the "super computing" budget or wherever the money comes from. I think it should be another priority, but really, we need both. Clusters aren't the solution to every problem, and super computers have their place. All in all I think it amounts to we need more government spending in the IT sector, and better spending in general. The ISP where I work at is also a geological data and oil resevoir company. We recently did a project for the DOE and they budgeted us $2 Mil. just for a web page about the project. Ridiculous. That $2 Million would buy a pretty nice data storage center I would think. But I guess that's what happens when your govt pays $500 for a hammer.
Everyone is entitled to their own opinion. It's just that yours is stupid.
Actually, Beowulf clusters of 800-1,000 machines running Linux can be competitive with supercomputers.
I remember reading in Wired magazine a few years ago about a biotech company here in the San Francisco Bay Area that clustered several hundred machines running Pentium III 600 MHz CPU's to do DNA mapping and analysis--and the results were just as fast as most supercomputers costing several times what that cluster cost.
Imagine what a cluster of 700 to 1,000 blade servers running the latest Intel Xeon CPU's can do now! =)
Many scientists agree with the "moron-in-chief", as you termed him. Computer simulations provide the necessary assurance that the nuclear stockpile is safe and reliable today. There are serious questions about whether this is a adequate long-term solution. To have credible deterrence value, potential adversaries must believe that the weapons will work as advertised. Without data from physical experiments, computer simulations can be misleading or useless. The physical experiments are a necessary reality check. As the simulations become more complex and sophisticated, so does the uncertainty as to the accuracy of their results. At some point we may be forced to resume limited nuclear testing in order to check and validate the models used in the simulations.
Mea navis aericumbens anguillis abundat
Try using a Beowulf-style cluster for a CFD problem, and watch as all computation grinds to a halt as your processors and interconnects devote all their capacity to inter-node coherency and synchronization. You need a traditional supercomputer like an SGI Origin for jobs like that, because of its massive internal bandwidth.
That used to be true, but I don't think it is anymore. A high-end Beowulf compute node these days typically gives you 2 processors and 2-4 Gigabit Ethernet channels, going into a high-end switch. That seems like it's in the same ballpark as the SGI Origin, which gives you nodes with up to 16 processors, up to 12GB/sec aggregate memory bandwidth, and 8 channels going into the router. They aren't going to perform identically, but I think the differences are diminishing.
Furthermore, with distributed shared memory software, parallel linear algebra libraries, and SIMD-on-MIMD libraries, you can program it more or less like you would have a traditional supercomputer, without having to worry a lot about synchronization.
OpenMosix, in an upcoming release, even promises to give you address spaces that cross machines, giving you effectively a NUMA machine on a network of PCs.
B...S...; we use a small Beowulf (16 dual 1 GHz PIII boards with a fast ethernet backplane from PSSC) for oceanic numerical modeling and the problem scaled almost perfectly with number of processors.
Our models are 3-dimensional, but sudivision and message passing takes place only in the horizontal two-D direction. And message passing only needs to account for the boundary nodes.
Ease of use is a bit of a larger issue, however. For convenience sake I usually end up running at home on the dual Athlon and then doing big runs and batch jobs on the Beowulf.
... grumble, grumble, grumble, mutter, mutter, Millenium... Hand... Shrimp, I tol' 'em, I tol' 'em.
it's quite astonishing that these researchers, who are otherwise well-reputed, have missed the whole point of government sponsorship of super-* facilities: to do what can't be done otherwise. mostly, that means running traditional supercomputer jobs, those that are tightly coupled. people who have loosely-coupled jobs have long ago bailed from the supercomputing arena, and have been building their own clusters. similarly, there's no unique advantage to centralizing data storage, and a huge disadvantage (bottlenecks in and out).
I have to wonder whether Markoff badly munged the intent of the Gray/Bell paper, since the way he presents it is internally inconsistent. that is: the gov should spend huge bucks on massive centralized storage, but computing should be decentralized ala grids. oops, how is all that compute power supposed to move data to/from the three national data repositories? perhaps the central problem here is the fallacy shared by grid-o-philes: that networking is getting dramatically faster. take a look at your own network: if you are lucky enought to have gigabit to the desktop, when did that upgrade happen (probably 100 upgrade happen? what kind of speed did you get on your last big download? I've experienced a speedup of something between 10 and 50x in the past, say, 10 years. that's pathetic, when compared to the speedup we all have experienced in CPU power, memory size/speed, and disk size/speed.
there's no Moore's Law of networking: no n^2 process to keep accelerating (unlike die or disk densities). yes, there are technological improvements, and yes, you can gang cables together to scale bandwidth almost linearly. no such help for latency, though. and technological improvements are neither infinite nor increasing. that means that the network is becoming more of a bottleneck, not less.
Exactly. And there are quite a few data intensive applications out there that really need massive (say 20-40 terabytes now, petabytes in 2-5 years.) storage. But nobody cares about it. You pay for the CPU cycles, but the storage, heck that's free, an afterthought in your proposal. Most of the really interesting code I've seen is more into data then needing CPU, but that's just me. The Teragrid is coming to realize the need for lots of storage, too. Grid architecture is what is most difficult right now, because not too many people really know what they want a "grid" to be. Sigh.