Ask the Man Behind the NOAA's New Beowulf Cluster
Greg Lindahl sent in this story last September about a massive Alpha Linux cluster that's being built by HPTi for the NOAA's Forecast Systems Laboratories. What Greg forgot to mention when he submitted the original story is that he's the project's chief designer. What with all the Beowulf (and Alpha) interest around here, we figured he'd make a great interview guest, especially now that the project is well under way. Please post your questions below. Answers to 10 - 15 of the highest-moderated ones should appear within the next week.
What do you see as the future for the alpha? Will Compaq let it die a slow and unfortunate death? Will it continue in its current niche of "High Performace Technical Computing", and be out of the reach (pricewise) of mere mortals? Or will Compaq ever market them to a wider audience, and hopefully bring the price down?
Many people I know wants an alpha. No one I know thinks they can afford one.
--Bob
1^2=1; (-1)^2=1; 1^2=(-1)^2; 1=-1; 1=0.
I have recently become gainfully employed in a capacity which will require me to administer a Beowulf cluster. My question, Mr. Lindahl, is how you feel about the various competing technologies for distribution of computation. In particular, do you feel there is much to be gained from the work of the MOSIX project at The Hebrew University of Jerusalem? Traditionally tasks for Beowulf style supercomputers have required specific programming in MPI or PVM calls. MOSIX endeavors to provide adaptive load-balancing with process migration. Essentially this allows the programmer to forgo the hassle of parallelizing his code. Rather, he can now simply fork() or create SMP threads and the OS will automatically handle distribution of those processes over the cluster. Do you feel that this is a worthwhile avenue to pursue for scientific computation or are there issues which make MPI or PVM still a substantially better choice? Thank you for your time.
--
Brandon D. Valentine
The alpha CPU runs circles around any 'cheaper' (read: x86) CPU while doing just that.
There are some chips that are even faster than the alpha (HP has some nice CPUs), but in relation to x86 the Alpha nodes aren't that much more expensive... Not unlike HP-RISC chips, which probably cost a multitude of the Alphas.
Another reason could be that the academic users use the alpha a lot to do number crunching, which would be of help in the availability of optimised libs. Not forgetting the Compaq tools and math libraries, which simply rock.
--
Okay... I'll do the stupid things first, then you shy people follow.
Okay... I'll do the stupid things first, then you shy people follow.
[Zappa]
This brings to mind a more fundamental and philosophical question - Does your computer (or any one that's possible to build) have enough horsepower to out-calculate that analog computer called reality that we all know and love so very much?
make world, not war
The raw performance of the hardware being used for scientific and parallel programming has improved by leaps and bounds in the past 10-20 years. However, most folks still program these supercomputers much the same way they did in the 80's: Unix, Fortran, explicit message passing, etc.
You have worked in research with Legion and in industry at HPTi. Do you think there is hope for some radical new programming technology that makes clusters easier for scientists to use? If so, what do you think the cluster programming environment of tomorrow might look like?
One of the weaknesses for beowulfs seems to me to be a lack of decent (job) management software. How do you split the clusters resources? Do you run one large simulation on all the CPUs, or do you run 2 or 3 jobs on 1/2 or 1/3 of the available CPUs?
Is there provision for shifting jobs onto different nodes if one of them dies during a run?
This guy is a gem. Not only is he a noted designer of world-class supercomputers, he has a sense of humor.
/. would interview me, but...
Don't take anything, especially life, too seriously.
BTW, I would have no reservations in taking first post on an interview with myself. Not that
.sig: Now legally binding!
Most of the IS/IT trade publications and media usually do not fully comprehend the differences between massively multiprocessor systems with shared memory and those clusters of systems and processors with their own local memory, or supercomputing clusters. This is quite evident in a recent article regarding the TPC-D performance between clusterd Compaq Wintel/MSSQL systems and a single, shared memory Sun/Oracle system where the Compaq cluster outperformed the Sun solution in 2 of the 10 standard benchmarks. Basic laws of statistics negate those results because the design of the two systems were not of the same class -- e.g., to be fair, Microsoft-Compaq should have compared performance to an equivalent cluster of lower-costing Sun systems (let alone a Lintel cluster!).
As you and I already know (and I hope everyone reading this now knows), there are several applications where lower costing clusters cannot always do the job of more costly shared memory systems as efficiently (e.g., low-latency, real-time applications such as real-time simluations, come to mind). That is why the Compaq Wintel cluster scored drastically far below the shared Sun system in many of the other 8 benchmarks in the aforementioned study.
As such, I am interested in the considerations the NOAA has had to make in evaluating shared memory versus clustered systems. Specifically:
- What are some of the NOAA/NWS programs and software that will not be applicable for execution on this new cluster?
- What [estimated] percentage do these programs make up of the total applications the NOAA uses, both quantity and in time of execution?
- What [assuming] shared memory systems and solutions does the NOAA use for these applications?
Of course, the lower the number in the first two questions, the more advantageous the existence of a supercomputing cluster is to an organization. For example, in the aerospace industry, the quantity of cluster-efficient applications may be small, but the total execution time of a "run" of these select applications can greatly outweigh all others. Again, speaking from my aerospace background, such applications like Monte Carlo, CFD, 6DOF (six degrees of freedom) runs and simulations are extremely time consuming. Monte Carlo is an ideal application for clustering since each "run" result is complete independent from another (almost linear performance improvement when distributed in a cluster). CFD is very close to linear (~90% efficient) and 6DOF, I would guess, could be as high as 60 or 70%, if it is written to take advantage of distributed computing systems.The main reason why these engineering applications are so efficient on clusters is the nature of how they use data. They need little to start crunching, and return little. But during the run, they create and use massive ammount of data, which is all "temporary." This is in stark constrast to databases (such as those targetted by the aforementioned TPC-D benchmarks), where data, not computational results, is the focus of the application. By using supercomputing clusters for computational-driven engineering apps, we can save both money on systems and the time of our engineers waiting on results.
As such, I am interested in the overall increase in efficiency you are seeing after the introduction of supercomputing clusters. Specifically:
[ I now work in the semiconductory design industry, and we are looking at acquiring some Linux supercomputing clusters speed up the runs of EDA (electronic design automation) tools like those for IC layout and the like. ]
I appreciate your time and wish your organization and yourself the best wishing in our Linux and OSS endeavors.
-- Bryan "TheBS" Smith
-- Bryan "TheBS" Smith
Independent Author, Consultant and Trainer
MPI gives you native, transparent, and fast parallel functions at the cost of dramatically increased programmer headache. PVM gives you kinda-portable, relativly obvious parallel functions at the cost of overhead.
There are other systems; AFAPI (dead), MOSIX (SLOW but totally transparent), etc.. I've played with tham all, and I rather like MPI for dedicated clustering and MOSIX for casual 'I need a fast make World' stuff..
.sig: Now legally binding!
The overall performance will depend on the type of applications you are running. To that end I also wondering if are you planning on running any standard benchmarks and making the results public? I would be particularly interested in seeing the results from the TPC-C benchmark (http://www.tpc.org). I'm not sure if it will be even possible to run this benchmark on your system since I don't know how it is configured but it would be nice to see how your system compares in terms of enterprise computing solutions.
Frylock: That's not a toy!
Master Shake: You say that about everything you own. You should own toys. They're fun.
You can set up a nice differential equation to find the optimum # of nodes and $ per node.
Did you use any sort of optimization algorithms in designing this system? Not just for the number of nodes, but also for quality vs. price, or any other areas.
--
I don't want to Ask The Man, I want to Stick It To The Man!
-- Ed Avis ed@membled.com
first post?
How do you think the new wave of Beowulf clusters will effect all of supercomputing, not just forcasting?
There are four boxes used in defense of liberty: soap, ballot, jury, ammo. Use in that order.
TWW
"Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
How did you come to be the project's chief designer? I'm curious to know the background of anyone who gets to work on such an interesting project.
Got Rhinos?
Are other government agencies going to duplicate your work? Have they already? If so, for what purposes?
www.alarmist.org
can you give us some information about what exactly is in this cluster? what alphas, etc?
Are you ever tempted to swipe some of that computer time for say raytracing (povray clusters well, since tasks can be divided by line or frame, and the input is fairly granular...) or any other sort of task.
Secondly, what kind of cooling do you use to keep all those CPU's happy?
---
Play Six Pack Man. I
I built a Beowulf-style cluster this past semester in college for independent study. One of the biggest hurdles we had was picking out a message passing interface such as MPI or PVM. Configurining across multiple platforms was then even worse (we had a mixture of old Intels, SunSparcs and IBM RS/6000's). What do you see in the future for these interfaces in terms of setup and usage and will cross-platform clusters become easier to install and configure in the future?
Some people take their .sig way too seriously
It is known that a cluster can be put together for a relative low cost. At least when compared to supercomputers. Do you see cost as a reason this magnitude of computing power hasn't been available to much of the commercial region? Do you think that these current successful low-cost implementations will speed development of any commercial applications for tools such as these? And what commercial situation(s), if any, do you see a cluster being applied to?
Ok, a two parter:
As I understood it weather models are a fairly hard thing to paralleliz (how the hell do you spell that?) because of the interdependence of pieces of the model. This would seem to me to make a Beowulf cluster a tough choice as it's inter-CPU bandwidth is pretty low right? And that's why I thought most weather prediction places chose high end super-computers because of their custom and expensive inter-CPU I/O?
Second part: Is weather prediction getting any better? Everything I've read about dynamic systems says that prediction past a certain level of detail or timeframe is impossible. Is that true?
Disclaimer: I might be dumb.
Hotnutz.com - Funny
I am curious as to whether (no pun intended...:)) or not you have ever done any testing to see if a distributed.net type enviornment would be useful for your type of work?
It seems to me that there are more than a few people who are willing to donate spare cpu cycles for various projects. At a minimum. you could concentrate on the client side binaries and not worry as mouch about hardware issues.
In the immortal words of Socrates, who said; 'I drank what?'
A recent Miami Herald article talked about the use of an IBM RS/6000SP to process weather data. It's close to 30 times as fast as the previous machine. Though I'm curious as to how this machine compares to the NOAA supercomputer, I'm really interested in how much better predictions can get with systems like the one at NOAA. How much (statistical) confidence is in current weather prediction over 1 month? 3 months? 1 year? How much will the NOAA system expect to improve weather forecasting.
BTW, As a Florida resident, accurate forecasting of hurricane paths could save millions of dollars. Thanks for your time. Kwan
Link to Miami Herald article from May 21, 2000
Congratulations on NOAA, BTW. As a former UVA CS student, its nice to see your work with Legion and beowulf systems continue to succeed. For people outside of the clustering community, take a look at http://legion.virginia.edu .
Recently I have been seeing the beginnings of business adoption of beowulf style systems, as they are finally realizing the benefits which the scientific community has been enjoying for years ;). Up to now, however, most of the tools for beowulf work, such as schedulers, message passing APIS's, administrative tools, and file systems have been geared towards scientific problems, often lacking such features as fault tolerance or security. Has there been an anti-business bias within the beowulf community? And, if so, what do you think will be needed to change it?
And, as an unrelated question, if you could see one advance in beowulf technology happen tommorrow, what would it be?
Before deciding on a beowulf clusters, what different options did you explore (Cray? IBM?), and what motivated you to choose the Beowulf System?
Additionally, to what would you compare the system that you are planning to build, as far as computing power is concerned?
Thanks,
VVulfe
Having built a few small ones, I got to know quite a bit about Linux clusters, and about programming for them. Therefore, this question has nothing to with clusters.
What was the biggest 'WTF was I thinking' on this project? I'd imagine there was a fair amount of lateral space allowed to the designers, and freedom to design also means freedom to screw up.
.sig: Now legally binding!
Seriously, what was the most challenging of maintainence tasks you had to undertake? Do you anticipate that a trade off point where the number of machines makes maintanence impossible? Do you have any pearls of wisdom for those of us just involved in the initial design of such clusters, so that maintaining it in the future is less painful?
I've noticed that most Beowulf clusters (although admittedly, I don't know about the NOAA one) tend to use standard desktop mini tower cases. Is there any particular reason for doing so, as opposed to going for rackmount servers. I'd expect the latter to provide a much more space efficient system, and in my experience, rack mounted cases tend to have better cooling than their desktop equivalents. This should be particularly noticable when using large number of machines in close proximity. Is it purely a cost issue?
"The invisible and the non-existent look very much alike." -- Delos B. McKown
First off, from what I have gathered, it was not clear if you background was weather or not, so, I am hoping it is. Here are a couple of questions:
1) Having just graduated with a BS in Atmospheric Sciences, I have had a chance to take numerical weather prediction courses over the last five years. With this new influx of processing power, where do you see numerical models going in the future?
2) Somewhat related to 1), with mesoscale models becoming more popular (MM5 quickly springs to mind), where do you see the balance of processor time going to these models. The ability to get a model out faster, or to compute more variables to provide a more accurate forecast at the smaller scale?
3) Not knowing too much about the origins of these models, I was interested to find that a person could get the source to the MM5 and modify it as they see fit. Will models developed in the future follow this same trend? With powerful computers becoming affordable, it would not be that difficult for a university to build one and run a particular model for their area (I believe that Ohio State is doing it, again, with the MM5)?
Thanks!
Bryan R.
Bryan R.
The price of freedom is eternal vigilance, or $12.50 as seen on eBay.....
Whatever your answer, I think it's fair to say that there is something about this system, which uses an open-source clustering technology, built on top of an open-source operating system, which made it best for your needs; maybe it was the reliabilty, or the ability to modify it as needed, or maybe just the lower dollar cost to your department.
My question then, is this: have you given any thought to how you can help advance open source software, to give back to the community that created this tool? Getting the word out that the U.S. Goverment uses Linux for its cutting-edge weather forecasting tool would be an enormous PR win for the folks that still have trouble convincing their management that OSS software can be trusted for "real work." I'm not suggesting putting a picture of 'Tux' on every weather forecast, (although that would be kinda cute,) but it would be great if NOAA press releases about the project gave at least passing mention to the fact that the project will be benefitting from open source software.
I realize this is not something you would normally do for, say a Cray or IBM, but those are commercial enterprises, with their own PR budgets; they don't need your help to get their word out. OSS needs all the help it can get, so that future projects like yours can continue to reap the benefits.
As I understand it, MPI/PVM (which most [all?] Beowulf clusters consist of) require special programming techniques. That is, you can't just take your number crunching app, put it on the cluster and type "make" for it to work.
So who do you have doing the programming for this thing? Did they take special training, or is it easy to pick up for any programmer?
Finally, given the possible difficulty (and speciality) of using the above, has anyone considered using DIPC?
--
Have Exchange users? Want to run Linux? Can't afford OpenMail?
Linux MAPI Server!
http://www.openone.com/software/MailOne/
(Exchange Migration HOWTO coming soon)
Besides that, best of luck, and I can't wait to see the final product. ;^)
-legolas
i've looked at love from both sides now. from win and lose, and still somehow...
Why did you choose Alpha processors for the individual nodes? Why not something cheaper with more nodes, or something more expensive with fewer nodes? What other configurations did you consider, and why weren't they as good?