NSF awards $500,000 grant for Beowulf Cluster
ragnar! writes "National Science Foundation (NSF) awarded $500,000 to support a new parallel computing facility for Bartol. The "major research infrastructure" (MRI) grant will support a parallel system based on 100 linked processors, each of which will run at speeds up to 600 megahertz, connected by fast Ethernet hardware - very similar to the Avalon-Beowulf Cluster, developed by the Los Alamos Center for Nonlinear Studies and Goddard Space Flight Center. "
I suppose I knew it would be risky making a comment like this on a thread that actually _was_ about beowulf clustering. Just to clarify things, I intended it to be humorous - hence the part about gaining back my lost respect via first post :)
-Denor
I imagine a BSD variant would be best - still open source, but the TCP/IP stack is faster, so you'd probably lose less in inter-processor communication.
If you're running a private gigabit-class network (GigE, Myrinet, Giganet, etc.) and have a separate control network (typically Fast Ethernet), there's no reason to run TCP/IP over the high-speed network. In tht case, you could bypass the TCP/IP stack entirely and have the message passing system (typically an MPI implementation) talk directly to the hardware -- the "user space"/"OS bypass" approach. This is what Myricom's GM and the various VIA implementations let you do. Most of the larger Beowulf cluster installations are going with something like this.
I must admit that I find it very surprising that they're going to the trouble of buying fast DEC Alphas and then connecting them with something as pokey as Fast Ethernet. I hope their RMHD and other calculations are pretty close to embarassingly parallel (i.e. almost no IPC), or the network will definitely end up being a performance bottleneck.
"My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
If you want to check out more, check out http://www.capsl.udel.edu. The information on the web page isn't too organized, and all the info on the EARTH page is bound to be pretty old, but you might be able to get a decent idea.
Well, I sort of admin a Beowulf at work... it consists of 8 dual PII-450s with 512MB RAM apiece...
/usr, instead of each machine with its own /usr - or maybe not, depending on what seems to be the best way to do it).
It came pre-made, with a slightly-modified version of RH 5.2 installed - basically just an SMP kernel and some utilities and libraries. You don't really need any special software, except for PVM or MPI (we use MPI). The MPI distribution we use is LAM 6.2 - I'm sure you can hunt it down if you look around a bit (try google.com/linux).
I'm going to eventually set up another small cluster for testing and development purposes, once we get the first piece of software into operation (or maybe before, if I don't have a whole lot of stuff to do) - I'm planning on setting it up with Debian and a much more sensible layout (share
I wouldn't recommend RedHat, tho - adminning on it is not much fun. (no apt!)
We are a cash strapped beowulf group doing legit research (we actually made a little bit of knot (ap math) history) Everyone seems to like us and support what we are doing but no one is putting their money where their mouths are BALDRIC. We are doing this in the true spirit of beowulf.. Taking old surplus hardware from all around a university and putting it to good use .. All of the research findings must be public inorder for anyone to use it, we have a very open source attitude to the cluster.. We currently have 8 nodes up and running with 7 more waiting to get 'on the action'. But our problem is that we have *no* funding. The biggest support we've gotten is a tiny room (I'm talking 15' x 15' at the most) from our Computer Science department.
My question is: How can we get this kind of support??
The main result of this is that only the Government buys supercomputers, and nowadays they're mostly a boondoggle. SGI is currently trying to sell Cray, with limited success. Even Deep Blue is a cluster, made of stock CPUs on custom boards with additional custom hardware. The era of the classic supercomputer, with its huge mat of hand-wired connections, is over.
Lam can be found at http://www.mpi.nd.edu/lam/. It was originally written at the Ohio Supercomputing Center. It is currently being maintained by the Laboratory for Scientific Computing at the University of Notre Dame. By the way, we just released version 6.3 of LAM. If you're looking for a good way to see how LAM is communicating, check out XMPI, a graphical interface to LAM (as well as SGI's MPI implimentation). LAM is available as a tarball, i386 and SRC RPMS, and should be available in the Debian Potato archives. BTW - While you're visiting the LSC's pages, don't forget to see the world famous domecam.
I want a Beowulf cluster of THESE THINGS!!
:-)
Sorry, just couldn't resist... bye-bye karma
"Software is like sex- the best is for free"
-Linus Torvalds
This source of funding isn't that unusual -- the University of Virginia Centurion cluster was funded by two $450,000 MRI grants.
Almost no one uses Linda -- what would you think UDel does?
Most people with systems like this use a batch queue system like PBS and message passing libraries like MPI.
Beowulf like clusters become popular, Linux is
often used, but it have to compete with the large
and good old Unix suppliers. Take a look at:
http://www.fysik.dtu.dk/CAMP/valhal.html
Here you find a similar project, and even an
explanation why they didn't choose linux.
Seems like the commercial unices are running
out of time.
After reading the article, I couldn't help but wonder what type of software they would use to keep the processing happening smoothly. Parallel processing in the large such as this is a whole area of study on it's own, I would assume they would implement some sort of process control software that would model the virtual OS Linda, but I don't see any reference in the article as to how they are handling this.
Great! Let's just hope none of them have been listening to the ACs here on slashdot or they'll try to build it out of iMacs running linux or palm pilots....
I'm sorry folks, but I'm just not creative enough to come up with a way to somehow make a beowulf cluster of these. I apologize for not being able to contribute to the obligatory beowulf cluster thread, and hope that I can earn back all of your respect by getting a first post somehow.
-Denor
... these machines ARE massively parallel supercomputers, if you build them big enough and you use the best commodity networking (like myrinet).
People making coments about the amount of hardware/support that can be had for $500,00 should remember the realities of grant funding at a University in this country:
So, a $500k grant is about $250k after ICR. Then say you fund 2 peole at $35k/year to help build and run it. Now you're down to just $110k for hardware. Even with a "best case" run of the numbers and cheap people, you're still not going to have more than $150k for hardware in this grant.
Also keep in mind that this grant's funding is spread over 3 years.
100 600MHz PCs is going to run about $100k even before you start buying networking equipment, backup equipment and power supply/protection equipment.
In all likelyhood, Bartol is going to need additional funding (possibly x% matching money from the state or other similar grants) to make this a realitiy.
Just thought people should know that when you get a $500,000 grant, you don't just get a check for $500,000 to blow on hardware.
----
Life if possible, art at any cost.
The article wasn't specific as to hardware, but since they said it was "much like the Avalon cluster" they might well be using Alphas, not Pentia. $5k/box would be a good price if they are using the newer Alpha boxes based on the 21264 chip (which is better than twice as fast, on average, than the 21164's used in Avalon, even at the same MHz).
you can't moderate in any discussion you post in. I suppose you could do that with two accounts. Just use each account to moderate up the other one. The implications are rather interesting actually...
--
grappler
Vidi, Vici, Veni
Because any post associated with beowolf clusters is normally a troll, the moderators are having a hard time moderating this particular topic...
:)
Their first instinct is, "Oh God, it's a beowulf post - moderate down, moderate down." It must be a hard itch for them not to scratch in this case
From the article it is hard to tell exactly what this money was for. Was it a $500,000 payment for a Beowulf cluster, for Bartol to run the cluster, or for Bartol to build and run the cluster?
If they are purchasing hardware for that amount, they're getting ripped, because I'm thinking all the needed hardware, including the boxes and the networking equipment, can be had for under $150,000 (they could get a nice bulk order discount).
My figure wouldn't include costs like assembly/setup labour and the OS (heh) but half the work is opening the boxes...
Seriously, once the system is going and the scientists have their apps setup, all you need to do is make sure it doesn't overheat. (We are talking about a massive number of x86 systems, here).
Disclaimer: I really don't know what the hell I'm talking about in this post. If someone could inform us what it costs to maintain a project like this, please post.
Boo!