Is There a Use for a Public Beowulf?
Anonymous Coward asks: "If the average Slashdot reader had access to a Beowulf cluster, what would they use it for? Everyone seems to think that Beowulf clusters are fairly interesting, but does anyone have any particular job they would assign to one? If someone were to create a publically accessible Beowulf cluster, what would you do with it? Is there even a demand for such a beast?" Now this would be a neat hack, but the logistics behind running such a thing would be immense. But even though something like this may not be needed now this might not necessarily be so in the future. Something like this might be a great tool for that novice astronomer in the neighborhood ... or aspiring mathematicians in high school.
you scored? was she hot?
be sure to kiss linux's ass a lot or else it won't get posted.
Having some experience with a Beowulf, the type of programs that my company is looking to run is Molecular Dynamic simulations. This very CPU intensive. Beowulf clusters are cheap and with the proper programming can be fast. But one must not make the naive notion that a 32 node cluster will make a program go 32 times faster. That is never the case as Amdahls law will tell you. Having put together a small 'wulf, the cost is extremely cheap. All you need is a motherboeard, CPU, RAM, NIC, and possibly HD ( not really needed ). SO for about $400 to $500 you can make a node. For about $600, you can make your "world Node". Switches are Cheap also. The idea of a public 'wulf is great, but the admin would definately be a chore. I am sure you could sell shell accounts for relatively cheap monthly fee. I would tend to think that most anyone having an interest in this type of computing might want to buy a "subscription", or you could sell "time" on the cluster for a price per CPU hour used. Most likely would be a niche market, but could make someone some money. There is a wonderful book, " How to build a Beowulf" from MIT Press, that explains very well how to build one and what to run on it. [ ISBN 0-262-69218-x) Just my 2 cents..... Mike www.avxm Partnership
I've been working on building a free public access beowulf cluster for some time now. It's only a 32 node P133 system, but it will be free.
I'll be submitting a post to slashdot as soon as it's up, but you can keep current by looking at www.ultrax.co.uk.
cheers,
Tim
--
tim@ultrax.co.uk
"If the average Slashdot reader had access to a Beowulf cluster..." :)
they'd want to get together with other Slashdotters with Beowulf clusters and make a Beowulf cluster out of them
I see even classic Slashdot is now pretty much unusable on dial up anymore.
This would be expensive, but it could work out. Figuring say at least $2500 for a decent node, like say maybe 128MB of RAM or more. Maybe like a PIII-600 or higher, although you could get off cheap by using Celeron's, but not a whole lot of L2 cache there. So if you want to have a huge cluster, lets say 300 nodes, thats $750,000. Now add in racks, and switches, etc, etc...we're talking over $1,000,000. Of course, if the programs are just running on the cluster, and don't have an interaction with the outside world, as little as a T1 could be sufficient. Also have to figure in monthly administration costs.
Okay, so you get funding, and let's say it goes over big and you have say 50 paying subscribers, we'll shoot small. Each paying $10/mo. That's $500/mo. total. That'll take a long time to pay off the investment. I do think though, that it could work with investment, and then have some paying subscribers, maybe they get priority over non-paying subscribers.
If I had a college student make the cases, I could see probably 500MHz, 128Mb and ~3Gb nodes (100Mbit) for about $400 each.
So a 10node system would be about $5k
and a 20 would be $10k
(upper limits on pricing)
Hmmmm.
Nathan Brazil?
I would probably use it for generating images or somin.. I have no clue what the average
Kenny
All Ford, All The Time
FordTalk
-- Do You Drive A Ford, Or Want To ? All Ford, All The Time - FordTalk
This means that it must have computation-intensive segments which need not be run in a sequential fashion. Anything with large amounts of number crunching is usually a candidate.
Second, you must rewrite the application using a parallel computing interface to take advantage of this. I've seen it done, and it's not exactly trivial.
Given all of this, however, I think a public Beowulf could be a wonderful way to introduce high-school students to parallel computing early on. Most high schools could not afford to dedicate even 16 machines to such a task, so having a public cluster available would make it easier. If there was an instructor who understood how to teach it, this could give students who are interested in scientific computing a great head start.
Having scored at a HS programming contest just yesterday, however, I realize that not all programming instructors are all that clued.
- Mathematical Simulations
- Video/Image processing
- Educational tool
That's all I can really come up with at the top of my head...In my opinion, maybe such a system (or project) shouldn't focus so much on supercomputing, but just computing in general. If you could have a system that would provide shell accounts, with full access to languages, compilers, etc. Prehaps this system could provide students a way to have easy access to programming facilities, UNIX mail, etc.
The problem is, most of the current desktop machines are powerful enough to eliminate the need for time-sharing computer power. (At least, for what a "free" public computer would be used for)
*shrug* Just my opinion...
I actually do a lot of numerical modeling, and I do some rather cheezy little things to take advantage of as many cpu's as I can find. But because I'm not a CS student I don't get access to many fast machines. I've been lucky enough that I can break up my problem and fit the data on to a floppy disk, drag it to the computer lab late on a friday night, beg for 6 machines and let it run a few hours. But to be honest it'd be very nice to get an my processing down in minutes not hours. Right now, my data set takes about 20 hours to process on a pent III 550. I've profiled it a lot and thats about the best I can do.
You're probably right if you go for unrestrained scheduling. But it is true that scheduling for computer time is about that strict on the resources I've been exposed to here at the U. (Mmmm, Cray T3E)
Perhaps if some form of subscription service was implemented to help subsidize the costs and regulate user usage times, this would be more feasible. Something like, "$20/month for x hours of time." And there are plenty of batch scheduler systems out there that can queue your job and run it so you don't have to actually be up at 4am on a MOnday morning... You could probably even write one that emails you the results when it gets done, which would be spiffy. I know I wouldn't mind spending that little to have access to that kind of computing power. Of course, also offering a mechanism to get free or subsidized time for those who can't afford the subscription cost would be neccessary.
--
News for Geeks in Austin, TX
First up, you need a stock of (lets say 32) reasonably powerful PCs that already earn revenue, or whose cost is already covered by some funding. They would typically be dual-boot WinDoze/Linux boxes, 400MHz, 128Mb, 6Gb for Win, 12Gb for Linux.
Examples I can think of or have seen are:
those in the basement of a University library where students type end-of-semester papers
those used for general office/secretarial work in a company
those in an internet cafe.
You get the idea.
These PCs would be in use from, say, 9h00 to 21h00 running WinDoze. Even the a careless or ill-intentioned user would prbably be incapable of damaging the Linux partitions from within Billy Boy's "Os". Then, at 21h00, the night-shift takes over. The machines are rebooted into Linux and become nodes in a Beowulf cluster.
There's a job-queueing server behind the firewall; users connect via ftp and leave their jobs in the incoming directory at any time. During the night, the job-queueing server submits these jobs to Beowulf. Users collect the results of the job that are found in the outgoing directory of the job-queueing server.
I'm sure there are many, many establishments with PCs sitting around idle for most of the time. I work in a company where around desktop PCs are used 8h00 - 20h00 Monday to Friday. That leaves 12 hours per night, plus all 48 hours at the weekend, i.e. 108 hours per week available.
The point is, these scenarios take PCs that are already paid for and use time that is usually lost. The only extra cost is the time it takes to set up the Beowulf cluster and implement the job-queueing server. There might not be any need to have an operator there during Beowulf operations, just during the boot phase.
You'd need to develop some sort of queueing system so that users could submit there jobs, specifying the number of processors, length of calculation, etc. and then the batch system could optimally run the jobs. At least thats how they do it on supercomputers that I have used.
What would be neat is if a LUG would team up with a school district and develop a system at a particular high school that would be a resource for the whole district to use for science projects or whatever. The LUG could help develop a cirriculum that could be taught in workshops at the different schools.
Scuttlemonkey is a troll
If 10 people all want to use it at the same time, what happens? They're all fighting for resources (RAM/CPU). May as well do it on your desktop. Or am I missing something?
A public-access Beowulf would need either a LOT of nodes, or very strict user scheduling (user A gets 2 hours on Tuesday, user B gets 4 horus on Wednesday, etc.)
Well, in addition to being a computer geek, I love chemistry. Problem is, most chemical simulation software (as opposed to pretty opengl visualization software) is either very expensive or very memory/cpu cycle hungry (model water on your PC, no problem, model a 40,000 carbon biopolymer, watch your Athlon go up in smoke along with your RAM... ;^) ), or (very frequently) both.
But if some entity (.gov or .edu) had an open access beowulf with things like NAMD, Gaussian, Molpac, Moldy, (etc etc etc) loaded on it, that would allow the chemically-inclined members of the populace to actually get real data right now instead of having to get a PhD in order to have access to a {Beowulf | Cray | whatever}.
Another option immeadiately presents itself: Massively Parallelized Povray. :^) For making pretty pictures of the molecule you just spent 5,000 cpu-hours modeling. Or 3d renderings of Natalie Portman's ass covered in hot grits, if you want to skip the chemistry bit...
--
News for Geeks in Austin, TX