Building a Teraflop Donated Beowulf Cluster
A number of people have written in about the new Teraflop Project aka Project Übermensch. It's an interesting idea-these folks want to get essentially the equivalent of 10757 AMD KII 350s, and turn it into a monster Beowulf cluster. In exchange for donating a machine to the project, you get a month of full bore processing power from your old machine, as well as a for-life e-mail address. They've got an address on the site to send machines to-but how often do you think one of these things is gonna break? I'd hate to sys-admin thousands of old boxen.
traceroute www.teraflop.org...
www.teraflop.org = 208.222.100.35
If you open up http://208.222.100.35/, you see an offer for free business website hosting. If you open up http://www.teraflop.org/ you see their page. Clearly, they're using HTTP/1.1 - too cheap to use an IP address per virtual host.
TERAFLOP.ORG was registered 5 days ago. I'd bet they haven't even paid for the domain yet.
As noted, their POC is a hotmail address.
Uh, guys???? Do you think this could possibly be a um, SCAM???
Sheeeit. Their total outlay is $0. If even one person is stupid enough to send them a computer, that's a pretty hefty profit margin. Why is this on slashdot?
Having worked on a system with 12,280 cpu's, I can
say right now with confidence that hacking together 10000+ odd intel systems simply won't work.
First, I worked on the QCD Teraflop system http://www.ccd.bnl.gov/RIKEN_BNL/riken.html
It consumes a substantial amount of power, generates a lot of heat and has a lot of components. Component reliability is a major issue. We don't have 10000 disk drives, 10000 network cards, 10000 power supplies, 10000 everything.
Keeping all the pieces up and running requires careful engineering, checkpointing results at intermediate steps (the checkpoints can be BIG), etc, etc.
It won't be done with el-cheapo PC hardware.
slashdot.com All the news that isn't.
Let's have a show of hands: how many of you have actually seen a Beowulf cluster? Where I work (Caltech's CACR), we have a 114-node system, and it's pretty damn big. You can't just have all of the nodes packed densly: you need to be able to access the backs for networking, power, etc. It takes up a pretty sizeable amount of floor, and reaches up to the (rather high) ceiling.
Here's a couple of pictures. The one up top is just one side of it.
These guys want to make one that's over 100 times as big! Can you imagine the network cable nightmare? Not to mention the power requirements. Makes you feel sorry for the technician that has to set it up.
The other big problem with a large cluster is network latency. You can reduce the effects of this by passing larger packets of info, but there's still a limit that you reach. Just because you make something 100 times bigger doesn't mean it'll be 100 times better.
I also think that the software configuration would play a major role in the efficiency. I'd rather trust trained scientist (not me; I'm just a student), who's been working with large-scale parallel machines for years to set this up, not some tech guys who thought it'd be a neat idea. But maybe I'm just pessimistic.
Still perfectly happy with my 1-node PII... -ElJefe