Macintosh Clustering
HiredMan writes: "Wired is running an article comparing the set-up and admin of Linux Beowulf clusters versus Mac based clusters. Slant of the article is that the Macs are easier to set-up, maintain and are more flexible. They note that the Linux "how to" manual is 230 pages while the corresponding Apple document is a 1 page PDF file. Dauger Research of former Appleseed fame is mentioned as well, of course. MacSlash is also covering the article. Let the on-topic (for once) Beowulf comments fly..."
-- Initial build costs are much lower (dual Athlon 2000+ right now without graphics hardware is way cheaper than a dual G4 1GHz).
True.
-- Maintenance costs are much, much lower. Anything goes wrong with a PC node, just swap out that part with another commodity part. Mac repair or parts replacement costs will eat you, especially if you start to have many, many nodes.
Wrong. Commodity parts such as memory and hard drives are exactly the same on the Mac. I have bought memory and hard drives at Sam's club, and they work just fine in my Mac.
Plus you can modify bits of Linux if you need to optimize the behavior of your cluster for the sort of computing you do, which you can't do with Mac OS.
Wrong again. At the level of the OS where you might need to have some custom tweaks (the kernel) you can customize OS X to your hearts content. See Darwin.
Now this article may have been talking about OS 9 clusters, but there is nothing preventing anyone from using OS X.
... for scientists like myself, this is a very nice thing. Not all of us in the sciences are tech-savvy... I'm probably the one in my 5-person research group who understands the most about *nix. For those of you who don't realize this, many research scientists have to work hard to get their grants and outside money.
So, what does all this mean to us? As an atmospheric scientist, having some serious number crunching power is mighty helpful. Weather modeling is quite the processor intensive task, and then interpreting the results can take years after all the computing is done, including further computations and visualization routines. To put it shortly, we can easily tax our computers.
So, now you know that we need computing power, but money is a premium for us in many cases, so why shouldn't we just get some cheap Intel boxes and *nix cluster them? Well, we could, but then we'd need to hire a systems admin. Someone who is tech-savvy enough to keep everything running decently well for us. That requires another person who REALLY understands what's going on in many cases, which is another salary on the payroll. For us, it all ends up balancing in the end. The $5-10K that we save in clustering our 8 Intel boxes over the Macs is eaten up in one year or less by the guy (or woman) who has to set up the whole thing. So, for us, the ease of setup and use is something that can translate into some good savings and we don't have to worry as much about having to rely on another person to save us if something goes wrong. That's the benefit of simplicity for us.
I agree that it is important to know, as one person said, "The nature of the beast", but that's something that takes time to do, and when you're not being paid to learn about how to cluster computers, but to figure out how the atmosphere works, then things like "The nature of the beast" are just further complications. I would rather have something that I can slap together, know that it works, and get back to my work, without the interference of others if I don't need it.
And that brings me to another rebuttal, about someone mentioning that if you buy the Macs, you're also going to pay for all the extra Superdrives and video cards and all that. I say to that, "Good." That way, if the cluster doesn't need to be used, then I don't have a bunch of mostly useless boxes sitting around... or if a collaborator comes around and needs a computer, I can just remove one of the computers from the cluster and let them use that for as long as they need. The point is that there are advantages and disadvantages to each setup. Now you've heard some advantages and why the scientific community might care about this. Remember, not everyone here can compile their own kernels and not everyone cares about being able to do that. Some of us, thank the deity of your choice, actually want to do something with this power and not care how it works in depth. To each their own.
-Jellisky
If it can be optimized for AltiVec, almost nothing will be faster than a G4.
Just take a look at these RC5 stats (mid-way down the page). G4s smoke everything, because the RC5 client is optimized for AltiVec, thus it can compute four keys in a single clock cycle. By comparison, Athlons do one key per clock cycle, and Pentium 4s do one key every four clock cycles.
So if you've got an operation that can benefit from the G4's SIMD capabilities, Macs are your best bet.
Free Hans!
The fact that a manual is shorter doesn't mean that it is a better or easier to install program.
I would agree that comparing manual lenght is not a reliable guide to judge the relative complexity of two programs. The one-page doc is even a "quick start guide" not a complete manual. But I still suspect that the writer is correct that Appleseed clusters are easier to set up and maintain than a Beowulf cluster. Reading over the directions myself it did looked pretty brain-dead simple - most of that one page didn't even have much to do with the actual installation of the program but with such complicated tasks as connecting your Mac to an ethernet hub: "For each Mac, plug one end of a cable to the Ethernet jack on the Mac and the other end to a port on the (ethernet) switch." and noting a few system requirments (CarbonLib 1.2 or OS X 10.1) The installation instructions consists of "Double-click the Pooch Installer and select a drive for installation." Instructions on how to use consist of dragging and dropping the program you want to run in parrallel onto the Pooch app and "click Select Nodes..., select the computers you want to run it on, and, in the Job Window, click on Launch Job."
Besides, if you are going to have a cluster, you want cheap, off the shelf machines such as PCs with plenty of spare parts that can be customised to suit your needs : why pay for a good 3d graphics card in every pc if you are going to do number crunching !
This is only the case if the individual PC's are dedicated nodes and not being used for anything else. Most Appleseed clusters are made up of computers that are primarily being used for something else. School Mac computer lab by day; clustered "supercomputer" by night. The cluster of that did 233 gigflops (76 dual G4's mostly 533's with a few 450's) was simply all of the Macs at UMC working as a cluster over Christmas break. This is where the easy set up, maintenance and the ability to cobble together computers with different processors and even different OS's (some nodes may be running MacOS 9 and some nodes may be running OS X) is an advantage. The Appleseed clusters that are made up of dedicated machines are probably discarded computers they already had kicking around so cost is not an issue there either.
It's not your fault, because you probably didn't know this, but the USC Mac cluster didn't cost anything near $440,000, and it didn't have any 1000 MHz. G4's in it.
At the "Macs in Science and Engineering" user conference at Macworld, they gave the general specs. of this cluster, and all of the machines were dual processors, but of different hardware generations. Although the fastest machines were dual 800 Mhz. on 133 MHz. bus, the majority were slower dual 450 and 500 Mhz. machines with 100 Mhz. buses.
With the fact that all were dual, and ignoring depreciation on the older hardware, the cost would be at most $220,000, If you were using Dual 1 GHz. G4's, it would still be only $220,000. My notes are on my laptop, but I believe that the actual cost of the USC cluster was less than $200,000.
Also, I assume that you think that the 270 uni-processor T-birds will scale performance linearly as well. I doubt it would only cost ~$600 per node as you would have to use Myrinet or some other fast fabric, and with three and a half times as many nodes, the latencies, hardware, and administration cost would be crippling. I have the same cost argument if you use dual Athlons, as the boards are quite rare, and the node count is almost double the Mac node count.
Your price/performance assertions don't stand up!
-- Len
Agreed, however if you'd ever actually tried to use the product you'd realise that this is not the case. Let me show you through exactly how simple it is in just 10 simple steps:
- Grab a bunch of Macs, a switch and a monitor.
- Plug Macs into the power.
- Plug a keyboard and the monitor into the first mac and turn it on.
- Configure the network through the easy to use Networking Control panel. Or alternatively don't configure it and throw a DHCP server into the mix somewhere.
- Install and run pooch (drag and drop from the disk image it comes on then double click).
- Repeat for each Mac.
- On the last Mac, pick an application you want to run on the cluster, drag and drop it into pooch.
- Select which Mac's you'd like to help out with running this program.
- Click start.
- There is no step 10.....
Voila! The best bit about this is that I've never even read the pooch manual, yet I've still managed to set up my own Mac Beowolf cluster. I've looked into Linux beowolf clustering a number of times and gotten hopelessly lost and confused despite having respectable Linux knowledge.If you've ever set up a Mac beowolf cluster you'll very quickly realise that there is no comparison in ease of use and anyone who argues otherwise is clearly uninformed.
Like always, don't bash what you haven't tried...