Macintosh Clustering
HiredMan writes: "Wired is running an article comparing the set-up and admin of Linux Beowulf clusters versus Mac based clusters. Slant of the article is that the Macs are easier to set-up, maintain and are more flexible. They note that the Linux "how to" manual is 230 pages while the corresponding Apple document is a 1 page PDF file. Dauger Research of former Appleseed fame is mentioned as well, of course. MacSlash is also covering the article. Let the on-topic (for once) Beowulf comments fly..."
Having used the old Nextstep API (which I believe have been ported to OS X under the guise of CoCo) I can say that they are well suited for cluster computing.
I remember Richard Crandall and the mathematica guy (Wolfram) using Zilla (an old Next distributed computing program) to crack the world's largest prime in the mid nineties...
Anyone know if Zilla is back on OS X?
Also the Gigabit ethernet on motherboad and the large 2MB cache on the PowerPC chips will go a long way on making these machines a good cluster.
It's been a while since I've done distributed computing (hey, I am out of acedemia) but OS X will hopefully make the whole shebang easier...
"The fact that a manual is shorter doesn't mean that it is a better or easier to install program."
While this is true, it's not even to the point. They didn't compare manuals. They took a book written on building a Linux cluster, and compared it to what is basically a step by step outline for for plugging together a G4 cluster. There are similar outlines out there for Linux clusters, too:
The SCL Cluster Cookbook by the folks at Ameslab is a bit longer than 1 page, but still shorter than 230. (http://www.scl.ameslab.gov/Projects/ClusterCookbHow to Build a Beowulf Cluster -- this is 10 pages long, but goes into such detail as processor, network, RAM, and disk speeds separately for both master and slave nodes. (http://www.mcsr.olemiss.edu/bookshelf/articles/h
But the point is, this article was written by pro-Mac people, so obviously they're going to take a pro-Mac stance. I mean, if these G4 clusters get to be useful, someone is going to write a 230 page book on how to build one of them. Right now, all the documentation that may be out there could be contained in this one page outline. The books come later, if the technology becomes accepted.
----... he's fairly uninformed on clustering. He claims that you have to have the exact same kernel version on a linux beowulf cluster or it grinds to a halt... ... this is, of course, bullshit. Our 96 node cluster here uses different kernels.
And that's just a single example of his lack of experience with clustering...
-- Initial build costs are much lower (dual Athlon 2000+ right now without graphics hardware is way cheaper than a dual G4 1GHz).
True.
-- Maintenance costs are much, much lower. Anything goes wrong with a PC node, just swap out that part with another commodity part. Mac repair or parts replacement costs will eat you, especially if you start to have many, many nodes.
Wrong. Commodity parts such as memory and hard drives are exactly the same on the Mac. I have bought memory and hard drives at Sam's club, and they work just fine in my Mac.
Plus you can modify bits of Linux if you need to optimize the behavior of your cluster for the sort of computing you do, which you can't do with Mac OS.
Wrong again. At the level of the OS where you might need to have some custom tweaks (the kernel) you can customize OS X to your hearts content. See Darwin.
Now this article may have been talking about OS 9 clusters, but there is nothing preventing anyone from using OS X.
So ever wonder why you get so little useful information with your computer? Blame Apple.
Nice flamebait, Reality Master, but I think cost savings had *way* more to do with PC manufacturers ditching the paper manuals than Apple ever did.
As far as that commercial goes, it was true for the average user (the commercial's target) -- the Mac didn't need a ton of manuals to do the equivalent tasks that you needed those manuals for on PCs.
It doesn't mean there shouldn't be better documentation nowadays, of course (apart from keeping David Pogue et al. in business). But let's try to keep our pre-conditioned biases as tenuously connected to reality as possible.
I watched C-beams glitter in the dark near the Tannhauser gate.
Driver problems? Kernel problems with hardware? What are you running? Windows 95 Unix version?
Run a stable kernel; there are no "driver" problems, there are no kernel problems. You don't need to run 2.5.3 for clustering. You don't even need to run 2.4
And if you want you can buy PC's with a warrenty. Having said that, I build my own computer 4 years ago from different parts from different stores and the only thing thats failed in that time is my mouse.
You think the P4 price/performance is bad, G4's are insane
USC Macintosh Cluster Running the AltiVec Fractal Benchmark achieves over 1/5 TeraFlop on 152 G4's and demonstrates excellent scalability.
KLAT2's complete results are: Rmax=64.459 GFLOPS with 64 Athlon 700MHz with 128MB PC100 CAS2 SDRAM
So a 1 tflop apple machine would cost about $440,000 in hardware for 152 G4 1000mhz -vs- 270 Tbird 1400mhz at about $160,000.
The difference, $280,000 could certainly hire someone literate enough to read the long linux manual.
If voting were effective, it would be illegal by now.
It would be a really good idea to make clustering easier, but there is a trade-off between easiness and performance. Making the creation of clusters easy ("a few G4 Macs, some Ethernet cables, a hub and the Pooch software.") by only talking about the easy-to-use software and not optimized network topology (correct me if i'm wrong but the Beowulf handbook probably covers a lot of that) will definitely keep performance quite low.
BTW. on the wired site it says:
while almost the first sentence in the 1-page-pdf says:... for scientists like myself, this is a very nice thing. Not all of us in the sciences are tech-savvy... I'm probably the one in my 5-person research group who understands the most about *nix. For those of you who don't realize this, many research scientists have to work hard to get their grants and outside money.
So, what does all this mean to us? As an atmospheric scientist, having some serious number crunching power is mighty helpful. Weather modeling is quite the processor intensive task, and then interpreting the results can take years after all the computing is done, including further computations and visualization routines. To put it shortly, we can easily tax our computers.
So, now you know that we need computing power, but money is a premium for us in many cases, so why shouldn't we just get some cheap Intel boxes and *nix cluster them? Well, we could, but then we'd need to hire a systems admin. Someone who is tech-savvy enough to keep everything running decently well for us. That requires another person who REALLY understands what's going on in many cases, which is another salary on the payroll. For us, it all ends up balancing in the end. The $5-10K that we save in clustering our 8 Intel boxes over the Macs is eaten up in one year or less by the guy (or woman) who has to set up the whole thing. So, for us, the ease of setup and use is something that can translate into some good savings and we don't have to worry as much about having to rely on another person to save us if something goes wrong. That's the benefit of simplicity for us.
I agree that it is important to know, as one person said, "The nature of the beast", but that's something that takes time to do, and when you're not being paid to learn about how to cluster computers, but to figure out how the atmosphere works, then things like "The nature of the beast" are just further complications. I would rather have something that I can slap together, know that it works, and get back to my work, without the interference of others if I don't need it.
And that brings me to another rebuttal, about someone mentioning that if you buy the Macs, you're also going to pay for all the extra Superdrives and video cards and all that. I say to that, "Good." That way, if the cluster doesn't need to be used, then I don't have a bunch of mostly useless boxes sitting around... or if a collaborator comes around and needs a computer, I can just remove one of the computers from the cluster and let them use that for as long as they need. The point is that there are advantages and disadvantages to each setup. Now you've heard some advantages and why the scientific community might care about this. Remember, not everyone here can compile their own kernels and not everyone cares about being able to do that. Some of us, thank the deity of your choice, actually want to do something with this power and not care how it works in depth. To each their own.
-Jellisky
Obviously, you know very little about the Macintosh. You should learn a bit more before you go spouting off flames.
The software used to accomplish the clustering for AppleSeeds is Mac MPI, which is based upon the *standard* for parallel computing, MPI. The reason that the PDF doesn't talk about programming MPI is that there is no need for redundant documentation. Go find a book on MPI if you want to learn to prgram to that API.
And yes, I will get quite far telling you it's easier to upgrade Mac OS X to its latest version/. Thanks to Apple's Software Upgrade control panel program, this can all take place automatically according to any schedule you desire. Two clicks of a mouse is all it takes to set this up, as opposed to spending quite a lot of time figuring out how to use the incredubly arcane "apt". In fact, AFAIR, Software Update is now set to operate automatically by default.
Gee, I didn't realize that particle physics simulations involving millions of particles wasn't a *real* application...
The fact that your comment has been moderated up to four (so far) is simlply an empiric demonstration of the lack of knowledge of most Slashdot readers.
>"It took NASA's Jet Propulsion Laboratory two weeks
> to put together a 16-node Linux cluster." he
>added. "I could do the same thing in less than an
>hour."
Then JPL was either building the systems from whitebox components, or is completely incompetent. I built a 20 node cluster in about 1.5 days, including the OS install on all of the nodes.
>Dauger added that Linux clusters are extremely
>fragile: If all the machines in the cluster
>aren't running the same version of the kernel,
>everything grinds to a halt. By contrast, a
>Macintosh cluster can be made from a mix of G3
>and G4 Macs running Mac OS 9 or X.
Excuse me???
My cluster is currently running 2 different linux kernels (2.4.18, 2.4.9), two different processing architectures (alpha and x86) and I occasionally throw an SGI O2K into the mix. Sure, the x86, alpha, and SGI binaries need to be compiled seperately, but it hardly "grinds to a halt"
>Dauger said Mac clusters have better bandwidth
>than similarly configured Linux clusters. They
>can transfer bigger chunks of data between nodes
>but their latency is less (The individual bytes
>of data are transferred less rapidly).
Huh??
And now let's look at the cost.
I can build dual athlon nodes for about $500/cpu
Let's assume his claim of 70% faster is true (I doubt that numberbut anyway). Can he build G4 nodes for $700/cpu?
Funny that.
I know of at least one company that uses nothing but Apple hardware to do heavy-duty data mining, on an Apple cluster. Even more unique is that the company had to write a custom 64-bit filesystem to deal with the massive amounts of data to cross reference. Oddly enough, the developers did this with assistance from Apple.
The company's website:
http://www.riskwise.com
If it can be optimized for AltiVec, almost nothing will be faster than a G4.
Just take a look at these RC5 stats (mid-way down the page). G4s smoke everything, because the RC5 client is optimized for AltiVec, thus it can compute four keys in a single clock cycle. By comparison, Athlons do one key per clock cycle, and Pentium 4s do one key every four clock cycles.
So if you've got an operation that can benefit from the G4's SIMD capabilities, Macs are your best bet.
Free Hans!
Accessibility isn't better with Macs. On any platform, clustering software is "just another app" (unless you pay for it, and I don't see how that makes it any more accessible), that's not unique to Macs.
/etc/hosts.equiv, and install your app, you have a Beowulf cluster.
If you have a dozen RedHat boxes, network them, install PVM, put all the hosts in
If you have a dozen YellowDog boxes, it's the same procedure.
If you have a dozen OSX boxes, you network them, install the clustering app, and install your app. The only difference is you don't have to make a change to hosts.equiv. Big whoop.
The fact that a manual is shorter doesn't mean that it is a better or easier to install program.
I would agree that comparing manual lenght is not a reliable guide to judge the relative complexity of two programs. The one-page doc is even a "quick start guide" not a complete manual. But I still suspect that the writer is correct that Appleseed clusters are easier to set up and maintain than a Beowulf cluster. Reading over the directions myself it did looked pretty brain-dead simple - most of that one page didn't even have much to do with the actual installation of the program but with such complicated tasks as connecting your Mac to an ethernet hub: "For each Mac, plug one end of a cable to the Ethernet jack on the Mac and the other end to a port on the (ethernet) switch." and noting a few system requirments (CarbonLib 1.2 or OS X 10.1) The installation instructions consists of "Double-click the Pooch Installer and select a drive for installation." Instructions on how to use consist of dragging and dropping the program you want to run in parrallel onto the Pooch app and "click Select Nodes..., select the computers you want to run it on, and, in the Job Window, click on Launch Job."
Besides, if you are going to have a cluster, you want cheap, off the shelf machines such as PCs with plenty of spare parts that can be customised to suit your needs : why pay for a good 3d graphics card in every pc if you are going to do number crunching !
This is only the case if the individual PC's are dedicated nodes and not being used for anything else. Most Appleseed clusters are made up of computers that are primarily being used for something else. School Mac computer lab by day; clustered "supercomputer" by night. The cluster of that did 233 gigflops (76 dual G4's mostly 533's with a few 450's) was simply all of the Macs at UMC working as a cluster over Christmas break. This is where the easy set up, maintenance and the ability to cobble together computers with different processors and even different OS's (some nodes may be running MacOS 9 and some nodes may be running OS X) is an advantage. The Appleseed clusters that are made up of dedicated machines are probably discarded computers they already had kicking around so cost is not an issue there either.
Much of OS X is closed source but Darwin, it's unix based core, is not. If doing a darwin port doesn't float your boat there is always OpenBSD or NetBSD or even Linux ports that will run on your IIsi cluster.
Just have all of your OS X clients boot off of a disk image on a Mac OS X Server machine.
http://www.apple.com/education/k12/networking/diff er/index.html#macmanager
This reminds me of an old Mac story.
t ml
:^)
The situation was that Guy Kawasaki (an Apple "evangelist" at the time) challenged some PC folks to a "bake off," to determine which system made some tasks easier.
When the day came, Kawasaki sent out a 10-year-old to go head-to-head with the PC geek.
The full details of the story are at http://www.halcyon.com/kegill/mac/win95/faceoff.h
Maybe we should have a new challenge where a Linux geek and a 10-year-old compete to see who can set up a compute cluster the fastest.
Computers are useless. They can only give you answers. -- Pablo Picasso
They note that the Linux "how to" manual is 230 pages while the corresponding Apple document is a 1 page PDF file.
Yes. Wonderful. This says nothing. This is one of "those" statistics. The Linux "how to" could be 230 pages because it not only tells you how to set it up, but gives you advice on customizing, creating optimized programs, hacking the kernel, and FAQs covering every single problem or question you might have.
The Mac PDF might be an almost blank page that says, "Call tech. support." Furthermore, why mention that it's a PDF at all? Are you saying that it's somehow better to use a proprietary document format (e.g. Proprietary Document Format - PDF, get it?) instead of plain text? Is the information somehow MORE relevant because it's in PDF?
Please. I've seen neither, but all this tells me is that someone wouldn't know a relevant comparison if it widdled on his shoes and stole his wallet.
Jake
Dating: while( 1 ){ call_girl(); get_rejected(); drink_40(); } return 0;
Unibrain.com Offers firewire networking for Macs and PC's, and it integrates into an ethernet network. And with fire wire 2.0 offering massive speeds over fibre, i could only imagine.
Getting this beast for OS X was an easy enough choice. The "manual" just pointed out some basic features, like where certain things are and connecting to the internet though the GUI.
The thing was no more than 20 or so pages of mostly pictures. A+ for simplicity, appearance and brevity. However, you gotta dig and dig on the net or buy a 3rd party manual to figure out how the other stuff works -- like NetInfo. Lucky for me, I got my UNIX experience on a NeXT box - another miracle in an easy-to-use UNIX OS and mine came with nearly a yard of manuals and developer's documentation. It was a relatively painless process to migrate for me.
I tried every decent and legal way I could think of to resolve the issue w/the business before I rented the chicken suit
Pooch won't run on those, however, because it requires MacOS 9 or later. Those versions of MacOS won't run on 68K Macs. A Beowulf might be doable under one of the 68K Linux distros (only one that comes to mind is Debian)...but I've found Linux to be almost unbearably slow on my Quadra 610. (Linux probably has been nowhere near as optimized as MacOS, which has (or had) large amounts of hand-coded 68K assembly in it.)
20 January 2017: the End of an Error.
It's not your fault, because you probably didn't know this, but the USC Mac cluster didn't cost anything near $440,000, and it didn't have any 1000 MHz. G4's in it.
At the "Macs in Science and Engineering" user conference at Macworld, they gave the general specs. of this cluster, and all of the machines were dual processors, but of different hardware generations. Although the fastest machines were dual 800 Mhz. on 133 MHz. bus, the majority were slower dual 450 and 500 Mhz. machines with 100 Mhz. buses.
With the fact that all were dual, and ignoring depreciation on the older hardware, the cost would be at most $220,000, If you were using Dual 1 GHz. G4's, it would still be only $220,000. My notes are on my laptop, but I believe that the actual cost of the USC cluster was less than $200,000.
Also, I assume that you think that the 270 uni-processor T-birds will scale performance linearly as well. I doubt it would only cost ~$600 per node as you would have to use Myrinet or some other fast fabric, and with three and a half times as many nodes, the latencies, hardware, and administration cost would be crippling. I have the same cost argument if you use dual Athlons, as the boards are quite rare, and the node count is almost double the Mac node count.
Your price/performance assertions don't stand up!
-- Len
Agreed, however if you'd ever actually tried to use the product you'd realise that this is not the case. Let me show you through exactly how simple it is in just 10 simple steps:
- Grab a bunch of Macs, a switch and a monitor.
- Plug Macs into the power.
- Plug a keyboard and the monitor into the first mac and turn it on.
- Configure the network through the easy to use Networking Control panel. Or alternatively don't configure it and throw a DHCP server into the mix somewhere.
- Install and run pooch (drag and drop from the disk image it comes on then double click).
- Repeat for each Mac.
- On the last Mac, pick an application you want to run on the cluster, drag and drop it into pooch.
- Select which Mac's you'd like to help out with running this program.
- Click start.
- There is no step 10.....
Voila! The best bit about this is that I've never even read the pooch manual, yet I've still managed to set up my own Mac Beowolf cluster. I've looked into Linux beowolf clustering a number of times and gotten hopelessly lost and confused despite having respectable Linux knowledge.If you've ever set up a Mac beowolf cluster you'll very quickly realise that there is no comparison in ease of use and anyone who argues otherwise is clearly uninformed.
Like always, don't bash what you haven't tried...
Beowulf was predated by "Zilla.app", which shipped on NeXTStep 2.0. Richard Crandall used Zilla on any workstation that was idle, anywhere on NeXT's network (idle being defined as "the screen saver was running"), to find the 13 Fermat number, among other things.
So, this kind of (relatively) low-cost clustering began on Mac OS X's predecessor.
-jcr
The only title of honor that a tyrant can grant is "Enemy of the State."
These Xeons feature 512K of L2 Cache. Sure there are Xeons with HUGE amounts of L2 cache, but then we are hitting the $10000 price range. These are workstation machines, not server machines.
I can't compare the Apple's to the P4s... P4s don't go dual processor, so the PPC G4 wins here. I can't get a Dual proc P4.
Athlon? None of the vendors I checked have Athlon workstations, so they weren't in consideration.
However, after realizing the lack of Athlons, I remembered that Penguin Computing has a line of Athlon based workstations.
I went to their website, and priced out an Athlon MP system, the Tempest 210MP Workstation.
With 2 Athlon MP 1900+, not really competetiive with the new 1 GHz G4s, but close enough for our comparison (and matching your assertion that they are in the same league as them). With 512MB PC2100 RAM, and upgraded to the Gigabit Ethernet card (they have one, might as well try to be fair), and my workstation price is $2707.
Congratulations, we have a winner. A Athlon MP 1900+ (running at 1.53 GHz if I recall?) with similar specs at the Apple Workstation comes in $300 cheaper. The Apple has some advantages, the better video card and Superdrive are nice features when the machine is recycled as a desktop machine, but for now they are superfluous.
What is the point of my work?
You're all full of shit. Apple's computers are extremely price competitive. They are cheaper than Xeons from the real vendors with similar specs (Xeons had faster RAM, equal L2 cache, no L3 cache, and no gigabit ethernet).
Apple puts out a really competitively priced Unix workstation to Linux workstations from major vendors.
Apple puts out really competitively priced consumer machines (iMac/iBook) compared to Wintel machines from major vendors.
You can choose to use an Apple solution or not, but stop spreading the bullshit about Apple being more expensive.
What most of us hate about Apple is that they make it impossible to unhide them, to get into the guts of the thing and change it as we see fit
/Applications/Utilities/Terminal, and launch it.
On any machine running Mac OS X, go to
-jcr
The only title of honor that a tyrant can grant is "Enemy of the State."
To answer the question of what a Mac would be used for, the answer is quite a lot. Most cluster-based stuff is homegrown applications, which can be written for OSX as easily as most OSes. But beyond that, there is actually a huge call for rendering farms for programs such as After Effects and Maya that film companies use to create films (more importantly, the films I actually want to see, the ones where things fling through space and explode, not the ones where things are passed around a coffee table while people discuss important issues of sexual politics).
I know Linux just had a big win with Dreamworks, but Macs are huge in F/X industry. And if clustering brings new avenues to cheaper special effects, that means more special effects. And that is just good.
As for it being easier then Linux, it probably is. No point in crying about it, let's get a Beuwolf-out-of-the-box solution. I agree that Macs aren't customizable enough to my taste, but this doesn't mean there can't be a default configuration of BW that would work immediately and could be tweaked later.
Everything Apple uses is pretty stock and standard:
PCI, AGP, SDRAM (DDR is coming eventually...), USB, Firewire, ATA...
In fact about the only non-standard things on a Mac are the motherboard, CPU, and that new monitor plug thing they introduced with the Cube, which has adaptors available for it.
When was the last time you looked at a Mac?
BlackGriffen
ok yes but consider physical space issues as well as heat dissapation. Yes i know the macs run cooler but box size is an issue you need a lot of floor space.l on.html
I can buy a 64 proc 1.6 gig athlon cluster for $70,000 from http://www.microway.com/products/clusters/dualath
This comes with Myrinet which is much better than ether lower latencie as well as high bandwidth. This is a preconfigured cluseter in one nice rackmount unit just plugin in and go. I would love to see dual proc rack mount macs as this would give another alternative to either x86 or sun. Sun netras are nice looking but slow and exspensive (yes i spent the summer working with one i know) intel is very very very cheap and configuration costs are minimal config one mache dd disk image to other have dhcp do the networking easy as pie.
for real fun stick mosix on and run mpi on top of mossix. You get a small performance loss but gain massive reliablilty.
You can use cluster nodes as part of a desktop lab but watch what happens when one gets rebooted and MPI gets very confused. For real clusetering you want rack mount small size footprint as well as low cost