Project Appleseed Updated
J. FoxGlov writes "UCLA's Project Appleseed has been updated with new benchmarks showing their clusters of Macintosh G3s and G4s running neck and neck with Crays and kicking the snot out of Pentium II clusters, generating fractal clusters in parallel. Includes the recipe for making your own Apple-flavored Beowulf cluster. "
Bad pun there, but couldn't resist.
/each/ capable of 4.88GFLOPS).
The crays they compare to are pretty old beasts, and they only tested with a few processors (Cray's SV1 for example can take advantage of over 1200 cpus!).
Drop by http://www.sgi.com/sv1/tech_info.html
(or http://www.cray.com/) to see info on the SV1 if you're interested.
Now, don't get me wrong; this is a very nice cluster, but them seem to unfairly compare it to a cray (the t3e-900 is not even a recent machine!). I'm sure someone else will explain where computers such as crays and sgis come into real use (high-throughoutput work), but for distributed systems requiring less than gigatnic amounts of communication bandwidth, beowulfs do handle many kinds of tasks very well (and cheaply!).
Just didn't want eveyone to think a 16-node g4/g3 cluster was faster than a cray (actually, the sv1 can use cpus
There's a somewhat humorous portion to the instructions, although I'm not sure it was intended. Check them out:
.ucla.edu/appleseed/appleseedrecipe.html
:>
http://exodus.physics
Setting this up is as easy as 1, 2, 3 apparently (despite, well, paying for everything). After a 3 step process, they put a little note at the bottom:
"Note: To build a Beowulf, a Linux-based cluster, we think the following 230-page book is an excellent introduction: T. L. Sterling, J. Salmon, D. J. Becker, and D. F. Savarese, How to Build a Beowulf, [MIT Press, Cambridge, MA, USA, 1999]."
A 230 page introduction?
- Jeff A. Campbell
- VelociNews (http://www.velocinews.com)
- Jeff
A beowulf cluster can be assembled with *multiple* network cards to decrease the network distance between each processor. Basically instead of the machines sharing a single network, there are several separate networks to split the traffic. The reason for this is that as traffic on Ethernet rises, it reaches a point where it hits a wall and throughput can really decline fast.
Appleseed is set up using the internal ethernet card (though I would guess you could use a different interface like a fiber optic connection) connected in the usual fashion to a regular switch. The article didn't mention any option to install more network cards and use those.
Now, for most things a shared 100M network will be suficient. Depending on your applications I would guess that a beowulf would be more configurable. If I were to make a 1024 node cluster, it would be a beowulf with the nodes arranged into a hypercube. Putting 1024 Macs onto a single beowulf might cause performance problems depending on what you're doing. Usually programs that don't require a lot of communications between nodes run best on beowulf type clusters, so the problem of having only one network card in a machine might be no big deal after all.
If tits were wings it'd be flying around.
What I'd like to see is a Beowulf cluster of iMacs -- one of each color, arranged in a little circle. It may not be as fast as a Cray, but now you've got the world's cutest Beowulf cluster!
Okay. As cool as this whole shebang sounds (and it does sound pretty damn cool), aren't we usually the ones who starts yelling "benchmarks are meaningless" whenever the guys in the Microsoft trenches pull off another one from their files? I say, stick with that position. Better a false negative than a false positive. So I'm sorry, but I don't care about these benchmarks anymore than I care about any of the Mindcraft series, and that's that.
(Not that I wouldn't like a nice cluster of Macs, mind you. Ummm. Tasty.)
To the editors: your English is as bad as your Perl. Please go back to grade school.
Note: To build a Beowulf, a Linux-based cluster, we think...
;)
The funny part is that the slashdot story-posting perl scripts didn't post this story twice for mentioning both linux and beowulfs.
C'mon, CmdtTaco! Release the source to the story-posting perl scripts, already!
"If one is really a superior person, the fact is likely to leak out without too much assistance" -- John Andrew Holmes
Guess Crays aren't quite what they used to be. Maybe they should make them in grape colors :) Sharkey
http://www.badassmofo.com
I actually had an honor of working with one of these clusters at a famous university Plasma Physics Lab. Several points here. Do not forget that the benchmarks advertized are for UCLA's particular gyrokynetic code done in F77. The gyrokinetic code usually doesn't require a lot of communication anyway, so for small clusters slow networking is not much of a problem. That is why Beowulf clusters are so suitable for problems where you don't have much inter-node traffic. Crays use very high bandwidth interconnects which are expensive and not needed for particle code like this. The difference is in code implementation and the CPU. Also Crays use Alpha as their processor, and Alphas are very good at FP intensive code, but they need a lot of code tweaking to squeeze all of the performance Alpha can give. Once the code runs great on an Alpha, put it on a different CPU and you have it crawling (and vice/versa). The lab that I worked for had Fortran 90 gyrokinetic code which was basically accomplishing the same thing as UCLA's, BUT it ran 3 times faster on a 400 MHz Alpha, than on G3 350 MHz using AppleSeed. Network-wise scaling it on G3s was not a big problem (small cluster, not much IP traffic), should note MacOS would be unresponsive during the benchmarks completely; while the code was running (all of CPU was devoted to it, and MacOS doesn't do preemtive multitasking). Surprizingly UCLA's code runs very nicely on G3, just as fast, or better than it does on an Alpha, that's why Macs are so suitable for them (besides plug-n-play Beowulf factor). So I think comparing their results to an archaic Cray is a nice way of attracting attention, but when it comes to details it's just another Beowulf. Put a different code that does well on Cray on it, and it's 2/3s slower per processor. Though I must say, that availability of this kind of software is great, since research facilities have tons of Macs and CPU cycles. This software also eliminates a need for *nix sysadmining, but it is costly. Fortran compilers are expensive, and so are Macs. If I would be building a cluster and didn't have Macs off hand, I would use cheap PC labor and Linux (though would still have to pay around a grand for F90 compiler per license :(). What *nix offers is compatibility, flexibility, and preemtive multitasking, and it allows to run several parallel jobs at a time. Traditionally Beowulf software was written for *nix, and so many MPI implementations and other essential software (like FFFTW and other math/scientific libraries) are available primarily for *nix, and would have to be ported to MacOS or any other OS.
Don't believe all this hype and go sell your Crays just yet. What many people fail to realize is that the total number of achievable MFLOPS of all the nodes in a parallel machine IS NOT a very meaningful measure of how powerful or useful the machine is. This ignores the nature of the interconnect between the processors, memory, etc., which is *extremely* important in most parallel computations, and is what makes supercomputers so damned expensive. This stuff is not Ethernet! For many types of parallel applications, Ethernet becomes such a bottleneck that no advantages can be realized from parallelizing an application.
The generation of fractal clusters is a classic example of what are known as "embarassingly parallel" problems in parallel computing circles. As you iterate points in the set, their evolution is independent, so a minimum of message passing is required. (In computer science-ese, "the computational graph is disconnected"). With even the crummiest of interconnects, you can get good results out of parallelizing these fractal cluster generators because the only thing that will really make a difference is the total number of FLOPS acheivable by each of the nodes. Fractal set generation is just not a very meaningful benchmark.
But consider, say, a finite-element model where every point in your grid is affected by its neighbors. Then you need to do lots of message passing, and the nature of the interconnect becomes orders of magnitude more important. In this case, I guarantee you that a commercial supercomputer is going to beat the pants off of any cluster machine. This is not to say that cluster machines aren't useful, but a real "supercomputer" still has its place.
I attended ucla as a physics major while this project was still under way, and took a more basic, introduction to computer modelling of plasma systems. The professor doing a lot of the work in this field is John Dawson. Along with him, and IIRC, more in charge of the computer systems, is Victor Decyk.
Decyck taught half of the class, although he was technically a TA. He explained the progression away from high $$ "super computers", such as Crays, and the usefulness of clusters.
I also had the honor of working at JPL, where Decyk was a part-time scientist in the computing/analysis department for the Experimental Measurment Devices group.
If you look up something like "computer plasma modelling" on the 'net, you'll very likely find papers by these two...very interesting high-powered stuff - the mind boggles at just how much the computer is crunching when you realize that a large number of the plasma particles are interrelated spatially.
Q: What do you think about American Culture?
A: I think it's a good idea.
(adapted from Gandhi)