Slashdot Mirror


Choosing the Right Cluster System

ckotso asks: "So I've read here and there about linux clusters, and I am ready to set on creating one with some help of the educational institute I am working for. So far I've found out about Beowulf, SCI and MOSIX. I really wish I can get some help on this, since NT is making its way into the University gradually and I hate to see this. I want to give a cheap and robust alternative to this place, I simply have to change their minds! " Interested? There's more information inside.

"My questions are:

  1. Have I missed any other serious competitor in the cluster field?
  2. What are the pros and cons of these systems?
  3. Has anyone tried them all and written any report as to how they compete?
Thanks!"

8 of 106 comments (clear)

  1. What are you gonna use it for? by Nicolas+MONNET · · Score: 3
    Your question is a bit like: "So I keep reading about those 'automotive' thingies, and I wonder which one to buy: a 747, a Suzuki motorbine or a cruiseliner?"

    --

  2. A few ideas... by Noryungi · · Score: 5
    OK, here is my take on your question. Watch out, though, as I am not a Beowulf expert.

    Here are some information you may consider before starting your own cluster:
    • Beowulf clusters have to be useful for the kind of scientific projects your university undertakes. Large science (physics, astronomy) projects, usually coded in Fortran and involving lots of calculations that can be computed in parallel, are ideal applications for them. Other applications may be a lot less interesting. A Beowulf cluster, depsite its power, is not always the perfect solution.
    • If your University is short on cash, you may want to investigate the "Stone Soup" cluster -- recycled old Pentiums and 486s can find a second lease on life in a Beowulf cluster. Pros: cheap. Cons: require a lot of labor and patience and is less powerful than Beowulf cluster using up-to-date CPUs and network connections.
    • To be truly effective, Beowulf clusters require at least a couple of very powerful servers and very advanced network hardware -- be sure to compute this into the total cost.
    • Beowulf clusters are not for the faint of heart. They require quite a lot of skills, as far as the network configuration, machine configuration and traffic optimization are concerned. It's not surprising the first Beowulf were born at NASA -- It did require rocket scientists to make them work! =) Once they are up and running, though, their performances are close or better than dedicated supercomputers -- for a small fraction of the price.
    • Another good side of Beowulf is the fail-safe possibilities and evolution capacities of such a machine. If a "node" goes down, the machine does not crash, and the node share of the task(s) can be assigned by the main server to another machine. If you need a more powerful machine, simply add a dozen new PCs to your mix and watch those MIPS/Gigaflops go up!
    • Finally, never forget the one argument that wins them all: price, price, price, price! Linux is free, Intel PCs are dirt cheap, all you need is a lot of space and a dedicated team to make it work. Oh, and lots of network cards & cables... =)

    So, some positive factors, some negative ones. If you want to convince your University, always remind them that they can always count on the support of other universities and research centres the world over that are using this technology right now.

    Good luck!
    --
    The right to offend is far more important than the right not to be offended. (Rowan Atkinson)
    1. Re:A few ideas... by Chalst · · Score: 4

      Good post, just one main thing to add. In a cluster system hwat you
      can do is very much constrained by the way you glue the individual
      nodes together. The 100Mbits per second throughput of a fast ethernet
      connection may sound as if it gives you all the connectivity you need
      but if a machine sends each 100bit opacket to a different machine, it
      will slow down to a snails pace as it is not very fast at these kind
      of switching tasks.

      Good routing software can make up for this, as can careful forethought
      about the network geometry. An ATM network is the best of all worlds,
      but very expensive ... actually what happened about all those claims
      that ATM routers would become as cheap as water? A last point: look
      at the Parallel processing HOWTO.

  3. Depends on your needs. by SEWilco · · Score: 5
    That's quite an assortment. What you want depends on your needs and on the characteristics of the choices. As for NT, the availability of source for many of these things will be nice for research activities.
    • Beowulf is one of a family of parallel programming API tools. Programs must use the API to accomplish parallel programming.
    • SCI is fast hardware with support for distributed shared memory, messaging, and data transfers. Again, if you don't use the API then no gain.
    • DIPC is distributed System V IPC. Programs which use the IPC API can be converted to DIPC easily, such as just by adding the DIPC flag to the IPC call.
    • MOSIX is the most general-purpose. Processes are scattered across a cluster automatically without having to modify the programs. No API needed other than usual Unix-level process use. Allows parallel execution of any program, although full use requires a parallel program design.
  4. How about an Apple cluster by Mononoke · · Score: 3
    The Appleseed Mac cluster at UCLA

    Click here to go directly to the project abstract (more details, less graphics.)


    --

    --
    NetInfo connection failed for server 127.0.0.1/local
  5. You need to define what you need from your cluster by Eivind+Eklund · · Score: 4
    First: You need to define what you want out of your cluster - what kind of applications it is going to run, what sort of environment you want for them, how large a cluster you want to build, whether you want to do 'free cycle stealing', and whether you want high availability. A 'cluster' is much to vague a term for it to be possible to give much advice based on just that, or even further references.

    Second: SCI is orthogonal to the other two technologies - it is a special hardware network technology (Scalable Coherent Interface), originally made to support distributed shared memory. You may be thinking of the software Dolphin Interconnect Solutions provide with their SCI solutions, but as far as I know, that doesn't directly enter into the same space, either. Their web pages does certainly not indicate that it does, and my discussions with (one of?) their Linux developer(s) implied that it contained somewhat more (lock managers etc), but not in the same space. A technology that compete with SCI, though proprietary, is Myrinet. This has a longer history than SCI, and has been less plagued with problems than SCI (though SCI is supposedly quite stable now).

    Third: There are a bunch of other technologies (some cross-platform, some single-platform) that compete in making it easy to build clusters. MOSIX and Beowulf are just two of them. If you give more details of what you want to achieve, I'll dig out references from my collection (made to support the development of FreeBSD-specific clustering improvements, so some types of references may be lacking, but I'll probably be able to come with at least some points to start for any wanted cluster workload.)

    Eivind.

    --
    Doubting the existence of evolution is like doubting the existence of China: It just shows that you're uninformed.
  6. In Search of Clusters by Zootlewurdle · · Score: 3

    I would whole heartedly recommend that anybody interested in clustering should read Greg Pfisters "In Search of Clusters" published by Prentice Hall - ISBN 0-13-899709-8. It is the seminal work in this area.

    Other good resources include:
    - IEEE Task Force on Cluster Computing
    http://www.dgs.monash.edu.au/~rajkumar/tfcc/index. html
    - Linux-HA http://linux-ha.org/
    - some general links
    http://www.tu-chemnitz.de/informatik/RA/cchp/index .html

    There are more clustering products out than you can shake a stick at, and everybody seems have a different take what they mean by a cluster.

    Does anyone have any information on what the Linux Cluster Cabal are up to ?

    Probably the best thought out cluster solutions are OpenVMS Clusters and UnixWare NonStop Clusters.

    Zootlewurdle

    --
    Share and Enjoy
  7. Shared Memory vs. Message-passing by jclip · · Score: 3
    (background: I've done a lot of C coding using MPI on large Beowulf clusters at Caltech. I've implemented the same codes on Sun and SGI shared memory machines as well as DEC and RS/6000 clusters.)

    If you're looking to cluster for high performance, you need to decide which HPC paradigm you're going to go with and choose your clustering based on that.

    Distributed shared memory (DSM) is great on the programmer side. You've got a big steaming chunk of memory shared among processors, and a bunch of parallel threads/processes (depending on OS) acting on that chunk. DSM makes a lot of sense for database servers, and is the prevalent HPC solution among the big server companies (Sun, SGI, etc.) since multithreaded code runs dandy without any modifications. MOSIX implements DSM.

    The downside: vast memory bandwidth required for sharing and high overhead. In an educational environment (and IMHO), DSM is a Very Bad Thing, since programming DSM teaches you nothing about actually using parallelism--it's just like working in any other multithreaded environment.

    Students are better served by learning on a message-passing system, which is what Beowulf clusters are. You have a bunch of computers and a way to make them talk to one another (PVM or MPI)--"now implement some algorithms!" MP machines are [given equal-quality implementations, a big given] generally faster and more scalable than DSM machines, as well as being more "pure". Optimizing DSM programs is much easier if you have MP experience.

    Downside: MP is a pain to program and even more of a pain to debug. But students could use more suffering, right? Language support is a little iffier for MP, too, with Fortran and C being prevalent.