Choosing the Right Cluster System
ckotso asks: "So I've read here and there about linux clusters, and I am ready to set on creating one with some help of the educational institute I am working for. So far I've found out about Beowulf, SCI and MOSIX. I really wish I can get some help on this, since NT is making its way into the University gradually and I hate to see this. I want to give a cheap and robust alternative to this place, I simply have to change their minds! " Interested? There's more information inside.
"My questions are:
- Have I missed any other serious competitor in the cluster field?
- What are the pros and cons of these systems?
- Has anyone tried them all and written any report as to how they compete?
--
I think we need to start out with this very important question. What do you want to use it for? Depending on how you want to use it will determine the style of cluster you need. Each style has it's plusses and minuses. Some are better for batch processing, other are better at handling large amounts of interactive work. What do you need?
Here are some information you may consider before starting your own cluster:
So, some positive factors, some negative ones. If you want to convince your University, always remind them that they can always count on the support of other universities and research centres the world over that are using this technology right now.
Good luck!
The right to offend is far more important than the right not to be offended. (Rowan Atkinson)
If you are looking for the kind of clustering that Windoze NT does, then you want something like TurboCluster Server from TurboLinux, it is clustering for high availability and high throughput for web servers. TurboLinux
If you need more general load balancing clustering for enterprise applications, look at Linas Vepstas's Linux Enterprise Computing pages at http://linas.org/linux/, he has a section on clustering on that page.
If you need supercomputer numbercrunching or render-farm type clustering, then the Beowulf approach is what you want. Linas' pages also have a section on Beowulf type clustering.
Click here to go directly to the project abstract (more details, less graphics.)
--
NetInfo connection failed for server 127.0.0.1/local
Second: SCI is orthogonal to the other two technologies - it is a special hardware network technology (Scalable Coherent Interface), originally made to support distributed shared memory. You may be thinking of the software Dolphin Interconnect Solutions provide with their SCI solutions, but as far as I know, that doesn't directly enter into the same space, either. Their web pages does certainly not indicate that it does, and my discussions with (one of?) their Linux developer(s) implied that it contained somewhat more (lock managers etc), but not in the same space. A technology that compete with SCI, though proprietary, is Myrinet. This has a longer history than SCI, and has been less plagued with problems than SCI (though SCI is supposedly quite stable now).
Third: There are a bunch of other technologies (some cross-platform, some single-platform) that compete in making it easy to build clusters. MOSIX and Beowulf are just two of them. If you give more details of what you want to achieve, I'll dig out references from my collection (made to support the development of FreeBSD-specific clustering improvements, so some types of references may be lacking, but I'll probably be able to come with at least some points to start for any wanted cluster workload.)
Eivind.
Doubting the existence of evolution is like doubting the existence of China: It just shows that you're uninformed.
On the other hand, administration/management type people who don't know what a Beowulf cluster does or how it is used are often involved in the decision making process for computers at universities/companies. Pro-Windows zealots of that type try to throw NT at everything too. And as we all know, Windows NT is not a drop in replacement for UNIX/Linux for every task either.
What is your problem with someone trying to do a little research into what alternatives are out there? What have the Windows zealots got to hide? If they can't ask questions like this in a forum like Slashdot, where can they ask?
As for making real-world decisions in the future, it is best to have the information on which to make a good decision. You seem to be implying that people should just roll over and go for the 'easy answer'.
One VERY important thing to consider is how the clustering technology is implemented in a given solution. For example, Mosix does not require any rewrites to your code, except maybe making the program fork off more often so that the processes can become more distributed. Paralell Virtual Machine (PVM), which is one of the most popular methods of implementing Beowulf clusters, however, is a code library that must be integrated into each program you want to run on the cluster. So, depending on your (or your users) programming knowledge, you should be careful about what clustering architecture you use.
Do you want great clustering? Use VMS, nothing beats it for clustering. The UI sucks, its a pain to admin, but the clustering gives your tons of power. Most clustering uses custom software, you can just as easily right that custom software for VMS as you could for UNIX.
I would whole heartedly recommend that anybody interested in clustering should read Greg Pfisters "In Search of Clusters" published by Prentice Hall - ISBN 0-13-899709-8. It is the seminal work in this area.
. html x .html
Other good resources include:
- IEEE Task Force on Cluster Computing
http://www.dgs.monash.edu.au/~rajkumar/tfcc/index
- Linux-HA http://linux-ha.org/
- some general links
http://www.tu-chemnitz.de/informatik/RA/cchp/inde
There are more clustering products out than you can shake a stick at, and everybody seems have a different take what they mean by a cluster.
Does anyone have any information on what the Linux Cluster Cabal are up to ?
Probably the best thought out cluster solutions are OpenVMS Clusters and UnixWare NonStop Clusters.
Zootlewurdle
Share and Enjoy
They do clusters, and they'll slap together an entire system for you.
Mind you, I haven't tried their systems (umm, clusters are expensive ya know), but it seems to me that going with VA would get you pretty damn close to a turnkey solution.
Give 'em a call. It couldn't hurt, anyway.
First of all, as some other posts said, don't go off building a Beowulf (that name has been overused IMHO) unless you know what you are building it for: clusters are built around the software you're gonna run on them, not the other way around.
;-). Also, do not plan to rely on NFS: Linux NFS is spotty when stressed by some high-bandwidth processes.
;-)...
;-)...
If you plan to port MPP applications from a Cray or an Origin 2K, a Beowulf with and MPI port will most likely do what you want. If you are interested in a HA cluster, then we're really not talking Beowulfs; take a look at TurboLinux's TurboCluster distro.
If you want to throw lotsa CPU power to a problem that's not already MPP'd, a port to Mosix might worth your while, but investigate cautiously: Mosix does a good job (I am told) of process migration, but it doesnt migrate sockets yet, so it may effectively double your network bandwidth --this may not be a problem if your interprocess comm is minimal, or it might be a show-stopper. Do consider a port to MPI in this case: MPI is an industry standard and it works almost as well on a Cray as it does on a Beowulf.
Network communication is not as big a deal as it used to be: besides SCI, there's Myrinet (with OSS drivers and software too), Gigabit ethernet (also OSS drivers from some companies) and they all more or less work with Linux. Or you can go with the original Beowulf solution and bond Ethernet channels (i.e. make 2 NICs look and feel like 1 to the OS, almost doubling your capacity). It all depends on your application's inter-process communication requirements.
If you do decide on a Beowulf, heed these words: be carefule of SMP machines, at least this early in the game. Linux SMP is deficient at best --hopefully 2.4.x will solve it, but I wouldn't hold my breath. If you decide on an SMP machine, stay away from Xeon's as the extra cost will be useless right now --because of SMP problems with Linux, you might as well have a regular Pentium in there, or even some Celerons (hey, it'll buy better networking equipment
I guess the best advice would be: Don't go spending all your NSF money right away. Get 2-3 machines with some Fast Ethernet, set the thing up, port your software, make sure it works as well as you expected it to, THEN go spend the big $$$ on SCI, more nodes, etc. The biggest advantage of Beowulfs is *Freedom* as in flexibility
As the old maps said: "Beware: Monsters Here"
engineers never lie; we just approximate the truth.
No, Beowulf does work just fine with 2.2.x kernels
Lobos2 at NIH is a beowulf cluster with 100 compute nodes that were recently updated to 2.2.13
-----Transmission Complete----- If you want to email me...Don't
I would definitely agree with this. The Sun Enterprise Cluster software is the way to go if you are looking for 2 to 4 node clusters for HA application failover or for something like a 2 node parallel server for a database. I am fairly certain that it will run on Solaris x86 too, though if you really need clustering (and the inherent concept of "no single points of failure"), you probably should be looking at the more dependable, and redundant, Sparc platforms.
However, if you are looking for a very large parallel processing "machine" reminiscent of the old Sequent machines for distributed parallel applications, it seems to me that Beowulf would be a better solution.
It really does depend on what you are looking for.
-Solaris/Sun Cluster Administrator
they want to use NT but I know Linux is better
Perhaps it is more like "They want to use NT, but I don't want to get stuck supporting it". The other question is, should pointy haired bosses who won't be the ones using the machines be the ones making the decisions on platforms either? I think the guy asking the question is a sysadmin. From my experience, he should certainly have some input on what platform is chosen, since it will directly affect his job. Contrary to how Microsoft markets it, in my experience, NT requires not only more service and administration because it is less reliable, but it is less convenient and more complex to administer, especially as the number of servers increases. GUI based admin tools really become more of a hindrance than an advantage when you need to perform the same functions on dozens or hundreds of machines, and you start wishing for a nice, fast command line interface and/or scripting alternatives.
If you're looking to cluster for high performance, you need to decide which HPC paradigm you're going to go with and choose your clustering based on that.
Distributed shared memory (DSM) is great on the programmer side. You've got a big steaming chunk of memory shared among processors, and a bunch of parallel threads/processes (depending on OS) acting on that chunk. DSM makes a lot of sense for database servers, and is the prevalent HPC solution among the big server companies (Sun, SGI, etc.) since multithreaded code runs dandy without any modifications. MOSIX implements DSM.
The downside: vast memory bandwidth required for sharing and high overhead. In an educational environment (and IMHO), DSM is a Very Bad Thing, since programming DSM teaches you nothing about actually using parallelism--it's just like working in any other multithreaded environment.
Students are better served by learning on a message-passing system, which is what Beowulf clusters are. You have a bunch of computers and a way to make them talk to one another (PVM or MPI)--"now implement some algorithms!" MP machines are [given equal-quality implementations, a big given] generally faster and more scalable than DSM machines, as well as being more "pure". Optimizing DSM programs is much easier if you have MP experience.
Downside: MP is a pain to program and even more of a pain to debug. But students could use more suffering, right? Language support is a little iffier for MP, too, with Fortran and C being prevalent.
You should be first reading the Beowulf FAQ. (There are links in prev. posts), but my take for the /. readers here.
First of all, distributed/parallel computing is not an area in which there is consensus on hardware/software systems. Though, there is an accelerating trend towards building supercomputers comprised of clusters of commodity components. The fastest computer on Earth, last time that was ASCI RED (and I hear it's gotten an upgrade!), is built as such. That is vector computers and ultra-expensive parallel computers are being phased out in favor of cluster systems.
A cluster is basically a number of computers connected through a decent interconnection network. On each compute node a traditional OS runs. On top of the OS, a software interface that implements either a message-passing environment or a shared-memory environment sits. In some cases, it may be desirable to modify the OS itself for a better single-system image (such as MOSIX patches). Thus, a cluster is nothing but a number of connected machines. With proper software, it might be possible to run huge clusters on the internet (Check the Globus project!)
In the design of this system, a couple of parameters must obviously be determined.
1) Number of compute nodes
2) Processor/Memory configuration of each compute node.
3) The interconnection network:
a) Network interface of each compute node
b) The switch that is used to connect each machine.
An incorrect estimation of these parameters may give rise to a very sub-optimal hardware configuration. That is the network must be fasst enough to account for the messages being sent, the memory must be JUST large enought to support the granularity of processing, etc. I advise you to read some introductory text on parallel programming before embarking on a cluster effort.
Notice that there is no SINGLE piece of software that will magically parallelize and distribute your applications gracefully. You will find that explicitly parallel or distributed applications will run much more efficiently. While you can get a global process implementation, or even shared-mem implementations with some software, the real "speedup" is going to be observed for explicitly parallel programs, for instance linear algebra libraries designed to run on message passing architectures.
Finally, let me give you a list of component we're in the process of acquiring.
32 PII-450, 128 MByte compute nodes with 3COM Fast Etherlink 100Base-TX
1 master node, a plain PIII-450, Gigabit ethernet and another NIC to connect to internet, some megs of disk
1 devel workstation, plain PIII-450...
3COM SuperStack II 3900 36-port 100Base-TX, 1 1000Base-SX managed switch
32-port (hopefully) multi-port serial board (you use this for diagnostics)
The software will simply be the stable Debian release, on the compute nodes not much software will run. Of course the lam package will be resident since it is a pretty good MPI implementation. The server will be uplinked to the switch with 1000Base-SX so that it can act as a synchronizing source (you know, beowulf master node, file server, etc.) The multi-port serial board provides a shell over the serial cables to each compute nodes, that gives you a good chance for repair when a node goes down.
Keep clustering,
--exa--
Sure you can get Intel boxes for a dime a dozen...
Hardware costs are the same or slightly lower for Linux, because Linux has lower hardware requirements.
sure linux is free.
Linux development tools are also much cheaper than those for Windows, and this is a direct issue because the kind of apps that are run on a Beowulf type machine are typically homegrown.
Sure Windows costs money to buy.
Windows costs a lot more than you might even think by the time you add in all the add-on software and development tools you would need to build a cluster and development environment. We are talking 10's of thousands of dollars difference for a few dozen nodes.
BUT they both require something that costs alot: service.
And they both require it. That is true of any type of computer system. Nothing I've seen would tell me that there is any reason to expect that Windows would cost less for service, support and administration than Linux, in fact from what I've seen, and despite Microsoft's marketing, it is the opposite.
Beowulf clusters are potential security risks if not properly administered and kept patched up to date.
The same thing is true of Windows boxes. The same thing is true of any type of system. Actually most Beowulf boxes are hidden away behind firewalls and not something that is accessable to every Joe random student and outsider, so you are overstating the relative security risk compared to any other computer system.
Installation and maintanance require *good* sysadmins. What's the average salary of a sysadmin?
It's not insignificant, but it is cheaper around most major universities (especially since they have the advantages of indentured servants... err... coop and graduate students and generally depressed markets for tech staff). And from what I've seen in the university environments, it is a lot easier to find people with skills in UNIX/Linux administration than it is to find MSCEs. Also in my experience UNIX/Linux not only require less administration work because they are more reliable, it is easier for a smaller number of admins to administrate a larger number of *nix boxes than Windows.
Look at what you currently have in house for expertise.
It sounds like this guy is one of the in-house sysadmins. Most universities have more in-house expertise in UNIX/Linux for large scale implementations, which is one of the big reasons that all of the big research orgs are using Linux for their Beowulf clusters.
Just throwing out a new Appleseed thread because I think it deserves it. I wrote an article on the clustering project at the UCLA Physics department nearly a year ago. They had achieved spectacular results using 300HHz Beige G3's and 100BaseT. Very simple setup using off-the-shelf hardware. I do not know if they have looked into using Firewire or Gigabit Ethernet and/or the new G4's yet. But, I would expect that the performance from a 8 or 16 box cluster of G4's with Gigibit Ethernet would pretty much blow away a beowulf cluster in both the performance and price categories. As for NT... don't make me laugh. Doctor Viktor Decyk is the project coordinator and would be glad to speak to you I am sure. The project website has been posted in a previous thread. Connor W. Anderson IT Manager Department of Radiology Univerisity of Chicago
"Being Irish, he possessed an abiding sense of tragedy which sustained him through brief episodes of joy." -W. B.
Depending on what you need the cluster for you should adapt network at your clustering technology.
KAOS
PAPERS
Grey (Chris Lusena)
My questions are:
1) Have I missed any other serious competitor in the cluster field?
You sure want to try the newly released GPL program ALINKA LCM to do the management and configuration of Linux beowulf-type clusters.
Once installed, the software can automatically setup from the network a beowulf cluster (with or without using the hard disks) within 2 minutes. With this software, it is dead easy to build an "instant beowulf" cluster...
The current version is 1.1.3 and can be considered as a beta release, although some sites use ALINKA LCM v1.1.3 in production. If you wish to know more about ALINKA LCM, you can read the on-line documentation here . ALINKA provides software tools for commodity clusters running Linux since August 1999. Customers of ALINKA include the French CEA (Center of Atomic Energy) and public research laboratories.
The ALINKA company provides commercial support for ALINKA LCM and also sells a GUI for ALINKA LCM, called ALINKA RAISIN, running within a web browser.
You can check http://www.alinka.com for more information on this new killer software !
ATM is not necessarilly the best connectivity solution for this particular application, nor is routing. ATM is a cell-based OSI layer 2 technology that breaks each cell down into a 53 bytes. On an OC-3 (155Mbps) one can incur quite a bit of overhead for LAN-based traffic so you won't necessarilly see your full 155Mbps of traffic. ATM works well for native ATM devices that require real QoS and are able to manage the setup and tear down of the various types of circuits that are available in an ATM cloud. IP in all it's forms have to be adapted to ATM in one of a couple of ways, LAN emulation being one of the most popular. Setting up permanent point to point PVCs is also another way to do it.
One of the qualitative differences between clustering and MP is that in a clustering environment one has to be able to write applications that can be made parallel and are capable of taking advantage of the massive amounts of CPU time available while not suffering from the relatively small amounts of memory bandwidth available. Most ACs don't understand this, so we get comments like "I want to run quake on a Beowulf". It follows that increasing the amount of bandwidth between machines will make the clustering environment less restrictive from a memory bandwidth point-of-view. One never wants to "route" in a clustered environment. Devices that make forwarding decisions at OSI Layer 3 are all inherently slower than devices that make forwarding decisions at OSI Layer 2. There are L3 switches that forward packets at wire speed, but these are expensive and pointless to use in this type of environment, as it's not needed. Basically, one would want to put their cluster into a single subnet (and vlan) in a completely switched environment and endeavour to minimize broadcast traffic. At a minimum I would recommend a completely switched 100Mbps environment for a low-cost cluster.
It should be noted though, that all 100BaseTX switches are *not* non-blocking. I wouldn't consider using anything that isn't. If one requires additional bandwidth for a particular type of application there are a couple of other options. Gigabit ethernet will provide approximately 3 times the bandwidth of fast ethernet in a Linux machine, mainly being limited by the throughput of the stack. One also may want to consider HIPPI if the need is there. HIPPI is very very expensive, and to the best of my knowledge only available from a handful of vendors. One of those being Essential/ODS (my previous employer). I believe that there is a driver for Linux for the Essential/ODS HIPPI NIC, though I'm not certain what the throughput is. HIPPI is being used by the big boys, Sandia National Labs, Nasa Ames, Lawrence Livermore, mostly in SGI environments. Beyond HIPPI, there is something called GSN (Gigabyte Switch Network) a 6.4Gbps environment being adopted as the next level of bandwidth by both ODS and SGI. ODS filled the first order for GSN switches sometime in January of 1999 I believe. I'm not even sure there is a NIC available for the type of hardware that's supported by Linux. For info on HIPPI and GSN stuff check out ODS' web site I would recommend HIPPI, then Gigabit Ethernet for a high performance cluster. The Lanblazer from ODS and the Cajunswitch 550 (the same switch, one is OEMed from the other)for gigabit ethernet or fast ethernet. In addition, there are products from Extreme Networks, Fore Systems (Berkeley Systems Gig E stuff), HP and I'm sure there are a few others. Most of the stuff from the top 3 (Cisco, Nortel, 3Com) are not non-blocking, one should do the research before making a purchase.
But, I would expect that the performance from a 8 or 16 box cluster of G4's with Gigibit Ethernet would pretty much blow away a beowulf cluster in both the performance and price categories.
I seriously doubt that. To use the AltiVec part of the G4 (which is what gives its absurdly high peak performance), you need to be either hand-writing PPC/AltiVec assembly code or using a vectorizing PPC/AltiVec compiler, and I have not heard of *any* of the latter. Also, the memory system on the G4 isn't much (if any) better than that on a standard Pentium III, which frankly sucks (~300MB/s). A Beowulf cluster comprised of Alphas with a Myrinet network will likely wipe the walls with a similarly sized G4 cluster with Gigabit Ethernet, and will cost about as much -- large GigE switches are expensive.
8 DS10 1Us (@$3k) + 8 Myrinet cards (@$1.4k) + 1 16-port Myrinet switch (@$4k) = $39.2k
8 G4s (@$2.5k) + 8 Gigabit Ethernet cards (@$0.7) + 1 8-port Gigabit Ethernet switch (@$15k) = $40.6k
"My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
Novell is also very expensive.
Compared to Linux, perhaps, but not that much worse than NT...