Flattening Out The Linux Cluster Learning Curve

don't you mean... by beckett · 2004-10-31 00:09 · Score: 3, Insightful

Re:don't you mean... by St.+Arbirix · 2004-10-31 04:59 · Score: 4, Funny

They were having problems with too many people learning how to cluster linux. Mentions in various forums about "imagine a Beowulf cluster of these" had reached epidemic proportions so they decided something had to be done.

Thanks to this book the learning curve has been flattened down to something more appreciable and amenable to those who have complained about the problem. The curve has been flattened far enough that it takes two years to learn that clustering "will likely require more than one computer to operate correctly" (Chapter 403 pg. 8729). I count this as a big win for society.

Ignore the anonymous coward who replied before me.

--
Direct away from face when opening.

This is the kind of book we need... by Amiga+Lover · 2004-10-31 00:11 · Score: 5, Funny

Now it's not just geeks, but also IT Managers who can imagine a beowulf cluster!

Re:This is the kind of book we need... by Anonymous Coward · 2004-10-31 00:16 · Score: 2, Informative

Umm dude.. Enterprise cluster != beowulf cluster
Re:This is the kind of book we need... by Green+Salad · 2004-10-31 00:39 · Score: 2, Insightful

Rather than ridicule to level of expertise, I think it's important for IT management types to have their imagination fired up about clusters.
They're the one's that can get funding and support for you to put one together.
Re:This is the kind of book we need... by ocelotbob · 2004-10-31 02:30 · Score: 2, Interesting

*Disclaimer: I am tired. It is 6:30 on a sunday morning. I have done the one task I gave myself before I allowed myself to sleep, which was to make pawgloves for my halloween costume. Thus, sanity is overrated right now.
Okay, the classic beowulf cluster is a 4x4 matrix of computers. Now, to have a beowulf of beowulfs, each of those computers on a cluster must be connected to its own 4x4 grid, so you now have a cluster of 256 computers, arranged somewhat suboptimally. Now, in order to communicate with these systems, you are going to need some library functions. Classic beowulfs work well with the industry standard pvm libraries. They can also use openmosix if the application is not natively cluster aware. As we are dealing with clusters of clusters, some applications may not function properly if they were designed to work on just a single cluster. So, most likely, we'll end up needing to use a variety of techniques to beowulf squared an application, such as combining pvm and openmosix

--
Marxism is the opiate of dumbasses
Re:This is the kind of book we need... by perlchild · 2004-10-31 04:40 · Score: 2, Interesting

You bring an interesting point up, I wish each book on the topic of clusters mentioned which type(s) of clusters it dealt with...

Looking for a good book on High-Availability clusters would be so much simpler

OSTG? by ricotest · 2004-10-31 00:13 · Score: 4, Informative

I must have missed this, and for anyone else who didn't know, OSTG is the new name for the Open Source Development Network (OSDN) Slashdot is a part of. They're now called the Open Source Technology Group.

Let me get this straight by Timesprout · 2004-10-31 00:19 · Score: 3, Insightful

The guy puts a single 10 node cluster together and this qualifies him him to write the 'definitive guidebook called "The Linux Enterprise Cluster"'.

Dont think so.

--
Do not try to read the dupe, thats impossible. Instead, only try to realize the truth
What truth?
There is no dupe

Very nice by a_hofmann · 2004-10-31 00:40 · Score: 4, Informative

Installing and administering the various open source tools can be tedious work, especially without documentation of how to put things together.

A quick Google search though reveals a lot of free papers and manuals on this very topic.

The problem with clustering in Linux... by Xpilot · 2004-10-31 00:41 · Score: 4, Interesting

...is that there are a gazillion ways to do it, and every cluster vendor comes up with their own way, and there is no agreed-upon standard yet to easily deploy these things (AFAIK). Now the fact that there is no single vendor controlling how clustering works is a good thing, without a lack of a good standard as to what a clustered environment will offer to the application developer, the task of setting up clusters for different types of applications remains a tedious task.

Lars Marowsky-Brée had a paper in the proceedings of OLS 2002 describing the problem and a suggeted solution in his paper entitled "The Open Clustering Framework". I'm not sure how far standardized clustering has come since then. Anyone has any insight on the matter?

--
"Backups are for wimps. Real men upload their data to an FTP site and have everyone else mirror it." -- Linus Torvalds

Re:The problem with clustering in Linux... by barks · 2004-10-31 01:28 · Score: 4, Interesting

I have yet to meet anyone that's running a homebrew cluster and tell me which distro they're running...as awesome as they appear to be.

What I gathered from my introductory Operating Systems class was that this was the next frontier and exciting market to keep an eye for.......that and creating applications for these setups was not as you said, standarized yet. Can Linux applications that normally run a single box setup migrate relatively easily to a cluster setup yet?

--

Some aim to please, I aim to tease.
Re:The problem with clustering in Linux... by jweage · 2004-10-31 02:10 · Score: 2, Interesting

I've personally setup two clusters, both in the 40 CPU range. I've used various versions of Red Hat, I'm now using White Box Enterprise Linux.

I use a PXE boot system with Anaconda kickstarts to get the software installed. A poast install script then configures everything else on the machine. When it reboots, the machine appears in the cluster and is ready to use. I use the Torque batch scheduling system.

You don't need the cluster toolkits to setup a cluster! DHCP, TFTP and a configured kickstart file work just fine with Red Hat.
Re:The problem with clustering in Linux... by mindstrm · 2004-10-31 02:13 · Score: 2, Interesting

THe answer is "it depends". First, there is no such thing as a generic "cluster". A cluster is just a bunch of machines cooprating to solve a problem (whether that problem is serving a website or computational physics, or the requirement for redundancy)

Some types of applications, it's easy to visualize how to get a dozen or a hundred computers to help with the problem (serving static web pages). Others, it's not (databases)

Logistics gone digital? by egnop · 2004-10-31 01:13 · Score: 2, Interesting

End users would often complain about the system's slow response time.
He says, "Because we couldn't print the forms for the warehouse people to select the products to be put on the truck, we'd have dozens of truck drivers sitting in the break room each day for more than 10 minutes.

I actually don't get it, most logistics got wireless for about a decade now...
and the truck driver has no right for a break...

Re:Logistics gone digital? by teh_winch · 2004-10-31 03:01 · Score: 2, Insightful

If the system is taking a long time to do the work required to print a form how is wireless going to help?

and the truck driver has no right for a break...

What do you propose the driver does? Go drive around the block for 10 min until they are ready to load?

Publication, standardization, multiplication by wombatmobile · 2004-10-31 01:23 · Score: 3, Insightful

Publications like this play an important role in establishing best practices and community, two key enablers of standardization.

These in turn will lead to greater adoption, and more publications. A virtuous cycle.

VMS clusters by Anonymous Coward · 2004-10-31 01:24 · Score: 4, Interesting

Want practice with decades-mature enterprise clusters? Why not get a few old VAX or Alpha systems on eBay, and/or fire up a few instances of the simh emulator, then join the free OpenVMS hobbyist program (I recommend the also-free-to-hobbyists Process Software's Multinet TCP/IP stack and server software).

And please, don't be put off by VMS because DCL = your first exposure to a VMS system - feels more awkward than bash (in many ways, it certainly is!). It's in the underlying architecture of the OS where the fruits of tight engineering are really demonstrated.

Re:VMS clusters by hachete · 2004-10-31 02:46 · Score: 2, Informative

This seems fairly active:

http://gnv.sourceforge.net/

includes a port of bash to VMS. Not sure how good it is.

Having used and programmed DCL, it's not that bad.

h

--
Patriotism is a virtue of the vicious

Unless i read this incorrectly. by thegoogler · 2004-10-31 01:37 · Score: 2, Informative

And he built a ten node cluster OF ten node clusters, then this is lame. and he is under-qualified to do the book(most likely) as most ACTUAL enterprise clusters are at least 20 nodes, possibly more if its clusters of blade servers.

Mandrake CLIC by bolind · 2004-10-31 02:07 · Score: 4, Informative

I will start by admitting that I am just a dumb university student talking out my ass. I have never set up an enterprise scale cluster.

However, last january we set up a small (six node) cluster with the help of CLIC. Once we realized the link between a Mandrake and consective dead CD drives, we installed the cluster in little time.

CLIC might focus a little too much on userfriendlyness and a little too little on flexibility, but for our purposes it was great. It sports ganglia, gexec, distcc and MPI (and probably more), and administration and deployment of nodes is a breeze.

I heartily recommend CLIC for student/test/proof-of-concept projects.

Beowulf Newbie Question by Phoenix666 · 2004-10-31 02:29 · Score: 2, Interesting

I read about setting up a cluster about six months' back, and they said that you can only really run programs that are specifically designed to run on a beowulf cluster. It seems like if you could set up a cluster and be able to run any old app on it without special coding, then you'd have your massive adoption of linux. Plug-n-play supercomputer, using the crappy old boxes gathering dust under the cubicles.

Is there any plans to take beowulf in this direction? Is it already possible, but I was just reading the wrong FAQ?

--
Do what you can, with what you have, where you are.

Re:Beowulf Newbie Question by photon317 · 2004-10-31 03:29 · Score: 3, Informative

Inevitably high-performance clusters require software designed to run on high-performance clusters. It is better not to think of such a cluster as a single system, but rather as a network of individual machines with a tight network connection. Some of the clustering add-ons for linux approach and even achieve certain aspects of a "Single system image" type of configuration, but it's never completely like a single system.

Back in 1997 or so I tried to get as close as I could to a true Single System Image by building off of the beowulf patchsets combined with patches for Distributed SysV IPC/SHM and a globally-shared root filesystem using CNFS (cluster-nfs, so that a few essential configfiles can have unique copies per cluster node). It was very daunting work to get those patches integrated together, and the end result was that without some kind of network-interconnect that was as high-speed and low-latency as a processor's FSB, there was always going to be a big performance hit doing things this way. Of course if an application happens to be perfect for simple HPC clusters (all cpu intensive, very little I/O, and the work is easily divisible without tons of IPC between the workers), then it runs fantastically on such a Single System Image cluster, but then again it would have run fantastically on a simple cluster that doesn't look like a Single System too. So what the Single System concept bought me really was a nice abstraction layer that made everything easy to deploy, configure and manage. But it came at a severe initial cost of human labour. It's not worth the trouble.

--
11*43+456^2
Re:Beowulf Newbie Question by Junta · 2004-10-31 03:39 · Score: 2, Informative

An openmosix cluster would behave more along the lines of what you are thinking, but ultimately for HPC applications at scale it is generally more efficient to not do openmosix and write the programs explicitly for parallelism mindful of the layout of processing elements (i.e. network topology or balance between SMP connected processing elements and network connections between nodes).

--
XML is like violence. If it doesn't solve the problem, use more.

Please stop misusing the term 'learning curve' by double_h · 2004-10-31 04:45 · Score: 4, Informative

A flat learning curve is a bad thing.

The term "learning curve" was invented by the aerospace industry in the 1930s as a way to quantify improved efficiency from mass production (basically, the more you do a task, the easier it becomes). The term was later adopted by psychology and the social sciences, where most people first encounter it.

In both cases, the horizontal axis of a learning curve represents time or effort, and the vertical axis represents amount learned or productivity. Therefore something that is intuitively obvious in fact has a steep learning curve.

"Learning curve" was a technical term with a specific definition for decades before it was ever a (misused) marketing buzzword.

Thank you for your time :)

Unless... by freezin+fat+guy · 2004-10-31 05:19 · Score: 2, Funny

They flatten it vertically. Wohoo! Zero investment, complete knowledge!

Re:little advice by dougnaka · 2004-10-31 14:50 · Score: 2, Informative

The first part, act as one big SMP machine is what clustering does.

The second part with shell acounts and home directories are all problems already solved by NIS/NFS. You could setup a pool of machines that all share the same NIS/NFS info so anywhere the user logged in they'd have the same files/passwords, and load balance it via ipvs or dns.

AFAIK the current state of clustering works well for custom code situations, where you write your app to run on the cluster, but doesn't transparently make your 4 boxes act like 1 box with 4X the resources for just any program.

I've used distcc with some luck on gentoo, but it only distributes compiling over your nodes.

--
My Linux Command of the Day site : LCOD

apps not designed for cluster with lots of state? by mikefe · 2004-10-31 20:49 · Score: 2, Interesting

I have been looking at network filesystem level clustering and failover and NFS, SMB/CIFS and OpenAFS look like good choices for that. With NFS and CIFS you can have an active/inactive fail-over cluster.

I don't know about NFS, but in the case of CIFS, the protocol spec has provisions for renegotiating locks if a connection is broken, but I don't know if there are bugs in win2k/XP clients with samba 3 servers. OpenAFS can have a sort of active/active setup, but the archatecture is such that there is only one server that handles the writes and the rest are read-only. In all of these you can have a semi active/active failover cluster if you move half of the active volumes to the backup server, but this adds a lot to the complexity of your fail-over system.

Those services have a low to moderate amount of state information kept on the server. In the case of a graphical (VNC) terminal server, I don't know of any open source projects that will allow gnome session to be on one server, have that server go down, another server take over its ethernet MAC and IP address and continue processing where it left off on the backup server. The best I can think of is OpenMosix or maybe OpenSSI which are two single system image type clustering systems. If anyone knows anything, please reply and let me know thanks.

--
There: Something at a specific location.
Their: Owned by someone.
Please make sure your english compiles.

Comments From the Author by KarlKopper · 2004-11-02 06:36 · Score: 2, Interesting

I'd like to jump in here and make a few comments.

First, about the book being a "definitive" guide. I cannot possibly claim to be an expert on every topic in the book--in fact, no one person can. The book is definitive, however, in that project leaders from each of the open source projects participated in editing and reviewing the material for the book.

It is an over broad statement to say it is the definitive guide for building any and all types of Linux Clusters. The book describes how to build a cluster that can be used to run mission critical applications to support an enterprise (it has little or nothing to do with working on the "Big Problem" as Pfister would call it).

(The book took four years to write by the way.)

I do hope it helps with the learning curve, but this is one of the advantages of building what I'm calling a Linux Enterprise Cluster--the system administrator can leverage his/her knowledge of Linux and add concepts that will allow them to build a cluster capable of supporting the enterprise.

I did not invent anything new for this book, and you CAN already find just about everything on-line that is in this book. I started work on the book in 2000 because, at the time, I wanted to have a guide book like this one that would hold my hand through the process of building a cluster that could support mission critical applications running GNU/Linux.

Finally, let me just agree with the comments about the number of nodes ("You don't need 20 nodes if 6 can do the job"). This book is not about building clusters for scientific applications where thousands of nodes and sophisticated batch job scheduling systems are required. How many nodes does it take to build the ideal cluster for your environment? I think that will depend on a lot of things including your budget, the impact of the failure of a single node in the cluster, how many instances of your application can run concurrently on a single node, performance bottlenecks from your node hardware, and so on. In my opinion, the ideal number of cluster nodes for an enterprise cluster--from the system administrator's standpoint--is about 10 (in a pinch you can log on to every node fairly quickly).

The cluster this book was based on has been in production long enough (over 18 months) to have undergone a complete hardware refresh by the way; so the text is based on actual experience (not just theory) and, as I mentioned earlier, it has been reviewed by subject matter experts to insure its technical accuracy.

Slashdot Mirror

Flattening Out The Linux Cluster Learning Curve

29 of 89 comments (clear)