Maintaining Large Linux Clusters
pompousjerk writes "A paper landed on arXiv.org on Friday titled Installing, Running and Maintaining Large Linux Clusters at CERN [PDF]. The paper discusses the management of the 1000+ Linux nodes, upgrading from Red Hat 6.1 to 7.3, securely installing over the network, and more. They're doing this in preparation for Large Hadron Collider-class computation."
My book on maintaing a cluster of 0-1 nodes will be out next month.
Why on earth would someone need a 1000+ node cluster?
Maybe for a Large Hadron Collider-class computation.
Whenever the offence inspires less horror than the punishment, the rigour of penal law is obliged to give way...
Damn. Back when I was on a high-energy experiment located in the middle-of-nowhere in Japan (subject of at least two slashdot articles), our japanese colleagues used to lease gaggles of Sun workstations at a yearly maintanence cost that exceeded the retail value of the machines themselves!!
A few of us linux-fans used to grumble that we'd be better off buying dozens of cheap linux-boxes, but we weren't making the buying decisions. It seemed to us that the higher-ups didn't think cheap boxes with a free OS could compete on a performance basis with the Suns.
As for me? I just installed CERNlib on my laptop and just laughed as it blew the suns away on a price/performance(+portability) basis
Just because you don't need it, or can't envision needing it, doesn't mean nobody else needs that kind of power.
Bob
fsck -u
So yeah, I basically designed my own system for a professor in the Political Science Dept at my universidad Washington University in St. Louis that completely boots over the network and is completely diskless for every node. About a year before Knoppix ever started doing that. Did it with openMosix and its fully LAM/MPI functional. Bruce of the openMosix list was on me for quite a while to get the docs done, but some really not cool domesitc issues came up and I never got them done. If anyone is really interested, send an email to drtdiggers_DONT_SPAM_ME_BASTARDS_@_SUCKYMICROSOFT_ hotmail.com and let me know, I'll finish them up.
(Disclaimer: IANAPP (Particle Physicist))
Hey, *somebody* has to back up the Internet from time to time!
:)
:)
...or to find Cheney.
Either that, or all the pr0n encoding.
Best...Tivo...*ever*!
Host this thing at an Internap location, and you're the Ultimate LPB.
Searching for "First Posters" for the Homeland Security people to "visit."
SETI client!
"Every room has every movie ever made in any language." Who do you think hosts *that*?
ILM, seeing the second LOTR movie, decides an 'upgrade' is in order for the SW:EP3 render farm.
It takes this much computing power to find WMD in Iraq.
MS compiling Longhorn builds.
Calculating the question for the answer 42.
BitTorrent!
I've been looking at ClusterKnoppix mentioned recently on slashdot. It has built in openmosix and also supports thin clients via a terminal service. Just pop it in, and instant cluster. In case you missed the article:
ClusterKnoppix
Where I work, we are developping a clustering system using single system images.. Where all the OS is stored on a server and is NFS mounted by each node. Our current tests show that we can easily run 100 nodes on 100mbit ethernet from a single server... And the coolest thing is that the nodes mount the / of the server, so for "small clusters" (under 100 nodes), we have to do a software upgrade only once and all nodes and the server are upgraded... Btw, this whole thing can be done using an almost unmodified Gentoo Linux distribution.
I'm hoping to convince my boss to let us publish detailed docs.. he thinks that if we do everyone will be able to use it and he will loose sales (we are in the hardware business..). Details at our homepage and about an older version (but with more details) at the place where we used to work.
well, i recently interviewed at nvidia, and they have a 3,000+ cluster just for emulating the new graphics/io chips they're working on... they don't manufacture anything, the turn around time to manufacture a prototype for testing would take too long... so all they do is simulate the actual chips and then send the data off for fabrication once they're done. on a cluster of 3,000 machines, some jobs take all weekend, from what i understand.
imagine if they just used one machine.
This reminds me of a paoper that was just presented at USENIX:
Fast, Scalable Disk Imaging with Frisbee. Fun talk.
Pretty cool tricks - they use multicast and filesystem specific compression techniques to parallel load the disks on a subset of the disks in the cluster. Very very very fast. (I use the disk imaging part of their software to load images on my test machines at MIT, and I'm quite impressed).
Anyway, just a bit of related cool stuff.
First, as another poster pointed out, these detectors produce a LOT of data. I'm on an experiment slated to take data at about the same time as the LHC experiments, with similar rate requirements.
We plan to use a 2500 node cluster (of year 2007 CPUs) to filter our data in real time. The input rate into this cluster will be about 10 GB/s, output rate about 200 MB/s.
But, each interaction is analyzed (usually) by just one computer. There are so many interactions, though, that you need massive clusters, but not much communication between nodes of the cluster.
That's just for the data filter. You need even larger amounts of computing to analyze what comes out in that 200 MB/s and to simulate what happens in the experiment. Much larger amounts.
Our experiment will ultimately require clusters this size at the laboratory and at something like a dozen other institutions.
Who in their right mind would have a cluster this size, for this sort of work, on any network where "securely installing over the network" is an issue? I mean, I'd want this as far off of a public network as possible, unless I really want to explain to whoever authorized my grant why my experimental data indicates that:
e = mc^31337
my sig's at the bottom of the page.
...run Windows?
Analogies don't equal equalities, they are merely somewhat analogous.
I'm surprised that nobody has mentioned SystemImager. If you haven't looked at it for maintaining large numbers of Linux boxes, scamper off and take a look now. It is worth your time.
Now, that being said, I recently had the opportunity to evaluate using a number of OpenBSD boxes, but I couldn't find a utility for maintaining a bunch of the boxes in the same manner as SystemImager (i.e. Incrementally update servers from a golden master via rsync).
So, has anyone run found anything that does what systemimager does, but that is cross-platform? Do any SystemImager developers out there want to comment on the potential difficulty in supporting other-than-Linux operating systems in SystemImager?
SystemImager is one of the most useful tools I've ever seen, however, I believe that it would be an enterprise "killer app" if it could do MacOS X, *BSD, Windows etc.
-Peter
. Penguins Surely Ca
One application that benefits from adding the nodes (with almost linear scaling in performance) is the Monte Carlo radiation transport. For example, in medical physics people try to calculate a dose distribution in a human body for the various configuration of treatment accelerators. Monte Carlo simulation software "generates" random initial particles (with appropriate probabilities for given accelerator) and than tracks each particle as it propagates and interacts with surrounding tissue. Interactions are randomly generated (hence: Monte Carlo) but again randomness is biased according to the appropriate physics. Each such "history" can be independently generated by a different node thus making parallelization trivial.
In my lab I have assembled a 24-node cluster and it takes about 4-8 hr to calculate dose distributions for the most cases. With a 1000 node cluster it would be possible to do this sort of calculations routinely in clinics during the treatment planing and actual treatment. This will mean that the cancer patients will have improved survivability odds due to the more precise targeting of the tumors.
Cheers,
Beowulf's root