Maintaining Large Linux Clusters
pompousjerk writes "A paper landed on arXiv.org on Friday titled Installing, Running and Maintaining Large Linux Clusters at CERN [PDF]. The paper discusses the management of the 1000+ Linux nodes, upgrading from Red Hat 6.1 to 7.3, securely installing over the network, and more. They're doing this in preparation for Large Hadron Collider-class computation."
There was a hint of lightning and a slight rumble of thunder in the distance. I turned from the horizon to face Robert, who was standing to my right. We stood before the luminous Wal-Mart banner. A noisy cluster of insects swarmed the fluorescent light nearby, their innumerable buzzing and clicks penetrating the silence.
âoeThere is a storm approaching,â I whispered to Robert. âoeWe should prepare the store for an emergency situation.â
He was apparently capable of sensing the urgency of my recommendation. âoeYes, sir,â said Robert, âoeIâ(TM)ll get right on it.â
As he returned to the inner confines of our beloved store, the silence that I had grown accustomed to abruptly ceased. The forest surrendered to the wind. Venerable pine trees swayed in the distance as their gray leaves began to litter the parking lot. Several cracked and fell swiftly to the ground. The lightning crackled overhead, and the air was laden with foreboding.
I stepped inside. Although the prodigious wind had been somewhat deadened by the building, I continued to perceive it. Robert had enabled the auxiliary lighting system. He stood near a corner, watching small hints of rain as they impacted the skylight. An occasional flicker of lightning penetrated the store. Tree limbs struck the roof.
âoeItâ(TM)s getting bad out there, isnâ(TM)t it?â Robert was concerned.
âoeIâ(TM)m afraid so,â I said, âoebut I promise you that weâ(TM)re going to vanquish. We always do.â
The lights dimmed for a moment and I was taken aback by our reflection in the skylight.
âoePeter,â said my colleague, âoewe didnâ(TM)t bring the fertilizer display inside!â
Robert was correct. Although I was admittedly hesitant to walk outside during a hailstorm, we couldnâ(TM)t permit it to obliterate company property. âoeYou stay here,â I said to him. As I approached the front door, I almost felt as though I were being watched. When I looked to the left, especially.
The automatic door opened and I stepped outside. I waded cautiously through the lake of water that covered the parking lot toward the display. The fluorescent lights overhead dimmed, then everything became dark. My face grimaced involuntarily as large hailstones continued to strike me. I glanced in both directions, but, with the exception of the devastating storm, could perceive nothing.
I turned to face the display, but I am unable to describe to you what lay before me. And it was moving closer.
Comment without sacrificing karma.
Fuck this website. FP, nigs.
My book on maintaing a cluster of 0-1 nodes will be out next month.
made you look! already grabbed the pdf!
#define NULL 0
#define EOF (-1)
#define ERR(s, c) if(opterr){\
extern int strlen(), write();\
char errbuf[2];\
errbuf[0] = c; errbuf[1] = '\n';\
(void) write(2, argv[0], (unsigned)strlen(argv[0]));\
(void) write(2, s, (unsigned)strlen(s));\
(void) write(2, errbuf, 2);}
extern int strcmp();
extern char *strchr();
int opterr = 1;
int optind = 1;
int optopt;
char *optarg;
int
getopt(argc, argv, opts)
int argc;
char **argv, *opts;
{
static int sp = 1;
register int c;
register char *cp;
if(sp == 1)
if(optind >= argc ||
argv[optind][0] != '-' || argv[optind][1] == '\0')
return(EOF);
else if(strcmp(argv[optind], "--") == NULL) {
optind++;
return(EOF);
}
optopt = c = argv[optind][sp];
if(c == ':' || (cp=strchr(opts, c)) == NULL) {
ERR(": illegal option -- ", c);
if(argv[optind][++sp] == '\0') {
optind++;
sp = 1;
} return('?');
}
if(*++cp == ':') {
if(argv[optind][sp+1] != '\0')
optarg = &argv[optind++][sp+1];
else if(++optind >= argc) {
ERR(": option requires an argument -- ", c);
sp = 1;
return('?');
} else
optarg = argv[optind++];
sp = 1;
} else {
if(argv[optind][++sp] == '\0') {
sp = 1;
optind++;
} optarg = NULL;
} return(c);
}
maybe better?
fp fp first post
"Can you imagine a beowulf cluster of these?" .... RTFA.
..... ditto.
"But does it run Linux?"
"try doing it with a windows cluster"
john
are you a Weapon of Male Destruction? then you need one of our sassy t-shirts
All I Want For Christmas Is My Constitutional Rights
It's time for the Microsoft Conspiracy theories to start. It's also time for shit you people think is funny, like constantly spelling "Microsoft" as "M$" hahaha that's so funny.
This has been a Microsoft Conspiracy Update.
Why on earth would someone need a 1000+ node cluster?
NASA was able to launch a rocket using a low-cost 96-node red hat linux cluster, what else needs this much computing power
I imagine with the power requirements, and the fans' exhaust from 1000+ computers clustered together you might just be able to build a hovering super computer.
and yes, it does run linux
-----------
Error 407 - No creative sig found
this site is not running on a cluster with their configuiration, its been slashdotted already....
The lunatic is in my head
Damn. Back when I was on a high-energy experiment located in the middle-of-nowhere in Japan (subject of at least two slashdot articles), our japanese colleagues used to lease gaggles of Sun workstations at a yearly maintanence cost that exceeded the retail value of the machines themselves!!
A few of us linux-fans used to grumble that we'd be better off buying dozens of cheap linux-boxes, but we weren't making the buying decisions. It seemed to us that the higher-ups didn't think cheap boxes with a free OS could compete on a performance basis with the Suns.
As for me? I just installed CERNlib on my laptop and just laughed as it blew the suns away on a price/performance(+portability) basis
So yeah, I basically designed my own system for a professor in the Political Science Dept at my universidad Washington University in St. Louis that completely boots over the network and is completely diskless for every node. About a year before Knoppix ever started doing that. Did it with openMosix and its fully LAM/MPI functional. Bruce of the openMosix list was on me for quite a while to get the docs done, but some really not cool domesitc issues came up and I never got them done. If anyone is really interested, send an email to drtdiggers_DONT_SPAM_ME_BASTARDS_@_SUCKYMICROSOFT_ hotmail.com and let me know, I'll finish them up.
1. INTRODUCTION
The LHC era is getting closer, and with it the challenge
of installing, running and maintaining thousands of
computers in the CERN Computer Centre.
In preparation, we have streamlined our facilities by
decommissioning most of the RISC hardware, and by
merging the dedicated and slightly different experiment
Linux clusters into two general purpose ones (one
interactive, one batch), as reported at the last CHEP[2].
Quite some progress has been made since then in the
automation and management of clusters. The EU DataGrid
Project (EDG), and in particular the WP4 subtask[3], has
entered its third and final year and we can already benefit
from the software for farm management being delivered
by them. See [4] for further details. In addition, the LHC
Computing Grid project (LCG)[5] has been launched at
CERN to build a practical Grid to address the computing
needs of the LHC experiments, and to build up the
combined LHC Tier 0/Tier 1 center at CERN.
In preparing for the LHC, we are already managing
more than 1000 Linux nodes of diverse hardware types,
the differences arising due to the iterative acquisition
cycles. In dealing with this high number of nodes, and
especially when upgrading from one release version of
Linux to another, we have reached the limits of our old
tools for installation and maintenance. Development of
these tools started more than ten years ago with an initial
focus on unifying the environment presented to both users
and administrators across small scale RISC workstation
clusters from different vendors, each of which used a
different flavour of Unix[6]. These tools have now been
replaced by new tools, taken either from Linux itself, like
the installation tool Kickstart from RedHat Linux or the
RPM package format, or rewritten using the perspective of
the EDG and LCG, to address large scale farms using just
one operating system: Linux.
This paper will describe in more detail how to fuck CBNâ(TM)s
sweet, sweet, succulent homo-ass. Mmmmm, good,
their contribution to the progress in improving the
installation and manageability of our clusters. In addition,
we will describe improvements in the batch sharing and
scheduling we have made through configuration of our
batch scheduler, LSF from Platform Computing[7].
2. CURRENT STATE
In May last year, the Linux support Team at CERN
certified RedHat Linux 7. This certification involved the
porting of experiment, commercial and administration
software to the new version and verifying their correct
operation. After the certification, we set up test clusters for
interactive and batch computing with this new OS. This
certification process took quite some considerable time,
both for the users and the experiments to prepare for
migration, which had to fit into their data challenges, and
for us to provide a fully tailored RedHat 7.3 environment
as the default in January this year. We took advantage of
this extended migration period to completely rewrite our
installation tools. As mentioned earlier, we have taken this
opportunity to migrate, wherever possible, to the use of
standard Linux tools, like the kickstart installation
mechanism from RedHat and the package manager RPM,
together with its package format, and to the tools that
were, and still are, being developed by the EDG project, in
particular by the WP4 subtask.
The EDG/WP4 tools for managing computing fabrics
can be divided into four parts: Installation, Configuration,
Monitoring, and Fault Tolerance. In trying to take over
these ideas and tools, we first had to review our whole
infrastructure with this in mind.
2.1. Installation
The installation procedure is divided into two main
parts. The basic installation is done with the kickstart
mechanism from RedHat. This mechanism allows
specification of the main parameters like the partition table
CHEP03, La Jolla California, March 24
Installing, Running and Maintaining Large Linux Clusters at CERN
Vladimir Bahyl, Benjamin Chardi, Jan van Eldik, Ulrich Fuchs, Thorsten Kleinwort, Martin Murth, Tim
Smith CERN, European Laboratory for Particle Physics, Geneva, Switzerland
Having built up Linux clusters to more than 1000 nodes over the past five years, we already have practical experience confronting some of the LHC scale computing challenges: scalability, automation, hardware diversity, security, and rolling OS
upgrades. This paper describes the tools and processes we have implemented, working in close collaboration with the EDG project [1], especially with the WP4 subtask, to improve the manageability of our clusters, in particular in the areas of system
installation, configuration, and monitoring.
In addition to the purely technical issues, providing shared interactive and batch services which can adapt to meet the diverse and changing requirements of our users is a significant challenge. We describe the developments and tuning that we have
introduced on our LSF based systems to maximise both responsiveness to users and overall system utilisation.
Finally, this paper will describe the problems we are facing in enlarging our heterogeneous Linux clusters, the progress we have made in dealing with the current issues and the steps we are taking to 'gridify' the clusters
1. INTRODUCTION
The LHC era is getting closer, and with it the challenge of installing, running and maintaining thousands of
computers in the CERN Computer Centre. In preparation, we have streamlined our facilities by
decommissioning most of the RISC hardware, and by merging the dedicated and slightly different experiment
Linux clusters into two general purpose ones (one interactive, one batch), as reported at the last CHEP[ 2].
Quite some progress has been made since then in the automation and management of clusters. The EU DataGrid
Project (EDG), and in particular the WP4 subtask[ 3], has entered its third and final year and we can already benefit
from the software for farm management being delivered by them. See [4] for further details. In addition, the LHC
Computing Grid project (LCG)[ 5] has been launched at CERN to build a practical Grid to address the computing
needs of the LHC experiments, and to build up the combined LHC Tier 0/ Tier 1 center at CERN.
In preparing for the LHC, we are already managing more than 1000 Linux nodes of diverse hardware types,
the differences arising due to the iterative acquisition cycles. In dealing with this high number of nodes, and
especially when upgrading from one release version of Linux to another, we have reached the limits of our old
tools for installation and maintenance. Development of these tools started more than ten years ago with an initial
focus on unifying the environment presented to both users and administrators across small scale RISC workstation
clusters from different vendors, each of which used a different flavour of Unix[ 6]. These tools have now been
replaced by new tools, taken either from Linux itself, like the installation tool Kickstart from RedHat Linux or the
RPM package format, or rewritten using the perspective of the EDG and LCG, to address large scale farms using just
one operating system: Linux.
This paper will describe these tools in more detail and their contribution to the progress in improving the
installation and manageability of our clusters. In addition, we will describe improvements in the batch sharing and
scheduling we have made through configuration of our batch scheduler, LSF from Platform Computing[ 7].
2. CURRENT STATE
In May last year, the Linux support Team at CERN certified RedHat Linux 7. This certification involved the
porting of experiment, commercial and administration software to the new version and verifying their correct
operation. After the certification, we set up test clusters for interactive and batch computing with this new OS. This
certification process took quite some consid
Bush is on fire and its not good for my lungs.
What about Linux fecal clusters? Never seen those...
NT
Comment without sacrificing karma.
Just a little too late for the SETI@home project. Kind of a shame, really. If only we had those computers sooner...
Imagine a beowulf cluster of those!
The global economy is a great thing until you feel it locally.
I've been looking at ClusterKnoppix mentioned recently on slashdot. It has built in openmosix and also supports thin clients via a terminal service. Just pop it in, and instant cluster. In case you missed the article:
ClusterKnoppix
Where I work, we are developping a clustering system using single system images.. Where all the OS is stored on a server and is NFS mounted by each node. Our current tests show that we can easily run 100 nodes on 100mbit ethernet from a single server... And the coolest thing is that the nodes mount the / of the server, so for "small clusters" (under 100 nodes), we have to do a software upgrade only once and all nodes and the server are upgraded... Btw, this whole thing can be done using an almost unmodified Gentoo Linux distribution.
I'm hoping to convince my boss to let us publish detailed docs.. he thinks that if we do everyone will be able to use it and he will loose sales (we are in the hardware business..). Details at our homepage and about an older version (but with more details) at the place where we used to work.
well, i recently interviewed at nvidia, and they have a 3,000+ cluster just for emulating the new graphics/io chips they're working on... they don't manufacture anything, the turn around time to manufacture a prototype for testing would take too long... so all they do is simulate the actual chips and then send the data off for fabrication once they're done. on a cluster of 3,000 machines, some jobs take all weekend, from what i understand.
imagine if they just used one machine.
This reminds me of a paoper that was just presented at USENIX:
Fast, Scalable Disk Imaging with Frisbee. Fun talk.
Pretty cool tricks - they use multicast and filesystem specific compression techniques to parallel load the disks on a subset of the disks in the cluster. Very very very fast. (I use the disk imaging part of their software to load images on my test machines at MIT, and I'm quite impressed).
Anyway, just a bit of related cool stuff.
RH 7.3 reaches it's end of life in December of this year. One can only assume (and hope) that they have the in-house people to support it, or it's going to cost them beacoup $$ for continued RHN support.
Oh Fuck! The stupid Microsoft white-trash trolls have awoken to post their filth on this site.
But then again... Do we give a fuck?
They use Microsoft after all, that's punishment enough!
Large linux clusters maintain YOU!
I nearly passed this article over as I scanned the words: paper, linux, clusters.. large hardon collider. w00t!
Wait, no hadron. Damn.
Large Hardon Collider?
Just curious...
Gotta love Slashdot... the only place where such a disclaimer isn't taken for granted.
kind of misread the title... oops
What the hell are they studying???
So, to all those who are in the know out there... when they have what they want how many nodes and individual machines could they maintain? What are the constraints? What about data back-ups? Is ephemeral data recorded on a few machines in separate nodes to make sure that one getting nocked out doesn't zap something for good?
Who in their right mind would have a cluster this size, for this sort of work, on any network where "securely installing over the network" is an issue? I mean, I'd want this as far off of a public network as possible, unless I really want to explain to whoever authorized my grant why my experimental data indicates that:
e = mc^31337
my sig's at the bottom of the page.
Why do we bother with this story?
It's not like we haven't got similar, or even larger clusters, right here in the US. Why are Slashdot such french lovers?
Give us a story about the NSA cluster or something like that. Not some lame french wannabe research institution using an american os (yeah, redhat is american).
How small a thought it takes to fill a whole life
The aroma of premium Samâ(TM)s Choice coffee infiltrated my nose as I sipped a cup delicately. Robert would be arriving momentarily, and I would request a situation report. He and I are, of course, the guardians of a Walmart Supercenter located inside of the prosperous community of Jasper, Texas.
Weâ(TM)ve encountered Paul Cryer, who initiated litigation after his SUV impacted my elite patrol vehicle, and miscellaneous other members of the nefarious Three Pointed Conspiracy. Our integrity, however, is the origin of our strength. We will never abdicate our store. I digress.
"Peter!" exclaimed Robert, emerging from the door leading to the lawn and garden center.
"Robert," I smiled, "youâ(TM)re fifty seconds late. I will not accept such unpunctual behavior. What if a situation occurred in your absence?"
My protégé said nothing, instead walking to the vending counter and drawing a cup of coffee. I wouldnâ(TM)t punish him this evening. "How was your day?" I asked.
"It was fine," said Robert, "but you need to listen to this. There was a situation today at the restaurant!" You see, Robert is also employed by the prestigious Catfish Diner restaurant, which is regarded by many as the most elegant dining establishment in Jasper! Their all-you-can-eat seafood buffet, although expensive, is quite exquisite. I recommend it. Once again, however, I digress.
I turned to Robert. "What was the situation?"
"Well, this man came in," he said, "a man from Houston. It was strange, though, as the emblem on his car wasnâ(TM)t" â" Robert paused â" "I mean, he didnâ(TM)t seem to be a member of the Three Pointed Conspiracy. But, as we both know from experience, such terrible people are unable to conceal their true nature!"
"What happened?" I inquired, attempting to suppress my concern. Robertâ(TM)s safety was of an imperative nature.
"Well, it didnâ(TM)t become physical," he said, "so I didnâ(TM)t need to thrash him. People were obviously intimidated, though. I walked to his table.
"âWhat would you like this evening, sir?â(TM) I asked.
"âLook, man, is there any way you can get me some Starbucks coffee?â(TM)
"âStarbucks coffee? I donâ(TM)t believe we serve that, sir. Would you like some Community Coffee instead?â(TM)
"âI donâ(TM)t drink anything that gives me running diarrhea!â(TM) he said, raising his voice as if he were insulted. âI guess Iâ(TM)ll have some Coke. Anything to keep me awake so that I can get out of this anus of a city tonight.â(TM)
"I must admit that he was beginning to offend me. However, like an experienced waiter, I suppressed my anger. âYes, sir,â(TM) I said, and returned with his drink.
"âLook, man, Iâ(TM)m sure that all of your food sucks anyway, and you only serve a buffet. Itâ(TM)s probably something that I wouldnâ(TM)t even feed my dogs, but Iâ(TM)m desperate. So, Iâ(TM)ll have that.â(TM)
"âYes, sir,â(TM) I responded, âbut our food is excellent. In fact, it is superlative. Our seafood is delivered fresh once every week!â(TM)
"âThatâ(TM)s great,â(TM) he said, laughing, âbut tell your master chef not to screw me over too badly, aâ(TM)ight?â(TM)
"I must concede that I ignored the manâ(TM)s request. I returned with a plate and some silverware, which he snatched promptly from my hands. Fifteen minutes later, I returned with his tab. âHere, sir,â(TM) I said, âthank you for dining with us this evening.â(TM)
"âMy displeasure,â(TM) said the man, handing me a car key, âthis food was terrible, by the way. If this happened at any other restaurant, I would probably shoot the chef. My Lexus is parked outside. Bring it to the front door, please, and donâ(TM)t forget: there are two towels inside of the glove box.
...run Windows?
Analogies don't equal equalities, they are merely somewhat analogous.
Running rpm --rebuilddb must be a real drag.
If you want to scale more, and your nodes have tons of ram, you could likely stuff the whole os into ramdisk and then use the local disk for the scratch space. Once booted, the network impact of nfs goes away.
Of course, you could use System installer Suite (http://www.sisuite.org/) which is *similar* to the rsync method mentioned by the other poster, but you get to skip the redhat install step in favor of SiS's tools.
XML is like violence. If it doesn't solve the problem, use more.
I LOVE LINUX!!!!!!!!!!!!
I'm surprised that nobody has mentioned SystemImager. If you haven't looked at it for maintaining large numbers of Linux boxes, scamper off and take a look now. It is worth your time.
Now, that being said, I recently had the opportunity to evaluate using a number of OpenBSD boxes, but I couldn't find a utility for maintaining a bunch of the boxes in the same manner as SystemImager (i.e. Incrementally update servers from a golden master via rsync).
So, has anyone run found anything that does what systemimager does, but that is cross-platform? Do any SystemImager developers out there want to comment on the potential difficulty in supporting other-than-Linux operating systems in SystemImager?
SystemImager is one of the most useful tools I've ever seen, however, I believe that it would be an enterprise "killer app" if it could do MacOS X, *BSD, Windows etc.
-Peter
. Penguins Surely Ca
Anyone else read it as "Large Hardon Collider"? I blew coffee threw my nose. Damn disexlia...
YOU ARE THE FAILURE!! You fucking silly ass piece of bitchtit shit....no fp for you, you fucking illiterate sack of knappy bitchtits!!!!
Fuck your mother, you fucking little bitch!!!
Lick my ass and like it, fucker.
How applicable is this to FreeBSD? Now that linux is under this legal cloud of doom, I'm switching all my clusters over to FreeBSD.
Imagine a beow... oh wait...
From my experience, NFS is by far the worst choice in networked filesystems.
Since all the boxens are linux I strongly suggest, SAMBA of NCP.
Think of it this way: In the old days we had
- NFS to share files with other old Unices (like SCO, Slowlaris, HPUX, AIX).
- SMB to share files with windowz
- NCP to share files with Novell network fs.
IMHO, NCP is the best. SMB is pretty good too.
has anyone tried linuxbios http://www.linuxbios.org/ to replace standard bios. results in a diskless, faster boot. used in this cluster architecture: http://www.clustermatic.org/
i have a 1'000'000 node cluster! ...
...
it crawles around, eates flies and likes light
and sometimes it replicates in my
fruit-loops!
it can accurately predict ( >95% ) the weather
two days ahead!
two bad it doesn't have any interface
that is compatible with me
it's what we scientists call a "passive-cluster"!
There's always an anti-NFS troll out there just waiting to spout "the truth". Get over it. NFS works great, particularily when all you are running is Linux.
Hmm... Not quite there yet. The collection of command line tools could probably be rolled into something that automates system management the way SystemImager does. But even then, radmind rather unintelligently seems to recopy entire files.
Also, how is partitioning taken care of.
No, I'm still looking for something like SystemImager that handles multiple Operating Systems. Perhaps extending SystemImager to support others will be the easiest way.
As a side note, Frisbee, which was mentioned in a previous thread, is the killer app for LAN-based system imaging. Wow!
-Peter
. Penguins Surely Ca
Sorry, not a big SystemImager expert. I see that it just uses rsync, hence your comment about recopying entire files. I'd point out that for binary files, rsync tends to copy the entire file anyway, on a version change. radmind's nice in this case because it can tell that a file needs to be updated with no network traffic.
:w
how is partitioning taken care of
Depends on the system. For Mac OS X, we pretty much need to use Apple's tools. For Solaris, we use Jumpstart. Kickstart on Linux. Partitioning is very OS specific. radmind is very portable.
dsriugadniaw34r sareh98fase fasef