Maintaining Large Linux Clusters

Episode 11.1: Blindsided by Walmart+Security · 2003-06-14 09:02 · Score: -1

There was a hint of lightning and a slight rumble of thunder in the distance. I turned from the horizon to face Robert, who was standing to my right. We stood before the luminous Wal-Mart banner. A noisy cluster of insects swarmed the fluorescent light nearby, their innumerable buzzing and clicks penetrating the silence.

âoeThere is a storm approaching,â I whispered to Robert. âoeWe should prepare the store for an emergency situation.â

He was apparently capable of sensing the urgency of my recommendation. âoeYes, sir,â said Robert, âoeIâ(TM)ll get right on it.â

As he returned to the inner confines of our beloved store, the silence that I had grown accustomed to abruptly ceased. The forest surrendered to the wind. Venerable pine trees swayed in the distance as their gray leaves began to litter the parking lot. Several cracked and fell swiftly to the ground. The lightning crackled overhead, and the air was laden with foreboding.

I stepped inside. Although the prodigious wind had been somewhat deadened by the building, I continued to perceive it. Robert had enabled the auxiliary lighting system. He stood near a corner, watching small hints of rain as they impacted the skylight. An occasional flicker of lightning penetrated the store. Tree limbs struck the roof.

âoeItâ(TM)s getting bad out there, isnâ(TM)t it?â Robert was concerned.

âoeIâ(TM)m afraid so,â I said, âoebut I promise you that weâ(TM)re going to vanquish. We always do.â

The lights dimmed for a moment and I was taken aback by our reflection in the skylight.

âoePeter,â said my colleague, âoewe didnâ(TM)t bring the fertilizer display inside!â

Robert was correct. Although I was admittedly hesitant to walk outside during a hailstorm, we couldnâ(TM)t permit it to obliterate company property. âoeYou stay here,â I said to him. As I approached the front door, I almost felt as though I were being watched. When I looked to the left, especially.

The automatic door opened and I stepped outside. I waded cautiously through the lake of water that covered the parking lot toward the display. The fluorescent lights overhead dimmed, then everything became dark. My face grimaced involuntarily as large hailstones continued to strike me. I glanced in both directions, but, with the exception of the devastating storm, could perceive nothing.

I turned to face the display, but I am unable to describe to you what lay before me. And it was moving closer.

--

Comment without sacrificing karma.

Re:Episode 11.1: Blindsided by Anonymous Coward · 2003-06-14 09:20 · Score: -1, Offtopic

I give you a D+. I've seen better writing from middle schoolers.
Re:Episode 11.1: Blindsided by JrTcoNrd · 2003-06-14 09:40 · Score: 0

I write better stuff than that... and i am a middle schooler, going into high school. Work on i dunn, a plot?

--
Do you ever find yourself humming the MacGuyver theme song? Then you my friend, are a true nerd.
Re:Episode 11.1: Blindsided by serrasalmus · 2003-06-14 10:00 · Score: 1

Have they not taught you the meaning of the word "satire" in middle school yet? That is unfortunate, as I was taught what "satire" was at the tender age of five. Allow me to initiate your uninitiated minds: Main Entry: satÂire Pronunciation: 'sa-"tIr Function: noun Etymology: Middle French or Latin; Middle French, from Latin satura, satira, perhaps from (lanx) satura dish of mixed ingredients, from feminine of satur well-fed; akin to Latin satis enough -- more at SAD Date: 1501 1 : a literary work holding up human vices and follies to ridicule or scorn 2 : trenchant wit, irony, or sarcasm used to expose and discredit vice or folly
Re:Episode 11.1: Blindsided by JrTcoNrd · 2003-06-14 10:37 · Score: 0

well, whatever that was supposed to be... i got it, but it lacked effectivness. Sorry, but when you post a rebuttal, make sure that it's partially coherent, at least!

--
Do you ever find yourself humming the MacGuyver theme song? Then you my friend, are a true nerd.
Re:Episode 11.1: Blindsided by serrasalmus · 2003-06-14 10:45 · Score: 1

Apparantly you didn't, because you didn't grasp any of my points whatsoever. A dense mind will never pick up on any cues...... no matter what..... and you sir, appear to be quite dense.
Re:Episode 11.1: Blindsided by JrTcoNrd · 2003-06-14 11:54 · Score: 0

I pikced up on it, buts being a snob doesn't make you classy. Der!

--
Do you ever find yourself humming the MacGuyver theme song? Then you my friend, are a true nerd.
Re:Episode 11.1: Blindsided by Uber+Banker · 2003-11-09 08:52 · Score: 1

"I picked up on it, buts being a snob doesn't make you classy. Der!"

Hmmmmm

SUCK my larger than average white penis. by Anonymous Coward · 2003-06-14 09:03 · Score: -1, Troll

Fuck this website. FP, nigs.

I am a cluster of one by Java+Geeeek · 2003-06-14 09:06 · Score: 3, Funny

My book on maintaing a cluster of 0-1 nodes will be out next month.

Re:I am a cluster of one by JDWTopGuy · 2003-06-14 12:28 · Score: 1

I am going to sue you with a beowulf cluster of lawyers because you stole my book idea!

--
Ron Paul 2012

mirror here by Anonymous Coward · 2003-06-14 09:06 · Score: -1

made you look! already grabbed the pdf!

The public domain implementation of getopt()!!!!! by Anonymous Coward · 2003-06-14 09:07 · Score: -1, Offtopic

#define NULL 0
#define EOF (-1)
#define ERR(s, c) if(opterr){\
extern int strlen(), write();\
char errbuf[2];\
errbuf[0] = c; errbuf[1] = '\n';\
(void) write(2, argv[0], (unsigned)strlen(argv[0]));\
(void) write(2, s, (unsigned)strlen(s));\
(void) write(2, errbuf, 2);}

extern int strcmp();
extern char *strchr();

int opterr = 1;
int optind = 1;
int optopt;
char *optarg;

int
getopt(argc, argv, opts)
int argc;
char **argv, *opts;
{
static int sp = 1;
register int c;
register char *cp;

if(sp == 1)
if(optind >= argc ||
argv[optind][0] != '-' || argv[optind][1] == '\0')
return(EOF);
else if(strcmp(argv[optind], "--") == NULL) {
optind++;
return(EOF);
}
optopt = c = argv[optind][sp];
if(c == ':' || (cp=strchr(opts, c)) == NULL) {
ERR(": illegal option -- ", c);
if(argv[optind][++sp] == '\0') {
optind++;
sp = 1;
} return('?');
}
if(*++cp == ':') {
if(argv[optind][sp+1] != '\0')
optarg = &argv[optind++][sp+1];
else if(++optind >= argc) {
ERR(": option requires an argument -- ", c);
sp = 1;
return('?');
} else
optarg = argv[optind++];
sp = 1;
} else {
if(argv[optind][++sp] == '\0') {
sp = 1;
optind++;
} optarg = NULL;
} return(c);
}

7th post! by Anonymous Coward · 2003-06-14 09:07 · Score: -1, Offtopic

maybe better?

fp by Anonymous Coward · 2003-06-14 09:07 · Score: -1, Offtopic

fp fp first post

Obligatory Posts... by DarkBlackFox · 2003-06-14 09:07 · Score: -1, Redundant

"Can you imagine a beowulf cluster of these?" .... RTFA.

"But does it run Linux?" ..... ditto.

Re:Obligatory Posts... by nano2nd · 2003-06-14 09:17 · Score: 0, Offtopic

I'm not interested till it can do Ogg Vorbis!!

--
G4 Hackintosh
Re:Obligatory Posts... by Anonymous Coward · 2003-06-14 09:43 · Score: -1, Redundant

I hope someone makes this article available on Bit Torrent.

but... by freedommatters · 2003-06-14 09:09 · Score: 0

bottom line should be...

"try doing it with a windows cluster"

john
are you a Weapon of Male Destruction? then you need one of our sassy t-shirts

--
All I Want For Christmas Is My Constitutional Rights

Microsoft by Anonymous Coward · 2003-06-14 09:10 · Score: -1, Offtopic

It's time for the Microsoft Conspiracy theories to start. It's also time for shit you people think is funny, like constantly spelling "Microsoft" as "M$" hahaha that's so funny.

This has been a Microsoft Conspiracy Update.

"But why?" asked Little Johnny. by agent+dero · 2003-06-14 09:17 · Score: -1, Flamebait

Why on earth would someone need a 1000+ node cluster?
NASA was able to launch a rocket using a low-cost 96-node red hat linux cluster, what else needs this much computing power
I imagine with the power requirements, and the fans' exhaust from 1000+ computers clustered together you might just be able to build a hovering super computer.

and yes, it does run linux
-----------

--
Error 407 - No creative sig found

Re:"But why?" asked Little Johnny. by toft · 2003-06-14 09:19 · Score: 2, Insightful

"Why on earth would someone need a 1000+ node cluster?"

Look at google? :-)
Re:"But why?" asked Little Johnny. by heli0 · 2003-06-14 09:23 · Score: 4, Funny

Why on earth would someone need a 1000+ node cluster?

Maybe for a Large Hadron Collider-class computation.

--
Whenever the offence inspires less horror than the punishment, the rigour of penal law is obliged to give way...
Re:"But why?" asked Little Johnny. by Bob+Wehadababyitsabo · 2003-06-14 09:28 · Score: 5, Interesting

Where I work, there is a 500 node Linux cluster for cladistic tree generation, which takes a lot of brute force and specialized tools to make happen. It is arguably much more complex then launching a rocket.
Just because you don't need it, or can't envision needing it, doesn't mean nobody else needs that kind of power.
Bob

--
fsck -u
Re:"But why?" asked Little Johnny. by StealthSock · 2003-06-14 09:28 · Score: 1

Maby if you wanted to watch Finding Nemo rendered in realtime instead of frame by frame? I don't know.
Re:"But why?" asked Little Johnny. by hak+hak · 2003-06-14 09:30 · Score: 4, Informative

Because of the computations required to analyze the enormous amount of data a particle collider outputs. The scattered particles go through all sorts of detectors which measure their energy and direction and send them to the cluster, which has to search for particles significantly smaller than a needle in a haystack of measurements.

(Disclaimer: IANAPP (Particle Physicist))
Re:"But why?" asked Little Johnny. by Tumbleweed · 2003-06-14 09:33 · Score: 3, Funny

Hey, *somebody* has to back up the Internet from time to time!

Either that, or all the pr0n encoding. :)

Best...Tivo...*ever*!

Host this thing at an Internap location, and you're the Ultimate LPB.

Searching for "First Posters" for the Homeland Security people to "visit." :)

SETI client!

"Every room has every movie ever made in any language." Who do you think hosts *that*?

ILM, seeing the second LOTR movie, decides an 'upgrade' is in order for the SW:EP3 render farm.

It takes this much computing power to find WMD in Iraq. ...or to find Cheney.

MS compiling Longhorn builds.

Calculating the question for the answer 42.

BitTorrent!
Re:"But why?" asked Little Johnny. by Anonymous Coward · 2003-06-14 09:37 · Score: 0

Shouldn't that be a Large Hard-on Collider-class computation?
Re:"But why?" asked Little Johnny. by digidave · 2003-06-14 09:45 · Score: 1, Funny

They must be trying to play Doom 3.

--
The global economy is a great thing until you feel it locally.
Re:"But why?" asked Little Johnny. by Anonymous Coward · 2003-06-14 09:46 · Score: 0

[[ Just because you don't need it, or can't envision needing it, doesn't mean nobody else needs that kind of power. ]]

This is exactly why he asked the question, genius. He wanted to know who would need this type of computational power and if it was more cost/performance effective than just buying a supercomputer.

These typical Slashdot dumbshits need to get off their "I'm smarter than you" pedestals and realize that they're no better than anyone else. It would also be nice bonus if they learned how to read.
Re:"But why?" asked Little Johnny. by Anonymous Coward · 2003-06-14 09:49 · Score: 0

The rule of Marc: Whenever commenting on someone else's stupidity, you will always indirectly comment on your own.

Damn you, grammer!
Re:"But why?" asked Little Johnny. by CowBovNeal · 2003-06-14 09:56 · Score: 2, Funny

So that they can survive a slashdotting? ;)

--
Bush is on fire and its not good for my lungs.
Re:"But why?" asked Little Johnny. by Alinabi · 2003-06-14 10:04 · Score: 1

Pretty much everything that has to do with solving evolution equations for complex systems. Even wether forcasts require way more computing power than NASA's 96 node cluster can provide. Rocket science is not at all "rocket science".

--
"You can't allow somebody to commit the crime before you detain them." [Condoleezza Rice]
Re:"But why?" asked Little Johnny. by bpfinn · 2003-06-14 10:12 · Score: 1

Why on earth would someone need a 1000+ node cluster?

The Atlas Project at CERN, when it comes online, is supposed to produce a petabyte of data every year. I doubt one 1000 node cluster would be enough to process that data quickly.
Re:"But why?" asked Little Johnny. by vondo · 2003-06-14 10:15 · Score: 5, Interesting

Disclaimer: IAAPP (I am a particle physicist).
First, as another poster pointed out, these detectors produce a LOT of data. I'm on an experiment slated to take data at about the same time as the LHC experiments, with similar rate requirements.
We plan to use a 2500 node cluster (of year 2007 CPUs) to filter our data in real time. The input rate into this cluster will be about 10 GB/s, output rate about 200 MB/s.
But, each interaction is analyzed (usually) by just one computer. There are so many interactions, though, that you need massive clusters, but not much communication between nodes of the cluster.
That's just for the data filter. You need even larger amounts of computing to analyze what comes out in that 200 MB/s and to simulate what happens in the experiment. Much larger amounts.
Our experiment will ultimately require clusters this size at the laboratory and at something like a dozen other institutions.
Re:"But why?" asked Little Johnny. by Anonymous Coward · 2003-06-14 10:28 · Score: 0

You might be able to analyze your data on a 0-node cluster if the Tevatron doesn't start working better soon...

-a mildly disgruntled CDF postdoc
Re:"But why?" asked Little Johnny. by huraxprax · 2003-06-14 10:30 · Score: 1

Good explanation. The main cause why LHC needs so much processing power is that the higher the energy, the more scattered particles ("Jets") you have, and they all arrive instantly at the detectors, and LHC will have higher energies than its predecessors. The "size" of particles is meaningless, but the interesting events where a new particle can be detected are very rare. I can't remember the numbers anymore and would have too look it up. They are also working on custum hardware which will do some calculations before sending the data to the cluster.
Re:"But why?" asked Little Johnny. by Anonymous Coward · 2003-06-14 11:45 · Score: 0

NASA have the machine at number 18 in the top500 list, its got 384 nodes (1392 cpus, 4 cpus per node).
Re:"But why?" asked Little Johnny. by C32 · 2003-06-14 11:48 · Score: 1

Funny coincidence, I was just reading an article about the planned CERN Large Hadron Collider which will be ready in 2007; it'll put out 1250 Mbyte/sec.
This is stored to tape though (~50 30 Mbyte/sec Storagetek 9.940B drives in parallel), not realtime.
Re:"But why?" asked Little Johnny. by ch-chuck · 2003-06-14 13:26 · Score: 1

I need a cluster to do the rendering of my blender animation experiments. My 100 frame movie at 640x480 takes several minutes to finish on a single 1.3Ghz box, especially when you like enviroment maps at high res (for mirrored surfaces).

Next Q: Why whould anyone want to make 3d animations in Blender? A. Because I want to!

--
try { do() || do_not(); } catch (JediException err) { yoda(err); }
Re:"But why?" asked Little Johnny. by Anonymous Coward · 2003-06-14 13:55 · Score: 1, Interesting

I am a high energy physicist.
You will need this much computing power if you are trying to filter and analyze one the order of a petabyte of data yearly. Some collisions at the LHC will produce 1000s of particles, a large fraction of which will be detected in multiple detectors as they fly away from the collsion point (nucleus on nucleus collisions). Thousands of these collisions will happen every second. The information in the various detectors then must be collected back so that all the signals a particular particle made can be associated with each other. Then many graduate students must write code to search through all these particles for exciting physics. A lot of computing power is essential for exploiting the potential of the collider and detector.
Re:"But why?" asked Little Johnny. by pdp11e · 2003-06-14 15:10 · Score: 3, Insightful

One application that benefits from adding the nodes (with almost linear scaling in performance) is the Monte Carlo radiation transport. For example, in medical physics people try to calculate a dose distribution in a human body for the various configuration of treatment accelerators. Monte Carlo simulation software "generates" random initial particles (with appropriate probabilities for given accelerator) and than tracks each particle as it propagates and interacts with surrounding tissue. Interactions are randomly generated (hence: Monte Carlo) but again randomness is biased according to the appropriate physics. Each such "history" can be independently generated by a different node thus making parallelization trivial.
In my lab I have assembled a 24-node cluster and it takes about 4-8 hr to calculate dose distributions for the most cases. With a 1000 node cluster it would be possible to do this sort of calculations routinely in clinics during the treatment planing and actual treatment. This will mean that the cancer patients will have improved survivability odds due to the more precise targeting of the tumors.

Cheers,
Beowulf's root

I hope by floydman · 2003-06-14 09:22 · Score: -1, Redundant

this site is not running on a cluster with their configuiration, its been slashdotted already....

--
The lunatic is in my head

Lucky bastards by Professor+D · 2003-06-14 09:25 · Score: 5, Interesting

#include "back-in-my-day-rant"

Damn. Back when I was on a high-energy experiment located in the middle-of-nowhere in Japan (subject of at least two slashdot articles), our japanese colleagues used to lease gaggles of Sun workstations at a yearly maintanence cost that exceeded the retail value of the machines themselves!!

A few of us linux-fans used to grumble that we'd be better off buying dozens of cheap linux-boxes, but we weren't making the buying decisions. It seemed to us that the higher-ups didn't think cheap boxes with a free OS could compete on a performance basis with the Suns.

As for me? I just installed CERNlib on my laptop and just laughed as it blew the suns away on a price/performance(+portability) basis

Autoassimilating Diskless Linux Clusters by Anonymous Coward · 2003-06-14 09:29 · Score: 3, Interesting

So yeah, I basically designed my own system for a professor in the Political Science Dept at my universidad Washington University in St. Louis that completely boots over the network and is completely diskless for every node. About a year before Knoppix ever started doing that. Did it with openMosix and its fully LAM/MPI functional. Bruce of the openMosix list was on me for quite a while to get the docs done, but some really not cool domesitc issues came up and I never got them done. If anyone is really interested, send an email to drtdiggers_DONT_SPAM_ME_BASTARDS_@_SUCKYMICROSOFT_ hotmail.com and let me know, I'll finish them up.

Re:Autoassimilating Diskless Linux Clusters by Anonymous Coward · 2003-06-14 10:49 · Score: -1, Troll

Ooo, ooo, can I kiss your ass first. You're so damn cool I can't stand it. You're even cooler than Bruce.
Re:Autoassimilating Diskless Linux Clusters by Anonymous Coward · 2003-06-14 11:03 · Score: 0

Wow, I'm shocked by the sheer maturity of your comment.

Man, what an ass.

I just saw it as an opprunity to give my little project some airtime. Its fully ontopic and fully relevant. Perhaps some more constructive critisim?
Re:Autoassimilating Diskless Linux Clusters by mprinkey · 2003-06-14 11:36 · Score: 1

I built a 60-node cluster about two years ago that originally had boot drives in each node. Since the gubment bought the boxes from the lowest bidder, we got very poor quality hardware to work with and about 75% of the hard drives died within a few months. To get things working again, I had to do the diskless boot thing. It was not at all fun. I can't see any advantage to that approach at all and I certainly wouldn't do it again by choice.
Re:Autoassimilating Diskless Linux Clusters by Anonymous Coward · 2003-06-14 11:49 · Score: 0

Really? What problems did you run into? The largest I an into were getting a mature kernel on the boxes, had to use PXE to do it. I as thinking of porting it to SYSLINUX. I just make ram drives for /dev, /var, /tmp, and /etc so it has a bit of a footprint, but in dual proc boxes with .5 G --> 1 G and higher RAM, the footprint was irrelevant.
Re:Autoassimilating Diskless Linux Clusters by Anonymous Coward · 2003-06-14 11:51 · Score: 0

OH! And the point that you may have missed! Its AUTOMATIC, its plug and play clustering, just have to tell the BIOS to do Netboot. So there is no trouble on the node end...
Re:Autoassimilating Diskless Linux Clusters by mprinkey · 2003-06-14 12:41 · Score: 1

These boxes had one of the extra-buggy version of PXE that just would not work at all. Ended up using a etherboot floppy in each system to get it up. I built shared, read-only /usr, /sbin, /bin, etc and /var, /etc, and /tmp came off of independent nfs shares. Maybe using ram drives for these would have helped.

The tftpboot thing was a bit of a mess to set up. Then Redhat simply did not want to let the normal boot process work, so I had to rewrite rc.sysinit basically from scratch.

Plus, I had to hack in a sleep command into the Etherboot floppy to keep all of the systems from booting simulateously. Doing so thrashed the NFS server and deadlocked the whole startup.

All together, I probably spent two full weeks getting that working correctly. Compare that to a netcat-based disk-cloning installation on an 80-node cluster that took me about 3 hours.

There are no doubt canned boot installs now, but when I did this two and half years ago, there were none that would work with software we use on the cluster. I got to see the "inside" of how all of this works and with 500+ nodes set up over 5 years, I can say with some certainty that I never want to do it again.
Re:Autoassimilating Diskless Linux Clusters by mprinkey · 2003-06-14 12:50 · Score: 1

It is automatic? Try it sometime with a four or five dozen systems and looming deadline. Try flashing a new BIOS onto 60 motherboards because there is a bug in the PXE code that shipped. The list of potential problems is pretty long.

Boot hard drives are cheap (~$50) and uses the boot mechanism that is most stable and well validated. People building ~1000 CPU clusters may well justify the cost savings and the additional setup work to get all of the kinks out of setup. People building "normal" clusters (10 CPU_COUNT 100) probably can't.
Re:Autoassimilating Diskless Linux Clusters by Anonymous Coward · 2003-06-14 13:41 · Score: 0

Good point. I didn't really have the resources to try it out on a bunch of different systems. I guess a better way would be to describe it as a Homogenous Autoassimilating Cluster setup :) Since the nodes were all the exact same hardware cfg, it was relatively easy. I had to rewrite rc.sysinit too, though I enjoyed learning how a *nix system actually boots up, I never knew (needed to know) before. But I can see the difference when you have a deadline looming over you, my timetable was pretty low stress, though interestingly enough, it took me about 2 weeks too. I'm curious, what were the specs on these machines you got in bulk?
Re:Autoassimilating Diskless Linux Clusters by mprinkey · 2003-06-14 14:49 · Score: 1

Dual P3 motherboards with onboard eepro100. Serverworks chipset. I think they were Supermicro 370DLE. Those were the only dual chipsets with decent memory bandwidth for 933 MHz chips. The BIOS that came on them was exceptionally buggy. I wasted almost an entire week trying to get PXE to work...it makes me angry just thinking about it. But that is good...it just reminds me of important lessons learned: No More Diskless Clusters and Tyan instead of Supermicro.
Re:Autoassimilating Diskless Linux Clusters by Daengbo · 2003-06-15 00:08 · Score: 2, Interesting

I don't claim to know more about your situation than you do, but several distros, including k12ltsp.org , support Open Mosix straight from the install, and work with either PXE (which you couldn't have used) or Etherboot. I'm not trying to change your mind. I'm just pointing out that there are a lot of folks who prefer and even swear by diskless clusters.

--
Put identity in the browser.
Re:Autoassimilating Diskless Linux Clusters by Anonymous Coward · 2003-06-15 04:02 · Score: 0

drtdiggers_DONT_SPAM_ME_BASTARDS_@_SUCKYMICROSOFT_ hotmail.com

I tried that address but it didn't work. :-(

the text by Anonymous Coward · 2003-06-14 09:31 · Score: -1, Redundant

1. INTRODUCTION
The LHC era is getting closer, and with it the challenge
of installing, running and maintaining thousands of
computers in the CERN Computer Centre.
In preparation, we have streamlined our facilities by
decommissioning most of the RISC hardware, and by
merging the dedicated and slightly different experiment
Linux clusters into two general purpose ones (one
interactive, one batch), as reported at the last CHEP[2].
Quite some progress has been made since then in the
automation and management of clusters. The EU DataGrid
Project (EDG), and in particular the WP4 subtask[3], has
entered its third and final year and we can already benefit
from the software for farm management being delivered
by them. See [4] for further details. In addition, the LHC
Computing Grid project (LCG)[5] has been launched at
CERN to build a practical Grid to address the computing
needs of the LHC experiments, and to build up the
combined LHC Tier 0/Tier 1 center at CERN.
In preparing for the LHC, we are already managing
more than 1000 Linux nodes of diverse hardware types,
the differences arising due to the iterative acquisition
cycles. In dealing with this high number of nodes, and
especially when upgrading from one release version of
Linux to another, we have reached the limits of our old
tools for installation and maintenance. Development of
these tools started more than ten years ago with an initial
focus on unifying the environment presented to both users
and administrators across small scale RISC workstation
clusters from different vendors, each of which used a
different flavour of Unix[6]. These tools have now been
replaced by new tools, taken either from Linux itself, like
the installation tool Kickstart from RedHat Linux or the
RPM package format, or rewritten using the perspective of
the EDG and LCG, to address large scale farms using just
one operating system: Linux.
This paper will describe in more detail how to fuck CBNâ(TM)s
sweet, sweet, succulent homo-ass. Mmmmm, good,
their contribution to the progress in improving the
installation and manageability of our clusters. In addition,
we will describe improvements in the batch sharing and
scheduling we have made through configuration of our
batch scheduler, LSF from Platform Computing[7].
2. CURRENT STATE
In May last year, the Linux support Team at CERN
certified RedHat Linux 7. This certification involved the
porting of experiment, commercial and administration
software to the new version and verifying their correct
operation. After the certification, we set up test clusters for
interactive and batch computing with this new OS. This
certification process took quite some considerable time,
both for the users and the experiments to prepare for
migration, which had to fit into their data challenges, and
for us to provide a fully tailored RedHat 7.3 environment
as the default in January this year. We took advantage of
this extended migration period to completely rewrite our
installation tools. As mentioned earlier, we have taken this
opportunity to migrate, wherever possible, to the use of
standard Linux tools, like the kickstart installation
mechanism from RedHat and the package manager RPM,
together with its package format, and to the tools that
were, and still are, being developed by the EDG project, in
particular by the WP4 subtask.
The EDG/WP4 tools for managing computing fabrics
can be divided into four parts: Installation, Configuration,
Monitoring, and Fault Tolerance. In trying to take over
these ideas and tools, we first had to review our whole
infrastructure with this in mind.
2.1. Installation
The installation procedure is divided into two main
parts. The basic installation is done with the kickstart
mechanism from RedHat. This mechanism allows
specification of the main parameters like the partition table
CHEP03, La Jolla California, March 24

pdf -- text by CowBovNeal · 2003-06-14 09:36 · Score: 1

Installing, Running and Maintaining Large Linux Clusters at CERN

Vladimir Bahyl, Benjamin Chardi, Jan van Eldik, Ulrich Fuchs, Thorsten Kleinwort, Martin Murth, Tim
Smith CERN, European Laboratory for Particle Physics, Geneva, Switzerland

Having built up Linux clusters to more than 1000 nodes over the past five years, we already have practical experience confronting some of the LHC scale computing challenges: scalability, automation, hardware diversity, security, and rolling OS
upgrades. This paper describes the tools and processes we have implemented, working in close collaboration with the EDG project [1], especially with the WP4 subtask, to improve the manageability of our clusters, in particular in the areas of system
installation, configuration, and monitoring.
In addition to the purely technical issues, providing shared interactive and batch services which can adapt to meet the diverse and changing requirements of our users is a significant challenge. We describe the developments and tuning that we have
introduced on our LSF based systems to maximise both responsiveness to users and overall system utilisation.
Finally, this paper will describe the problems we are facing in enlarging our heterogeneous Linux clusters, the progress we have made in dealing with the current issues and the steps we are taking to 'gridify' the clusters
1. INTRODUCTION
The LHC era is getting closer, and with it the challenge of installing, running and maintaining thousands of
computers in the CERN Computer Centre. In preparation, we have streamlined our facilities by
decommissioning most of the RISC hardware, and by merging the dedicated and slightly different experiment
Linux clusters into two general purpose ones (one interactive, one batch), as reported at the last CHEP[ 2].
Quite some progress has been made since then in the automation and management of clusters. The EU DataGrid
Project (EDG), and in particular the WP4 subtask[ 3], has entered its third and final year and we can already benefit
from the software for farm management being delivered by them. See [4] for further details. In addition, the LHC
Computing Grid project (LCG)[ 5] has been launched at CERN to build a practical Grid to address the computing
needs of the LHC experiments, and to build up the combined LHC Tier 0/ Tier 1 center at CERN.
In preparing for the LHC, we are already managing more than 1000 Linux nodes of diverse hardware types,
the differences arising due to the iterative acquisition cycles. In dealing with this high number of nodes, and
especially when upgrading from one release version of Linux to another, we have reached the limits of our old
tools for installation and maintenance. Development of these tools started more than ten years ago with an initial
focus on unifying the environment presented to both users and administrators across small scale RISC workstation
clusters from different vendors, each of which used a different flavour of Unix[ 6]. These tools have now been
replaced by new tools, taken either from Linux itself, like the installation tool Kickstart from RedHat Linux or the
RPM package format, or rewritten using the perspective of the EDG and LCG, to address large scale farms using just
one operating system: Linux.
This paper will describe these tools in more detail and their contribution to the progress in improving the
installation and manageability of our clusters. In addition, we will describe improvements in the batch sharing and
scheduling we have made through configuration of our batch scheduler, LSF from Platform Computing[ 7].
2. CURRENT STATE
In May last year, the Linux support Team at CERN certified RedHat Linux 7. This certification involved the
porting of experiment, commercial and administration software to the new version and verifying their correct
operation. After the certification, we set up test clusters for interactive and batch computing with this new OS. This
certification process took quite some consid

--
Bush is on fire and its not good for my lungs.

Re:pdf -- text by Anonymous Coward · 2003-06-14 09:43 · Score: 0

OMG, if the picture at that site is really cowboy neal than we should all chip in and get him a gym membership.

Linux Fecal Clusters by Anonymous Coward · 2003-06-14 09:43 · Score: -1, Flamebait

What about Linux fecal clusters? Never seen those...

YHBT. by Walmart+Security · 2003-06-14 09:46 · Score: -1

NT

--

Comment without sacrificing karma.

Too little, too late. by fuchsiawonder · 2003-06-14 09:46 · Score: 2, Funny

Just a little too late for the SETI@home project. Kind of a shame, really. If only we had those computers sooner...

Re:Too little, too late. by AndroidCat · 2003-06-14 10:11 · Score: 1

There are other projects that could use a lot of spare CPU time. If the humanitarian ones don't excite, how about pissing off Bill Gates? (Click on my sig.)

--
One line blog. I hear that they're called Twitters now.
Re:Too little, too late. by samhalliday · 2003-06-14 10:26 · Score: 1

i assume by "those computers" you mean the 1000 at CERN (not that they'd actually waste their time processing SETI data with the shit load of aliroot stuff going on these days...) well, CERN have been running GNU/Linux clusters for a looong time now, so this is no new thing. In fact, my friend actually had one of the older dual intel 500Mhz machines as his desktop machine, ripped out from the last generation cluster. they basically led him into the buzzing cluster room and said "grab one and follow me"... :-D
Re:Too little, too late. by Anonymous Coward · 2003-06-14 10:32 · Score: 0

Why is it a shame? I heard that SETI had too little data for too many machines anyway, so there were sending out duplicate blocks. How would more machines have helped anything?
Re:Too little, too late. by aled · 2003-06-14 13:08 · Score: 1

if we had lots of these cheap processors before... when they were expensive...

--

"I think this line is mostly filler"

Wow! by digidave · 2003-06-14 09:48 · Score: -1, Redundant

Imagine a beowulf cluster of those!

--
The global economy is a great thing until you feel it locally.

ClusterKnoppix - OpenMosix by Anonymous Coward · 2003-06-14 09:49 · Score: 5, Interesting

I've been looking at ClusterKnoppix mentioned recently on slashdot. It has built in openmosix and also supports thin clients via a terminal service. Just pop it in, and instant cluster. In case you missed the article:

ClusterKnoppix

Re:ClusterKnoppix - OpenMosix by paradxum · 2003-06-14 12:40 · Score: 1

I've been working on a redistro of ClusterKnoppix designed for video encoding... and it's comming along.... Just a few more deps to rebuild. It Makes DIVX encoding bearable.

I was building my own setup simular to knoppix until I discovered ClusterKnoppix.... I love when someone else does my work for me :)

Single system image by Tester · 2003-06-14 09:56 · Score: 5, Informative

Where I work, we are developping a clustering system using single system images.. Where all the OS is stored on a server and is NFS mounted by each node. Our current tests show that we can easily run 100 nodes on 100mbit ethernet from a single server... And the coolest thing is that the nodes mount the / of the server, so for "small clusters" (under 100 nodes), we have to do a software upgrade only once and all nodes and the server are upgraded... Btw, this whole thing can be done using an almost unmodified Gentoo Linux distribution.

I'm hoping to convince my boss to let us publish detailed docs.. he thinks that if we do everyone will be able to use it and he will loose sales (we are in the hardware business..). Details at our homepage and about an older version (but with more details) at the place where we used to work.

Re:Single system image by Anonymous Coward · 2003-06-14 10:40 · Score: 0

We run over 200 diskless linux boxes from a single server(dual 2.8 Xeons) with a 100 MBit NFS/admin network(The computation network is fully bisectional Myrinet)
Re:Single system image by gregsv · 2003-06-14 11:05 · Score: 1

Another way (which happens to be the way we do it where I work) is to make a master OS image, store it on a central server, and rsync it down to / on every node. Updates are made to the master OS image and then get automatically propagated down to every node. When new or replacement nodes are deployed, we use RedHat's KickStart system to install a base OS on them, then rsync down the master image. We maintain over 700 nodes this way.
Re:Single system image by Anonymous Coward · 2003-06-14 11:35 · Score: 0

If your interested in this sort of thing take a look at systemimager, it can multicast out images to hundreds (maybe thousands???) of nodes, this means the installation has a complexity of approximatley O(1) (sometimes data gets corrupted and individual nodes require the image to be retransmitted). If the software setup on a node gets screwed up it only takes a few minutes to reinstall the node. Of couse producing the images isn't quite so easy.
Re:Single system image by Anonymous Coward · 2003-06-14 12:07 · Score: 0

Yeah, this sounds very close what I set up! (See post o Autoassimilating Linux Cluster) Cool! I didn't have fully distros on the nodes though, no where near. Since they were just openMosixing at first, it was not necessary. LAM/MPI usage might require it, but then I was just thinking of /usr over NFS. Well, I can release my info and will be, I hope you can too, it'll be neat to see what you did.

theodiggers

why such a huge cluster? by Anonymous Coward · 2003-06-14 09:57 · Score: 5, Interesting

well, i recently interviewed at nvidia, and they have a 3,000+ cluster just for emulating the new graphics/io chips they're working on... they don't manufacture anything, the turn around time to manufacture a prototype for testing would take too long... so all they do is simulate the actual chips and then send the data off for fabrication once they're done. on a cluster of 3,000 machines, some jobs take all weekend, from what i understand.

imagine if they just used one machine.

Re:why such a huge cluster? by Ziviyr · 2003-06-14 14:04 · Score: 2, Funny

Why imagine? I got a calculator...

16 years, 156 days, 3 hours

Athlons would be putting out better graphics on their own that far into the future. :-)

--

Someone set us up the bomb, so shine we are!
Re:why such a huge cluster? by FLoWCTRL · 2003-06-14 19:02 · Score: 1

Oooh.. all weekend. I recently attended a talk in applied math where a researcher presented results of a simulation that ran on 47 CPUs for two years to reveal many hitherto unknown facts about hydrogen bonding...

--
http://oss.netmojo.ca

Related project: Loading disk images for clusters by angio · 2003-06-14 09:57 · Score: 4, Informative

This reminds me of a paoper that was just presented at USENIX:
Fast, Scalable Disk Imaging with Frisbee. Fun talk.

Pretty cool tricks - they use multicast and filesystem specific compression techniques to parallel load the disks on a subset of the disks in the cluster. Very very very fast. (I use the disk imaging part of their software to load images on my test machines at MIT, and I'm quite impressed).

Anyway, just a bit of related cool stuff.

Red Hat 7.3 by Spoticus · 2003-06-14 10:05 · Score: 2, Informative

RH 7.3 reaches it's end of life in December of this year. One can only assume (and hope) that they have the in-house people to support it, or it's going to cost them beacoup $$ for continued RHN support.

Re:Red Hat 7.3 by vondo · 2003-06-14 10:22 · Score: 2, Informative

I'm sure they are firewalled/NATed off, so why would they need (or even want) to upgrade that often?
Re:Red Hat 7.3 by samhalliday · 2003-06-14 10:58 · Score: 1

youve gotta be kiddin me... most of the GNU/Linux operating system is written in-house at CERN. the only reason they use redhat is so they can tell other institutions which distro to install in order to be binary compatible and sure of sources compiling successfully. im actually surprised they haven't made their own distro... i remember hearing the arguments against it once, but the memory has faded.
Re:Red Hat 7.3 by Dave114 · 2003-06-14 12:57 · Score: 1

Actually they do have their own distro... CERN Linux. It's essentially Redhat with a few modifications.
Re:Red Hat 7.3 by samhalliday · 2003-06-15 01:50 · Score: 1

interresting, i know a few people at CERN and none of them use this
maybe its so close to redhat that they burn the CD's with "redhat" written on it.
anyway, thanks for the link, if i had mod points i give you +1 informative

Oh no! Trolls everywhere! by Anonymous Coward · 2003-06-14 10:08 · Score: -1, Troll

Oh Fuck! The stupid Microsoft white-trash trolls have awoken to post their filth on this site.
But then again... Do we give a fuck?
They use Microsoft after all, that's punishment enough!

Re:Oh no! Trolls everywhere! by Anonymous Coward · 2003-06-14 10:11 · Score: -1, Troll

Hi, I'm Bill Gates and I've come for your soul.
Re:Oh no! Trolls everywhere! by Anonymous Coward · 2003-06-14 14:13 · Score: -1, Flamebait

Hi, I'm Linus Torvalds and I've come for your cock.

IN SOVIET RUSSIA....... by Anonymous Coward · 2003-06-14 10:09 · Score: 0

Large linux clusters maintain YOU!

Dsylexia? by Anonymous Coward · 2003-06-14 10:10 · Score: -1, Troll

I nearly passed this article over as I scanned the words: paper, linux, clusters.. large hardon collider. w00t!

Wait, no hadron. Damn.

Did anyone else read that as... by XaXXon · 2003-06-14 10:13 · Score: -1, Troll

Large Hardon Collider?

Just curious...

Only on Slashdot... by pi_rules · 2003-06-14 10:19 · Score: 1

(Disclaimer: IANAPP (Particle Physicist))

Gotta love Slashdot... the only place where such a disclaimer isn't taken for granted.

Large _Hardon_ Collider? by Anonymous Coward · 2003-06-14 10:20 · Score: 0

kind of misread the title... oops :-o

What the hell are they studying???

Re:Large _Hardon_ Collider? by Second_Derivative · 2003-06-14 10:45 · Score: 1

Oh my god...

OK I just laughed so hard at that the people around me gave me weird looks. Rare that you see something funny on slashdot these days as opposed to "rofl tacos spelling sux"
Re:Large _Hardon_ Collider? by Anonymous Coward · 2003-06-14 21:37 · Score: 0

go fuck a donkey

question from a psuedo-geek... by joebeone · 2003-06-14 10:24 · Score: 1

So, to all those who are in the know out there... when they have what they want how many nodes and individual machines could they maintain? What are the constraints? What about data back-ups? Is ephemeral data recorded on a few machines in separate nodes to make sure that one getting nocked out doesn't zap something for good?

Re:question from a psuedo-geek... by vondo · 2003-06-14 10:48 · Score: 1

Well, in particle physics, the typical use is that data isn't stored on these systems longer than it takes to analyse it (and since data is constantly being accumulated, you don't worry about small losses).
But, there are people looking into parallel, redundant filesystems and the like so that you can keep more on disk. For instance 1000x60GB=60TB is a sizable amount of free space on these clusters, but the output datarate from these experiments is a petabyte/year or so.
Re:question from a psuedo-geek... by Anonymous Coward · 2003-06-14 11:38 · Score: 0

In many clusters you don't use local disks to store data but remote NFS (in smaller systems) or data on a cluster file system like lustre or global FS. To do a backup you only have to backup the data stored on your storage nodes.

"securely installing over the network" by ameoba · 2003-06-14 10:25 · Score: 4, Interesting

Who in their right mind would have a cluster this size, for this sort of work, on any network where "securely installing over the network" is an issue? I mean, I'd want this as far off of a public network as possible, unless I really want to explain to whoever authorized my grant why my experimental data indicates that:

e = mc^31337

--
my sig's at the bottom of the page.

Re:"securely installing over the network" by samhalliday · 2003-06-14 10:36 · Score: 3, Interesting

if you read the paper (which OK is not as bad as not reading the article), you would realise that this is not a project which is being performed only at CERN; when LHC (and others, eg ALICE) become active in a few years, the data is going to be piped to literally hundreds of participating instututions (this is the current list for one of the smaller experiments) for data analysis. so, no, this is not enough processing power, and yes they need it to be publically available. i also know people who are (or were?) working on the security implementations. believe me, at CERN, they think it through; its run by lots of really smart people who know what they are at, not politicians. the distributed processing that comes out of these projects will hopefully pave the way forward for the next generation of the internet (the grid).
Re:"securely installing over the network" by Anonymous Coward · 2003-06-14 10:36 · Score: 0

You need to think in terms security in depth.

System installation over the network is a very useful mechanism for supporting survivable systems practices. System installation, of whatever kind, is also an attractive target for exploits, if steps are not taken to secure the installation process. Therefore, competent system architects think about secure installation as a matter of course.

None of this implies that the systems in question would be exposed on a public network. The installation mechanisms may be secure, but there's no value in asking for trouble!
Re:"securely installing over the network" by FLoWCTRL · 2003-06-14 18:57 · Score: 2, Insightful

Perhaps the nodes are not all physically located in the same building, or are otherwise vulnerable to physical man-in-the-middle intrusions. If one adopts secure practices as a matter of principle, it saves having to go back and implement security as an afterthought someday when the situation changes in an unanticipated way.

--
http://oss.netmojo.ca

Why? by CausticWindow · 2003-06-14 10:33 · Score: -1, Troll

Why do we bother with this story?

It's not like we haven't got similar, or even larger clusters, right here in the US. Why are Slashdot such french lovers?

Give us a story about the NSA cluster or something like that. Not some lame french wannabe research institution using an american os (yeah, redhat is american).

--
How small a thought it takes to fill a whole life

Re:Why? by Anonymous Coward · 2003-06-14 10:42 · Score: 0

Is it that hard for you to admit that the US is falling behind the curve?
Re:Why? by samhalliday · 2003-06-14 10:54 · Score: 0, Flamebait

CERN is only half in France, the other half being in switzerland (not even in the Europe Union). but, being American it must be hard for you to understand geography beyond your own backyard; my deepest regrets :-/
Re:Why? by hasse · 2003-06-14 11:02 · Score: 1

Well, Frenchie La Frencherson, last time I checked (right now as a matter of fact), Switzerland was located smack in the damn middle of Europe and the EU. How dumb do you think us americans are?
Re:Why? by samhalliday · 2003-06-14 11:11 · Score: 1

Europe is a way of talking about a geographical region... the EU is a political collection of countries, which (ever so neutral) switzerland is not a member of. so, Switerland is in Europe, but NOT in the EU. i never said otherwise :-P
you actually checked? hehe...
Re:Why? by Anonymous Coward · 2003-06-14 11:12 · Score: 0

Pretty dumb, since you cannot grasp the fact that geographical location has nothing to do with membership in a political organisation. To make it less abstract, and hopefully easier to understand for you, think of how West Berlin was not part of the soviet block, in spite of the fact that it was located smack in the middle of Est Germany. But maybe i'm asking to much of you.
Re:Why? by Anonymous Coward · 2003-06-14 11:16 · Score: 0

How dumb do you think us americans are?

Two more answers like that and you'll make me believe that the US is inhabited by amoebas that can type on a keyboard.
Re:Why? by Hank+Chinaski · 2003-06-14 11:23 · Score: 1

obvious troll. and not funny. cern is in switzerland.

--
IAAL
Re:Why? by Aardpig · 2003-06-14 11:41 · Score: -1, Troll

Yeah, screw them cheese-eating surrender monkeys. What good have they ever done us? Like, who apart from total lusers would uses their buttfuck-useless HTML or HTTP?

--
Tubal-Cain smokes the white owl.
Re:Why? by Anonymous Coward · 2003-06-14 12:44 · Score: 0

It's understandable why a person would have to check. It's akin to not being able to provide exact coordinates for a specific planet/asteroid orbiting a planet a million light years away: nobody cares, and by nobody, I mean those of us who matter, namely Americans.
Re:Why? by Anonymous Coward · 2003-06-14 14:35 · Score: 0

Actually no. Part of CERN is in France, part is in Switzerland.

Episode 10: Situation Report [Walmart Security] by Anonymous Coward · 2003-06-14 10:48 · Score: -1, Offtopic

The aroma of premium Samâ(TM)s Choice coffee infiltrated my nose as I sipped a cup delicately. Robert would be arriving momentarily, and I would request a situation report. He and I are, of course, the guardians of a Walmart Supercenter located inside of the prosperous community of Jasper, Texas.

Weâ(TM)ve encountered Paul Cryer, who initiated litigation after his SUV impacted my elite patrol vehicle, and miscellaneous other members of the nefarious Three Pointed Conspiracy. Our integrity, however, is the origin of our strength. We will never abdicate our store. I digress.

"Peter!" exclaimed Robert, emerging from the door leading to the lawn and garden center.

"Robert," I smiled, "youâ(TM)re fifty seconds late. I will not accept such unpunctual behavior. What if a situation occurred in your absence?"

My protÃ©gÃ© said nothing, instead walking to the vending counter and drawing a cup of coffee. I wouldnâ(TM)t punish him this evening. "How was your day?" I asked.

"It was fine," said Robert, "but you need to listen to this. There was a situation today at the restaurant!" You see, Robert is also employed by the prestigious Catfish Diner restaurant, which is regarded by many as the most elegant dining establishment in Jasper! Their all-you-can-eat seafood buffet, although expensive, is quite exquisite. I recommend it. Once again, however, I digress.

I turned to Robert. "What was the situation?"

"Well, this man came in," he said, "a man from Houston. It was strange, though, as the emblem on his car wasnâ(TM)t" â" Robert paused â" "I mean, he didnâ(TM)t seem to be a member of the Three Pointed Conspiracy. But, as we both know from experience, such terrible people are unable to conceal their true nature!"

"What happened?" I inquired, attempting to suppress my concern. Robertâ(TM)s safety was of an imperative nature.

"Well, it didnâ(TM)t become physical," he said, "so I didnâ(TM)t need to thrash him. People were obviously intimidated, though. I walked to his table.

"âWhat would you like this evening, sir?â(TM) I asked.

"âLook, man, is there any way you can get me some Starbucks coffee?â(TM)

"âStarbucks coffee? I donâ(TM)t believe we serve that, sir. Would you like some Community Coffee instead?â(TM)

"âI donâ(TM)t drink anything that gives me running diarrhea!â(TM) he said, raising his voice as if he were insulted. âI guess Iâ(TM)ll have some Coke. Anything to keep me awake so that I can get out of this anus of a city tonight.â(TM)

"I must admit that he was beginning to offend me. However, like an experienced waiter, I suppressed my anger. âYes, sir,â(TM) I said, and returned with his drink.

"âLook, man, Iâ(TM)m sure that all of your food sucks anyway, and you only serve a buffet. Itâ(TM)s probably something that I wouldnâ(TM)t even feed my dogs, but Iâ(TM)m desperate. So, Iâ(TM)ll have that.â(TM)

"âYes, sir,â(TM) I responded, âbut our food is excellent. In fact, it is superlative. Our seafood is delivered fresh once every week!â(TM)

"âThatâ(TM)s great,â(TM) he said, laughing, âbut tell your master chef not to screw me over too badly, aâ(TM)ight?â(TM)

"I must concede that I ignored the manâ(TM)s request. I returned with a plate and some silverware, which he snatched promptly from my hands. Fifteen minutes later, I returned with his tab. âHere, sir,â(TM) I said, âthank you for dining with us this evening.â(TM)

"âMy displeasure,â(TM) said the man, handing me a car key, âthis food was terrible, by the way. If this happened at any other restaurant, I would probably shoot the chef. My Lexus is parked outside. Bring it to the front door, please, and donâ(TM)t forget: there are two towels inside of the glove box.

But does it... by arose · 2003-06-14 11:05 · Score: 4, Funny

...run Windows?

--
Analogies don't equal equalities, they are merely somewhat analogous.

Re:But does it... by blibbleblobble · 2003-06-15 00:28 · Score: 1

"But does it run windows?"

I can just see the purchase-request now... 1000 copies of Windows at $250 each. ... and a lot more keyboards, mice, and monitors.
Re:But does it... by arose · 2003-06-15 02:25 · Score: 2, Funny

You forgot thr trained monkeys.

--
Analogies don't equal equalities, they are merely somewhat analogous.

1000+ cluster? by jabbadabbadoo · 2003-06-14 11:19 · Score: 1

Running rpm --rebuilddb must be a real drag.

Another approach... by Junta · 2003-06-14 11:39 · Score: 2, Informative

If you want to scale more, and your nodes have tons of ram, you could likely stuff the whole os into ramdisk and then use the local disk for the scratch space. Once booted, the network impact of nfs goes away.

Of course, you could use System installer Suite (http://www.sisuite.org/) which is *similar* to the rsync method mentioned by the other poster, but you get to skip the redhat install step in favor of SiS's tools.

--
XML is like violence. If it doesn't solve the problem, use more.

i'd just like everyone to know... by Anonymous Coward · 2003-06-14 11:55 · Score: 0

I LOVE LINUX!!!!!!!!!!!!

SystemImager-like update mechanism for non-Linux? by pschmied · 2003-06-14 12:28 · Score: 5, Interesting

I'm surprised that nobody has mentioned SystemImager. If you haven't looked at it for maintaining large numbers of Linux boxes, scamper off and take a look now. It is worth your time.

Now, that being said, I recently had the opportunity to evaluate using a number of OpenBSD boxes, but I couldn't find a utility for maintaining a bunch of the boxes in the same manner as SystemImager (i.e. Incrementally update servers from a golden master via rsync).

So, has anyone run found anything that does what systemimager does, but that is cross-platform? Do any SystemImager developers out there want to comment on the potential difficulty in supporting other-than-Linux operating systems in SystemImager?

SystemImager is one of the most useful tools I've ever seen, however, I believe that it would be an enterprise "killer app" if it could do MacOS X, *BSD, Windows etc.

-Peter

--
. Penguins Surely Ca

Huh? by soloport · 2003-06-14 13:04 · Score: 1

Anyone else read it as "Large Hardon Collider"? I blew coffee threw my nose. Damn disexlia...

Re:Huh? by pompousjerk · 2003-06-14 13:20 · Score: 1

Let me say this:

Holy crap, that would have been embarrassing...
Re:Huh? by Anonymous Coward · 2003-06-14 15:17 · Score: 0

In Soviet Russia...

Large hardons collide YOU!!

YOU ARE THE FAILURE!!!! by Anonymous Coward · 2003-06-14 13:27 · Score: -1, Flamebait

YOU ARE THE FAILURE!! You fucking silly ass piece of bitchtit shit....no fp for you, you fucking illiterate sack of knappy bitchtits!!!!

Fuck your mother, you fucking little bitch!!!

Lick my ass and like it, fucker.

FreeBSD cluster by Anonymous Coward · 2003-06-14 17:55 · Score: 0

How applicable is this to FreeBSD? Now that linux is under this legal cloud of doom, I'm switching all my clusters over to FreeBSD.

Wow... by Anonymous Coward · 2003-06-14 18:06 · Score: 0

Imagine a beow... oh wait...

NFS is a bad choice. by Anonymous Coward · 2003-06-14 22:15 · Score: 0

From my experience, NFS is by far the worst choice in networked filesystems.

Since all the boxens are linux I strongly suggest, SAMBA of NCP.

Think of it this way: In the old days we had
- NFS to share files with other old Unices (like SCO, Slowlaris, HPUX, AIX).
- SMB to share files with windowz
- NCP to share files with Novell network fs.

IMHO, NCP is the best. SMB is pretty good too.

linuxbios, anyone? by nafrikhi · 2003-06-14 23:56 · Score: 2, Informative

has anyone tried linuxbios http://www.linuxbios.org/ to replace standard bios. results in a diskless, faster boot. used in this cluster architecture: http://www.clustermatic.org/

Re:linuxbios, anyone? by pe1chl · 2003-06-15 00:34 · Score: 1

In a network, this seems to be largely redundant.
Use PXE when you want a diskless boot. May take more than 3 seconds, but is supported on many, many more systems!

1'000'000 node cluster by Anonymous Coward · 2003-06-15 01:03 · Score: 0

i have a 1'000'000 node cluster!
it crawles around, eates flies and likes light ...
and sometimes it replicates in my
fruit-loops!
it can accurately predict ( >95% ) the weather
two days ahead!

two bad it doesn't have any interface
that is compatible with me ...

it's what we scientists call a "passive-cluster"!

NFS gets the job done well by Anonymous Coward · 2003-06-15 08:09 · Score: 0

There's always an anti-NFS troll out there just waiting to spout "the truth". Get over it. NFS works great, particularily when all you are running is Linux.

Re:SystemImager-like update mechanism for non-Linu by More+Trouble · 2003-06-15 16:10 · Score: 1

SystemImager is one of the most useful tools I've ever seen, however, I believe that it would be an enterprise "killer app" if it could do MacOS X, *BSD, Windows etc.

You should check out radmind. It does in fact "do" Mac OS X, *BSD, and Linux.

:w

Re:SystemImager-like update mechanism for non-Linu by pschmied · 2003-06-16 03:35 · Score: 1

Hmm... Not quite there yet. The collection of command line tools could probably be rolled into something that automates system management the way SystemImager does. But even then, radmind rather unintelligently seems to recopy entire files.

Also, how is partitioning taken care of.

No, I'm still looking for something like SystemImager that handles multiple Operating Systems. Perhaps extending SystemImager to support others will be the easiest way.

As a side note, Frisbee, which was mentioned in a previous thread, is the killer app for LAN-based system imaging. Wow!

-Peter

--
. Penguins Surely Ca

Re:SystemImager-like update mechanism for non-Linu by More+Trouble · 2003-06-16 04:16 · Score: 1

Sorry, not a big SystemImager expert. I see that it just uses rsync, hence your comment about recopying entire files. I'd point out that for binary files, rsync tends to copy the entire file anyway, on a version change. radmind's nice in this case because it can tell that a file needs to be updated with no network traffic.

how is partitioning taken care of

Depends on the system. For Mac OS X, we pretty much need to use Apple's tools. For Solaris, we use Jumpstart. Kickstart on Linux. Partitioning is very OS specific. radmind is very portable.

:w

ar98sarf s87aeh87aw4h by jamie · 2003-07-13 12:43 · Score: 1

dsriugadniaw34r sareh98fase fasef

Slashdot Mirror

Maintaining Large Linux Clusters

134 comments