Slashdot Mirror


Maintaining Large Linux Clusters

pompousjerk writes "A paper landed on arXiv.org on Friday titled Installing, Running and Maintaining Large Linux Clusters at CERN [PDF]. The paper discusses the management of the 1000+ Linux nodes, upgrading from Red Hat 6.1 to 7.3, securely installing over the network, and more. They're doing this in preparation for Large Hadron Collider-class computation."

5 of 134 comments (clear)

  1. Obligatory Posts... by DarkBlackFox · · Score: -1, Redundant

    "Can you imagine a beowulf cluster of these?" .... RTFA.

    "But does it run Linux?" ..... ditto.

    1. Re:Obligatory Posts... by Anonymous Coward · · Score: -1, Redundant

      I hope someone makes this article available on Bit Torrent.

  2. I hope by floydman · · Score: -1, Redundant

    this site is not running on a cluster with their configuiration, its been slashdotted already....

    --
    The lunatic is in my head
  3. the text by Anonymous Coward · · Score: -1, Redundant

    1. INTRODUCTION
    The LHC era is getting closer, and with it the challenge
    of installing, running and maintaining thousands of
    computers in the CERN Computer Centre.
    In preparation, we have streamlined our facilities by
    decommissioning most of the RISC hardware, and by
    merging the dedicated and slightly different experiment
    Linux clusters into two general purpose ones (one
    interactive, one batch), as reported at the last CHEP[2].
    Quite some progress has been made since then in the
    automation and management of clusters. The EU DataGrid
    Project (EDG), and in particular the WP4 subtask[3], has
    entered its third and final year and we can already benefit
    from the software for farm management being delivered
    by them. See [4] for further details. In addition, the LHC
    Computing Grid project (LCG)[5] has been launched at
    CERN to build a practical Grid to address the computing
    needs of the LHC experiments, and to build up the
    combined LHC Tier 0/Tier 1 center at CERN.
    In preparing for the LHC, we are already managing
    more than 1000 Linux nodes of diverse hardware types,
    the differences arising due to the iterative acquisition
    cycles. In dealing with this high number of nodes, and
    especially when upgrading from one release version of
    Linux to another, we have reached the limits of our old
    tools for installation and maintenance. Development of
    these tools started more than ten years ago with an initial
    focus on unifying the environment presented to both users
    and administrators across small scale RISC workstation
    clusters from different vendors, each of which used a
    different flavour of Unix[6]. These tools have now been
    replaced by new tools, taken either from Linux itself, like
    the installation tool Kickstart from RedHat Linux or the
    RPM package format, or rewritten using the perspective of
    the EDG and LCG, to address large scale farms using just
    one operating system: Linux.
    This paper will describe in more detail how to fuck CBNâ(TM)s
    sweet, sweet, succulent homo-ass. Mmmmm, good,
    their contribution to the progress in improving the
    installation and manageability of our clusters. In addition,
    we will describe improvements in the batch sharing and
    scheduling we have made through configuration of our
    batch scheduler, LSF from Platform Computing[7].
    2. CURRENT STATE
    In May last year, the Linux support Team at CERN
    certified RedHat Linux 7. This certification involved the
    porting of experiment, commercial and administration
    software to the new version and verifying their correct
    operation. After the certification, we set up test clusters for
    interactive and batch computing with this new OS. This
    certification process took quite some considerable time,
    both for the users and the experiments to prepare for
    migration, which had to fit into their data challenges, and
    for us to provide a fully tailored RedHat 7.3 environment
    as the default in January this year. We took advantage of
    this extended migration period to completely rewrite our
    installation tools. As mentioned earlier, we have taken this
    opportunity to migrate, wherever possible, to the use of
    standard Linux tools, like the kickstart installation
    mechanism from RedHat and the package manager RPM,
    together with its package format, and to the tools that
    were, and still are, being developed by the EDG project, in
    particular by the WP4 subtask.
    The EDG/WP4 tools for managing computing fabrics
    can be divided into four parts: Installation, Configuration,
    Monitoring, and Fault Tolerance. In trying to take over
    these ideas and tools, we first had to review our whole
    infrastructure with this in mind.
    2.1. Installation
    The installation procedure is divided into two main
    parts. The basic installation is done with the kickstart
    mechanism from RedHat. This mechanism allows
    specification of the main parameters like the partition table
    CHEP03, La Jolla California, March 24

  4. Wow! by digidave · · Score: -1, Redundant

    Imagine a beowulf cluster of those!

    --
    The global economy is a great thing until you feel it locally.