Slashdot Mirror


Ask Slashdot: Best Linux Distro For Computational Cluster?

DrKnark writes "I am not an IT professional, even so I am one of the more knowledgeable in such matters at my department. We are now planning to build a new cluster (smallish, ~128 cores). The old cluster (built before my time) used Redhat Fedora, and this is also used in the larger centralized clusters around here. As such, most people here have some experience using that. My question is, are there better choices? Why are they better? What would be recommended if we need it to fairly user friendly? It has to have an X-windows server since we use that remotely from our Windows (yeah, yeah, I know) workstations."

17 of 264 comments (clear)

  1. Scientific Linux by stox · · Score: 5, Informative

    Built for that very purpose.

    --
    "To those who are overly cautious, everything is impossible. "
    1. Re:Scientific Linux by boristhespider · · Score: 5, Informative

      Being in academia and spending time in a lot of departments I can at least confirm that a large number of departments are running Scientific. I've worked in Britain, the USA, Canada, Norway and Germany and while Germany (predictably enough) has a hankering for SuSE, the others have a tendency to run Scientific.

      I did type in a long and boring anecdote about my experiences administering things running SGI Irix and Solaris back in the day, but wiped it when it began to look a bit incriminating and for all I know my ex-boss reads Slashdot. So I'll summarise as "don't administer SGI Irix or Solaris if you can avoid it". I'm no computer scientist, so maybe people who are better at it have no problems, but as a vaguely-competent scientist with an interest in computers but little more (like the original poster) I didn't get on with either of them. Red Hat was fine, and we hung Fedora machines off our central network and that was OK even though it was Fedora Core 1 with all its teething problems. And Scientific is very widely used in academia on big networks.

  2. NPACI Rocks by rmassa · · Score: 5, Informative

    NPACI Rocks is probably your best bet. http://rocksclusters.org/

    1. Re:NPACI Rocks by daemonc · · Score: 5, Interesting

      Seconded. I used Rocks to build clusters for the university for which I worked, and it made my life much, much easier.

      If you are already familiar with Redhat administration, you'll be happy to know Rocks can use either Redhat or CentOS as its base OS.

      It uses meta-packages called "rolls", which completely automate the installation and configuration of your computing nodes. There are rolls that include most of the commonly used commercial and Open Source HPC software out there, or you can "roll" your own. Basically you just configure your head node, and then adding a compute node is as simple as setting the BIOS to boot over PXE, plug it in, and done.

      Rocks, well, rocks.

      --
      All that we see or seem is but a dream within a dream.
  3. Scientific Linux by Skapare · · Score: 5, Informative

    How about Scientific Linux?

    --
    now we need to go OSS in diesel cars
  4. Distro isn't the biggie, it's the scheduler by javanree · · Score: 5, Interesting

    I've worked with various clusters over the past year.
    The distro doesn't really matter, mostly it's what you feel most comfortable with. I'd slightly favor RedHat Enterprise or a respin of it, since it's easiest in terms of drivers for commercial cluster hardware and commercial software support, but Debian would be just as fine. I would choose a 'stable' distro though, so no Fedora, no Ubuntu (even their LTS isn't exactly enterprise grade compared to RedHat / Suse or even Debian stable) You don't want to have to update every week since this usually requires quite some work (making new images and rebooting all nodes)

    What I found out matters a lot more is the scheduler you will use; Sun Grid Engine, PBS, Torque or slurm to name a few. Every scheduler comes with it strong and weak points, be sure to look at what matters most to you.

    If you are unfamiliar with all of these things, pick a complete bundle like Rocks (it's based on RedHat Enterprise Linux), which makes setting up a cluster quite easy and still allows you to choose which components you want. That'll greatly improve your chance of success. But be warned; it's still a steep learning curve building and specially configuring a cluster. The most time is spent tuning queuing parameters to maximize the performance of your cluster.

  5. Re:RHEL by pavon · · Score: 5, Insightful

    If you need something cheaper (no licenses), you can always go CentOS.

    If you want something compatible with Red Hat but cheaper, you should go with Scientific Linux, which is the same sort of idea as CentOS, but has more timely releases, and is used by other major clusters, like the ones at Fermilab and CERN.

  6. Red Hat for support by guruevi · · Score: 3, Insightful

    RH support is phenomenal and that's why a lot of businesses use it. If you want it on the cheap, go with what you're comfortable and have your specific calculation packages built in (Debian if you like apt and open source packages, RPM if you use a lot of commercial packages). If you're looking for performance and specific hardware enhancements, go Gentoo or one of it's brethren. Go with something that you can easily re-image if you're looking for lots of changes in software lineups or conflicts.

    --
    Custom electronics and digital signage for your business: www.evcircuits.com
  7. CentOS, Scientific Linux, Ubuntu, Debian by MetricT · · Score: 4, Informative

    I've got 10+ years experience managing a large (2000 core, 1+ PB storage) compute cluster. If you're using one of those annoying commercial apps that assume Linux = Red Hat Linux (Matlab, Oracle, GPFS,etc.), then CentOS or Scientific Linux are the way to go.

    If you don't have that constraint, consider Ubuntu or Debian. apt-get is my single favorite feature in the history of Unix-dom. Plus, there are often pre-built packages for several common cluster programs (Torque, Globus, Atlas, Lapack, FFTW, etc.) which can get you up and running a lot faster than if you had to build them yourselves.

  8. Building Clusters by Nite_Hawk · · Score: 5, Informative

    Hi,

    I work at a Supercomputing Institute. You can run many different OSes and be successful with any of them. We run SLES on most of our systems, but CentOS and Redhat are fine, and I'm using Ubuntu successfully for an Openstack cloud. Rocks is popular though ties you to certain ways of doing things which may or may not be your cup of tea. Certainly it offers you a lot of common cluster software prepackaged which may be what you are looking for.

    More important than the OS are the things that surround it. What does your network look like? How you are going to install nodes, and how you are going to manage software? Personally, I'm a fan of using dhcp3 and tftpboot along with kickstart to network boot the nodes and launch installs, then network boot with a pass-through to the local disk when they run. Once the initial install is done I use Puppet to take over the rest of the configuration management for the node based on a pre-configured template for whatever job that node will serve (for clusters it's pretty easy since you are mostly dealing with compute nodes). It becomes extremely easy to replace nodes by just registering their mac address and booting them into an install. This is just one way of doing it though. You could use cobbler to tie everything together, or use FAI. XCAT is popular on big systems, or you could use system imager, or replace puppet with chef or cfengine... Next you have to decide how you want to schedule jobs. You could use Torque and Maui, or Sun Grid Engine, or SLURM...

    Or if you are only talking about about like 8-16 nodes, you could just manually install ubuntu on the nodes, pdsh apt-get update, and make people schedule their jobs on google calendar. ;) For the size of cluster you are talking about and what I assume is probably a very limited administration budget, that might be the best way to go. Even with someting like Rocks you are going to need to know what's going on when things break and it can get really complicated really fast.

  9. Re:RHEL by b30w0lf · · Score: 5, Informative

    Agreed.

    A primary component of my job is the design and maintenance of high performance compute clusters, previously in computational physics, presently in biomedical computing. Over the last few years I have had the privilege of working with multiple Top500 clusters. Almost every cluster I have ever touched has run some RHEL-like platform, and every cluster I deploy does as well (usually CentOS).

    Why? Unfortunately, the real reasons are not terribly exciting. While it's entirely true that many distro's will give you a lot more up-to-date software with many more bells and whistles, at the end of the day what you really want is a stable system that works. Now, I'm not going to jump into a holy war by claiming RedHat is more stable than much of anything, but what it is is tried and true in the HPC sector. The vast majority of compute clusters in existence run some RHEL variant. Chances are, if any distro is going to have hit and resolved a bug that surfaces when you have thousands of compute cores talking to each other, or manipulating large amounts of data, or running CPU/RAM intensive jobs, or making zillions of NFS (or whatever you choose) network filesystem calls at once, or using that latest QDR InfiniBand fabric with OpenMPI version 1.5.whatever, it's going to be RHEL. That kind of exposure tends to pay off.

    Additionally, you're probably going to be running some software on this cluster, and there's a good chance that software is going to be supplied by someone else. That kind of software tends to fall into one of two camps: 1) commercial (and commercially supported) software, and; 2) open source, small community research software. Both of these benefit from the prevalence of RHEL (though, #1 more than #2). If you're going to be running a lot of #1, you probably just don't have an option. There's a very good chance that the vendor is just not going to support anything other than RHEL, and when it comes down to it, if your analysis isn't getting run and you call the vendor for support the last thing you want to hear is "sorry, we don't support that platform ." If you run a lot of #2, you'll generally benefit from the fact that there's a very high probability that the systems that the open community software have primarily been tested on are RHEL-like systems.

    Finally, since so many compute clusters have been deployed with RHEL-like distros, there is oodles of documentation out there on how to do it. This can be a pretty big help, especially if you're not used to the process. Chances are your deployment will be complicated enough without trying to reinvent the wheel.

  10. Re:Ubuntu 10.04 LTS by Hatta · · Score: 3, Insightful

    Why? I know Ubuntu is the standard recommendation for grandma these days, but what makes you think it's particularly appropriate for a computational cluster? For instance, do you really need GNOME on a high performance cluster?

    --
    Give me Classic Slashdot or give me death!
  11. Re:Which editor should he use? by ebuck · · Score: 4, Funny

    vi

    Clearly the best choice. It is so heavily optimized that even its name takes up only 40% of the required character of the second best contender, emacs.

  12. Re:RHEL by Anonymous Coward · · Score: 5, Informative

    I used to be on the CMS/LHC team at Fermilab. We used Scientific Linux on the 5500 Linux workers used for collider event reconstruction. SL is built with computing clusters in mind. I highly recommend it.

  13. Re:RHEL by Have+Brain+Will+Rent · · Score: 3, Funny

    Just for a moment there I saw 5500 students/programmers/supportfolk/etc. all sitting in little cubicals slaving away at some problem after having had "Scientific Linux", aka the whip, "used" on them....

    Then I finished parsing the sentence.

    --
    The tyrant will always find a pretext for his tyranny - Aesop
  14. Re:RHEL by greg1104 · · Score: 4, Informative

    Not just currently. Today's organizational turmoil within CentOS is nothing compared to when they lost access to much of the infrastructure a few years ago. I just wrote a blog entry on the rise of and fall of CentOS; the theme is why it's important to build an open community, not a tight clique, if you want an open-source project to scale.

  15. Re:None of them by cashman73 · · Score: 3, Interesting

    Actually, in the realm of biomedical supercomputing, "none of the above" has already been done. Check out the Anton supercomputer designed and built by D.E. Shaw Research. The entire supercomputer, right down to all of the processor cores themselves, were specially designed and built specifically for molecular dynamics research. The system has no operating system and, as such, no overhead. Every processor cycle goes straight into the calculations. It is capable of churning out simulations of 150,000+ atom protein complexes on the order of several microseconds long, using wallclock CPU time of a few days.