Ask Slashdot: Best Linux Distro For Computational Cluster?
DrKnark writes "I am not an IT professional, even so I am one of the more knowledgeable in such matters at my department. We are now planning to build a new cluster (smallish, ~128 cores). The old cluster (built before my time) used Redhat Fedora, and this is also used in the larger centralized clusters around here. As such, most people here have some experience using that. My question is, are there better choices? Why are they better? What would be recommended if we need it to fairly user friendly? It has to have an X-windows server since we use that remotely from our Windows (yeah, yeah, I know) workstations."
Redhat Enterprise Linux.
If you need something cheaper (no licenses), you can always go CentOS. Or you can mix both, having some RHEL and some CentOS machines.
morcego
Built for that very purpose.
"To those who are overly cautious, everything is impossible. "
NPACI Rocks is probably your best bet. http://rocksclusters.org/
How about Scientific Linux?
now we need to go OSS in diesel cars
Scientific Linux. http://www.scientificlinux.org/ Has the benefit of RHEL: a stable OS environment without some of the headaches of CentOS. If you have money (you probably don't) RHEL is good.
--I hate people when they're not polite -"Psycho Killer", Talking Heads
Fedora has components to help manage large deployments. https://fedorahosted.org/spacewalk/ It also has FreeIPA to help with a secure and scalable means of managing authentication/authorization/resources within the cluster. http://freeipa.org/page/Main_Page
Centos is modified to be the base OS for the ROCKS Cluster.
http://www.rocksclusters.org/wordpress/
I've worked with various clusters over the past year.
The distro doesn't really matter, mostly it's what you feel most comfortable with. I'd slightly favor RedHat Enterprise or a respin of it, since it's easiest in terms of drivers for commercial cluster hardware and commercial software support, but Debian would be just as fine. I would choose a 'stable' distro though, so no Fedora, no Ubuntu (even their LTS isn't exactly enterprise grade compared to RedHat / Suse or even Debian stable) You don't want to have to update every week since this usually requires quite some work (making new images and rebooting all nodes)
What I found out matters a lot more is the scheduler you will use; Sun Grid Engine, PBS, Torque or slurm to name a few. Every scheduler comes with it strong and weak points, be sure to look at what matters most to you.
If you are unfamiliar with all of these things, pick a complete bundle like Rocks (it's based on RedHat Enterprise Linux), which makes setting up a cluster quite easy and still allows you to choose which components you want. That'll greatly improve your chance of success. But be warned; it's still a steep learning curve building and specially configuring a cluster. The most time is spent tuning queuing parameters to maximize the performance of your cluster.
Imagine a Beowulf cluster of BeOS, beotch.
My comprehension of this question is roughly 'please have a flamewar about the different flavours of Linux.'
Is 1563649 a prime number?
If you are OK to go with RHEL, you also can look for SLES: SUSE Enterprise Linux Server. They also have SUSE Studio where you can make your own appliances. If you are large enterprise, they will even give you SUSE Studio appliance to be hosted in-house in your company for your own needs. They also have SUSE Manager — same as Spacewalk, but has more features in it (and is backward compatible with a Spacewalk).
RH support is phenomenal and that's why a lot of businesses use it. If you want it on the cheap, go with what you're comfortable and have your specific calculation packages built in (Debian if you like apt and open source packages, RPM if you use a lot of commercial packages). If you're looking for performance and specific hardware enhancements, go Gentoo or one of it's brethren. Go with something that you can easily re-image if you're looking for lots of changes in software lineups or conflicts.
Custom electronics and digital signage for your business: www.evcircuits.com
Scientific Linux 6.0 is built on Redhat Enterprise Edition 6 which is highly tested and tuned for server throughout put, power management, and stability compared to a stock vinalla kernel. The performance will be much better than a stock debian stable kernel or Ubuntu for example. Redhat has a bunch of hackers. Scientific Linux includes apps used for scientists which maybe your target market if you are a university too. If your old cluster has scripts and tools optimzied for Redhat and RPMs then makes sense to use a Redhat Distribution base.
If the scientific apps with Scientific Linux are not being utilized then just buy a license for RedHat Enterrpise Edition 6.1. The licensing fees are affordable if you have the budget for a large cluster and switches. With Redhat Enterprise edition you have support too if something goes down.
Remember to save a few bucks and go free is silly in an expensive project like this.
http://saveie6.com/
NPACI Rocks without a doubt. Red Hat centric, you need to put in some work to understand how it ticks, once you so and set up your cluster properly, it is very solid and reliable.
Are you sure that you know? You run local x window server on your windows machine when you use x window programs.
I've got 10+ years experience managing a large (2000 core, 1+ PB storage) compute cluster. If you're using one of those annoying commercial apps that assume Linux = Red Hat Linux (Matlab, Oracle, GPFS,etc.), then CentOS or Scientific Linux are the way to go.
If you don't have that constraint, consider Ubuntu or Debian. apt-get is my single favorite feature in the history of Unix-dom. Plus, there are often pre-built packages for several common cluster programs (Torque, Globus, Atlas, Lapack, FFTW, etc.) which can get you up and running a lot faster than if you had to build them yourselves.
we run our 320 core cluster on debian squeeze. infiniband support out of the box. the gridengine is a mater of apt.-get install. comes with tons of scientific sofware.
Scientific Linux is totally awesome, but a project of this size, especially with the IT knowledge on hand, needs the support and first-rate product which RedHat provides.
Hi,
I work at a Supercomputing Institute. You can run many different OSes and be successful with any of them. We run SLES on most of our systems, but CentOS and Redhat are fine, and I'm using Ubuntu successfully for an Openstack cloud. Rocks is popular though ties you to certain ways of doing things which may or may not be your cup of tea. Certainly it offers you a lot of common cluster software prepackaged which may be what you are looking for.
More important than the OS are the things that surround it. What does your network look like? How you are going to install nodes, and how you are going to manage software? Personally, I'm a fan of using dhcp3 and tftpboot along with kickstart to network boot the nodes and launch installs, then network boot with a pass-through to the local disk when they run. Once the initial install is done I use Puppet to take over the rest of the configuration management for the node based on a pre-configured template for whatever job that node will serve (for clusters it's pretty easy since you are mostly dealing with compute nodes). It becomes extremely easy to replace nodes by just registering their mac address and booting them into an install. This is just one way of doing it though. You could use cobbler to tie everything together, or use FAI. XCAT is popular on big systems, or you could use system imager, or replace puppet with chef or cfengine... Next you have to decide how you want to schedule jobs. You could use Torque and Maui, or Sun Grid Engine, or SLURM...
Or if you are only talking about about like 8-16 nodes, you could just manually install ubuntu on the nodes, pdsh apt-get update, and make people schedule their jobs on google calendar. ;) For the size of cluster you are talking about and what I assume is probably a very limited administration budget, that might be the best way to go. Even with someting like Rocks you are going to need to know what's going on when things break and it can get really complicated really fast.
A lot of this depends on what you're doing with your cluster and what apps you're running. However, Scientific Linux is used by quite a few large clusters and all of the US ATLAS and CMS clusters run on. As others have mentioned, you probably want to be more interested in how the cluster is managed and nodes setup and kept up to date. I'd recommend something like cobbler and puppet or some other change management system so that you can setup profiles and automatically have that propagated to the various nodes automatically. This is preferable and easier than going through and making the same configuration changes on 5-10 machines.
"When you sit with a nice girl for two hours, it seems like two minutes. When you sit on a hot stove for two minutes, it
Wrong too. Use the distro you work better with.
Am I eval()? - http://www.monst3r.com.br
Why? I know Ubuntu is the standard recommendation for grandma these days, but what makes you think it's particularly appropriate for a computational cluster? For instance, do you really need GNOME on a high performance cluster?
Give me Classic Slashdot or give me death!
vi
Clearly the best choice. It is so heavily optimized that even its name takes up only 40% of the required character of the second best contender, emacs.
Correction: the X11 server runs on your glass; eg: your Windows system. All you need then are X11 clients on the Linux cluster nodes.
So yeah, you'll need the libs and other support files for X11, but not the server itself. You'll save a bit on disk space by not installing the server. If it's just a single X11 client you need to run, then you can figure out exactly what it needs and not have a bunch of other crap (fonts, *GL, window managers, libs you're not using...) installed. Plus, you won't have a daemon running that takes resources despite being idle, and is an attack vector since it manages user logins.
None because you don't run interactive processes on a cluster but instead submit a package to the job scheduler.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
I'm not really a Ubuntu fan, but with the cluster I manage (120 physical cores, 960GB RAM) we've ended up going with Ubuntu 10.04 running Sungrid Engine for a couple of reasons. - The LTS support, by the time the support period ends we should be replacing the hardware any way. - It provides a Grid engine package by default (might not be the latest but it's good enough) for distributing the workloads - A lot of people are already familiar with Ubuntu - Most third party apps provide support for it - It's very stable - It's free Note if your users are heavy R users have a look at installing the Revolution R package from the third party repositories. It can provide some massive speed ups for Matrix work and a number of other jobs.
With some chance of being modded down, I suggest Gentoo Linux. With Gentoo you can compile your kernel and everything else which might give you some arguable performance increase. Because Gentoo is a source-based distribution, it might help you with scientific development because all the library (boost, itpp, lapack, etc) headers (and source) are immediately available. There is support for scientific libraries like atlas, ACML, etc.. and you can easily change the default library for blas/laplack using a simple command line. You can also find up to date scientific software in the official Gentoo repository.
I don't know about you but I find very useful being able to inspect the code of core libraries and patch it for my needs, if needed.
Just my 2 cents.
I have to disagree. Ubuntu has a nasty habit of letting non-mainstream, non-desktop related bugs pass through several release cycles. We've just this last week spent 3 full days trying to figure out why my perfectly working NFS boot over PXE cluster broke when we did a safe upgrade. Turns out there's been a bug in portmap since lucid, which still exists in natty which causes the NFS rootfs mount to fail. We had to to recreate the filesystem from scratch and install lucid without updates, then hold portmap back manually (after much trial and error to find out which package was breaking). I've had other issues with Ubuntu server too, so to me this is not an isolated incident. I wouldn't recommend Ubuntu for any scientific work, and especially not something as 'unusual' (read -> not desktop oriented) as cluster.
The problem with that suggestion is that the people maintaining the code don't have a clue what QA means. And before people whine - I used Gentoo as my primary distro for around three years. The emerge system is great - but the data inside is crap.
If you want to build your stuff from source and actually have a working system, look at the Debian-based distros. There's this nifty "apt-build" thing that lets you build software with whatever compile options you want (so you can still do -O3 -funroll-loops on everything if you really hate memory), just like Gentoo does. And there are packages for just about everything; partially because Debian's been around forever, and partially because "just about everyone" uses Ubuntu now. Gentoo does have a few "hacking" apps which are hard to find on other systems, but that's irrelevant to this discussion (and BackTrack is the way to go for that stuff anyway, IMHO). The primary difference is that you can build with source code that will actually work, and probably won't blow your system up when you just do a routine update. Wheras with Gentoo, some random kid who's too 'leet for testing might just promote to stable a new version of Xorg or Apache (both real examples from experience) which works fine on his system but breaks everyone else's in the world. And by "might" I mean "will". :)
I'm posting that mostly because quite a few Gentoo users think that only Gentoo (and maybe some of the BSDs) can easily rebuild a system from source, so they put up with atrocious quality assurance (which is admittedly extremely difficult given the Gentoo user base, and supposedly has gotten better) because they don't know that there are quite usable alternatives that are also more mainstream.
While I'd probably still recommend RHEL/CentOS/Rocks/whatever, to answer this specific question...
Ubuntu is an easy-to-use polished layer on top of Debian's unbeatable history of Doing Shit Right. Yes, there are some mistakes in their history like everyone else, so skip the "but in 1996 Debian did some obscure thing wrong" and "one time some boob screwed up the random number generator in ssh" - but overall, Debian is an incredible base for just about everything. Ubuntu takes Debian's inherent coolness and then makes releases more than once a decade. :) With the Ubuntu route, you get a number of different kernels which would benefit HPC applications - like a 32 bit kernel with the large memory support enabled, a kernel with RTC support, etc. You get the ability to install the ubuntu-minimal version (use the alternate installer) which is smaller than the minimal version offered by the other popular distros, and then install the packages you want. You get the QA benefits of using a distribution that has a *lot* of eyes upon it. And you get apt-build so you can recompile and fairly easily package things up.
Someone supporting HPC clusters shouldn't just pop the graphical install disk in and take what's installed; there is a fair amount of cutomization which should be done (ideally through Cfengine). So, while Ubuntu does have a nice, really easy install process for Grandma, there's an incredibly powerful and configurable architecture underneath that unassuming front end. If you have the deep knowledge required to understand why one distro really is better than others, it's actually worth taking the time to read through the documentation in the Ubuntu wiki, learning how all the different Debian things work together, and generally spending the time it takes to seriously use and inspect Ubuntu behind-the-scenes. It's nicely architected because the Debian people - weird as they may be - have spent decades building a very well designed platform that people like Canonical can extend.
It's really not as bad a choice as one might think if all they know about Ubuntu is that it's easy to install and use out of the box. :)
Actually, in the realm of biomedical supercomputing, "none of the above" has already been done. Check out the Anton supercomputer designed and built by D.E. Shaw Research. The entire supercomputer, right down to all of the processor cores themselves, were specially designed and built specifically for molecular dynamics research. The system has no operating system and, as such, no overhead. Every processor cycle goes straight into the calculations. It is capable of churning out simulations of 150,000+ atom protein complexes on the order of several microseconds long, using wallclock CPU time of a few days.
The distro does matter, often in ways not particular to being a cluster, but perhaps in ways making it easy to manage in general. For example, I'm moving away from Ubuntu (server) because it is too hard to selectively upgrade a single package or group of packages without imposing an upgrade on other packages. This is where "hand holding" has turned into "wrist crushing". So I'm moving to Slackware (which is getting a lot more capability through the SlackBuilds community).
now we need to go OSS in diesel cars