Ask Slashdot: Best Linux Distro For Computational Cluster?
DrKnark writes "I am not an IT professional, even so I am one of the more knowledgeable in such matters at my department. We are now planning to build a new cluster (smallish, ~128 cores). The old cluster (built before my time) used Redhat Fedora, and this is also used in the larger centralized clusters around here. As such, most people here have some experience using that. My question is, are there better choices? Why are they better? What would be recommended if we need it to fairly user friendly? It has to have an X-windows server since we use that remotely from our Windows (yeah, yeah, I know) workstations."
Yellow Dog Linux ftw!
Redhat Enterprise Linux.
If you need something cheaper (no licenses), you can always go CentOS. Or you can mix both, having some RHEL and some CentOS machines.
morcego
http://pareto.uab.es/mcreel/PelicanHPC/
Just imagine a Beowulf cluster of bullcrap!
In order to form an immaculate member of a flock of sheep one must, above all, be a sheep.
Built for that very purpose.
"To those who are overly cautious, everything is impossible. "
Your in-depth analysis is invaluable to the discussion.
If it was my choice, I'd go with Debian. I used RedHat for a few years in the '90s, then went to Debian after getting tired of dependency hell, and have not gone back since. Have worked in CentOS recently - not as bad as it used to be, but I still prefer Debian. Debian and its derivatives (like Ubuntu) were reported as being the "most important" Linux distro a couple months ago.
It isn't about the OS, it is about the tools to manage it. Rocks is based on Centos, and helps you run the cluster.
http://www.rocksclusters.org/
NPACI Rocks is probably your best bet. http://rocksclusters.org/
How about Scientific Linux?
now we need to go OSS in diesel cars
Ubuntu 10.04 LTS, accept no substitute. Maybe use a Ubuntu 10.10.2 desktop to manage them, it's easy to use. (11.04 is still unstable, IMO.)
It actually all depends on what packages you plan on running. Then cross reference that against what your options are, I think you'll run out of options quickly, TBH.
And you just need putty on the windows side. But, if you have to, all of them can run x-windows generally these days. I just find Ubuntu to be the easiest, plus their packaging system is the best, being Debian under the covers.
Please note, these are opinions, and I'm entitled to my informed opinion.
Gonzo Granzeau
"Nothing the god of biomechanics wouldn't let you into heaven for.." -Roy Batty
I run a cluster built using RocksClusters.org distribution, which is based off of CentOS. In the past, the previous admin has us running Oscar, but I found its management style too clunky, but that was pre version 6, but then with Rocks we've never looked back.
I haven't use it, but Rocks also has a visualization roll, that should probably include XWindows stuff. Also check out vnc or nomachine as XForwarding is a chatty protocol and gets a lot of lag once you are off of the local network(ie from home or abroad).
"From the 'what-will-you-be-computationalizing?' dept"
That's a good question.
Scientific Linux. http://www.scientificlinux.org/ Has the benefit of RHEL: a stable OS environment without some of the headaches of CentOS. If you have money (you probably don't) RHEL is good.
--I hate people when they're not polite -"Psycho Killer", Talking Heads
Fedora has components to help manage large deployments. https://fedorahosted.org/spacewalk/ It also has FreeIPA to help with a secure and scalable means of managing authentication/authorization/resources within the cluster. http://freeipa.org/page/Main_Page
I think an important question here is why was Red Hat chosen for the other clusters? Your requirements aren't very specific, there are hundreds of distro's that could meet your criteria.
In the X Window model, the computer with the display is the server. Client programs connect to the server in order to display on its screen.
MAC! wait, what?
Now we have such a clear winner on the choice of distro, perhaps we can discuss which would be the best editor on the cluster?
Centos is modified to be the base OS for the ROCKS Cluster.
http://www.rocksclusters.org/wordpress/
1) You don't need X server. You only need ssh. And if you forward windows via ssh to the windows (or another linux) workstation, you don't need X server running on the server side.
For stability use Slackware (pain setting up as the package management is weak, but it stays incorruptible for a lifetime). If you seek good package management, no-bullshit straight forward configuration and custom layout (e.g. X server, lightweight or no window manager, network and monitoring daemons), use Arch linux. Just avoid consumer-oriented distros, they usually come with fancy and gigantic desktop environment and confusing graphical configuration wizards.
I've worked with various clusters over the past year.
The distro doesn't really matter, mostly it's what you feel most comfortable with. I'd slightly favor RedHat Enterprise or a respin of it, since it's easiest in terms of drivers for commercial cluster hardware and commercial software support, but Debian would be just as fine. I would choose a 'stable' distro though, so no Fedora, no Ubuntu (even their LTS isn't exactly enterprise grade compared to RedHat / Suse or even Debian stable) You don't want to have to update every week since this usually requires quite some work (making new images and rebooting all nodes)
What I found out matters a lot more is the scheduler you will use; Sun Grid Engine, PBS, Torque or slurm to name a few. Every scheduler comes with it strong and weak points, be sure to look at what matters most to you.
If you are unfamiliar with all of these things, pick a complete bundle like Rocks (it's based on RedHat Enterprise Linux), which makes setting up a cluster quite easy and still allows you to choose which components you want. That'll greatly improve your chance of success. But be warned; it's still a steep learning curve building and specially configuring a cluster. The most time is spent tuning queuing parameters to maximize the performance of your cluster.
Imagine a Beowulf cluster of BeOS, beotch.
Most of the seismic clusters I work with use Centos or Fedora.
My comprehension of this question is roughly 'please have a flamewar about the different flavours of Linux.'
Is 1563649 a prime number?
"It has to have an X-windows server since we use that remotely from our Windows (yeah, yeah, I know) workstations."
So what? One one hand in order to run Linux graphic apps on Windows you need an X-Window server... on the Windows machine, not the Linux one. On the other hand, how is it that you *must* use GUI-based apps? There's *really* no operational alternatives? (I've been administrating Linux and Unix systems for almost two decades and I never needed -as in "must", GUI-based apps for that).
"The one you're (or your team/support staff is) most familiar with."
If none qualify under that criteria -- you probably want to go for popularity... because that leads to ease of finding support from people familiar with it. In that case, Ubuntu or possibly Fedora or CentOS.
If you are OK to go with RHEL, you also can look for SLES: SUSE Enterprise Linux Server. They also have SUSE Studio where you can make your own appliances. If you are large enterprise, they will even give you SUSE Studio appliance to be hosted in-house in your company for your own needs. They also have SUSE Manager — same as Spacewalk, but has more features in it (and is backward compatible with a Spacewalk).
It's like asking "which screwdriver bit is the best to use for driving screws"... You probably have an idea what software you intend to run on the cluster. If any of those are commercial packages, they probably have their support and installation processes setup for particular distributions. It'd be nice if they'd support anything, but in practice, they don't. Consult the vendors of any commercial product you plan to use with the cluster and bias yourself towards whatever they say they will support. It's easier that way.
Second, if that's not an overriding concern, then pick something that's reasonably popular or at least sticks to mainstream layouts and package availability. The reason is mostly because whatever have a good mind-share is easier to find documentation and support for, not to mention contractors with experience.
You are lucky in that, at it's core, Linux provides a stable platform with a lot of features that are largely consistent between platforms. There's not going to be wild inconsistencies in performance or hardware support (there's some, but not much) and software is rarely distribution-dependent (though, some may require some configuration/modification if the developer makes too many assumptions about the environment).
RH support is phenomenal and that's why a lot of businesses use it. If you want it on the cheap, go with what you're comfortable and have your specific calculation packages built in (Debian if you like apt and open source packages, RPM if you use a lot of commercial packages). If you're looking for performance and specific hardware enhancements, go Gentoo or one of it's brethren. Go with something that you can easily re-image if you're looking for lots of changes in software lineups or conflicts.
Custom electronics and digital signage for your business: www.evcircuits.com
Scientific Linux 6.0 is built on Redhat Enterprise Edition 6 which is highly tested and tuned for server throughout put, power management, and stability compared to a stock vinalla kernel. The performance will be much better than a stock debian stable kernel or Ubuntu for example. Redhat has a bunch of hackers. Scientific Linux includes apps used for scientists which maybe your target market if you are a university too. If your old cluster has scripts and tools optimzied for Redhat and RPMs then makes sense to use a Redhat Distribution base.
If the scientific apps with Scientific Linux are not being utilized then just buy a license for RedHat Enterrpise Edition 6.1. The licensing fees are affordable if you have the budget for a large cluster and switches. With Redhat Enterprise edition you have support too if something goes down.
Remember to save a few bucks and go free is silly in an expensive project like this.
http://saveie6.com/
NPACI Rocks without a doubt. Red Hat centric, you need to put in some work to understand how it ticks, once you so and set up your cluster properly, it is very solid and reliable.
If a theme hasn't come up yet, go with an RPM or RedHat based distro. I would stay away form Fedora because of how fast development is on it, and go with either Scientific, or CentOS. The skill sets that people have around you for Fedora will translate over just fine since they are all based on RPM. If you want to pay money and get support, then go with RedHat, otherwise CentOS is just RedHat EL but without the RedHat name.
I work in the industry, and know from personal experiences that Linux Distro selection is often almost religious.
But what you should be doing is check with your ISVs what they support, and do the same with your HW vendor. Try to make a matrix and you may find that the only option is one of the Enterprise distributions.
If your in the rare situation that you have source code for everything, I suggest you look at Scientific Linux or CentOS.
Microsoft Windows Server has an HPC (High-Performance-Computing) Edition. Several of the Universities and research companies use this product, and it now can use Windows Azure as on-demand worker nodes. Essentially you could set up only one "head" node and then just pay for computation when needed.
http://blogs.msdn.com/b/ignitionshowcase/archive/2011/04/20/windows-high-performance-computing-bursting-into-windows-azure.aspx?wa=wsignin1.0
If you're interested specifically in Linux, I recommend a Beowulf Cluster, which is what we used when I worked at the Space Center in Florida several years ago.
http://beowulf.org/
I was just pricing 2U database servers that had 32 cores each. A 128 core cluster is now just four small off-the-shelf servers in a rack for less than a hundred grand.
Period.
Are you sure that you know? You run local x window server on your windows machine when you use x window programs.
Seriously, the distribution is not the issue. You need a good concept (LDAP/SSO, automated installation, patch management, network/clustered filesystems, ...) to be able to manage that beast in the long term. Have someone capable do it for you initially.
Back to topic, i'm working with all the enterprise crap money can buy. Don't let anyone bullshit you into buying enterprise distributions because of certifications. You don't need that. Debian is a far better foundation both technologically and financially.
Redhat or fedora
good forum support
http://www.returninfinity.com/baremetal.html
I've got 10+ years experience managing a large (2000 core, 1+ PB storage) compute cluster. If you're using one of those annoying commercial apps that assume Linux = Red Hat Linux (Matlab, Oracle, GPFS,etc.), then CentOS or Scientific Linux are the way to go.
If you don't have that constraint, consider Ubuntu or Debian. apt-get is my single favorite feature in the history of Unix-dom. Plus, there are often pre-built packages for several common cluster programs (Torque, Globus, Atlas, Lapack, FFTW, etc.) which can get you up and running a lot faster than if you had to build them yourselves.
Debian -- easy to manage, easy to create new packages for, least amount of nonstandard, distribution-specific stuff (except configuration files management, but that is a result of having to keep individual packages' configuration tied to packages).
Contrary to the popular belief, there indeed is no God.
1. What types of computation is the cluster going to be used for? MD, CFD, ???
2. What software will be used on the nodes? CHARMM, GAMESS, LAMMPS, NWChem, etc.
3. Do you have a preference for a Linux distro? If not, it really doesn't matter that much if you are rolling your own cluster and software stack. It will just determine what things are used for package management and what services in the distro you might want to turn off in order to get the most memory for apps and not the base OS.
4. You should be using SSH as the main interface for the actual compute nodes and maybe (big maybe) have an X server on the login/compile head nodes, but NOT the compute nodes. You want the compute nodes to be as bare as possible to conserve as much RAM and scratch disk space for apps as possible.
Having said all that, CentOS, Fedora, SuSE and RHEL are probably the most popular on distributed memory clusters today. You will also want to make sure that whatever compilers you are using are compatible with the Linux distro you want to use, unless you are relying completely on gcc or binary applications. I have built many clusters from scratch and can be a point of contact should you have additional questions.
we run our 320 core cluster on debian squeeze. infiniband support out of the box. the gridengine is a mater of apt.-get install. comes with tons of scientific sofware.
Scientific Linux is totally awesome, but a project of this size, especially with the IT knowledge on hand, needs the support and first-rate product which RedHat provides.
If your cluster is going to be closed circuit (no internet access) I would recommend RHEL5 as finding and installing RPMs is generally easier when your not able to use the default Distro package utility. If you cluster will have access to the internet (you'll be able to use the Distro package app) I'd recommend Ubuntu10.04 as the Distro repository is up to date and constantly growing.
For building and maintaining a small cluster, especially to anyone whose main job is not going to be maintaining the cluster, you should take a look at Rocks. It actually builds on top of a regular Linux distro, although only certain distros work. Redhat Enterprise, CentOS, and Scientific Linux are mentioned in the documentation as being compatible.
What Rocks does is add a bunch of cluster-specific tools to the underlying distro. It helps take care of networking and setting up the compute nodes for easy maintenance and configuration. You basically configure your front end, and then it is extremely simple to manage the computer nodes (including installation; installation by default is done over the network between the compute node and the head node). I have also found the Rocks mailing list to be extremely helpful even to folks who are new to building clusters.
I work for an HPC vendor, thus the anonymous posting.
If you’re going to be running commercial codes/solvers/etc., you need to stick with one of the RPM-based distros. It’s all they test against. If you’re going to be running a fancy interconnect (Infiniband, etc.), that may further restrict your choices.
If you’re running homegrown code, go with your gut.
Hi,
I work at a Supercomputing Institute. You can run many different OSes and be successful with any of them. We run SLES on most of our systems, but CentOS and Redhat are fine, and I'm using Ubuntu successfully for an Openstack cloud. Rocks is popular though ties you to certain ways of doing things which may or may not be your cup of tea. Certainly it offers you a lot of common cluster software prepackaged which may be what you are looking for.
More important than the OS are the things that surround it. What does your network look like? How you are going to install nodes, and how you are going to manage software? Personally, I'm a fan of using dhcp3 and tftpboot along with kickstart to network boot the nodes and launch installs, then network boot with a pass-through to the local disk when they run. Once the initial install is done I use Puppet to take over the rest of the configuration management for the node based on a pre-configured template for whatever job that node will serve (for clusters it's pretty easy since you are mostly dealing with compute nodes). It becomes extremely easy to replace nodes by just registering their mac address and booting them into an install. This is just one way of doing it though. You could use cobbler to tie everything together, or use FAI. XCAT is popular on big systems, or you could use system imager, or replace puppet with chef or cfengine... Next you have to decide how you want to schedule jobs. You could use Torque and Maui, or Sun Grid Engine, or SLURM...
Or if you are only talking about about like 8-16 nodes, you could just manually install ubuntu on the nodes, pdsh apt-get update, and make people schedule their jobs on google calendar. ;) For the size of cluster you are talking about and what I assume is probably a very limited administration budget, that might be the best way to go. Even with someting like Rocks you are going to need to know what's going on when things break and it can get really complicated really fast.
Doesn't gain you a thing. Drivers are loaded on demand as needed for local hardware. Unused drivers are not loaded at all, and do not impact performance or memory usage.
All custom kernels can do at most is to reduce the size of the initrd file used during boot.
The initrd is a compressed cpio file containing the contents of a memory resident root filesystem used during hardware initialization. Once hardware is identified, then required drivers (disk/video/keyboard/mouse) are loaded and the real root filesystem is used (using the driver from the initrd).
Once the real root is mounted additional drivers (if any) may be loaded as directed by configuration.
The only gain in a custom kernel is reducing the time to compile a kernel...
I built one with Debian Lenny plus I developed a scheduling system in perl using kernel containers + CGroups. So, the research team would think they own a "real" linux and I can share resources in a better way. Also I used perl-cgi to make a containers design, this way no one touchs my real OS. The research leader just use my container design to create and deploy new containers, then the new container is booted in a testing machine so the person can install whatever he needs and cloning later to deploy on the cluster, 400 core + 1.8TB ram.
First, you need to ask this question in a place where folks are familiar with HPC, not a general forum where anyone can tote the flag for the Linux Flavor of the Week(tm) - since the term 'cluster' means failover to most. Second, you need to consider which distributions are supported by your solvers and schedulers. Third, why do you need to run X at all? You should probably look into an X server on your Windows desktops instead and save the computational cycles for, well, computation.
but that is ok! rather than asking quickie questions and expecting quickie answers, you can start by learning the difference between a system administrator and IT professionals.
"user friendliness" is proportional to sysadmin's abilities or proportional to $$$ for commercial tech support
1. A (stale) link to get you going on hpc clusters: http://www.hpccommunity.org/section/kusu-45/
2. http://www.platform.com/ - Dell's/Redhat official hpc cluster (at least a couple of years ago) which was based on kusu (see previous link). In other words, RH was(is?) using a third party for their RH HPC - correction needed if things have changed. - a great yo-yo system (DellRedHatPlatform) in case you have issues.
3. http://www.caoslinux.org/
A lot of this depends on what you're doing with your cluster and what apps you're running. However, Scientific Linux is used by quite a few large clusters and all of the US ATLAS and CMS clusters run on. As others have mentioned, you probably want to be more interested in how the cluster is managed and nodes setup and kept up to date. I'd recommend something like cobbler and puppet or some other change management system so that you can setup profiles and automatically have that propagated to the various nodes automatically. This is preferable and easier than going through and making the same configuration changes on 5-10 machines.
"When you sit with a nice girl for two hours, it seems like two minutes. When you sit on a hot stove for two minutes, it
Wrong too. Use the distro you work better with.
Am I eval()? - http://www.monst3r.com.br
I'd have to agree with the Debian/Ubuntu route if you want user friendliness. I've always found Debianesque systems much more manageable than other distros. If I have to provide most of the IT myself, I prefer Debian/Ubuntu. There are some science Debian distros as well (and repositories).
Scientific Linux would likely be faster overall for computationally heavy tasks but it really depends on what you are planning on doing. Debian wouldn't be slow, just not quite as fast as Scientific Linux; but again, that might not matter very much in the big picture.
I run a 730-core cluster on debian/gridengine. We're a debian shop, and keeping the cluster platform the same as our desktops is an advantage. Configuring gridengine takes some effort, but so far we're pleased with the result.
I work in a national supercomputing centre and I can only recommend you to use Scientific Linux and Quattor to manage configuration and installation. There's nothing as powerful as Quattor and it's used by many computing centres (IN2P3, CERN, etc)
as scheduler and batch system I recommend u to use Slurm or the combination of Torque/Moab. Ran away of Maui!!!! And LFS is so damn expensive!
We run cray's SLES and scientific Linux with both Slurm and torque/Maui and we're very happy with our clusters.
In addition, I recommend you Lustre as shared FS. Relatively easy to use and install.
Good luck with your setup!!!
M
Simple question. The OP asked WHY you feel that is a solution for large-cluster HPC.
It looks like so far your only reason is "i liek it!" - I personally have no opinion or experience with HPC clusters, but so far nearly all of those who do are recommending something that is either RHEL or RHEL-based (Rocks or Scientific Linux), if only because it allows you to leverage commonality with the big cluster operators with installations in the Top500.
Disclaimer: I'm an Ubuntu user, and I greatly enjoy it, but I have not seen many examples of actual scientific clusters running it.
retrorocket.o not found, launch anyway?
Disclaimer, I worked on the produce for a number of years. I now work at a different Linux company...
Scyld is built on top of Red Hat EL, can also run with CentOS, but uses a custom Kernel. It has a lightweight provisioning mechanism that makes maintenance of compute nodes very easy, and the single system image approach makes job management significantly easier than a traditional Beowulf cluster. I don't know if they test it out with Scientific Linux these days.
Open Source Identity Management: FreeIPA.org
Centos 5 is currently the favorite among the hpc sysadmins I work with and its rather simular to RHEL. That said Ubuntu has a much nicer package manager (read functional) and so could be a lot easier. It really comes down to what you are most comfortable with.
More important is the system software you run on top of the cluster.
We use and like for a very similarly sized cluster:
Red Hat Kickstart for install
Bcfg for system configuration
Nagios + Ganglia for monitoring
Torque + Maui for batch scheduling
We use RHEL/Scientific Linux & Perceus (http://www.perceus.org/). It is solid and easy to add new nodes.
http://bccd.net/faq - Debian based Bootable Cluster CD. 'nuf said.
--You're BOTH right. It's a floor wax AND a desert topping!
CentOS + cluster management + common libraries needed for HPC. We had a couple clusters at my last work and they worked well. Could setup the nodes to net boot, change the configuration in one place reboot nodes and viola everything is upgraded.
It's not said what the cluster is for. If it's going to be a private cloud, Ubuntu jeos (just enough OS) VMs are far quicker to configure than RHEL/CentOS 5.x VMs, if your taste runs to setting up VMs from scratch rather than just cloning them. Don't know what RH 6 offers there (and Scientific Linux, which unlike CentOS offers a RH 6 variant). You also haven't specified your file system's layers. Ubuntu support by most file system projects is equal to that of RHEL, in terms of easily installed packages being available. The exception is with some parts of RH's own preferred cluster stack, which I haven't used. DR:BD, for instance, is perhaps better supported on Ubuntu than RH. GlusterFS is better tested in RH, but runs as well, and is packaged as well, for Ubuntu (and is still a bit immature in either case, but worth considering down the line).
The main advantage of Ubuntu is you get the exact same product as support is sold for, without being required to buy the support. RH forces you to go to CentOS or Scientific Linux to come close to that, and there are differences in those - mostly in the package managers. And RH, if you do license it, pays out part of your money to patent trolls, whom RH essentially supports so that they may more productively attack its competition - other open source projects.
"with their freedom lost all virtue lose" - Milton
Correction: the X11 server runs on your glass; eg: your Windows system. All you need then are X11 clients on the Linux cluster nodes.
So yeah, you'll need the libs and other support files for X11, but not the server itself. You'll save a bit on disk space by not installing the server. If it's just a single X11 client you need to run, then you can figure out exactly what it needs and not have a bunch of other crap (fonts, *GL, window managers, libs you're not using...) installed. Plus, you won't have a daemon running that takes resources despite being idle, and is an attack vector since it manages user logins.
I built a small (32 node) Beowulf cluster for an informatics group at the University of Bonn. We started off with a SuSE, discovered that it was hard to get some drivers compiled, then went to Debian, discovered that some of the boot up scripts were a bit troublesome to keep up high availability, then went to Gentoo ahref=http://www.gentoo.org/rel=url2html-17894http://www.gentoo.org/ /> and were quite pleased how *everything*, including rebuilding a node up from the boot loader, could be scripted. Of course every situation has its unique hazards, but if you want tight system with everything under your control, it's hard to beat Gentoo, (however good ol' Debian came very close).
I believe you can successfully build a computational cluster from any linux distribution. I am sure you could go wild and use slackware if you want.
But I guess the quesiton is who will administrate the cluster ? from what you say, I feel like you will and you say yourself you don't know much about that. Then I would recommend to keep the distribution installed by the vendor because they will probably give you software support. But if you change it, they probably won't.
Important things have already been told. But in summary the question is what are you going to do with the cluster. What application are going to run on it. Are you going to develop application to run on it ? or are you using premade applications ? If you are developping with it, you probably want more up to date softwares. If you are using some premade applications, you probably want the best compatibility...
I'm not really a Ubuntu fan, but with the cluster I manage (120 physical cores, 960GB RAM) we've ended up going with Ubuntu 10.04 running Sungrid Engine for a couple of reasons. - The LTS support, by the time the support period ends we should be replacing the hardware any way. - It provides a Grid engine package by default (might not be the latest but it's good enough) for distributing the workloads - A lot of people are already familiar with Ubuntu - Most third party apps provide support for it - It's very stable - It's free Note if your users are heavy R users have a look at installing the Revolution R package from the third party repositories. It can provide some massive speed ups for Matrix work and a number of other jobs.
With some chance of being modded down, I suggest Gentoo Linux. With Gentoo you can compile your kernel and everything else which might give you some arguable performance increase. Because Gentoo is a source-based distribution, it might help you with scientific development because all the library (boost, itpp, lapack, etc) headers (and source) are immediately available. There is support for scientific libraries like atlas, ACML, etc.. and you can easily change the default library for blas/laplack using a simple command line. You can also find up to date scientific software in the official Gentoo repository.
I don't know about you but I find very useful being able to inspect the code of core libraries and patch it for my needs, if needed.
Just my 2 cents.
You are building a Beowulf cluster? Pretty neat. Can you imagine a Beowulf cluster of those?
Just to clear out a misconception that arises from time to time: you do not need an X server on a server exactly in the same way you don't need a web browser on your HTTP server. To understand that, you can think of an X server as a "browser" for the X protocol. On the server you just need some support libraries (which help applications in talking the X protocol).
I've read through the suggestions, and many good ones have been posted.
But, we live in the cloud age. I'll suggest taking a look around to see if your compute requirements could be met by using an available resource in the cloud. The opportunities there are exploding, and hard to gather info on as its fast moving, but *if* your compute can be made to fit in and around something like cloud foundry, Azure, or perhaps Hadoop or other number crunching cloud ops, - the advantage is they only charge for what you use. So you can ramp up or down (at least this is the theory) your compute power to a greater degree and with more flex than you can by building all your own nodes.
(And I know, the question poser asked Linux, but with compute and cloud sometimes its good to move away from platforms, and focus on the actual compute needed. You care about the calc, and not about which flavour it crunches on..
Just my tuppence in this complex question..
We`re all equal
I'll leave the clustering distro advice to others, but if I understand your needs regarding X-windows, what you need is an X server running on your windows (or other ) client machine so that the program running on the cluster can display on your desktop/laptop. The X programs may need appropriate libraries, but you don't need an X server running on the cluster.
See Xming for a good, free, open source X server for windows. There are other options available, but that's what I use, and find it to be stable and reliable. (For a Windows program... )
Then use putty to SSH to your cluster, with X11 forwarding to your locally running X server.
WALSTIB!
you will be much more comfortable with debian squeeze, RPM sux
It's a FAQ there, but you really should be asking this on the beowulf list, after skimming the list archives for any of the eight and a half million answers (in gory detail) that have been posted there in response over the years. Slashdot has plenty of nerds and I'm sure a lot of cluster geeks (who are likely on the beowulf list) but the beowulf list is sort of distilled cluster geekery/wisdom.
http://www.beowulf.org/mailman/listinfo/beowulf
rgb (Google "rgb duke beowulf" if you like -- I used to help answer this question once a month a few years ago on list, although I'm too busy and less active now.)
Even when the experts all agree, they may well be mistaken. --- Bertrand Russell.
The smallest glibc distro I know. Doesn't come pre-configured with cluster tools, doesn't even have prebuilt packages for them. But, it'll easily compile most of the software you require (C++ is one exception, I had to rebuild the compiler), and, most importantly, has a build system you can use to put together your own .iso which can be installed in under 5 minutes, probably even less.
Has recent 2.6 kernel and latest glibc, which means it'll also run executables built in other equivalent distros. I've run the Sun (oh, Oracle....) JVM with it, no modifications required.
I'll preface this by saying that I'm an HPC admin for a major national lab, and I've also contributed to and been part of numerous HPC-related software development projects. I've even created and managed a distribution a time or two.
There are two important questions that should determine what you run. The first is: What software applications/programs are you expecting the cluster to run? While some software is written to be portable to any particular platform or distribution, scientists tend to want to focus more on science than on code portability, so not all code works on all distributions or OS flavors. Small clusters like yours often focus on a few particular pieces of scientific code. If that's the case for you, figure out what the scientists who wrote it use, and lean strongly toward using that.
The second question is, who will run it? Many small, one-off clusters are run by grad students and postdocs who work for their respective PI(s) for some number of years and then leave. In this scenario, it's important to make sure things are as well-documented and industry-standard as possible to ease the transition from one set of student admins to the next. (And yes, PI-owned clusters have a surprisingly long lifespan. Usually no less than 5 years, often longer.) To that end, I strongly recommend RedHat or Scientific Linux.
We, and most large-scale computational systems groups, use one of two things: RHEL and derivatives, or vendor-provided (e.g., AIX, Cray). We run CentOS but are moving away from it ASAP. The Tri-Labs (Livermore, Sandia, and Los Alamos) use TOSS, which is based on CHAOS (https://computing.llnl.gov/linux/projects.html), which is based on RHEL. Many other sites use Scientific or CentOS. Older versions of Scientific deviated more from upstream, which caused sites like us to use CentOS instead. That's no longer true with SL6, and since CentOS 6 doesn't even exist yet (and RHEL6.1 is already out!), there are strong incentives to move to SL6.
Let me address some other points while I'm at it:
Why RHEL? If you can run RHEL itself, do so. RHEL isn't built with the same compilers it ships with; the binaries are highly optimized. Back when we were working on Caos Linux, we did some benchmarks that showed RHEL (and Caos, FWIW) to be as much as twice as fast as CentOS running the exact same code. So if performance is a consideration, and you can afford a few licenses, it's definitely worth considering. The support can be handy as well, particularly if this is a student-run cluster.
Why Scientific Linux? If you need a free alternative to RHEL or are running at a scale that makes RHEL licensing prohibitive, SL is the way to go, without a doubt. It's maintained professionally by a team at Fermilab whose fulltime job is to do exactly that. They know their stuff, and they're paid for it by the DOE. Other rebuild projects suffer from staffing problems, personality problems, and lack-of-time problems that SL simply doesn't have.
Why not Fedora? Stability and reliability are critically important. Fedora is essentially a continuous beta of RHEL. It lacks both the life-cycle and life-span of a long-term, production-quality product.
Why not Gentoo? Pretty much the same answer. The target audience for Gentoo is not the enterprise/production server customer. Source-based distributions do not provide the consistency or reproducibility required for a scale-out computational platform. You'll also have a hard time getting scientific code targeted at Gentoo or other 2nd-tier distributions.
Why not Ubuntu or Debian? Ubuntu is a desktop platform, not a server platform. Again, it boils down to their target market. There's really no value-add in the server space with Ubuntu, so why not just run Debian? If Debian's what your admins know best, it's worth considering, but keep in mind that very, very few computational resources run Debian, so you may have to do a lot more fending for yourself if you go that route.
Why not SLES? Mostly a pers
Michael Jennings | HPC Systems Engineer, Lawrence Berkeley National Lab | Author, Eterm (eterm.org)
I will shout out some random distro and not say anything else to back it up!
I would strongly recommend that based the distribution in what is the scope of that server (apps server, SOX server, DR Server, etc) and from that standpoint start to make an screening of the different distribution across the board, but do not based your decision in an "user friendly" distribution.
You should keep in mind distribution support, community, knowledge database, maintenance, etc.
Red Hat is quite a good choice.
Just my two cents
having designed and built super computers that have been in the top500 go for a rhel based derivative like "rocks clusters" or scientific linux.
128 cores is a bit small and probably doesnt justify setting up a "proper" cluster with submit, compute, data, etc nodes. But i would recommend rocks cluster highly as you won't have to mess around with all the sysadmin stuff too much.
Why i recommend a rhel derivative primarily because most (paid) packages came in this format and are fully supported, you dont want to waste time diagnosing weird nuances - it is a waste of time, effort and performance.
Recommending ubuntu - seriously, I am a major ubuntu fan but for a cluster? no way. the problems in a cluster are completely different from a server install or a desktop install - it will cause so much grief in the end youll regret the choice.
The basic principles of most cluster distributions is managing jobs, managing resources and that nodes are disposable (as in, if it fails you can rebuild it entirely with a single command). cluster distributions allow this by centrally managing every aspect of it. to rebuild a node you simply power it on, it generally pxeboots and boot straps a partition on the hd, if there is no partition is downloads an image for that specific node installs it and configures it, no hassle no fuss no thought. the only thing you need do is specify it's mac address in a file.
managing software is such a pain, you want something simple and easy to deploy packages. this is where a rhel based one distro works well as most commercial software (such as portland compiler) come in a nice easy rpm with vendor support. the packages you will want or need are already setup and ready in those distributions - you won't need to jump through hoops to get it working.
resource management and job scheduling: go for something big - i'd recommend sge/oge, and try to avoid pbs based ones (such as open pbs and maui/torque). Times have changed and the free pbs ones are grotty. even paid versions aren't too great.
simply put, go for "rocks clusters" or scientic linux using sge. don't try and go at it alone like some people here are suggesting - its too much hassle and can quickly grow beyond what you'd be able to manage without help.
a bit of background: ive been doing this on and off since 2001 and done it in to the hundreds of thousands of cores scale. ive designed and built almost every configuration.
The problem with that suggestion is that the people maintaining the code don't have a clue what QA means. And before people whine - I used Gentoo as my primary distro for around three years. The emerge system is great - but the data inside is crap.
If you want to build your stuff from source and actually have a working system, look at the Debian-based distros. There's this nifty "apt-build" thing that lets you build software with whatever compile options you want (so you can still do -O3 -funroll-loops on everything if you really hate memory), just like Gentoo does. And there are packages for just about everything; partially because Debian's been around forever, and partially because "just about everyone" uses Ubuntu now. Gentoo does have a few "hacking" apps which are hard to find on other systems, but that's irrelevant to this discussion (and BackTrack is the way to go for that stuff anyway, IMHO). The primary difference is that you can build with source code that will actually work, and probably won't blow your system up when you just do a routine update. Wheras with Gentoo, some random kid who's too 'leet for testing might just promote to stable a new version of Xorg or Apache (both real examples from experience) which works fine on his system but breaks everyone else's in the world. And by "might" I mean "will". :)
I'm posting that mostly because quite a few Gentoo users think that only Gentoo (and maybe some of the BSDs) can easily rebuild a system from source, so they put up with atrocious quality assurance (which is admittedly extremely difficult given the Gentoo user base, and supposedly has gotten better) because they don't know that there are quite usable alternatives that are also more mainstream.
SuSE Linux Enterprise Server is a proven HPC solution if management kicks up a stink and demands commercial support.
I've worked with SLES and have been fairly happy with it.
Scientific Linux is probably the optimal choice though ;)
128 cores isn't enough to worry about - just install a distro you like and feel comfortable maintaining. although 128 cores isn't many, you should probably think about the style of install you want. lots of people seem to like diskful installs - afaikt purely because it's familiar. most significant clustering sites use diskless (NFS root) though, because it's so much easier to maintain. there's never any question of nodes getting out of sync. traffic due to NFS root is trivial. another best-practice is to configure 1-2 admin nodes (no users, provides NFS, scheduling and monitoring services), one or more dedicated login nodes, and discourage users from touching the compute nodes directly (among other things, give them non-routable addresses.) get or make a ticket system to keep track of user and system issues. monitor the heck out of your systems.
I'm an HPC center admin and system programmer, 10+ years. I think we've been in the top 50 several times.
Quantian live cd has backported Openmosix kernel and beaucoup science and math goodies.
http://dirk.eddelbuettel.com/quantian.html or if website is down just google " quantian" and check out the cached page.
I found it not hard to use, when I last used it a few years ago, which was a nice feature when Openmosix was in progress.
Frankly I miss it.
*Repent!Quit Your Job!Slack Off!The World Ends Tomorrow and You May Die!
Just imagine a Beowulf cluster of bullcrap!
Can you imagine the licensing costs of Windows bullcrap(tm)? At least with Linux it's free...
There's no place like
That goes on your remote hosts. The servers will be at run level 3, and this cluster should not be outward facing anyway.
As the most knowledgable IT person on the team, you should know this.
I maintain multiple ~100 core clusters and made extensive use of a few other clusters each with 10k-100k cores. RedHat, CentOS, Debian, Suse, Cray Linux, etc. They all use something slightly different.
What it really boils down to is this:
(a) Is it Linux?
(b) Are you, as the primary maintainer, comfortable with it?
If your distro of choice answers "Yes" to both of those, then you've made a decent choice.
If you end up going to tens of thousands of cores, the choice will make more of a difference. But at your scale, it's really just what you're comfortable with.
Actually, in the realm of biomedical supercomputing, "none of the above" has already been done. Check out the Anton supercomputer designed and built by D.E. Shaw Research. The entire supercomputer, right down to all of the processor cores themselves, were specially designed and built specifically for molecular dynamics research. The system has no operating system and, as such, no overhead. Every processor cycle goes straight into the calculations. It is capable of churning out simulations of 150,000+ atom protein complexes on the order of several microseconds long, using wallclock CPU time of a few days.
I've had some experience setting up very small computing clusters, but it's given me some insight into how things work on bigger clusters as well. There *are* some specialized cluster-oriented distros. My experience with them, though, is that they're flaky and out-of-date and lacking in documentation. Actually, you will have trouble finding documentation in general; there's some good docs on how to choose hardware and a very broad view of cluster design and administration. There's some cluster schedulers and management software that has good highly-technical documentation available with the sources of some clustering software, but there's very little documentation that covers the middle ground of choosing what software to use to administer your cluster and how to integrate it together.
My advice is: go with a general purpose operating system you are reasonably familiar with. Give up on user friendliness. Your clustering software will not be user friendly. It was not designed that way. At best, it was designed to be easy to administer or flexible in it's application by users; at worst, it's just an arcane mess of utilities that keep getting used because once you know how to use them, they work, and nobody has sufficient time or determination to replace them with something better. The applications you run on your cluster aren't particularly likely to be user friendly either. That said, the extent to which red hat or debian are user-hostile is not-so-apparent to the end user, only the system administrator.
Getting a distro with a stable base is a must; Debian stable, or a RHEL derivative is probably your best choice, at least if you're only considering a Linux-based OS. You will almost certainly end up having to compile and install your own software and the smaller the changes in the base packages on your system, the less extra effort it is for you to maintain your custom-packaged software.
To the extent possible, you also want to avoid relying on esoteric software for critical parts of your cluster infrastructure. For example: AFS has bunch of nice features as a clustered file system, but unless your *really* need them you will be greatly increasing your sysadmin's stress-level by making him try and get everything else you choose to use integrated with AFS; NFS sucks in a lot of ways, but it's the "standard choice" for networked storage on unix, so there's less software that will integrate poorly with it and lots more documentation explaining on how to set it up and administer it. It's a fact of life that you will *have* to run esoteric software on your cluster, but you don't want to do it unless you really need to.
Lastly, if you want one knee-jerk explanation of what to choose for an HPC cluster in terms of the software stack:
torque - cluster scheduler
- use the integrated job scheduler, forget about building the tcl gui, it's *much* harder to use than the CLI tools
centos - operating system, maybe scientific linux or debian instead
nagios - use this to monitor your cluster
nfs - to share files between cluster nodes
samba - to share files with windows, if you can't get by with ftp/sftp/scp
ldap/nis - if you can completely secure your network, nis is easier to use that ldap, even though it *is* completely insecure
nx/freenx - for remote graphical login; on a LAN you can get away with VNC but nx performs much better over a DSL connection
If you are in a Debian shop, use Debian.
If you are in a RedHat shop, use RedHat.
The main reason is that if you already have other large clusters running $distro then someone already figured out deployment, maintenance, package management, drivers and a hardware vendor. Picking up a new distro throws aside major piles of work that have already been done and gives you the pleasure of re-inventing the wheel.
I'm guessing you'd rather get your cluster up and running, doing Real Work, rather than spending a bunch of time getting user authentication working correctly. Especially when somebody already did that.
Also, keep in mind that you do things like go on vacation and get sick, so the other people who are intimately familiar with whatever you already have can help out (and you can help them, too)
As an admin of a cluster that has evolved and changed over the last 7 years, I think I can help a bit. That being said, you truly failed at defining your real needs. I understand liking to stay with what people know, but from an end user point of view with interactions with a cluster, the only interface that they use is the only thing that needs to stay similar, and you can almost certainly use the same interface on any other linux distribution. So that said, what are you currently using to submit, queue, and schedule jobs? There are a few proprietary solutions out there and several open ones. There is Grid Engine (or Sun/Oracle Grid Engine), PBS, Maui, Torque, and several others out there. That should be the only real interface that an end user should have to the cluster.
Now comes the second question I have. Why do they need to be running X? I can understand having the X server installed and all the libraries, but you absolutely should only be running your servers at run level 3 (i.e. command line). You can still run applications if you set your display to a remote X server as the output device, in this case one that you run on windows desktop like Cygwin, or Xming. All you do by running X.org on the cluster nodes is waste about 1 gig of memory and 5-10% CPU resources, which could be utilized by your end users' jobs/applications.
Third, what kind of applications are your end users running? Are they real parallel environment applications using some sort of MPI (LamMIP/OpenMPI), or off the shelf products like Clustered Matlab (which actually uses MPI, but it is built into the product already, you just need to configure it properly)? Or are you really just running lots of batch jobs which may or may not be multithreaded applications, but do not do any intra-node communication?
Fourthly, how are you monitoring your existing cluster? Are you using something like Ganglia?
Finally, what kinds of third party software do you need to be able run/use? Is there anything that is commercial which may have limited support to specific linux distributions?
All of those things are questions that you need to really answer in order to recommend a distro.
All things being equal, personally, I would deploy a cluster using "Rocks Cluster" distro. It is designed from the ground up to be a easy to maintain and deploy cluster distribution. There are plenty of HPC specific packages/application/libraries available to be deployed on the nodes. "Rolls" are available, which basically contain a group of packages/applications/tools which are typically used together, or otherwise easily configure/install software that is required on each system, possibly with some complex interactions (for instance there is a "Ganglia" roll, which easily installs the Ganglia cluster monitoring software and automatically sets it up based on your Rocks installation. There is a "BIO" roll, which contains many open source tools and librarys which are useful in doing biological research clusters, like ClustalW, Glimmer, NCBI BLAST, just to name a few. Then there is the HPC roll, which is just some basic things like MPICH, MPICH2, OpenMPI, iozone, iperf. There is also a roll for PVFS for setting up a quick Parallel Virtual File System cluster).
It is designed from the ground up to be a cluster, not just a bunch of nodes running linux with high speed interconnects. It has management utilities to deploy applications across all nodes at once, quickly install OS on all your cluster nodes via PXE booting the compute nodes. Flash/upgrade the BIOS of computer nodes remotely via PXE boot. Basically it is designed to be managed and maintained as a cluster, not "x" number of individual systems. Seriously consider something like it.
http://www.rocksclusters.org/wordpress/
We were all warned a long time ago that MS products sucked, remember the Magic 8 Ball said, "Outlook not so good"
Admittedly, my experience is a few years out of date, but it used to be that the immediate answer to this question was Scyld, the direct descendant of the original "Beowulf" cluster created at Goddard Space Flight Center by Donald Becker. We used it for 3d rendering and video processing and it was really slick, and being based on RHEL it was easy to get people who knew how to work on it/software updates/support in forums, etc.
I've only seen one comment in support of Scyld here, has it fallen out of favor for some reason?
If you are planning a cluster for numerical or combinatorial computations the best option is using an nVidia card. You will have about 300 cores for at most $3,000. And there is no special cooling and space requirements.
There are commercial nVidia drivers for Linux but Red Hat may be the best option due to its support.
I really think the cluster epoch is over, now GPU computing si the most economic and efficient way for massive computation.
You can also get AMD cards but nVidia has better technology (CUDA)
I'm a first timer. This is going to be useful for me too. Thanks for the comments.
Hv downloaded CentOS 5.6 and intending to put OSCAR on it.
But everyone is mentioning Rocks. What about OSCAR? Should I stay away from it?
The problem with that suggestion is that the people maintaining the code don't have a clue what QA means.
Gentoo developers don't maintain code, they maintain software packages. That means our main objective is to distribute the software to the users without minimal modifications, so that you get pretty much what upstream developers distribute. The only exceptions are when a patch that fixes a bug, security vulnerability or even a build problem, is available, then we would try to integrate it earlier than upstream. We also ensure the build system works, and we have documented policies on how to do that. Besides the regular stabilization process, there is also a QA team responsible for checking minimal ebuild (the portage recipes on how to compile software) quality. I had some commits reviewed by them so believe me when I say they are quite picky:
http://www.gentoo.org/proj/en/glep/glep-0048.html
And before people whine - I used Gentoo as my primary distro for around three years. The emerge system is great - but the data inside is crap.
For someone using Gentoo for around three years, that doesn't seem like a very insightful answer does it? What is the "data inside"? Are tou referring to ebuilds? :)
If you want to build your stuff from source and actually have a working system, look at the Debian-based distros. There's this nifty "apt-build" thing that lets you build software with whatever compile options you want (so you can still do -O3 -funroll-loops on everything if you really hate memory), just like Gentoo does
It is not the same thing. I was told by some people that they needed to be constantly asking the sysadmins to install the development packages of scientific libraries in Suse Linux (same applies to debian), depending on the use case, that can take weeks.
And there are packages for just about everything; partially because Debian's been around forever, and partially because "just about everyone" uses Ubuntu now.
Debian is not very famous for up to date software or is it? Sure you can add alternate repositories but you don't need to do that on Gentoo..
The primary difference is that you can build with source code that will actually work, and probably won't blow your system up when you just do a routine update. Wheras with Gentoo, some random kid who's too 'leet for testing might just promote to stable a new version of Xorg or Apache (both real examples from experience) which works fine on his system but breaks everyone else's in the world. And by "might" I mean "will". :)
Obviously the Gentoo's policy on package stabilization can't catch every single package problem out there. The policy was somewhat made to allow a fair trade-off between stability and availability. We could increase stabilization times but software would be available in a less timely manner...
Major breakages are not only Gentoo developer's fault. Sure sometimes a Gentoo developer messes up and makes its users rebuild the entire installed software, but most of the times are either bad decisions from upstream developers or because a major change (which breaks stuff) is really needed. If you understand how library linking and versioning works, I don't think I have to explain further..
Oh and by the way, latest versions of Portage have a nice feature called "preserve-libs" which prevents breakage if the API of a library changes..
I'm posting that mostly because quite a few Gentoo users think that only Gentoo (and maybe some of the BSDs) can easily rebuild a system from source, so they put up with atrocious quality assurance (which is admittedly extremely difficult given the Gentoo user base, and supposedly has gotten better) because they don't know that there are quite usable alternatives that are also mor
... because then you can take advantage of the cluster. It would take seconds(or minutes if installing KDE) instead of days and weeks for the compilation... don't forget MAKEOPTS="-j129" as the manual states ...
Hello,
Thank you all for the informative replies, this will help us in deciding what to use.
It seems that Redhat or a variant thereof is what most of you agree is good, so we will probably go with one of those. Especially since that is what we have used in the past.
The reason for having X is that we work in X, some of the software we use need that for various reasons such as plotting. This will only be used on one node. Since this will be a small cluster (probably 4 boxes with 32 cores each) we do not intend on building a separate box for running X. We might use one of the old boxes for X, but I think we still would want the same dist on all of them for simplicity. (Oh, and to those who asked: these will be in racks and not used for desktops)
Answer to another question that came up: This is for use at a university, we will be using it mainly for (nuclear physics) simulations/calculations based on Monte Carlo methods.
Again, many thanks!
Currently supports RHEL, spotty support for Fedora, Scientific Linux, CentOS, SuSE Linux Enterprise 11, Windows 2008 and up, and ESXi 4 and up.
Debian and Ubuntu have made appearances in trunk, but I haven't tried it out personally yet.
XML is like violence. If it doesn't solve the problem, use more.
After 7 years working at CERN on the GRID project for LHC, i would recommand scientific linux, the target of this distro is to run the largest GRID in the world gLite on a free redhat, you have also in the EPEL repository many tools for large scale computing maintain by the GRID team in CERN and Fermilab.
Can you imagine the licensing costs of Windows bullcrap(tm)? At least with Linux it's free...
Until you need technical support or drivers for new devices.
Pigskin-Referee
Linux: Yesterday's technology, tomorrow
If you do not have cutting edge devices on you system, FreeBSD might be a good choice. It is quite stable although the number of devices it supports is somewhat limited. It also offers a fairly good support system.
Pigskin-Referee
Linux: Yesterday's technology, tomorrow
The distro does matter, often in ways not particular to being a cluster, but perhaps in ways making it easy to manage in general. For example, I'm moving away from Ubuntu (server) because it is too hard to selectively upgrade a single package or group of packages without imposing an upgrade on other packages. This is where "hand holding" has turned into "wrist crushing". So I'm moving to Slackware (which is getting a lot more capability through the SlackBuilds community).
now we need to go OSS in diesel cars
Bah.
Any statements about 'up to date' software immediately shows a glaring lack of comprehension about code stability.
Debian is only behind, if you like to use beta quality software. People with server farms, managing large quantities of data, don't WANT the latest and greatest, they want STABILITY. Stability is thousands, yes thousands of times more important than new features in code.
Gentoo has its place. However, that place is not anywhere near a data center, not anywhere near a corporate office, not anywhere near a server farm. Anyone with any competence in the real world won't use Gentoo for serious work, for the reasons listed above. Frankly, if you show me a resume with the word "Gentoo" on it, you're not going to get hired.
Gentoo's very nature ensures that it will *ALWAYS* be a BETA or even ALPHA quality build product. That's not because it's compiled from source, that's because of the way Gentoo manages packages, and because of a dozen different things that are in other projects to work towards stability. Gentoo seems to think that nothing is more important than the latest and greatest.. and as a result....
Well... instability is what you get.
(before people get all silly about this, that doesn't mean Gentoo doesn't have its place. However, stop trying to tell me that a home-built car should be deemed street worthy -- without even having to abide by the current legislation for street worth cars!)
(Lastly -- comments from the above post, such as "We could increase stabilization times but software would be available in a less timely manner..." and "Gentoo developers don't maintain code, they maintain software packages." and "but most of the times are either bad decisions from upstream developers or because a major change (which breaks stuff) is really needed." shows how stability is the last thing on a Gentoo package maintainer's mind...)
Competitive market pricing for technical support or drivers (hardware vendors often provide) for new devices is available for Linux, BSD....
MS, Apple, Oracle... and hardware vendors will at their discretion provide the same as L/FOSS at a higher non-competitive price and bug-fixes or crap-design current WinOffice toolbar when/if they want.
Closed-crap software is never competitive, but is customer-hostage focused for gross-profits and low MOTSSS overhead.
IOW: If you want pfuck yourself, but don't phuck US, EU, or others.
Unaccountable leaders are masters, and unrepresented people are slaves. How do US and EU fare?
If you have experience with a particular distribution, go with that. I set up a 512 core cluster using Debian about five years ago. If you go that route, I suggest using FAI for installation. That way you can re-image your systems on reboot and easily keep things up-to-date and make config changes system-wide by just rebooting your nodes. Many software packages both commercial and open source are RedHat focused. I had to create my own deb packages for many softwares. This trend is not as strong as it was before, but RedHat still dominates the software world. Take that into consideration and know what you're using it for. As for building a RedHat cluster, I can't comment on that like all of the others who have never built one. I don't have enough experience to give any thoughts towards it.
I am sorry but your reply is sliding a little into a flame war and possibly out of context, so I'm going to stop right here.
You're specifying a solution before you seem to really articulated your requirements. For example, you have identified the following:
1) ...new cluster (smallish, ~128 cores)
2) It has to have an X-windows server
3) Implied use of a Linux distribution
These are all different aspects of the solution. What you should first do identify and document some use cases and some performance requirements. The closest you come to this is:
A) User familiarity
B) Remote access
But these alone are insufficient to justify any expense implementing a new system. Therefore, I suggest you don't upgrade and instead use the current system.
Rocks is awesome. The bittorrent based install system scales beautifully across standard installs. To push out software updates you just update the package description and then re-install the whole cluster. The whole process took about 12 minutes for 40-something blades and should take a similar amount of time for many many more due to the peer-to-peer install process. *highly* recommended.
That wouldn't be true for one app running a few very taxing queries, but in our case it's getting hammered by dozens of apps running hundreds of smaller queries per second, which parallelizes rather well.
I'm sure I'll get hate for pointing this out but its true: Linux is free if your time is worthless. I've have looked into offering Linux as an alternative OS in my little retail shop for years, and every year i find nothing has changed. Until Torvalds either retires or someone fires his irritating ass so that Linux can FINALLY, after everyone else (Solaris, BSD, OSX and Windows, hell even OS/2) has had them for over a decade, get a stable kernel level hardware ABI so drivers don't shit themselves and die every time Linus gets an itch to fuck shit up in the kernel, then Linux will remain a black hole of time wasting where you have to spend days or even a week or more every six months doing the "forum hunts" trying to find "fixes' for the multitude of drivers Torvalds breaks constantly.
Which makes sense if anyone would look at it from an engineering instead of religious dogma perspective, as you are talking about literally tens of thousands of drivers nearly all of which have to interact with a kernel that Torvalds treats as his personal plaything and with little regard to the thousands of man hours he is pissing away, not only by all the developers that have to go in and ifx what he has broken but in all the hours users waste with forum hunts.
Not to mention how many Linux users it ultimately ends up costing because mom&pop retailers like me, the kind of guys YOU NEED to get on board Linux, as we have NO real ties or support from MSFT and in our position could really help Linux with sales and after sale support who stay away from your OS because with all the man hours forum hunts suck up it makes Linux literally MORE expensive than Windows. My time is a minimum $35 an hour, at the rate it only takes 2.5 hours to make Linux more expensive than Windows 7 HP and I can easily waste half a day on a forum hunt, what with searching for, tweaking, and multiple attempts to get said fix working.
So until the day comes I can sell a box with Linux on it and be confident that the drivers will continue working for at LEAST three years minimum, preferably five, then "Linux is free if your time is worthless" is simply the truth for all those that do this as something other than a hobby. Home users aren't gonna learn CLI to apply fixes, neither are SMBs and SOHO, and they sure as hell aren't gonna sit around with a list of make/model/rev of every piece of hardware they have in order to do forum hunts.
I want Linux to succeed in the desktop and retail markets, I really really do. I grew up in the days of GEM and Commodore and having lots of choices, and I believe lots of choice makes for a healthy and vibrant ecosystem. but someone is gonna have to face the fact that Torvalds is a douchebag. It is all well and good he invented the kernel, but it ain't 1991 anymore and Linux isn't just some plaything for Torvalds to futz with and share his changes over IRQ. The kernel is the heart of a multi-billion dollar OS, being counted on by millions, yet Torvalds treats it NO differently than he did at the beginning.
And before anyone says "LTS" let me say LTS is a bad joke. As long as much software is tied to which kernel you are using LTS is a codeword for "run out of date and possibly insecure software" and it is just ridiculous. It ain't 1991 folks, having drivers shit themselves and die is simply unacceptable in this day and age, especially when your competition gives on average a decade of support for their OS. Frankly this problem would be trivial to fix with a stable ABI for drivers, but Torvalds and his ego won't admit he made a mistake. The current way was fine when it was a hobbyist OS, it simply isn't anymore. Now you either shell out for expensive enterprise gear (which negates any savings by going Linux) where a team of developers have to constantly fix drivers for the life of the contract, or you are SOL, because you'll be wasting time on forum hunts. Sorry but that is just unacceptable.
ACs don't waste your time replying, your posts are never seen by me.
Until you need technical support or drivers for new devices.
I know math is hard--but let's see if I can break it down for you. Microsoft is $259 per incident. Ubuntu us $320 per year.
I know the Ubuntu number looks bigger until you realize that you can call 50 times for that same price. Calling 50 times for Windows would cost you just under $13,000.
But who calls for tech support for Linux anyways? I build and maintain ubuntu-based firewalls, spam filters, mail servers, virtual servers, and VoIP servers. I've never had to call for support.
There's no place like
I'm sure I'll get hate for pointing this out but its true: Linux is free if your time is worthless.
You misunderstand time then. It costs me nothing but time to setup a linux workstation for my wife. I spend under an hour, and she has a clean, non-virus-infected netbook. If I went the Windows route (because as you say, my time isn't worthless), I have to go shell out ~$150ish for Windows for her netbook...and I have to go work for almost a full day in order to pay for that. So not only do I waste an hour of time installing it for her, I waste a day working on my day job to pay for it. No thank you.
As for the business world, would you rather pay someone $1,000 for a mail server install or $5,000 for a mail server install. In case you're confused, the $1,000 install is entirely paying for my time to setup whatever mail options you want. In the case of the $5,000 install, $3,500 is for Microsoft software licensing and $1,500 is for my install time with the options Microsoft lets you have.
But whatever--keep telling yourself that linux sucks because of ABI breakage instead of the real reason: you want to keep your drivers closed source so you can lock users in, and you're too slow or stupid to recompile your drivers.
My wife has been running Ubuntu for 4 years now, the only time she called me was when her SSD died. She did the upgrades too. So where's the problem?
There's no place like
First, thanks for a rational response to a topic you probably find marginally offensive, given your implied role with the Gentoo project.
I reject the idea that packagers are only responsible for making a package compile with minimal changes. Someone needs to be looking at the whole picture instead of focusing on their small slice of the world, and packagers are in the best place to do that (or at least play a huge role in that). I see that constantly in my day job (enterprise security) where every business area only cares about their piece of the pie, completely ignoring (or just not understanding) how their slice fits in to the whole picture. It's frustrating there, and frustrating in my OS. :)
I thought it somewhat contextually obvious that the "data inside" referred to the primary data source for portage, but yes, "bundles of source code and supporting files known as ebuilds". :)
I don't know what you're talking about with the reference to Scientific Linux (RedHat-based) and SuSE (which is, I suppose, SuSE-based); neither of those are debian-derived; both are RPM-based distros that I dislike. :) Debian and Ubuntu are common Debian distros. Here's the first useful Google result related to apt-build - https://nigibox.wordpress.com/2009/10/01/apt-build-%E2%80%94-optimize-your-debian/ - I'd suggest reading about it more, and about apt in general. At a high level, you can pin package versions from multiple repositories with apt, and you can rebuild everything from just one package and its dependencies up to the whole darned system with apt-build. Portage is a cool system, but if you look in-depth, the apt/dpkg world has a very comparable feature set. It does not suck nearly as much as rpm (even with yum/yast wrapped around it), or other package systems like pkgtool, or whatever it was that Stampede used (it's been a while since an i686-native distro was a novel idea), or HP's POS swtool, or AIX's lpp format, or...
Debian Stable isn't known for up-to-date code, as that branch's goal is somewhat obviously "stability". You can use "unstable" and get very up-to-date code, or you can use a derivative like Ubuntu for a pretty good compromise in between. :)
I will wholeheartedly embrace the idea that many (in fact, most) Gentoo problems are user problems. But there are still way more problems than acceptable which are issues which maintainers should have caught. I'm willing to grant that it's way hard to catch the problems I find unacceptable - between upstream changes and downstream Stupid Users(R), there are just too many variables for anyone to manage. Ultimately it comes down to the distro user's personal level of tolerance; my tolerance is pretty low, but just slightly higher than Gentoo could previously reach. Other people have different tolerance levels, and I don't think they're stupid for using Gentoo. Heck, I support RHEL during my day job, and I *hate* the way RedHat does things (both in the distro and as a business) - but I don't for a moment think my employer is stupid for wanting to use RHEL. I endorse the variety of distros and people's choice to use the distro which best suits their needs. I do think that Gentoo fills a pretty narrow niche, though, and that it's a poor choice for environments where stability or reliability are the top priorities. Based on previous experience which may no longer be completely valid - Gentoo only fills a stability need well through the use of a mostly-binary install, and at that point, Gentoo's primary benefits are very much diminished.
I do like the Gentoo philosophy, though, and I've heard that things have turned around after the initial turmoil after Daniel Robbins left. But honestly, Gentoo offers me zero benefits over Ubuntu at this point. I get acceptable stability, and a very flexible build environment in the rare case that I need that. The only decen
whatever you'll do, you will have to work, to learn more about your Linux system, or to subcontract someone to (or buy support) to run your cluster and help your users. The point is then: do you have a budget (time and money) for that? Are you interested yourself to learn more about Linux systems (hence to spend less time on numerical codes or science)? If not, you'll need to pay someone to do the work. If yes, you need to learn a lot.
And here comes the religious dogma I was talking about! Isn't it funny that the argument against having stable functioning drivers always comes down to IDEOLOGY, with the rant most people link to going so far as to call those that refuse to hand over source "leeches" and hope the kernel futzing breaks their drivers?
I mean WTF is it to you if some do and some don't? Is that ANY different than right now? Nope, as you still have companies like Nvidia that makes binary blobs, only now you get to watch them break every six months. Does having open drivers keep Linux from breaking? Nope again as the open drivers break just as often thanks to Linus and his kernel fucking, because if you could look at it logically instead of a faith based perspective you'd see that there are only so many devs, and there are fewer of them than drivers to fix so drivers will ALWAYS be broken when Linus gets a wild hair up his ass, every. single. time!
And allow me to say that if your way "worked" in any kind of reasonable fashion retailers wouldn't avoid your OS like the clap which I can assure you we most certainly do. Not just all the thousands of mom&pop shops, dotting the entire country, but big names like Best Buy, Staples, Walmart, do you think they avoid your OS because of its "quality construction" or a secret conspiracy? NO! It is because they take the box home, run updates when the little icon tells them to and get a broken machine like it is 1993 all over again, and promptly take that broke ass shit back! And since we retailers can't sell used as new that means we take a hit on every return making Linux even MORE expensive!
As a final word allow me to give you proof, undeniable proof like a slap to the face your current way is broke ass shit. Now I'm sure you'll find some excuse, like "Use Distro X" or "You should buy hardware Y" but in the end all you will have is excuses because this proof should make even YOU take note! Now you and your fellow converts think we retailers are just full of it, that it should "just work" right? well when one of the biggest OEMs on the planet has to DISABLE the repos and spend considerable money and man hours keeping a badly out of date "corporate repo" just for their customers because if they don't the drivers WILL break then i'm sure you can see why both little guys like me and big guys like Walmart and OEMs like ASUS have washed their hands of your OS. I mean when fricking netbooks, a class of machine built around Linux strengths and which started out more than 30% Linux ends up completely obliterated by a decade old Windows OS it is high time to ask yourself "What are we doing wrong?" and I'd say basing your OS on religion instead of sound design practices and trusting your customers to make purchases that will benefit them (such as choosing FOSS drivers where possible so they have LTS) is a good example of why Linux is so far behind everyone else, and why even free you are getting hammered by an OS with a $100 barrier to entry.
ACs don't waste your time replying, your posts are never seen by me.
Until you need technical support or drivers for new devices.
I know math is hard--but let's see if I can break it down for you. Microsoft is $259 per incident. Ubuntu us $320 per year.
I know the Ubuntu number looks bigger until you realize that you can call 50 times for that same price. Calling 50 times for Windows would cost you just under $13,000.
But who calls for tech support for Linux anyways? I build and maintain ubuntu-based firewalls, spam filters, mail servers, virtual servers, and VoIP servers. I've never had to call for support.
Actually, support starts at $195 per incident and there are several different plans.
There is no charge if Microsoft is unable to rectify the complainant's problem..
You have conveniently failed to address what I was stating in my original post; ie, the "free" factor evaporates once support is required. In effect both Microsoft and any allegedly "free" OS, the cost is the same for support as long as it is not required.
In 20 years I have never called MS for any technical support. I am able to read and comprehend technical manuals, etcetera rather well. Plus, when a new device is released I do not have to wait for months (years, never) for support for that device with Microsoft. I have customers that demand high quality service.
Now, I do have a FreeBSD server at home. It is a nice hobbyist toy and I do enjoy playing around with it from time to time. However, I would never use it in a mission critical environment.
Pigskin-Referee
Linux: Yesterday's technology, tomorrow
Scientific Linux still hasn't put out the 5.6 release, they instead went for the 6.0, while RedHat is at 6.1 because 6.0 is so very buggy.
And here comes the religious dogma I was talking about!
Yes, your ability to forsee that someone might disagree with you makes you correct.
Isn't it funny that the argument against having stable functioning drivers always comes down to IDEOLOGY,
Funny--I remember some bitching about ABI, but there was a whole ton of other crap you put in there about your ideology. I remember you bitching about having to hunt through forums (ever had to wade through a forum full of Windows noobs? "Uh, I rebooted and it fixed everything."), bitching about your time being oh so important, your retail woes, and other things completely irrelevant.
with the rant most people link to going so far as to call those that refuse to hand over source "leeches" and hope the kernel futzing breaks their drivers?
Nope--I don't call you a leech. I think you have a chosen business model (not to release the source because you want to lock people in), and that's fine. Just don't keep bitching about your inability to keep up. You seem like the kind of person who would have a business model around sending morse code via telegraph and then bitch that the internet costs way too much because the protocols keep changing every few decades (IPv4/IPv6) and you have to upgrade your router and switch your thinking from morse code to SMTP...all the while the telegraph becomes more and more obsolete.
I mean WTF is it to you if some do and some don't? Is that ANY different than right now? Nope, as you still have companies like Nvidia that makes binary blobs, only now you get to watch them break every six months.
Yup--and I don't buy their crap. The machines that I do work with that have nvidia work well though because they are using the open source driver made by the community. And while it has occasional issues too, it gets fixed faster than the Nvidia blob.
Does having open drivers keep Linux from breaking? Nope again as the open drivers break just as often thanks to Linus and his kernel fucking, because if you could look at it logically instead of a faith based perspective you'd see that there are only so many devs, and there are fewer of them than drivers to fix so drivers will ALWAYS be broken when Linus gets a wild hair up his ass, every. single. time!
And allow me to say that if your way "worked" in any kind of reasonable fashion retailers wouldn't avoid your OS like the clap which I can assure you we most certainly do.
Yeah--Amazon really *hates* linux. It's constantly fscking up their retail business... </sarcasm>
Not just all the thousands of mom&pop shops, dotting the entire country, but big names like Best Buy, Staples, Walmart, do you think they avoid your OS because of its "quality construction" or a secret conspiracy? NO! It is because they take the box home, run updates when the little icon tells them to and get a broken machine like it is 1993 all over again, and promptly take that broke ass shit back!
And since we retailers can't sell used as new that means
There's no place like
Actually, support starts at $195 per incident and there are several different plans.
I went to microsoft.com/support and clicked on server support. It starts at $259 everywhere I looked. The same server support for Ubuntu is more expensive for your first incident--but if you have two incidents, you're ahead of Microsoft.
There is no charge if Microsoft is unable to rectify the complainant's problem..
You have conveniently failed to address what I was stating in my original post; ie, the "free" factor evaporates once support is required. In effect both Microsoft and any allegedly "free" OS, the cost is the same for support as long as it is not required.
The free factor doesn't evaporate. Something like Ubuntu still costs $0 while Microsoft's offerings do not start at $0.
In 20 years I have never called MS for any technical support. I am able to read and comprehend technical manuals, etcetera rather well. Plus, when a new device is released I do not have to wait for months (years, never) for support for that device with Microsoft. I have customers that demand high quality service.
Really? You've been able to fix your own bugs with Windows ME, Windows Vista, Exchange 5.5, Sharepoint, etc? You're telling me in 20 years, you've never run into a developer-created bug or that you've magically prayed to the Ballmer and it wasn't an issue? I haven't even done that in Linux. The difference is I can fix most of my own Linux bugs. With Microsoft, you must call them--even if they end up acknowledging it and reversing the support charge at the end.
Which new devices have you used in Windows that weren't already available in Linux?
USB was supported first in Linux.
IPv6 was supported first in Linux.
I know wireless sucks in Linux, but that's because communication with a wireless card isn't a standard like IPv6 or USB is a standard.
Plugging crap into my windows box generates endless popups and disk thrashing while it searches for drivers and usually fails. In Linux, I plug it in and by the time I look back up at the screen, I have a camera or USB drive mounted, or even a bluetooth device ready to use...
Now, I do have a FreeBSD server at home. It is a nice hobbyist toy and I do enjoy playing around with it from time to time. However, I would never use it in a mission critical environment.
Funny--I talked with a guy yesterday who runs a site used heavily by the insurance business. He said when he launched the site he chose BSD because the Microsoft option was prohibitively expensive. He had 5 servers and two load balancers. He said other than the hardware, it cost him nothing. Contrast that with whatever server version of Windows does clustering and load balancing. I remember running it back in the 2000-era (iirc) and it was tens of thousands of dollars.
There's no place like
Do you even hear yourself? The amount of logical hop jumping and plain denial is just astounding! I guess denial isn't just a river in Egypt huh? Because the simple fact that you honestly believe that I should tell customers to learn to recompile their own drivers which BTW won't do SHIT when it comes to some of Linus's serious kernel fucking, is simply beyond ridiculous. How can you stand here with a straight face and claim your OS is ready for the masses ,em>if they need to compile their own drivers
And WTF do Windows forums have to do with shit/ Or Lowes? you NEVER need Windows forums after simply updating the OS whereas you BETTER be ready to spend an assload of time at your distros forums with make/model/rev thanks to updates breaking shit left and right, which again I linked to. This is a classic case of "moving the goalposts" as you refuse to acknowledge, even though I provided links rubbing your nose in it, that even Dell can't keep the drivers working which is beyond insanity! And who gives a shit what some enterprise, which BTW has these things called "admins" that get paid big bucks to deal with broken shit like drivers and has about as much to do with retail PC sales as a car does with an F-16, have to do with this discussion? Did I MENTION anywhere enterprise? Or say in any place that we were talking about, in no particular order, enterprise deployments, servers, routers, cell phones, or any other damned thing that isn't a retail Linux sale? Nope don't think so.
In the end the numbers don't lie. no retail B&M store will touch your OS, and after 20 years Linux is so far behind /. has an article congratulating Linux on reaching a whole 1%! Woo Hoo, and it only took 20 damned years! If the "community" continues like you with elitism, refusing to see problems and correct them, refusing to make things easy for the user, and most importantly refusing to keep Linus from constantly breaking shit, then don't be surprised that it takes Linux another 20 damned years to reach that magical 2%. The simple fact is it isn't 1993, and users aren't gonna jump through flaming hoops simply for "free as in freedom, fight teh power!" bullshit. you gotta be better or at the very least as good, and frankly with the kernel futzing Linux doesn't even rank as high as Windows 98 in my book, MAYBE Win 3.1. Because with Windows 98 I could actually take a RTM and update it to the last patch and the drivers still worked whereas the Linux update notifier may as well be a "Break Linux NOW!" button, for all the broken drivers. It is pretty God damned sad when you can't even run updates without your OS shitting itself, and if the choices are 1.-Give them a broken OS and telling them "RTFM Noob LOL!" 2.-Turning off ALL updates and leaving them as vulnerable as any other unpatched OS, or 3.-Installing Windows and at least having it run until EOL without having broken drivers? Well at $35 an hour it really only takes a single forum hunt to make your OS more expensive than Windows. But you pretend it is all a conspiracy, that we 'just don't understand" your OS. it reminds me of that old joke "I have no friends, Linux has no friends, maybe I can be Linux's friend!" because the public sure as hell ain't touching it!
ACs don't waste your time replying, your posts are never seen by me.
Haha
That was an accurate, yet ballsy post to mention in slashdot of all places. I got flamed and modded down when stating the obvious with Linux myself, yet I still like it as a Server OS. Linux usage has gone down according to statcounter from nearly 1% to .7%. A very big drop thanks to Windows 7. I used to use Linux but finally gave up on it for the reasons you described. I am contemplating installing it today in a VM so I can run a LAMP stack with PostgresSQL as well as Joomla. But I am in the small small minority of users.
Windows has it's weaknesses but being a consumer OS for business and home users is certainly not one fo them. Compiling a kernel is rediculous. I did PC support as a contractor on teh side and only mentioned Linux to a tech shop because the user needed a server for 10 users and didn't want to pay for Windows 2003 Server Small Business Edition and only needed a file server, domain controller, and a simple email and internet site. Linux fit the bill and hosted all 4 nicely, but that was supported server hardware and not for John in the Office to play his games or run Office on his Toshiba laptop with strange/cheap hardware. Dells do not even make good drivers for Windows in my experience and I hate them with a passion. However, since Michael Dell returned the quality has improved tremendously.
I read your posts and you know your stuff. I used to charge $75/hr when I lived in Alaska for the rates and it sounds like the $35/hr might be a little low for your expertise. I never heard of that app you mentioned that kills adware infected with Flash. I will give it a try since I use music on youtube and prefer not to live without it.
http://saveie6.com/
Dell and Asus once sold Linux briefly at BestBuy before they pulled it. Walmart did too. Why?
Because Joe Six Pack became furious as to why MS Office wouldn't work or why his resume created with OpenOffice looked like crap when a potential employer opend it with Word. OR why little Timmy's pc games with DirectX couldn't run on them? ETC.
Not to mention BestBuy realized that consumers buying these cheap linux books would not provide any profit margins by buying anti virus software and printers. They lose money on every machine sold and only make it by spammer you with accessories and software. Bad for retail ...
This is why Windows is here to stay. If you hate it, save up for a Mac. That is a consumer OS as well yet expensive that is higher quality than Linux or Windows. Or get one of those tablets running Andriod. There are options.
http://saveie6.com/
I am an air jordan shoes addicted.I love air jordan high heels so much. With perfect design and charming styles air jordan high heels are utilised as being a classy style wearing.
Way, way back in ye olden days, when I built my first cluster with my boss, we looked at all the distros (the only major change in large distros since then, arguably, is that Mandrake is gone Ubuntu now fills the slot that Debian did) available, and Redhat had some major things going for it (not trying to start a holy war, or be flamebait, so please don't start anything but an actual discussion) :
1) Automated installation - Kickstart was there when no other distro of the time (this was when Redhat Linux - not RHEL or Fedora, but good old Redhat Linux - 6.2 was just being released) had anything remotely equivalent. It's still the best-of-breed for automated, programmable Linux installations, and great now with PXE and TFTP. The only thing I could ever complain about is that the package selection macros could be improved/expanded upon.
2) Package Management - At the time, we decided that RPM and DPKG were roughly equivalent at the time, but RPM was slightly ahead, looked to be moving more in the direction we wanted (look at where RPM and YUM are nowadays - I actually wrote something almost identical to YUM before it was around, for use by our desktop workstations...it was a lot cruder, but it operated on the same principles of building repositories and automated versus hands-on updating; there was no need to run it on our clusters, since once the nodes were installed, packages were very rarely changed, and when it was done it was a fairly significant event, best done by pushing the RPM out to the /tmp directory and then run a parallel shell command to pull it upgrade or install the package and then erase it if successful; in the rare event of an OS upgrade, we simply marked the node as using the new OS in our cluster management config files, then set it for installation since the local disk was just for scratch space and OS, rebooted the nodes N at a time [so we didn't overload our NFS server(s)], let them re-install, and then we were up and running again), and won out in terms of availability of packages. That last one has really proven true over time - most companies, if they bother putting something into a package manager at all, they almost always pick RPM. Or if they have a plain old-fashioned tarball and installation script (or Makefile, or whatever) you can wrap that up in an RPM. Or just write your own within the spec file, using the nifty RPM macros to help you.
3) Package Availability/Compatibility - Almost all of the major packages that we needed were already available either as part of the distribution or a download away. I keenly remember that I only had to build 4 RPMs for *all* of our proprietary, in-house software to run (and out of those 4 RPMs, 2 were already available as part of the distro. but we needed a different version for library compatibility).
And basically, those core truths still hold true to this day. Admittedly, both SuSE and Debian-based distros (Ubuntu being the most prominent, of course) have improved dramatically, but SuSE was more-or-less based off of Redhat, just with some different design goals at the time (mostly being more user-friendly, I believe, though I never talked to the developers so I can only guess from experience with the various versions of SuSE). These days, RHEL (or CentOS, which is a freely available version of the same; it's RHEL w/out the tags...including the price tag!) is nice if you can afford it (I've worked places where the RHEL subscriptions were donated by the same vendor that donated the hardware, and I've also worked places where we just ran RHEL on the I/O and management/login nodes and then used CentOS on the compute nodes [but this can be irritating because RHEL and CentOS are almost, but not quite, identical - keep that in mind], and I've also worked at places where we used CentOS on the I/O and management/login nodes and Fedora on the compute nodes) because of the stability (but it's also a drag if you have users that want/need the latest and greatest features - I think you might want to look at the EPEL,