Slashdot Mirror


Ask Slashdot: Best Linux Distro For Computational Cluster?

DrKnark writes "I am not an IT professional, even so I am one of the more knowledgeable in such matters at my department. We are now planning to build a new cluster (smallish, ~128 cores). The old cluster (built before my time) used Redhat Fedora, and this is also used in the larger centralized clusters around here. As such, most people here have some experience using that. My question is, are there better choices? Why are they better? What would be recommended if we need it to fairly user friendly? It has to have an X-windows server since we use that remotely from our Windows (yeah, yeah, I know) workstations."

264 comments

  1. YDL by metalmaster · · Score: 1

    Yellow Dog Linux ftw!

  2. RHEL by morcego · · Score: 2

    Redhat Enterprise Linux.

    If you need something cheaper (no licenses), you can always go CentOS. Or you can mix both, having some RHEL and some CentOS machines.

    --
    morcego
    1. Re:RHEL by pavon · · Score: 5, Insightful

      If you need something cheaper (no licenses), you can always go CentOS.

      If you want something compatible with Red Hat but cheaper, you should go with Scientific Linux, which is the same sort of idea as CentOS, but has more timely releases, and is used by other major clusters, like the ones at Fermilab and CERN.

    2. Re:RHEL by Anonymous Coward · · Score: 0

      Why bother recommending a distro without stating why it's useful for research cluster? RH is for business, it's package handing is a joke even today. The main reason why people use RHES is because it has associated costs, and that gives the PHB's a fuzzy glow.

    3. Re:RHEL by Anonymous Coward · · Score: 0

      RHEL is fine, CentOS is just awful, and anytime someone offers up CentOS as a substitute for RHEL, I wonder of they've ever used CentOS. Watch for circular dependencies and lots of unavailable packages. You'll be installing most things from dag.wieers.com, and I feel bad for the guy paying for so much bandwidth for hosting rpms for this tripe of a Linux distribution. If you don't want to pay for support (and I mean RHEL, because YaST's finicky handling of configuration files make SLES un-sysadmin-friendly IMO) then go with Ubuntu server. Everything you could need is an apt-get away, rather then google-the-wget away with CentOS and dag. I know my situation isn't a cluster, but we're running 20 Ubuntu servers in 15 colos currently, and our experience has been by far the best with Ubuntu.

    4. Re:RHEL by slmdmd · · Score: 1

      If performance is number one priority then I would say, you should compile your own kernel. The standard distros have a lot of fat in the kernel. I have not tried many distros, so let other slashdotters pick a distro. Take the base distro and then begin kernel compilation, cut out all the drivers for which you don't have the hardware. For example, cd writer driver, tape drive, network card drivers except for the one you use and many many other stuff.. It will take few iterations to get the desired kernel. Enterprise versions are outdated by years, for example compare RHEL kernel with Fedora 15 kernel version. From my experience "enterprise" means crippled distro yet 10+ times expensive.

    5. Re:RHEL by morgan_greywolf · · Score: 2

      RHEL is fine, CentOS is just awful, and anytime someone offers up CentOS as a substitute for RHEL, I wonder of they've ever used CentOS. Watch for circular dependencies and lots of unavailable packages.

      I've never seen that problem with CentOS.

      Everything you could need is an apt-get away, rather then google-the-wget away with CentOS and dag. I know my situation isn't a cluster, but we're running 20 Ubuntu servers in 15 colos currently, and our experience has been by far the best with Ubuntu.

      The problem with Ubuntu for scientific computing is that many commercial scientific computing packages have runtime dependencies on old, outdated libraries found in Red Hat-based distros, but aren't available on Ubuntu without compiling from source. I used to admin 2 large compute clusters for a Fortune 100 NASA contractor, so I actually know what I'm talking about.

    6. Re:RHEL by b30w0lf · · Score: 5, Informative

      Agreed.

      A primary component of my job is the design and maintenance of high performance compute clusters, previously in computational physics, presently in biomedical computing. Over the last few years I have had the privilege of working with multiple Top500 clusters. Almost every cluster I have ever touched has run some RHEL-like platform, and every cluster I deploy does as well (usually CentOS).

      Why? Unfortunately, the real reasons are not terribly exciting. While it's entirely true that many distro's will give you a lot more up-to-date software with many more bells and whistles, at the end of the day what you really want is a stable system that works. Now, I'm not going to jump into a holy war by claiming RedHat is more stable than much of anything, but what it is is tried and true in the HPC sector. The vast majority of compute clusters in existence run some RHEL variant. Chances are, if any distro is going to have hit and resolved a bug that surfaces when you have thousands of compute cores talking to each other, or manipulating large amounts of data, or running CPU/RAM intensive jobs, or making zillions of NFS (or whatever you choose) network filesystem calls at once, or using that latest QDR InfiniBand fabric with OpenMPI version 1.5.whatever, it's going to be RHEL. That kind of exposure tends to pay off.

      Additionally, you're probably going to be running some software on this cluster, and there's a good chance that software is going to be supplied by someone else. That kind of software tends to fall into one of two camps: 1) commercial (and commercially supported) software, and; 2) open source, small community research software. Both of these benefit from the prevalence of RHEL (though, #1 more than #2). If you're going to be running a lot of #1, you probably just don't have an option. There's a very good chance that the vendor is just not going to support anything other than RHEL, and when it comes down to it, if your analysis isn't getting run and you call the vendor for support the last thing you want to hear is "sorry, we don't support that platform ." If you run a lot of #2, you'll generally benefit from the fact that there's a very high probability that the systems that the open community software have primarily been tested on are RHEL-like systems.

      Finally, since so many compute clusters have been deployed with RHEL-like distros, there is oodles of documentation out there on how to do it. This can be a pretty big help, especially if you're not used to the process. Chances are your deployment will be complicated enough without trying to reinvent the wheel.

    7. Re:RHEL by Anonymous Coward · · Score: 0

      CentOS (and presumably RHEL) are great choices if you want to limit the kinds of applications you can run. The repos are kept quite out of date, even if you add EPEL, so you may not be able to run recent scientific packages. Building everything from scratch is the way to go anyway right?

    8. Re:RHEL by nukem996 · · Score: 1

      I second this mainly because it sounds like you don't have much experience in setting up a cluster. By using RHEL you get tech support which may help when your stuck. If your company doesn't want to pay for it CentOS is good because I beleive you can just pay for RHEL support and Redhat will support it.

    9. Re:RHEL by Anonymous Coward · · Score: 5, Informative

      I used to be on the CMS/LHC team at Fermilab. We used Scientific Linux on the 5500 Linux workers used for collider event reconstruction. SL is built with computing clusters in mind. I highly recommend it.

    10. Re:RHEL by morcego · · Score: 1

      However, if performance is your number TWO priority, and stability is your number ONE, you should use the stock kernel. If you are using an enterprise distribution, that is.

      --
      morcego
    11. Re:RHEL by futurekill · · Score: 2

      I'd go with SL as well...CentOS is currently experiencing some organizational turmoil that really makes me doubt its future.

      --
      The gates in my computer are AND, OR and NOT; they are not Bill.
    12. Re:RHEL by Have+Brain+Will+Rent · · Score: 3, Funny

      Just for a moment there I saw 5500 students/programmers/supportfolk/etc. all sitting in little cubicals slaving away at some problem after having had "Scientific Linux", aka the whip, "used" on them....

      Then I finished parsing the sentence.

      --
      The tyrant will always find a pretext for his tyranny - Aesop
    13. Re:RHEL by WuphonsReach · · Score: 1

      RHEL is fine, CentOS is just awful, and anytime someone offers up CentOS as a substitute for RHEL, I wonder of they've ever used CentOS. Watch for circular dependencies and lots of unavailable packages.

      The only way you get into dependency hell in CentOS over RHEL is if you don't know what you're doing and how to control pulling packages from non-standard repositories.

      And from the sounds of things, you're the type who adds a 3rd party repository and pulls everything in, instead of using the "includepkgs=" line to *only* pull in things from the 3rd party repository that you absolutely need.

      --
      Wolde you bothe eate your cake, and have your cake?
    14. Re:RHEL by greg1104 · · Score: 4, Informative

      Not just currently. Today's organizational turmoil within CentOS is nothing compared to when they lost access to much of the infrastructure a few years ago. I just wrote a blog entry on the rise of and fall of CentOS; the theme is why it's important to build an open community, not a tight clique, if you want an open-source project to scale.

    15. Re:RHEL by cloudmaster · · Score: 2

      I don't build or support HPC clusters in my current job, but in my previous job working for the people who probably made the CPU in your cluster, I did. :) We specifically did performance testing on the hardware before it was released to the public. We did the testing with RHEL and SLES because that's what pretty much everyone who built clusters does. Now, "everyone does it" doesn't mean it's the best, but just like nukem996 said below me, it does mean it'll be best supported. If you have a problem and search teh intarwebs, you'll probably ifnd a solution geared towards RHEL. Ubuntu is getting pretty popular, and you're also apt to find solutions for problems you run into on a Debian-derived platform as well. IMHO, it's easier to rebuild components on Ubuntu that RHEL, but that's mostly because apt > rpm. :) But really, pretty much everyone uses RHEL, so CentOS FTW.

      Or just stay on Fedora, since it works and is pretty much the same as RHEL except for being more current. :)

    16. Re:RHEL by shinzawai · · Score: 0

      You are an idiot.

    17. Re:RHEL by dbIII · · Score: 1

      I think the main reason for that is that some of the expensive commercial software that runs on clusters will only run properly on RHEL or similar. One maddening example required an old version of RHEL only because the horrible Macromedia flexlm licence manager (designed to punish only the honest) would not run on anything newer - the actual software suite that we had paid for would run on any version of linux on x86 or x86_64. RHEL has plenty of stuff built in to be compatible with old binaries which is not something the other distros have had to worry about.
      In short, if the applications you are running have a long history and are bundled with things that are effectively abandonware then RHEL and Centos save a lot of messing about and looking for old versions of libraries. There are still some portions of the 2011 release of some very expensive software used where I work which will only work on 8 bit colour displays! That's how much cruft is in these things.

    18. Re:RHEL by hawkinspeter · · Score: 1

      If you need to be running a closed-source package, then it would make sense to run the OS that the package was written for.

      If you're not tied to an awkward close source package, then I'd recommend Ubuntu server - you don't need to worry about license fees and yet you can still get all the security patches (if you want them, I'd imagine that a compute cluster wouldn't be internet facing).

      --
      You're a temporary arrangement of matter sliding towards oblivion in a cold, uncaring universe
    19. Re:RHEL by RogerWilco · · Score: 2

      If you run a lot of #2, you'll generally benefit from the fact that there's a very high probability that the systems that the open community software have primarily been tested on are RHEL-like systems.

      We used to run SLES and OpenSuse on our old cluster, and switched to Ubuntu for our new one. We had several reasons for that:
      1) We found that most developers are on a flavour of Ubuntu. It's really become the nr1 desktop distro.
      2) We made an inventory of what was available though rpm and though apt, and a lot of packages that our users needed were available though apt but not rpm.
      3) CentOS and Scientific Linux were also considered, but seemed to be lagging behind Ubuntu in what versions of packages were supported.

      The problem seem to have, that's apparently less in some other research projects, is that the developers are using a lot of relatively new packages and libraries, and therefore need quite up-to-date distros. Reason 2 meant that if we went with an rpm based distro, we would need to build much more packages from source, which would mean a lot of work for the support department.

      We've only had one real problem with Ubuntu: GRID doesn't support it, so it took a lot of time to get that running.

      Our cluster is ~3200 cores, so not really big, but not tiny either.

      --
      RogerWilco the Adventurous Janitor
    20. Re:RHEL by Builder · · Score: 1

      And me without mod points... bah!

    21. Re:RHEL by Anonymous Coward · · Score: 0

      If you want something compatible with Red Hat but cheaper, you should go with Scientific Linux, which is the same sort of idea as CentOS, but has more timely releases, and is used by other major clusters, like the ones at Fermilab and CERN.

      In fact, if you are distro-neutral and want just to minimize babysitting time going forward with RHEL and its derivatives will pay you back in free time.
      The reasons are explained by other posters but, in short, it's the most stressed standard distro in HPC *AND* Grid environments; seeing is believing:
      http://gstat.egi.eu/gstat/summary/GRID/EGEE/ (yes, these are the sites crunching the bunch of LHC data that machinery at CERN is spitting).

      If you have to choose between RHEL, SL and CentOS (==commercial, institutional, community),
      I would recommend to go forward with SL, applicable for most people setting up cluster services in a non-profit environment.
      With CentOS I have seen minor issues (eg. certain rpms may not sit on it due to original vendor strings been modified) but your mileage may vary.

      If you happen to prefer pbs, I'm the author of this tool and I'll be more than happy to assist you through, just because I can:
      http://fotis.web.cern.ch/fotis/QTOP

      cheers,
      Fotis

    22. Re:RHEL by Anonymous Coward · · Score: 0

      CentOS is, effectively, dead, much like Whitebox before it. There is not even a sign of a CentOS 6.0 release, and RHEL 6.1 is already out. Dag Weiers has publicly switched to Scientific Linux.

      Note that if you want actual computational clustering, RHEL does include very good tools as part of their commercial support for overall integration and in particular hardware support for blades, fiber channel support, driver support, and commercial grade failover tools. You will not get this with the "free" distributions except by hiring your own engineering team, and few self-taught engineers have experience in scalable, reliable solutions. I've seen far too many companies spend hundreds or thousands of man-hours of engineering time supporting peripheral utilities like monitoring systems or file servers that could have been done much faster and cheaper with a plug-in commercial solution, _without_ all the fancy add-ons of clients who didn't realize they don't benefit from those features.

    23. Re:RHEL by slmdmd · · Score: 1

      I second this mainly because it sounds like you don't have much experience in setting up a cluster. By using RHEL you get tech support which may help when your stuck. If your company doesn't want to pay for it CentOS is good because I beleive you can just pay for RHEL support and Redhat will support it.

      It is opensource. If you are not able to support yourself then I am afraid you are not adventurous enough to use Linux. Stay in proprietery OS world.

    24. Re:RHEL by Anonymous Coward · · Score: 0

      Agree with Scientific Linux. CentOS is getting slower and slower getting security patches out. SL6 is based on RHEL6 and is quite stable, works well with the major parallelization packages out there, and is generally stable.

    25. Re:RHEL by nukem996 · · Score: 1

      RHEL support is good when a server goes down and you don't have time to google it. You just want to know how to fix it.

    26. Re:RHEL by klazek · · Score: 1

      You just described my years in grad school.

    27. Re:RHEL by rubycodez · · Score: 1

      those that make clusters have contributed to RedHat primarily. While not my favorite distribution, RedHat is for more than just business, it can be tailored to everything from little appliances to mainframes to supercomputing clusters

    28. Re:RHEL by rubycodez · · Score: 1

      the repos are kept full of properly debugged and time-tested software. basing a project on bleed-edge shaky things is fine for the hobbyist, but stability is more important when spending millions of dollars on something that will take over a year to build anyway. Newest isn't always best.

  3. PelicanHPC GNU Linux (formerly ParallelKnoppix) by Anonymous Coward · · Score: 0

    http://pareto.uab.es/mcreel/PelicanHPC/

  4. Re:None of them by el_jake · · Score: 0

    Just imagine a Beowulf cluster of bullcrap!

    --
    In order to form an immaculate member of a flock of sheep one must, above all, be a sheep.
  5. Scientific Linux by stox · · Score: 5, Informative

    Built for that very purpose.

    --
    "To those who are overly cautious, everything is impossible. "
    1. Re:Scientific Linux by Browzer · · Score: 1

      care to provide a link to that "informative" claim, and please don't say OpenAFS.

      thanks

    2. Re:Scientific Linux by boristhespider · · Score: 5, Informative

      Being in academia and spending time in a lot of departments I can at least confirm that a large number of departments are running Scientific. I've worked in Britain, the USA, Canada, Norway and Germany and while Germany (predictably enough) has a hankering for SuSE, the others have a tendency to run Scientific.

      I did type in a long and boring anecdote about my experiences administering things running SGI Irix and Solaris back in the day, but wiped it when it began to look a bit incriminating and for all I know my ex-boss reads Slashdot. So I'll summarise as "don't administer SGI Irix or Solaris if you can avoid it". I'm no computer scientist, so maybe people who are better at it have no problems, but as a vaguely-competent scientist with an interest in computers but little more (like the original poster) I didn't get on with either of them. Red Hat was fine, and we hung Fedora machines off our central network and that was OK even though it was Fedora Core 1 with all its teething problems. And Scientific is very widely used in academia on big networks.

    3. Re:Scientific Linux by afidel · · Score: 1

      Hmm, Solaris 9 and Redhat 6 were so close that I could often take config files from one and make minimal changes and run them on the other. Of course our Solaris install was running the optional gnu utilities so that helped =)

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    4. Re:Scientific Linux by jd · · Score: 2

      Rocks is another good distro for this. It's designed specifically for cluster use, with packages pre-built with that in mind.

      It also depends some on what clustering system you're using. If you're wanting to use MOSIX or Kerrighed, then use a distro the one you want to use is well-tested on. Kernel patch conflicts can otherwise make things very difficult.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    5. Re:Scientific Linux by DriedClexler · · Score: 0

      Gee, thanks. I must have missed the ten other mentions of Scientific Linux in this discussion.

      Skim the fucking discussion before you post.

      --
      Information theory is life. The rest is just the KL divergence.
    6. Re:Scientific Linux by slaughts · · Score: 1

      Of course Rocks uses either Red Hat or CentOS...

    7. Re:Scientific Linux by jd · · Score: 1

      Yes but that doesn't mean everything was compiled with RHEL's standard compile flags or that different patches weren't included/excluded. I haven't looked at the specifics of Rocks' packages for a while, but I can trust they're tried-and-tested in cluster environments. RHEL may be 100% identical in all respects, but then it might not be. All I'm really sure of is that Red Hat's QA won't have put clustering as high on their list of things to test against.

      That's no criticism of Red Hat. They've a finite number of testers in-house and a finite time before a release. It's natural that they test as much as they can but slant it towards those buying RHEL. Generally, academics and scientists building moderately-coupled clusters on a shoestring budget won't be RHEL's biggest customers. CentOS - much the same applies, in that biggest user groups are going to take priority and smaller (albeit highly interesting) groups will get some support but not as much.

      I'm fairly certain MOSIX compiled against RHEL is available, so that should be just fine from that standpoint if MOSIX is the way the person wants to go. I just don't know what distros have tested against kernel patches for clustering, or which ones they have tried if they have. There are so many and there's a high risk that some will conflict with patches the distro does use. (It's what made the FOLK project such a bugger to maintain. Conflicts are rife on patches that aren't even in the popular patchsets.)

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    8. Re:Scientific Linux by Anonymous Coward · · Score: 0

      Didn't your grandmother ever tell you "If you don't have anything nice to say, then SHUT THE FUCK UP"

    9. Re:Scientific Linux by stox · · Score: 1

      Actually, I run a 500+ seat callcenter on OpenAFS. Makes my life much easier than the Samba implementation it replaced.

      --
      "To those who are overly cautious, everything is impossible. "
    10. Re:Scientific Linux by swv3752 · · Score: 1

      Get a damn clue. The Grandparent posted at almost the same time as the one earlier post for Scientific Linux, and much earlier then anyone else. Perhaps idiots like yourself should first learn how to read threaded discussions before posting.

      --
      Just a Tuna in the Sea of Life
    11. Re:Scientific Linux by DriedClexler · · Score: 0

      No, I scrolled past about five mentions before I saw the GPs. Should have just voted redundant I guess.

      --
      Information theory is life. The rest is just the KL divergence.
    12. Re:Scientific Linux by Anonymous Coward · · Score: 0

      How 'bout you use me for your next cluster? Huh?!

    13. Re:Scientific Linux by stenWolf · · Score: 1

      Rocks is no distro, it's a toolchain built on a rhel-like distro (usually CentOS), with quite horrible documentation.
      As long as you use the vanilla deployment you're fine, everything nice and easy. Try doing anything not covered in the open docs and you're stuck knee deep in "user-list only" documentation for some truly weird design choices. Oscar used to be better, but the lack of development for a long time really made a dent.
      Personally I like xCAT2, but am thinking of trying out quattor as well. Both primarily support rhel or rhel-like distros.

    14. Re:Scientific Linux by boristhespider · · Score: 1

      In which case either Red Hat evolved significantly from Red Hat 6, or I'm a bigger idiot than I thought... :) (Or our Solaris wasn't running the GNU utilities. Given that I had no clue where anything was I suspect this to be so...)

    15. Re:Scientific Linux by Skapare · · Score: 1

      How about I abuse you, instead.

      --
      now we need to go OSS in diesel cars
  6. Re:None of them by Anonymous Coward · · Score: 0

    Your in-depth analysis is invaluable to the discussion.

  7. What do your sysadmins know? by Anonymous Coward · · Score: 0

    If it was my choice, I'd go with Debian. I used RedHat for a few years in the '90s, then went to Debian after getting tired of dependency hell, and have not gone back since. Have worked in CentOS recently - not as bad as it used to be, but I still prefer Debian. Debian and its derivatives (like Ubuntu) were reported as being the "most important" Linux distro a couple months ago.

    1. Re:What do your sysadmins know? by rubycodez · · Score: 1

      please provide a list of all the supercomputer clusters running Debian in this world? I know the weather services in the Phiippines and Germany operate a Debian cluster for forecasting, but are there any others? The cluster Sysadmins of the world seem to go with RedHat or derived

  8. Look at Rocks by Anonymous Coward · · Score: 1

    It isn't about the OS, it is about the tools to manage it. Rocks is based on Centos, and helps you run the cluster.

    http://www.rocksclusters.org/

  9. NPACI Rocks by rmassa · · Score: 5, Informative

    NPACI Rocks is probably your best bet. http://rocksclusters.org/

    1. Re:NPACI Rocks by FromageTheDog · · Score: 1

      This. Rocks makes it so ridiculously easy to set up a cluster that administration literally can be a single-person job.

    2. Re:NPACI Rocks by esten · · Score: 1

      Yep. Currently as a grad student I am managing a 128 proc cluster with 20 nodes which is shared between several groups. I knew a little bit of linux when I started and nothing about cluster and really have had very few problems with the cluster. Rocks makes it really simple

    3. Re:NPACI Rocks by daemonc · · Score: 5, Interesting

      Seconded. I used Rocks to build clusters for the university for which I worked, and it made my life much, much easier.

      If you are already familiar with Redhat administration, you'll be happy to know Rocks can use either Redhat or CentOS as its base OS.

      It uses meta-packages called "rolls", which completely automate the installation and configuration of your computing nodes. There are rolls that include most of the commonly used commercial and Open Source HPC software out there, or you can "roll" your own. Basically you just configure your head node, and then adding a compute node is as simple as setting the BIOS to boot over PXE, plug it in, and done.

      Rocks, well, rocks.

      --
      All that we see or seem is but a dream within a dream.
    4. Re:NPACI Rocks by sirsnork · · Score: 1

      This, ignore any other suggestion and use Rocks. You're doing yourself a disservice if you don't at least look at it.

      --

      Normal people worry me!
    5. Re:NPACI Rocks by Anonymous Coward · · Score: 0

      Also NPACI Rocks open an easy door to start using OpenCL or CUDA (like it or not GPUs gives a huge boost to some gpu ready apps)

    6. Re:NPACI Rocks by TooMuchToDo · · Score: 1

      Rocks only goes so far though. If your cluster is small, it works like a champ. You want any heavy level of customization and it quickly becomes unwieldy. Once you go above 1000 nodes, want to have your nodes on public network addresses instead of an RFC1878 block, etc, you might as well put TFTP/Kickstart/etc together yourself.

    7. Re:NPACI Rocks by Anonymous Coward · · Score: 0

      I can't say that I'm too familiar with it (only a summer intern), but when I was doing aircraft modeling and simulation work for the USAF, we used Rocks as well. Even got to set it up on an old cluster.

    8. Re:NPACI Rocks by Junta · · Score: 1

      I'd advise xCAT at that scale. Particularly nowadays it obviates the need to tftp even the kernel and initrd, which speeds things up and makes netboot a lot more reliable. You can put together the same stuff manually with iPXE added to your equations, but xCAT2 represents a codebase embodying the experiences of a lot of people over time and wrapping it up under convenience commands. rinstall takes your kickstart file, makes it unique for the node, puts everything in place for PXE and the installer to function, and runs.

      --
      XML is like violence. If it doesn't solve the problem, use more.
    9. Re:NPACI Rocks by Anonymous Coward · · Score: 0

      Rocks works especially well if you need a very standardized system with a lot of cool tools and a pretty darned good e-mail support system. The guys behind it are knowledgable and clueful, and a lot of the users are very helpful as well. Last time I looked, Ricks uses CentOS or RHEL as its underlying distro. However, there are also some limitations in Rocks. If you want to do something they didn't imagine, getting a Roll that supports it (Rolls are the easiest way to add a feature, utility or operation in Rocks) means writing your own, or finding a third party one that works. There are some third-party sources that are support-based, and if you're like we are, we have a cadre of linux-savvy admins but not much spare change to buy a roll.

      Short form: We started with Rocks because I felt it'd make our support of the cluster easier and standardize things better, but migrated away from it when some of the things that make it so useful conspired to make it hard for us to do things as new software (e.g. OFED, kernels, etc.) came out.

      We have stabilized, like so many here, on CentOS. We want stability, and do not usually need and certainly don't want, the bleeding edge updates. Did I mention that we, and our users, value stability?

      I'm aware of at least one Teragrid site that did essentially what we did: Started with Rocks and kept some of their utilities that made life significantly easier. We, and they, then developed our own utilities for PXEboot reloads and new install of nodes. I'll admit ours isn't as nice and simple as the Rocks version, but it works for us, and allows us to customize it more readily.

      My recommendation: New to HPC, building your first cluster and short of experienced manpower (and your boss wants it up as soon as the hardware is on the loading dock, without contemplating getting you help to rack), I'd go with Rocks in a heartbeat. In fact, I've got one cluster I manage that's still on Rocks because it has such straightforward uses and users, it doesn't need customization outside the norm.

    10. Re:NPACI Rocks by Anonymous Coward · · Score: 0

      As much as Rocks is very simple after a fashion, there's something about it's design philosophy that just, well, bugs me (no pun intended). I really hated it when it was Redhat-only and the OS was much more rigidly tied to the management, now I just feel vaguely uncomfortable using it, like if my (now ex) girlfriend packed my bag for a trip. Everything appears to be there and appears to be okay, but there's a certain nagging feeling there nonetheless...

      I'd take RHEL/CentOS/Fedora using xCAT (funny how they changed the name from eXtreme Cluster to eXtreme Cloud Administration Tool - what is it with everyone having to throw the word "cloud" into everything - we're mostly educated people, we can look at software and see whether it will work for a cluster that's being used to provide cloud services or not!) over a Rocks install any day.

  10. Scientific Linux by Skapare · · Score: 5, Informative

    How about Scientific Linux?

    --
    now we need to go OSS in diesel cars
  11. Ubuntu 10.04 LTS by GoNINzo · · Score: 0

    Ubuntu 10.04 LTS, accept no substitute. Maybe use a Ubuntu 10.10.2 desktop to manage them, it's easy to use. (11.04 is still unstable, IMO.)

    It actually all depends on what packages you plan on running. Then cross reference that against what your options are, I think you'll run out of options quickly, TBH.

    And you just need putty on the windows side. But, if you have to, all of them can run x-windows generally these days. I just find Ubuntu to be the easiest, plus their packaging system is the best, being Debian under the covers.

    Please note, these are opinions, and I'm entitled to my informed opinion.

    --
    Gonzo Granzeau
    "Nothing the god of biomechanics wouldn't let you into heaven for.." -Roy Batty
    1. Re:Ubuntu 10.04 LTS by tokul · · Score: 1

      Ubuntu 10.04 LTS, accept no substitute.

      So what user should do? Use substitute of Debian or accept no substitute.

    2. Re:Ubuntu 10.04 LTS by Anonymous Coward · · Score: 0

      I don't think you know what the word "substitute" means.

      Heh . . . Captcha: "Purity"

    3. Re:Ubuntu 10.04 LTS by Hatta · · Score: 3, Insightful

      Why? I know Ubuntu is the standard recommendation for grandma these days, but what makes you think it's particularly appropriate for a computational cluster? For instance, do you really need GNOME on a high performance cluster?

      --
      Give me Classic Slashdot or give me death!
    4. Re:Ubuntu 10.04 LTS by sirlark · · Score: 2

      I have to disagree. Ubuntu has a nasty habit of letting non-mainstream, non-desktop related bugs pass through several release cycles. We've just this last week spent 3 full days trying to figure out why my perfectly working NFS boot over PXE cluster broke when we did a safe upgrade. Turns out there's been a bug in portmap since lucid, which still exists in natty which causes the NFS rootfs mount to fail. We had to to recreate the filesystem from scratch and install lucid without updates, then hold portmap back manually (after much trial and error to find out which package was breaking). I've had other issues with Ubuntu server too, so to me this is not an isolated incident. I wouldn't recommend Ubuntu for any scientific work, and especially not something as 'unusual' (read -> not desktop oriented) as cluster.

    5. Re:Ubuntu 10.04 LTS by cloudmaster · · Score: 2

      While I'd probably still recommend RHEL/CentOS/Rocks/whatever, to answer this specific question...

      Ubuntu is an easy-to-use polished layer on top of Debian's unbeatable history of Doing Shit Right. Yes, there are some mistakes in their history like everyone else, so skip the "but in 1996 Debian did some obscure thing wrong" and "one time some boob screwed up the random number generator in ssh" - but overall, Debian is an incredible base for just about everything. Ubuntu takes Debian's inherent coolness and then makes releases more than once a decade. :) With the Ubuntu route, you get a number of different kernels which would benefit HPC applications - like a 32 bit kernel with the large memory support enabled, a kernel with RTC support, etc. You get the ability to install the ubuntu-minimal version (use the alternate installer) which is smaller than the minimal version offered by the other popular distros, and then install the packages you want. You get the QA benefits of using a distribution that has a *lot* of eyes upon it. And you get apt-build so you can recompile and fairly easily package things up.

      Someone supporting HPC clusters shouldn't just pop the graphical install disk in and take what's installed; there is a fair amount of cutomization which should be done (ideally through Cfengine). So, while Ubuntu does have a nice, really easy install process for Grandma, there's an incredibly powerful and configurable architecture underneath that unassuming front end. If you have the deep knowledge required to understand why one distro really is better than others, it's actually worth taking the time to read through the documentation in the Ubuntu wiki, learning how all the different Debian things work together, and generally spending the time it takes to seriously use and inspect Ubuntu behind-the-scenes. It's nicely architected because the Debian people - weird as they may be - have spent decades building a very well designed platform that people like Canonical can extend.

      It's really not as bad a choice as one might think if all they know about Ubuntu is that it's easy to install and use out of the box. :)

    6. Re:Ubuntu 10.04 LTS by cloudmaster · · Score: 1

      So, there's been a bug for years, but you just hit it recently? Sounds like a new bug. ;)

      (I'll pretend I haven't seen all sorts of problems with NFS root on different Ubuntu releases for the last several years; the bug seems to relate to the way the mounting and detecting-of-mounting works; my name's probably in a few of the bugtracker threads)

    7. Re:Ubuntu 10.04 LTS by sirlark · · Score: 1

      not years, months... The same bug has existed in portmap since maverick and was, I assume, back ported to lucid, which would be why my lucid update broke. The system hadn't been updated since January, which is why I only picked it up now.

  12. CentOS+RocksClusters.org by Anonymous Coward · · Score: 0

    I run a cluster built using RocksClusters.org distribution, which is based off of CentOS. In the past, the previous admin has us running Oscar, but I found its management style too clunky, but that was pre version 6, but then with Rocks we've never looked back.

    I haven't use it, but Rocks also has a visualization roll, that should probably include XWindows stuff. Also check out vnc or nomachine as XForwarding is a chatty protocol and gets a lot of lag once you are off of the local network(ie from home or abroad).

  13. what-will-you-be-computationalizing? by Anonymous Coward · · Score: 0

    "From the 'what-will-you-be-computationalizing?' dept"

    That's a good question.

  14. Scientific Linux by Ether · · Score: 2

    Scientific Linux. http://www.scientificlinux.org/ Has the benefit of RHEL: a stable OS environment without some of the headaches of CentOS. If you have money (you probably don't) RHEL is good.

    --
    --I hate people when they're not polite -"Psycho Killer", Talking Heads
  15. Fedora by tanawts · · Score: 2

    Fedora has components to help manage large deployments. https://fedorahosted.org/spacewalk/ It also has FreeIPA to help with a secure and scalable means of managing authentication/authorization/resources within the cluster. http://freeipa.org/page/Main_Page

    1. Re:Fedora by Fjandr · · Score: 1

      Fedora goes to they other extreme from CentOS. The update cycle is too short, which means you have increased worry about instability. Stuff just breaks sometimes, even though it's a good distro on the the whole for many purposes. I'd assume stability is a top priority for someone putting together a cluster.

  16. Requirements by hawkbat05 · · Score: 1, Insightful

    I think an important question here is why was Red Hat chosen for the other clusters? Your requirements aren't very specific, there are hundreds of distro's that could meet your criteria.

    1. Re:Requirements by Skapare · · Score: 1

      Red Hat gets chosen by people wearing suits and ties because there are people working at Red Hat who wear suits and ties.

      --
      now we need to go OSS in diesel cars
  17. Not an X Window Server by Anonymous Coward · · Score: 0

    In the X Window model, the computer with the display is the server. Client programs connect to the server in order to display on its screen.

  18. Mac! by EonsWrath · · Score: 1

    MAC! wait, what?

  19. Which editor should he use? by Albanach · · Score: 1, Funny

    Now we have such a clear winner on the choice of distro, perhaps we can discuss which would be the best editor on the cluster?

    1. Re:Which editor should he use? by Kamiza+Ikioi · · Score: 1

      vi

      --
      I8-D
    2. Re:Which editor should he use? by Anonymous Coward · · Score: 0

      nano. hands down.

    3. Re:Which editor should he use? by Anonymous Coward · · Score: 0

      emacs

    4. Re:Which editor should he use? by gilleain · · Score: 1

      Now we have such a clear winner on the choice of distro, perhaps we can discuss which would be the best editor on the cluster?

      Sounds good - and finish up with a reasoned, polite exchange of views on which programming languages to use on the new cluster?

    5. Re:Which editor should he use? by Anonymous Coward · · Score: 1

      Nothing that belongs to the 80's.

    6. Re:Which editor should he use? by Anonymous Coward · · Score: 0

      Pfft!

      If you can't do it in Emacs, it ain't worth doin'

    7. Re:Which editor should he use? by Ender+Wiggin+77 · · Score: 1

      Editor should be SlickEdit. Programming language should be JavaScript execution of Java code converted with GWT. That's a science experiment right there.

    8. Re:Which editor should he use? by Anonymous Coward · · Score: 0

      C# and x86 assembly.

    9. Re:Which editor should he use? by betterunixthanunix · · Score: 1

      The POSIX standard editor of course.

      --
      Palm trees and 8
    10. Re:Which editor should he use? by ebuck · · Score: 4, Funny

      vi

      Clearly the best choice. It is so heavily optimized that even its name takes up only 40% of the required character of the second best contender, emacs.

    11. Re:Which editor should he use? by afidel · · Score: 2

      None because you don't run interactive processes on a cluster but instead submit a package to the job scheduler.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    12. Re:Which editor should he use? by rgbatduke · · Score: 1

      Now you're just teasing. After the distro wars, the editor wars, you want to start the compiler wars? Oh, wait, half of the respondents are replying with interpreted languages (demonstrating a pretty profound ignorance of 99% of scientific computing) so clearly the question was serious...;-)

      Personally I vote for slackware for a distro, jove as an editor, and do languages other than C (and somewhat regrettably, fortran) exist? They do not. Maybe perl or one of them new-fangled interpreted languages to augment plain old bash/awk/sed for the occasional scripty sort of task. PVM and/or MPI for parallel program support. A nice fast network, plenty of ram tuned to the number of processors and cores that makes sense for your particular parallel/distributed application(s), storage similarly tuned, and there you go.

      I was just kidding about the slackware, BTW, but honestly you can build a decent cluster for most purposes on top of any flavor of Linux, BSD, Solaris, and yeah, probably even MacOS. The primary differences are going to be what apps you have prebuilt and instantly deliverable in the distro's package management repos and scheme, and how easy they are to strip down to a decent, automatically installed cluster node image. Although I did use slackware for my very first linux clusters (after using SunOS and Irix and miscellaneous Unices for some years before). Any distro named after the church of the subgenius can't be all bad...:-)

      One last serious note. Most of the replies above don't pay enough attention to the importance of automatable installation and update methodologies in the cluster distro. I'm a pretty big fan of kickstart and yum (since yum was developed by my colleagues and coworkers in the Duke physics department) as one can use the former and latter together to build very scalable cluster installations and fully automate a whole lot of critical aspects of management. While I do generally think well of Debian, I don't think it can really compete in this one arena, and it is a show-stoppingly important arena (although Debian is so huge that maybe somebody has finally made it as simple and transparent for mass-market installs). For that reason alone I think that any of the RH-derived, yum/kickstart supporting distros are very reasonable choices for a cluster, and that which flavor in particular one might choose depends mightily on the particular kind of cluster you are building and its purpose. I've built clusters on top of fedora with complete satisfaction -- cluster nodes run a stripped down linux anyway and the "rapidly varying" problem is more or less irrelevant to cluster stability as the packages you install aren't, actually, that rapidly varying.

      One advantage of it is that a few tools and libraries you want to be up to date, sometimes -- the GSL comes to mind -- and historically things like SL, Rocks, and RHEL/Centos have actually lagged so that the GSL and certain other tools are out of date by too much, to the point where useful stuff in the current builds isn't there (and is a pain to build yourself). In other words, there can be particular trade-offs in terms of things like device support, library support, and tool support that might affect specific people or clusters built for certain specific purposes, and this needs to be thought about before building on top of e.g. MacOSX (where a lot less prebuilt stuff is likely to be available) or the smaller distros (ditto) or distros that aren't aggressively enough maintained. Anything missing or out of date is more work for you.

      But the standard litany in cluster design is: YMMV. Prototype. If you've got a few cluster nodes to play with, try popping Debian on them, and see how hard it is to install them en masse. Repeat for a few other distros and tools. See what has resources that easily support your target application(s). See which distros have the drivers for the devices and hardware you will be using without headaches or kernel builds on your part (if any). Then pick one not because somebody says "use this, it's the best" but rather because it actually wins against the others in ease of use and completeness for your cluster.

      rgb

      --
      Even when the experts all agree, they may well be mistaken. --- Bertrand Russell.
    13. Re:Which editor should he use? by greg1104 · · Score: 1

      ed is also two characters, and they are adjacent on the keyboard. If you are a touch-typist, it takes 50% of the hands and 50% of the fingers necessary to start that inefficient vi editor.

    14. Re:Which editor should he use? by cloudmaster · · Score: 1

      No, the second contender would be vim. The emacs option doesn't come up until you've dismissed both vi and vim. :)

    15. Re:Which editor should he use? by Anonymous Coward · · Score: 0

      Well, we all know emacs takes 2.5 times as long to type commands as vi, so that's only natural.

    16. Re:Which editor should he use? by TeknoHog · · Score: 1

      ed is also two characters, and they are adjacent on the keyboard. If you are a touch-typist, it takes 50% of the hands and 50% of the fingers necessary to start that inefficient vi editor.

      In the typing standard I've learned, E and D are on the same finger. It is much faster to use two fingers for two keys, so clearly vi wins. Not that either would matter, since I already have Emacs running.

      --
      Escher was the first MC and Giger invented the HR department.
    17. Re:Which editor should he use? by Anonymous Coward · · Score: 0

      I'm always disappointed at how much mileage the old vi vs. emacs trope gets. Honestly, I use emacs, and I can totally relate to people who prefer vi... I find emacs meshes better with how I naturally work and think, so I use it. I can see that some people might have different instinctive patterns that benefit from a different design philosophy.

      My big problem is with all the people trying to get me to switch to some clunky aweful gui based ide, like eclipse.

    18. Re:Which editor should he use? by Anonymous Coward · · Score: 0

      and for only an extra 20% you get the improved version, for FREE!

    19. Re:Which editor should he use? by rrohbeck · · Score: 1

      None because you don't run interactive processes on a cluster but instead submit a package to the job scheduler.

      You can always submit an ed script.

    20. Re:Which editor should he use? by SuperQ · · Score: 1

      As someone who sysadmins for HPC clusters that rank highly in the top500, vi is very much the best editor.

  20. Rocks Cluster uses a modified Centos by w3rdna · · Score: 2

    Centos is modified to be the base OS for the ROCKS Cluster.
    http://www.rocksclusters.org/wordpress/

    1. Re:Rocks Cluster uses a modified Centos by erikscott · · Score: 2

      The Rocks approach is nice for quickly regenerating a failed node. And it's Centos under the covers, as noted, so it's RHEL in disguise. If you're running 16 boxes with dual quad-cores, you'll lose the occasional disk drive. If you run 64 cheap desktops with single-socket dual-cores, you'll lose a disk drive every week or two.

    2. Re:Rocks Cluster uses a modified Centos by 0racle · · Score: 2

      Ok, none of that is true. Even as a troll, that's pretty pathetic.

      --
      "I use a Mac because I'm just better than you are."
    3. Re:Rocks Cluster uses a modified Centos by KainX · · Score: 1

      The Rocks approach is nice for quickly regenerating a failed node. And it's Centos under the covers, as noted, so it's RHEL in disguise. If you're running 16 boxes with dual quad-cores, you'll lose the occasional disk drive. If you run 64 cheap desktops with single-socket dual-cores, you'll lose a disk drive every week or two.

      Of course, if you're using a modern (read: stateless) provisioning system, "regenerating a failed node" simply requires a power-cycle. And you lose far fewer disk drives since they're not used for the OS. And replacing a dead node with a new one is a single command and a power button.

      Systems like ROCKS only seem great if you haven't used anything else. :-)

      --
      Michael Jennings | HPC Systems Engineer, Lawrence Berkeley National Lab | Author, Eterm (eterm.org)
  21. Choose wisely by Anonymous Coward · · Score: 0

    1) You don't need X server. You only need ssh. And if you forward windows via ssh to the windows (or another linux) workstation, you don't need X server running on the server side.

    For stability use Slackware (pain setting up as the package management is weak, but it stays incorruptible for a lifetime). If you seek good package management, no-bullshit straight forward configuration and custom layout (e.g. X server, lightweight or no window manager, network and monitoring daemons), use Arch linux. Just avoid consumer-oriented distros, they usually come with fancy and gigantic desktop environment and confusing graphical configuration wizards.

  22. Distro isn't the biggie, it's the scheduler by javanree · · Score: 5, Interesting

    I've worked with various clusters over the past year.
    The distro doesn't really matter, mostly it's what you feel most comfortable with. I'd slightly favor RedHat Enterprise or a respin of it, since it's easiest in terms of drivers for commercial cluster hardware and commercial software support, but Debian would be just as fine. I would choose a 'stable' distro though, so no Fedora, no Ubuntu (even their LTS isn't exactly enterprise grade compared to RedHat / Suse or even Debian stable) You don't want to have to update every week since this usually requires quite some work (making new images and rebooting all nodes)

    What I found out matters a lot more is the scheduler you will use; Sun Grid Engine, PBS, Torque or slurm to name a few. Every scheduler comes with it strong and weak points, be sure to look at what matters most to you.

    If you are unfamiliar with all of these things, pick a complete bundle like Rocks (it's based on RedHat Enterprise Linux), which makes setting up a cluster quite easy and still allows you to choose which components you want. That'll greatly improve your chance of success. But be warned; it's still a steep learning curve building and specially configuring a cluster. The most time is spent tuning queuing parameters to maximize the performance of your cluster.

    1. Re:Distro isn't the biggie, it's the scheduler by RichM · · Score: 1

      I would choose a 'stable' distro though, so no Fedora, no Ubuntu (even their LTS isn't exactly enterprise grade compared to RedHat / Suse or even Debian stable

      That would depend on if you mean stable = "ancient" or stable = "modern but secure".
      Ubuntu LTS is in the latter category from my experience.

    2. Re:Distro isn't the biggie, it's the scheduler by javanree · · Score: 1

      We're not talking about the average desktop, where apparently loads of bling is 'needed' these days. A cluster is usually for processing loads and loads of data.
      Most in-house software usually compiles just fine on CentOS/RHEL/Suse , getting the compiler suites to work might even be easier than under Ubuntu (10.04 with Intel V11.1 libstdc++5 anybody?) , much professional software isn't even supported under Ubuntu (and specially with scientific software such things DO matter)

      Trying to claim Ubuntu is more secure because it's more recent shows nothing but ignorance.

      What people want in an HPC environment is that it processes data; month after month, always with 100% predictable result and without downtime. A proper enterprise distro gives you that. The 0.01% performance gain in a HPC environment of a newer kernel doesn't make sense compared to the pains of having to roll your own updates because the distro maker stopped the updates. Just look at Ubuntu 8.04 LTS right now to see what I mean, compared to RHEL5 (which is older, but sees much more maintenance) And remember; you don't 'just upgrade' a cluster's distro after a year or two. The OS that you install after it's rolled in usually lasts for the lifetime of the cluster, or has to last for at least 2-3 years.

  23. Re:None of them by blair1q · · Score: 2

    Imagine a Beowulf cluster of BeOS, beotch.

  24. In Houston... by Anonymous Coward · · Score: 0

    Most of the seismic clusters I work with use Centos or Fedora.

  25. This Question by SleazyRidr · · Score: 2

    My comprehension of this question is roughly 'please have a flamewar about the different flavours of Linux.'

    1. Re:This Question by blair1q · · Score: 1

      ..."and whoever is left standing and doesn't have too much shit on him in the end shall be king!"

      Pretty much how all reviews go, minus the fun for the spectators.

  26. X window by turbidostato · · Score: 1

    "It has to have an X-windows server since we use that remotely from our Windows (yeah, yeah, I know) workstations."

    So what? One one hand in order to run Linux graphic apps on Windows you need an X-Window server... on the Windows machine, not the Linux one. On the other hand, how is it that you *must* use GUI-based apps? There's *really* no operational alternatives? (I've been administrating Linux and Unix systems for almost two decades and I never needed -as in "must", GUI-based apps for that).

    1. Re:X window by grimsweep · · Score: 1

      Good chance that the GUI request deals primarily with user-friendly aspects of using the cluster. There are always alternatives to GUI-based apps, but there are plenty of times where using one will save you time and effort. Have you ever tried substituting Gimp with Image Magick? You can't beat the latter for batch image processing, but I wouldn't ask anyone to design a logo with it.

    2. Re:X window by bugi · · Score: 1

      He was asking for X libraries, not an X server. He's covered. No non-embedded distro will ship without such.

      In X-land, the server is what talks to your display device on behalf of your other programs. The server manages the scarce resource. The clients bribe the server for access.

    3. Re:X window by Eponymous+Bastard · · Score: 1

      I've been against this wall before. There are a few things to consider:
      - In a university environment the "compute cluster" is not going to be in a data center far away, but rather in "lab" (read office) with 16 8-core machines, so the machines might actually be used locally either with a monitor for each grad student or a KVM switch for the single student/admin. For newbie admins it's easier to flip the KVM switch and click their way through the admin guis.
      - In a mixed Win/Linux environment, you're right, all you need is a XServer on the windows side, but the only freely available ones are (as of a year ago or so):
        - An old version of MingW that hangs on the current Ubuntu and Debian desktops.
        - Cygwin/X, which is a pain to set up.
      - You can also set up VNC, but the split VNC/unix passwords, painful setup to start x sessions upon a vnc conection, and IIRC having to give everyone a VNC number was a PITA. Not to mention performance differences. Then again, given the requirement, they're probably thinking of something like this. Bring up the X server with a default session and people can connect remotely, or something
      - Not all newbie admins know how to install the desktop environment (gedit, apps, etc) without also installing the X server. Even for those who do, they might just disable starting gdm on boot, so that X is still available if you need it with a simple startx. It won't take up much resources when it's down, and 1GB of disk space isn't much nowadays

      Personally I access servers from either Linux or Cygwin, via ssh and X forwarding, but it's kind of hard to get it into windows people's heads that you use a server remotely without opening a "session" with desktop and start menu remotely. Also, even if the desktop environment isn't installed I usually install gedit for quick edits (I'm an emacs guy, but gedit is easier for the rest of the team, and for small one-line edits easier even for me).

      Though a pet peeve of mine is that there isn't a quick util to bring up the machine's application menu from the command line. Basically, pop up Gnome's or KDE's app/start menu. Why do I have to wade through hundreds of .desktop files to figure out how to run the SAPGui app in my desktop remotely from my laptop? The data is there in a standard format, there's code for it, just not available from the command line.

      Also keep in mind these aren't admins accessing the servers. For example some might want to bring up R's graphical UI and do some work on it. Some of the programs/plugins might make use of the resources on the other machines, but some are just normal desktop programs. If their workstations are windows, they might end up connecting remotely and bringing up an IDE for developing their calculation/analysis routines anyway, serving dual purpose as compute cluster and simple workstations.

      The start menu is a good way to see what is available and isn't in each remote machine. Whether the rest of the "session" framework is useful is another story.

    4. Re:X window by Anonymous Coward · · Score: 0

      I would, but then I'm an asshole.

    5. Re:X window by Dogers · · Score: 1

      For a Windows XServer, try http://sourceforge.net/projects/xming/

      Works great when I've needed it!

      --
      I am a viral sig. Please copy me and help me spread. Thank you.
    6. Re:X window by Doug+Neal · · Score: 1

      NX is also an option.

    7. Re:X window by cloudmaster · · Score: 1

      FYI:

      You can launch vnc via [x]inetd and have it connect to localhost via XDMCP, using PAM for authentication at the chooser you ultimately get. Don't bother with VNC passwords, and that frees you from having to give people specific port assignments. You (generally, depending on how you set it up) lose the disconnected support - but a setup like that is a real nice way to get around the difficulty (and cost) in setting up Windows X Servers; the VNC clients are all comparatively simple to set up and secure. :) And VNC generally uses less bandwidth than X11, at the expense of more processes running on the Unix side.

    8. Re:X window by turbidostato · · Score: 1

      "Good chance that the GUI request deals primarily with user-friendly aspects of using the cluster."

      Probably but then again, there's no exclusive need to use Linux GUI apps. Maybe he can find native Windows apps or browser-based.

      "Have you ever tried substituting Gimp with Image Magick?"

      Surely not, but I've used Image Magick when I needed to manage lots of images. In this context, anyway, I'd favour using a Windows native app for image processing, after all, a PNG is a PNG.

    9. Re:X window by Anonymous Coward · · Score: 0

      Maybe you didn't need it for administration, but a lot of scientific codes use GUIs to view the data afterwards or even for interactive data processing.

  27. Familiarity matters by Anonymous Coward · · Score: 0

    "The one you're (or your team/support staff is) most familiar with."

    If none qualify under that criteria -- you probably want to go for popularity... because that leads to ease of finding support from people familiar with it. In that case, Ubuntu or possibly Fedora or CentOS.

  28. SUSE Linux or SUSE Linux Enterprise by hotfireball · · Score: 2

    If you are OK to go with RHEL, you also can look for SLES: SUSE Enterprise Linux Server. They also have SUSE Studio where you can make your own appliances. If you are large enterprise, they will even give you SUSE Studio appliance to be hosted in-house in your company for your own needs. They also have SUSE Manager — same as Spacewalk, but has more features in it (and is backward compatible with a Spacewalk).

  29. That depends on what you are trying to do... by Anonymous Coward · · Score: 0

    It's like asking "which screwdriver bit is the best to use for driving screws"... You probably have an idea what software you intend to run on the cluster. If any of those are commercial packages, they probably have their support and installation processes setup for particular distributions. It'd be nice if they'd support anything, but in practice, they don't. Consult the vendors of any commercial product you plan to use with the cluster and bias yourself towards whatever they say they will support. It's easier that way.

    Second, if that's not an overriding concern, then pick something that's reasonably popular or at least sticks to mainstream layouts and package availability. The reason is mostly because whatever have a good mind-share is easier to find documentation and support for, not to mention contractors with experience.

    You are lucky in that, at it's core, Linux provides a stable platform with a lot of features that are largely consistent between platforms. There's not going to be wild inconsistencies in performance or hardware support (there's some, but not much) and software is rarely distribution-dependent (though, some may require some configuration/modification if the developer makes too many assumptions about the environment).

  30. Red Hat for support by guruevi · · Score: 3, Insightful

    RH support is phenomenal and that's why a lot of businesses use it. If you want it on the cheap, go with what you're comfortable and have your specific calculation packages built in (Debian if you like apt and open source packages, RPM if you use a lot of commercial packages). If you're looking for performance and specific hardware enhancements, go Gentoo or one of it's brethren. Go with something that you can easily re-image if you're looking for lots of changes in software lineups or conflicts.

    --
    Custom electronics and digital signage for your business: www.evcircuits.com
    1. Re:Red Hat for support by Anonymous Coward · · Score: 0

      Thanks, I needed a laugh.

  31. Scientific Linux 6.0 or RedHat Enterprise 6.1 by Billly+Gates · · Score: 2

    Scientific Linux 6.0 is built on Redhat Enterprise Edition 6 which is highly tested and tuned for server throughout put, power management, and stability compared to a stock vinalla kernel. The performance will be much better than a stock debian stable kernel or Ubuntu for example. Redhat has a bunch of hackers. Scientific Linux includes apps used for scientists which maybe your target market if you are a university too. If your old cluster has scripts and tools optimzied for Redhat and RPMs then makes sense to use a Redhat Distribution base.

    If the scientific apps with Scientific Linux are not being utilized then just buy a license for RedHat Enterrpise Edition 6.1. The licensing fees are affordable if you have the budget for a large cluster and switches. With Redhat Enterprise edition you have support too if something goes down.

    Remember to save a few bucks and go free is silly in an expensive project like this.

    1. Re:Scientific Linux 6.0 or RedHat Enterprise 6.1 by blair1q · · Score: 1

      Scientific Linux 6.0 is built on Redhat Enterprise Edition 6

      So is Scientific Linux 6.0 free?

    2. Re:Scientific Linux 6.0 or RedHat Enterprise 6.1 by blair1q · · Score: 1

      eh, nemmind. my brain wandered while my eyes worked over your other two paragraphs.

    3. Re:Scientific Linux 6.0 or RedHat Enterprise 6.1 by tokul · · Score: 1

      Redhat Enterprise Edition 6 which is highly tested

      Yet they fail to fix their own broken patches applied to some packages for years. Security updates = ok. Fixing things we fubared = no way, need major release for that.

  32. NPACI Rocks by jfp51 · · Score: 2

    NPACI Rocks without a doubt. Red Hat centric, you need to put in some work to understand how it ticks, once you so and set up your cluster properly, it is very solid and reliable.

  33. RPM based distro by Anonymous Coward · · Score: 0

    If a theme hasn't come up yet, go with an RPM or RedHat based distro. I would stay away form Fedora because of how fast development is on it, and go with either Scientific, or CentOS. The skill sets that people have around you for Fedora will translate over just fine since they are all based on RPM. If you want to pay money and get support, then go with RedHat, otherwise CentOS is just RedHat EL but without the RedHat name.

  34. Check you ISV and HW vendor by Anonymous Coward · · Score: 0

    I work in the industry, and know from personal experiences that Linux Distro selection is often almost religious.

    But what you should be doing is check with your ISVs what they support, and do the same with your HW vendor. Try to make a matrix and you may find that the only option is one of the Enterprise distributions.

    If your in the rare situation that you have source code for everything, I suggest you look at Scientific Linux or CentOS.

  35. Heresy, I know, but... by Anonymous Coward · · Score: 0

    Microsoft Windows Server has an HPC (High-Performance-Computing) Edition. Several of the Universities and research companies use this product, and it now can use Windows Azure as on-demand worker nodes. Essentially you could set up only one "head" node and then just pay for computation when needed.
    http://blogs.msdn.com/b/ignitionshowcase/archive/2011/04/20/windows-high-performance-computing-bursting-into-windows-azure.aspx?wa=wsignin1.0

    If you're interested specifically in Linux, I recommend a Beowulf Cluster, which is what we used when I worked at the Space Center in Florida several years ago.
    http://beowulf.org/

  36. Funny how 128 cores used to seem like a lot by Quila · · Score: 1

    I was just pricing 2U database servers that had 32 cores each. A 128 core cluster is now just four small off-the-shelf servers in a rack for less than a hundred grand.

    1. Re:Funny how 128 cores used to seem like a lot by Nite_Hawk · · Score: 1

      I know, it's insane. Once the 16-core Interlagos chips are out you could do the entire thing in a fully populated Dell 6145 2U enclosure (2 nodes).

    2. Re:Funny how 128 cores used to seem like a lot by Bill_the_Engineer · · Score: 1

      Out of curiosity, what did you pick for environmental control (ie. heat) ?

      --
      These comments are my own and do not necessarily reflect the views or opinions of my employer or colleagues...
    3. Re:Funny how 128 cores used to seem like a lot by Quila · · Score: 1

      They're going into one of the dozens of racks we have. We have more of an issue of power right now than heat.

    4. Re:Funny how 128 cores used to seem like a lot by Skapare · · Score: 1

      More cores is not necessarily better. Some software, and even some algorithms, can do poorly with such contentions on memory access. OTOH, others can do much better. You have to understand the software you intend to run on it a lot. Is it Embarrassingly parallel?

      --
      now we need to go OSS in diesel cars
  37. Red Hat Enterprise Server by jschmitz · · Score: 1

    Period.

  38. x-window server? by tokul · · Score: 2

    It has to have an X-windows server since we use that remotely from our Windows (yeah, yeah, I know) workstations.

    Are you sure that you know? You run local x window server on your windows machine when you use x window programs.

  39. Go ask someone who knows what she's doing. by Anonymous Coward · · Score: 0

    Seriously, the distribution is not the issue. You need a good concept (LDAP/SSO, automated installation, patch management, network/clustered filesystems, ...) to be able to manage that beast in the long term. Have someone capable do it for you initially.

    Back to topic, i'm working with all the enterprise crap money can buy. Don't let anyone bullshit you into buying enterprise distributions because of certifications. You don't need that. Debian is a far better foundation both technologically and financially.

  40. Redhat or fedora by Anonymous Coward · · Score: 0

    Redhat or fedora
    good forum support

  41. baremetal by Anonymous Coward · · Score: 1

    http://www.returninfinity.com/baremetal.html

  42. CentOS, Scientific Linux, Ubuntu, Debian by MetricT · · Score: 4, Informative

    I've got 10+ years experience managing a large (2000 core, 1+ PB storage) compute cluster. If you're using one of those annoying commercial apps that assume Linux = Red Hat Linux (Matlab, Oracle, GPFS,etc.), then CentOS or Scientific Linux are the way to go.

    If you don't have that constraint, consider Ubuntu or Debian. apt-get is my single favorite feature in the history of Unix-dom. Plus, there are often pre-built packages for several common cluster programs (Torque, Globus, Atlas, Lapack, FFTW, etc.) which can get you up and running a lot faster than if you had to build them yourselves.

    1. Re:CentOS, Scientific Linux, Ubuntu, Debian by Anonymous Coward · · Score: 2, Informative

      I run matlab instances here on my debian vms - no problems. All in all, we have about 800 machines here over several clusters, and everything runs on debian.

    2. Re:CentOS, Scientific Linux, Ubuntu, Debian by enjar · · Score: 1

      MATLAB does not really care about your Linux distro. I run it on Debian all day long. So do all the other MATLAB users where I work, and there are whole pile of them.

      PBS cares a lot. So much so that it's annoying. They cling to RH and SuSe like a retard to a popsicle.
      LSF is much better. Works fine pretty much wherever. Including Debian.
      Torque works wherever.

      I agree that apt-get rules.

    3. Re:CentOS, Scientific Linux, Ubuntu, Debian by m50d · · Score: 1

      Sure, but when you do have a problem, can you get support?

      --
      I am trolling
  43. Debian by Alex+Belits · · Score: 1

    Debian -- easy to manage, easy to create new packages for, least amount of nonstandard, distribution-specific stuff (except configuration files management, but that is a result of having to keep individual packages' configuration tied to packages).

    --
    Contrary to the popular belief, there indeed is no God.
    1. Re:debian by fak3r · · Score: 1

      while likely seen as a flame by most, I have to wholeheartedly agree with this statement. Also, I built, and am currently running, a 100TB cluster running Debian Squeeze and the GlusterFS distributed filesystem, so I'm not all talk!

  44. Response & questions by multimediavt · · Score: 1

    1. What types of computation is the cluster going to be used for? MD, CFD, ???

    2. What software will be used on the nodes? CHARMM, GAMESS, LAMMPS, NWChem, etc.

    3. Do you have a preference for a Linux distro? If not, it really doesn't matter that much if you are rolling your own cluster and software stack. It will just determine what things are used for package management and what services in the distro you might want to turn off in order to get the most memory for apps and not the base OS.

    4. You should be using SSH as the main interface for the actual compute nodes and maybe (big maybe) have an X server on the login/compile head nodes, but NOT the compute nodes. You want the compute nodes to be as bare as possible to conserve as much RAM and scratch disk space for apps as possible.

    Having said all that, CentOS, Fedora, SuSE and RHEL are probably the most popular on distributed memory clusters today. You will also want to make sure that whatever compilers you are using are compatible with the Linux distro you want to use, unless you are relying completely on gcc or binary applications. I have built many clusters from scratch and can be a point of contact should you have additional questions.

  45. debian squeeze by dermond · · Score: 2

    we run our 320 core cluster on debian squeeze. infiniband support out of the box. the gridengine is a mater of apt.-get install. comes with tons of scientific sofware.

    1. Re:debian squeeze by Anonymous Coward · · Score: 0

      I built a 624 core cluster using IBM HS22 blades and Mellanox Infiniband hardware with Red Hat as the base OS and Concurrent Red Hawk Linux as the RTOS layer. Let's say that the Infiniband stuff was easy to get working in Red Hat (once I figured out what the hell everything was), but Red Hawk was very, very difficult. Everything had to be custom compiled from the Open Fabrics Alliance source, stripping out EVERYTHING IB related from the Red Hat and Red Hawk distros.

  46. Swing the license for RHEL by Zemplar · · Score: 2

    Scientific Linux is totally awesome, but a project of this size, especially with the IT knowledge on hand, needs the support and first-rate product which RedHat provides.

  47. RHEL5 or Ubuntu10.04 by digitaldebris · · Score: 1

    If your cluster is going to be closed circuit (no internet access) I would recommend RHEL5 as finding and installing RPMs is generally easier when your not able to use the default Distro package utility. If you cluster will have access to the internet (you'll be able to use the Distro package app) I'd recommend Ubuntu10.04 as the Distro repository is up to date and constantly growing.

  48. Rocks built on CentOS or Scientific Linux by Anonymous Coward · · Score: 1

    For building and maintaining a small cluster, especially to anyone whose main job is not going to be maintaining the cluster, you should take a look at Rocks. It actually builds on top of a regular Linux distro, although only certain distros work. Redhat Enterprise, CentOS, and Scientific Linux are mentioned in the documentation as being compatible.

    What Rocks does is add a bunch of cluster-specific tools to the underlying distro. It helps take care of networking and setting up the compute nodes for easy maintenance and configuration. You basically configure your front end, and then it is extremely simple to manage the computer nodes (including installation; installation by default is done over the network between the compute node and the head node). I have also found the Rocks mailing list to be extremely helpful even to folks who are new to building clusters.

  49. RHEL or SLES will prevent ulcers by Anonymous Coward · · Score: 0

    I work for an HPC vendor, thus the anonymous posting.

    If you’re going to be running commercial codes/solvers/etc., you need to stick with one of the RPM-based distros. It’s all they test against. If you’re going to be running a fancy interconnect (Infiniband, etc.), that may further restrict your choices.

    If you’re running homegrown code, go with your gut.

  50. Building Clusters by Nite_Hawk · · Score: 5, Informative

    Hi,

    I work at a Supercomputing Institute. You can run many different OSes and be successful with any of them. We run SLES on most of our systems, but CentOS and Redhat are fine, and I'm using Ubuntu successfully for an Openstack cloud. Rocks is popular though ties you to certain ways of doing things which may or may not be your cup of tea. Certainly it offers you a lot of common cluster software prepackaged which may be what you are looking for.

    More important than the OS are the things that surround it. What does your network look like? How you are going to install nodes, and how you are going to manage software? Personally, I'm a fan of using dhcp3 and tftpboot along with kickstart to network boot the nodes and launch installs, then network boot with a pass-through to the local disk when they run. Once the initial install is done I use Puppet to take over the rest of the configuration management for the node based on a pre-configured template for whatever job that node will serve (for clusters it's pretty easy since you are mostly dealing with compute nodes). It becomes extremely easy to replace nodes by just registering their mac address and booting them into an install. This is just one way of doing it though. You could use cobbler to tie everything together, or use FAI. XCAT is popular on big systems, or you could use system imager, or replace puppet with chef or cfengine... Next you have to decide how you want to schedule jobs. You could use Torque and Maui, or Sun Grid Engine, or SLURM...

    Or if you are only talking about about like 8-16 nodes, you could just manually install ubuntu on the nodes, pdsh apt-get update, and make people schedule their jobs on google calendar. ;) For the size of cluster you are talking about and what I assume is probably a very limited administration budget, that might be the best way to go. Even with someting like Rocks you are going to need to know what's going on when things break and it can get really complicated really fast.

    1. Re:Building Clusters by clutch110 · · Score: 2

      This post is full of good information. I have been managing HPC for seismic companies for the past 8 years now. I regularly use xCAT as I find that after a few nodes automation is the way to go.

      You will find that most clusters run RedHat or a variant of the OS. Most places run CentOS on the nodes and have a machine with RedHat stashed around somewhere in case a problem occurs and they need to reproduce it on a "supported" OS.

      Why is there a requirement for a full blown X install? Are these machines desktop boxes or are they racked? Typically you have a thin client software installed at the cluster gateway. We use both NX and ThinAnywhere today.

    2. Re:Building Clusters by Anonymous Coward · · Score: 0

      Even if you're just building a small cluster, the exercise of configuring DHCP, tftpboot, and an automated installer like kickstart or AutoYast is well worth doing. You'll learn lots of things there that are reusable in just about any systems deployment context. Also, pay attention to hardware drivers for whatever high-speed networking system your cluster will be using, and whatever stacks are layered on top of that (for example, OFED or NFS/RDMA). Getting those set up correctly will save you from hours of painful debugging and lost cluster time.

    3. Re:Building Clusters by Junta · · Score: 1

      Incidentally, I'm an xCAT developer and am always interested in ways to make it scale down a bit better as well as it scales up. Historically, it has been worth it at large scale, but a bit too much configuration for small systems. A lot of settings support autodetect now and if you don't care much about the nitty-gritty, applying the default templates provides a serviceable set of groups that can drive configuration instead of micromanaging all sorts of details.

      In terms of DHCP, it generally allows and uses nice features of ISC DHCP without requiring more complex config than dnsmasq, which is probably one of the nicer things it does. Notably if you are doing iSCSI, deploying Windows, want a nfs or ramroot system, or other fancy stuff xCAT does a good job of taking care of the requisite details (I'm a bit biased).

      I'm always interested in other ways to make it easier or criticisms on how it's deficient. Preferably on the xcat-user mailing list.

      --
      XML is like violence. If it doesn't solve the problem, use more.
    4. Re:Building Clusters by gatkinso · · Score: 1

      A decent post, but I find your sig lacking.

      Please append one of these sounds: http://www.allaboutbirds.org/guide/Red-tailed_hawk/sounds

      --
      I am very small, utmostly microscopic.
    5. Re:Building Clusters by sash · · Score: 1

      I second the comments of Nite_Hawk. I should add that the other big question you'll face is monitoring and alerting; look at Ganglia and Nagios/Icinga.

      Our team manages the Online/DAQ system of the ATLAS experiment at CERN, about 2000 PCs. Central configuration is of course a must; we use Scientific Linux CERN 5 and netbooted, single image for most "worker" nodes, with specialisation done after boot by an in-house scripting system; servers and other special purpose systems are managed via Quattor or Puppet. Quattor is used a lot at CERN, I find that it is less flexible and requires more work than Puppet but its (currently) stronger package management gives better guarantees of uniformity. Netbooting a single image is excellent to guarantee that all systems are the same and you just need to reboot to be like on a new clean install; but it's a lot of work to setup and maintain, so it's only justified over large numbers.

      Personally I also manage the grid cluster at University of Johannesburg (~30 nodes) and there I'm using Cobbler and Puppet, they do a nice job and don't take too much effort.

      PS shmux is your friend :-)

    6. Re:Building Clusters by GPSguy · · Score: 1

      There is also the occasional need for something like VNC when you absolutely, positivily have to have that remote desktop look for your visualization software.

      --
      Never ascribe to malice that which can adequately be explained by tenure.
    7. Re:Building Clusters by Anonymous Coward · · Score: 0

      I agree on xCAT. In a past life, I was responsible for several clusters of 60 compute nodes (each with four quad core processors), in different environments, and the best way to manage them all was to have xCAT and Warewulf installed on a centralized server. Each system got a standardized boot image via tftpboot, and if we wanted to make a change to the system, we just had to edit a file in one place, run two commands, and every server had the file.

  51. Custom kernel doesn't gain you a thing by Anonymous Coward · · Score: 1

    Doesn't gain you a thing. Drivers are loaded on demand as needed for local hardware. Unused drivers are not loaded at all, and do not impact performance or memory usage.

    All custom kernels can do at most is to reduce the size of the initrd file used during boot.

    The initrd is a compressed cpio file containing the contents of a memory resident root filesystem used during hardware initialization. Once hardware is identified, then required drivers (disk/video/keyboard/mouse) are loaded and the real root filesystem is used (using the driver from the initrd).

    Once the real root is mounted additional drivers (if any) may be loaded as directed by configuration.

    The only gain in a custom kernel is reducing the time to compile a kernel...

    1. Re:Custom kernel doesn't gain you a thing by sumdumgai · · Score: 1

      And what about things like processor optimization and memory handling? How do you turn on or off hyperthread scheduling or pre-emptive multitasking with a module? These things are important to HPC.

      So, I completely disagree with your assertion.

      --
      âoeIn theory, theory and practice are the same. In practice, they are not." â Albert Einstein
  52. Debian by wirelesslayers · · Score: 1

    I built one with Debian Lenny plus I developed a scheduling system in perl using kernel containers + CGroups. So, the research team would think they own a "real" linux and I can share resources in a better way. Also I used perl-cgi to make a containers design, this way no one touchs my real OS. The research leader just use my container design to create and deploy new containers, then the new container is booted in a testing machine so the person can install whatever he needs and cloning later to deploy on the cluster, 400 core + 1.8TB ram.

  53. Ask HPC people by Anonymous Coward · · Score: 0

    First, you need to ask this question in a place where folks are familiar with HPC, not a general forum where anyone can tote the flag for the Linux Flavor of the Week(tm) - since the term 'cluster' means failover to most. Second, you need to consider which distributions are supported by your solvers and schedulers. Third, why do you need to run X at all? You should probably look into an X server on your Windows desktops instead and save the computational cycles for, well, computation.

  54. you have no idea what you are talking about!!! by Anonymous Coward · · Score: 1

    but that is ok! rather than asking quickie questions and expecting quickie answers, you can start by learning the difference between a system administrator and IT professionals.

    "user friendliness" is proportional to sysadmin's abilities or proportional to $$$ for commercial tech support

    1. A (stale) link to get you going on hpc clusters: http://www.hpccommunity.org/section/kusu-45/

    2. http://www.platform.com/ - Dell's/Redhat official hpc cluster (at least a couple of years ago) which was based on kusu (see previous link). In other words, RH was(is?) using a third party for their RH HPC - correction needed if things have changed. - a great yo-yo system (DellRedHatPlatform) in case you have issues.

    3. http://www.caoslinux.org/

  55. Scientific Linux by scheme · · Score: 2

    A lot of this depends on what you're doing with your cluster and what apps you're running. However, Scientific Linux is used by quite a few large clusters and all of the US ATLAS and CMS clusters run on. As others have mentioned, you probably want to be more interested in how the cluster is managed and nodes setup and kept up to date. I'd recommend something like cobbler and puppet or some other change management system so that you can setup profiles and automatically have that propagated to the various nodes automatically. This is preferable and easier than going through and making the same configuration changes on 5-10 machines.

    --
    "When you sit with a nice girl for two hours, it seems like two minutes. When you sit on a hot stove for two minutes, it
  56. Re:None of them by MikeDirnt69 · · Score: 2

    Wrong too. Use the distro you work better with.

    --
    Am I eval()? - http://www.monst3r.com.br
  57. Debian/Ubuntu by dogmatixpsych · · Score: 1

    I'd have to agree with the Debian/Ubuntu route if you want user friendliness. I've always found Debianesque systems much more manageable than other distros. If I have to provide most of the IT myself, I prefer Debian/Ubuntu. There are some science Debian distros as well (and repositories).

    Scientific Linux would likely be faster overall for computationally heavy tasks but it really depends on what you are planning on doing. Debian wouldn't be slow, just not quite as fast as Scientific Linux; but again, that might not matter very much in the big picture.

  58. Debian/Gridengine by jbazik · · Score: 1

    I run a 730-core cluster on debian/gridengine. We're a debian shop, and keeping the cluster platform the same as our desktops is an advantage. Configuring gridengine takes some effort, but so far we're pleased with the result.

  59. Scientific Linux is your choice by Anonymous Coward · · Score: 0

    I work in a national supercomputing centre and I can only recommend you to use Scientific Linux and Quattor to manage configuration and installation. There's nothing as powerful as Quattor and it's used by many computing centres (IN2P3, CERN, etc)

    as scheduler and batch system I recommend u to use Slurm or the combination of Torque/Moab. Ran away of Maui!!!! And LFS is so damn expensive!

    We run cray's SLES and scientific Linux with both Slurm and torque/Maui and we're very happy with our clusters.

    In addition, I recommend you Lustre as shared FS. Relatively easy to use and install.

    Good luck with your setup!!!

    M

  60. Re:Ubuntu 10.04 LTS - Why? by Andy+Dodd · · Score: 1

    Simple question. The OP asked WHY you feel that is a solution for large-cluster HPC.

    It looks like so far your only reason is "i liek it!" - I personally have no opinion or experience with HPC clusters, but so far nearly all of those who do are recommending something that is either RHEL or RHEL-based (Rocks or Scientific Linux), if only because it allows you to leverage commonality with the big cluster operators with installations in the Top500.

    Disclaimer: I'm an Ubuntu user, and I greatly enjoy it, but I have not seen many examples of actual scientific clusters running it.

    --
    retrorocket.o not found, launch anyway?
  61. Scyld Beowulf From Penguincomputing by adamy · · Score: 1

    Disclaimer, I worked on the produce for a number of years. I now work at a different Linux company...

    Scyld is built on top of Red Hat EL, can also run with CentOS, but uses a custom Kernel. It has a lightweight provisioning mechanism that makes maintenance of compute nodes very easy, and the single system image approach makes job management significantly easier than a traditional Beowulf cluster. I don't know if they test it out with Scientific Linux these days.

    --
    Open Source Identity Management: FreeIPA.org
  62. Centos 5 or Ubuntu by Anonymous Coward · · Score: 0

    Centos 5 is currently the favorite among the hpc sysadmins I work with and its rather simular to RHEL. That said Ubuntu has a much nicer package manager (read functional) and so could be a lot easier. It really comes down to what you are most comfortable with.

    More important is the system software you run on top of the cluster.

    We use and like for a very similarly sized cluster:
    Red Hat Kickstart for install
    Bcfg for system configuration
    Nagios + Ganglia for monitoring
    Torque + Maui for batch scheduling

  63. RHEL/Scientific Linux & Perceus by Kludge · · Score: 1

    We use RHEL/Scientific Linux & Perceus (http://www.perceus.org/). It is solid and easy to add new nodes.

  64. BCCD by pu'u_bear · · Score: 1

    http://bccd.net/faq - Debian based Bootable Cluster CD. 'nuf said.

    --
    --You're BOTH right. It's a floor wax AND a desert topping!
  65. Rocks by Anonymous Coward · · Score: 0

    CentOS + cluster management + common libraries needed for HPC. We had a couple clusters at my last work and they worked well. Could setup the nodes to net boot, change the configuration in one place reboot nodes and viola everything is upgraded.

  66. Re:Ubuntu 10.04 LTS - Why? by wytcld · · Score: 0

    It's not said what the cluster is for. If it's going to be a private cloud, Ubuntu jeos (just enough OS) VMs are far quicker to configure than RHEL/CentOS 5.x VMs, if your taste runs to setting up VMs from scratch rather than just cloning them. Don't know what RH 6 offers there (and Scientific Linux, which unlike CentOS offers a RH 6 variant). You also haven't specified your file system's layers. Ubuntu support by most file system projects is equal to that of RHEL, in terms of easily installed packages being available. The exception is with some parts of RH's own preferred cluster stack, which I haven't used. DR:BD, for instance, is perhaps better supported on Ubuntu than RH. GlusterFS is better tested in RH, but runs as well, and is packaged as well, for Ubuntu (and is still a bit immature in either case, but worth considering down the line).

    The main advantage of Ubuntu is you get the exact same product as support is sold for, without being required to buy the support. RH forces you to go to CentOS or Scientific Linux to come close to that, and there are differences in those - mostly in the package managers. And RH, if you do license it, pays out part of your money to patent trolls, whom RH essentially supports so that they may more productively attack its competition - other open source projects.

    --
    "with their freedom lost all virtue lose" - Milton
  67. X11 ...server? by Fishbulb · · Score: 2

    Correction: the X11 server runs on your glass; eg: your Windows system. All you need then are X11 clients on the Linux cluster nodes.

    So yeah, you'll need the libs and other support files for X11, but not the server itself. You'll save a bit on disk space by not installing the server. If it's just a single X11 client you need to run, then you can figure out exactly what it needs and not have a bunch of other crap (fonts, *GL, window managers, libs you're not using...) installed. Plus, you won't have a daemon running that takes resources despite being idle, and is an attack vector since it manages user logins.

    1. Re:X11 ...server? by WuphonsReach · · Score: 1

      Correction: the X11 server runs on your glass; eg: your Windows system. All you need then are X11 clients on the Linux cluster nodes.

      Ah, the joys of X11 terminology.

      I can understand why they flip-flop the server/client locations and understand that it's technically correct, but it confuses the hell out of folks more often then not.

      --
      Wolde you bothe eate your cake, and have your cake?
    2. Re:X11 ...server? by Anonymous Coward · · Score: 0

      They probably do it the ass backwards and completely redundant way everyone seems to like to do, run X11 client and server on the server box itself and use a VNC server + client to connect to it (makes sense, it's just like windows right!)

      Besides the fact that most clustering software I've used can be controlled from the shell, so ssh would work just fine (unless the software he's using does something retarded like require the full GUI to run on each server)....

    3. Re:X11 ...server? by GPSguy · · Score: 1

      Generally, your head node will need an X client, but NOT the compute nodes. You won't have to log into them per se, but the head node, where you submit your jobs, does have to get to them. In general, the compute nodes in an HPC environment are hidden away on a private network, and don't see the outside world, And, for that matter, shouldn't (let's not talk about OSG requirements, or things that ATLAS and CMS are promulgating).

      Another consideration is cluster-local visualization: As datasets grow, it becomes less practical to bring whole datasets back to your desk, and then process them for a quick-look at results. Instead, initial, and perhaps all analysis should be considered on the cluster. This argues in favor of an X installation, and GPU accelleration hardware on at least the head node, a dedicated graphics/analysis node(s), or perhaps the whole cluster.

      And, so far, no one has spoken of favorite compilers. gcc's not bad but not stellar for a lot of HPC uses. Portland Group and Intel have done good things, IMNSHO, in the compiler world, and PGI is starting to incorporate nVidia GPU compatibility in their stuff.

      --
      Never ascribe to malice that which can adequately be explained by tenure.
  68. a vote for Gentoo by smoothnorman · · Score: 1

    I built a small (32 node) Beowulf cluster for an informatics group at the University of Bonn. We started off with a SuSE, discovered that it was hard to get some drivers compiled, then went to Debian, discovered that some of the boot up scripts were a bit troublesome to keep up high availability, then went to Gentoo ahref=http://www.gentoo.org/rel=url2html-17894http://www.gentoo.org/ /> and were quite pleased how *everything*, including rebuilding a node up from the boot loader, could be scripted. Of course every situation has its unique hazards, but if you want tight system with everything under your control, it's hard to beat Gentoo, (however good ol' Debian came very close).

  69. Who will administrate it ? by godrik · · Score: 1

    I believe you can successfully build a computational cluster from any linux distribution. I am sure you could go wild and use slackware if you want.

    But I guess the quesiton is who will administrate the cluster ? from what you say, I feel like you will and you say yourself you don't know much about that. Then I would recommend to keep the distribution installed by the vendor because they will probably give you software support. But if you change it, they probably won't.

    Important things have already been told. But in summary the question is what are you going to do with the cluster. What application are going to run on it. Are you going to develop application to run on it ? or are you using premade applications ? If you are developping with it, you probably want more up to date softwares. If you are using some premade applications, you probably want the best compatibility...

    1. Re:Who will administrate it ? by dbIII · · Score: 1

      who will administrate the cluster?

      That can actually be an easy job so long as everything is consistent. If a new person comes in cold and can't find any docs they can make their mistakes learning about the system on a node nobody is using. IMHO it's easier than looking after desktop computers or file servers. For one thing if a node has a hardware problem you can afford to wait a few days for a part to be shipped in instead of screaming for expensive 24hr on site support. I've got 19 nodes turned off at the moment just to save power and quite a few of the ones running are idle. If things get busy or somebody wants another node they can be up in a couple of minutes.

      distribution installed by the vendor

      They probably know a lot less about what is required than whoever has to set up the cluster. Once you put the software you bought the nodes for on there and set up queues etc it's different enough from stock RHEL or centos or whatever that the hardware vendors will be reluctant to support it.

  70. Ubuntu 10.04 LTS with Sun Gridengine by Falcdragon · · Score: 2

    I'm not really a Ubuntu fan, but with the cluster I manage (120 physical cores, 960GB RAM) we've ended up going with Ubuntu 10.04 running Sungrid Engine for a couple of reasons. - The LTS support, by the time the support period ends we should be replacing the hardware any way. - It provides a Grid engine package by default (might not be the latest but it's good enough) for distributing the workloads - A lot of people are already familiar with Ubuntu - Most third party apps provide support for it - It's very stable - It's free Note if your users are heavy R users have a look at installing the Revolution R package from the third party repositories. It can provide some massive speed ups for Matrix work and a number of other jobs.

  71. Re:Gentoo by miknix · · Score: 2, Insightful

    With some chance of being modded down, I suggest Gentoo Linux. With Gentoo you can compile your kernel and everything else which might give you some arguable performance increase. Because Gentoo is a source-based distribution, it might help you with scientific development because all the library (boost, itpp, lapack, etc) headers (and source) are immediately available. There is support for scientific libraries like atlas, ACML, etc.. and you can easily change the default library for blas/laplack using a simple command line. You can also find up to date scientific software in the official Gentoo repository.
    I don't know about you but I find very useful being able to inspect the code of core libraries and patch it for my needs, if needed.
    Just my 2 cents.

  72. Can you imagine... by Anonymous Coward · · Score: 0

    You are building a Beowulf cluster? Pretty neat. Can you imagine a Beowulf cluster of those?

    1. Re:Can you imagine... by GPSguy · · Score: 1

      Yes. Next question?

      --
      Never ascribe to malice that which can adequately be explained by tenure.
  73. No X servers required on servers... by ianezz · · Score: 1

    It has to have an X-windows server since we use that remotely from our Windows

    Just to clear out a misconception that arises from time to time: you do not need an X server on a server exactly in the same way you don't need a web browser on your HTTP server. To understand that, you can think of an X server as a "browser" for the X protocol. On the server you just need some support libraries (which help applications in talking the X protocol).

  74. Change of Tack by AdmV0rl0n · · Score: 1

    I've read through the suggestions, and many good ones have been posted.

    But, we live in the cloud age. I'll suggest taking a look around to see if your compute requirements could be met by using an available resource in the cloud. The opportunities there are exploding, and hard to gather info on as its fast moving, but *if* your compute can be made to fit in and around something like cloud foundry, Azure, or perhaps Hadoop or other number crunching cloud ops, - the advantage is they only charge for what you use. So you can ramp up or down (at least this is the theory) your compute power to a greater degree and with more flex than you can by building all your own nodes.

    (And I know, the question poser asked Linux, but with compute and cloud sometimes its good to move away from platforms, and focus on the actual compute needed. You care about the calc, and not about which flavour it crunches on..

    Just my tuppence in this complex question..

    --
    We`re all equal .. Just some of us are less equal than others.
    1. Re:Change of Tack by GPSguy · · Score: 1

      A lot of the monte carlo schemes, and especially the nuclear physics ones I've encountered recently, need a LOT of memory. Most of the cloud options are sort of memory-lite. And, map-reduce (Hadoop) isn't the answer to everything, even if my boss thought so at one time.

      --
      Never ascribe to malice that which can adequately be explained by tenure.
  75. X server by iceaxe · · Score: 1

    I'll leave the clustering distro advice to others, but if I understand your needs regarding X-windows, what you need is an X server running on your windows (or other ) client machine so that the program running on the cluster can display on your desktop/laptop. The X programs may need appropriate libraries, but you don't need an X server running on the cluster.

    See Xming for a good, free, open source X server for windows. There are other options available, but that's what I use, and find it to be stable and reliable. (For a Windows program... )

    Then use putty to SSH to your cluster, with X11 forwarding to your locally running X server.

    --
    WALSTIB!
  76. debian by Syobon · · Score: 1

    you will be much more comfortable with debian squeeze, RPM sux

  77. Asking in the wrong place... by rgbatduke · · Score: 1

    It's a FAQ there, but you really should be asking this on the beowulf list, after skimming the list archives for any of the eight and a half million answers (in gory detail) that have been posted there in response over the years. Slashdot has plenty of nerds and I'm sure a lot of cluster geeks (who are likely on the beowulf list) but the beowulf list is sort of distilled cluster geekery/wisdom.

    http://www.beowulf.org/mailman/listinfo/beowulf

    rgb (Google "rgb duke beowulf" if you like -- I used to help answer this question once a month a few years ago on list, although I'm too busy and less active now.)

    --
    Even when the experts all agree, they may well be mistaken. --- Bertrand Russell.
  78. http://ttylinux.net by taoboy · · Score: 1

    The smallest glibc distro I know. Doesn't come pre-configured with cluster tools, doesn't even have prebuilt packages for them. But, it'll easily compile most of the software you require (C++ is one exception, I had to rebuild the compiler), and, most importantly, has a build system you can use to put together your own .iso which can be installed in under 5 minutes, probably even less. Has recent 2.6 kernel and latest glibc, which means it'll also run executables built in other equivalent distros. I've run the Sun (oh, Oracle....) JVM with it, no modifications required.

  79. What Will You Run, and Who Will Run It? by KainX · · Score: 1

    I'll preface this by saying that I'm an HPC admin for a major national lab, and I've also contributed to and been part of numerous HPC-related software development projects. I've even created and managed a distribution a time or two.

    There are two important questions that should determine what you run. The first is: What software applications/programs are you expecting the cluster to run? While some software is written to be portable to any particular platform or distribution, scientists tend to want to focus more on science than on code portability, so not all code works on all distributions or OS flavors. Small clusters like yours often focus on a few particular pieces of scientific code. If that's the case for you, figure out what the scientists who wrote it use, and lean strongly toward using that.

    The second question is, who will run it? Many small, one-off clusters are run by grad students and postdocs who work for their respective PI(s) for some number of years and then leave. In this scenario, it's important to make sure things are as well-documented and industry-standard as possible to ease the transition from one set of student admins to the next. (And yes, PI-owned clusters have a surprisingly long lifespan. Usually no less than 5 years, often longer.) To that end, I strongly recommend RedHat or Scientific Linux.

    We, and most large-scale computational systems groups, use one of two things: RHEL and derivatives, or vendor-provided (e.g., AIX, Cray). We run CentOS but are moving away from it ASAP. The Tri-Labs (Livermore, Sandia, and Los Alamos) use TOSS, which is based on CHAOS (https://computing.llnl.gov/linux/projects.html), which is based on RHEL. Many other sites use Scientific or CentOS. Older versions of Scientific deviated more from upstream, which caused sites like us to use CentOS instead. That's no longer true with SL6, and since CentOS 6 doesn't even exist yet (and RHEL6.1 is already out!), there are strong incentives to move to SL6.

    Let me address some other points while I'm at it:

    Why RHEL? If you can run RHEL itself, do so. RHEL isn't built with the same compilers it ships with; the binaries are highly optimized. Back when we were working on Caos Linux, we did some benchmarks that showed RHEL (and Caos, FWIW) to be as much as twice as fast as CentOS running the exact same code. So if performance is a consideration, and you can afford a few licenses, it's definitely worth considering. The support can be handy as well, particularly if this is a student-run cluster.

    Why Scientific Linux? If you need a free alternative to RHEL or are running at a scale that makes RHEL licensing prohibitive, SL is the way to go, without a doubt. It's maintained professionally by a team at Fermilab whose fulltime job is to do exactly that. They know their stuff, and they're paid for it by the DOE. Other rebuild projects suffer from staffing problems, personality problems, and lack-of-time problems that SL simply doesn't have.

    Why not Fedora? Stability and reliability are critically important. Fedora is essentially a continuous beta of RHEL. It lacks both the life-cycle and life-span of a long-term, production-quality product.

    Why not Gentoo? Pretty much the same answer. The target audience for Gentoo is not the enterprise/production server customer. Source-based distributions do not provide the consistency or reproducibility required for a scale-out computational platform. You'll also have a hard time getting scientific code targeted at Gentoo or other 2nd-tier distributions.

    Why not Ubuntu or Debian? Ubuntu is a desktop platform, not a server platform. Again, it boils down to their target market. There's really no value-add in the server space with Ubuntu, so why not just run Debian? If Debian's what your admins know best, it's worth considering, but keep in mind that very, very few computational resources run Debian, so you may have to do a lot more fending for yourself if you go that route.

    Why not SLES? Mostly a pers

    --
    Michael Jennings | HPC Systems Engineer, Lawrence Berkeley National Lab | Author, Eterm (eterm.org)
    1. Re:What Will You Run, and Who Will Run It? by javanree · · Score: 1

      Bright Cluster Manager is quite nice, but still lacking loads of things. The shell is powerful but very cryptic, the graphical interface doesn't allow certain operations to be done on a range of selected nodes for instance...

      Also the 'integration' with for instance Sun Grid Engine (which is supported) is not very thorough (specially with regard to setting up queues, something the Sun tools already suck at) I still need Bright Cluster Manager + the SGE tools + Sun ARCo to get almost everything done, and at times it feels like a lot of duplicated effort and some good old *NIX handywork is still required to really get things moving.

      However the development team is very receptive to constructive feedback and much has changed over the past 6 months.

    2. Re:What Will You Run, and Who Will Run It? by Anonymous Coward · · Score: 0

      Bright Cluster Manager is quite nice, but still lacking loads of things. The shell is powerful but very cryptic, the graphical interface doesn't allow certain operations to be done on a range of selected nodes for instance...

      Also the 'integration' with for instance Sun Grid Engine (which is supported) is not very thorough (specially with regard to setting up queues, something the Sun tools already suck at) I still need Bright Cluster Manager + the SGE tools + Sun ARCo to get almost everything done, and at times it feels like a lot of duplicated effort and some good old *NIX handywork is still required to really get things moving.

      However the development team is very receptive to constructive feedback and much has changed over the past 6 months.

      Javanree: you must have a fairly old version of Bright Cluster Manager, because I can select ranges of nodes, groups, categories, etc. What is especially cool is that I can export configurations from one cluster to another cluster through the GUI. Also, on my cluster the integration with SGE and Slurm work very nicely.

    3. Re:What Will You Run, and Who Will Run It? by javanree · · Score: 1

      Interesting... are you on 5.1 then? We're 'stuck' on 5.0 but with the latest release (at least no newer version available through yum) The export to another cluster sounds awesome! Just had another look, it's not there in our version.

      As for integration with SGE : I can see the queues; I just can't modify most of the queue properties.

  80. ME ME! by Osgeld · · Score: 1

    I will shout out some random distro and not say anything else to back it up!

  81. Linux Distro by Anonymous Coward · · Score: 0

    I would strongly recommend that based the distribution in what is the scope of that server (apps server, SOX server, DR Server, etc) and from that standpoint start to make an screening of the different distribution across the board, but do not based your decision in an "user friendly" distribution.
    You should keep in mind distribution support, community, knowledge database, maintenance, etc.
    Red Hat is quite a good choice.

    Just my two cents

  82. easy, go for "rocks clusters" or scientific linux by Anonymous Coward · · Score: 0

    having designed and built super computers that have been in the top500 go for a rhel based derivative like "rocks clusters" or scientific linux.

    128 cores is a bit small and probably doesnt justify setting up a "proper" cluster with submit, compute, data, etc nodes. But i would recommend rocks cluster highly as you won't have to mess around with all the sysadmin stuff too much.

    Why i recommend a rhel derivative primarily because most (paid) packages came in this format and are fully supported, you dont want to waste time diagnosing weird nuances - it is a waste of time, effort and performance.

    Recommending ubuntu - seriously, I am a major ubuntu fan but for a cluster? no way. the problems in a cluster are completely different from a server install or a desktop install - it will cause so much grief in the end youll regret the choice.

    The basic principles of most cluster distributions is managing jobs, managing resources and that nodes are disposable (as in, if it fails you can rebuild it entirely with a single command). cluster distributions allow this by centrally managing every aspect of it. to rebuild a node you simply power it on, it generally pxeboots and boot straps a partition on the hd, if there is no partition is downloads an image for that specific node installs it and configures it, no hassle no fuss no thought. the only thing you need do is specify it's mac address in a file.

    managing software is such a pain, you want something simple and easy to deploy packages. this is where a rhel based one distro works well as most commercial software (such as portland compiler) come in a nice easy rpm with vendor support. the packages you will want or need are already setup and ready in those distributions - you won't need to jump through hoops to get it working.

    resource management and job scheduling: go for something big - i'd recommend sge/oge, and try to avoid pbs based ones (such as open pbs and maui/torque). Times have changed and the free pbs ones are grotty. even paid versions aren't too great.

    simply put, go for "rocks clusters" or scientic linux using sge. don't try and go at it alone like some people here are suggesting - its too much hassle and can quickly grow beyond what you'd be able to manage without help.

    a bit of background: ive been doing this on and off since 2001 and done it in to the hundreds of thousands of cores scale. ive designed and built almost every configuration.

  83. Re:Gentoo by cloudmaster · · Score: 2

    The problem with that suggestion is that the people maintaining the code don't have a clue what QA means. And before people whine - I used Gentoo as my primary distro for around three years. The emerge system is great - but the data inside is crap.

    If you want to build your stuff from source and actually have a working system, look at the Debian-based distros. There's this nifty "apt-build" thing that lets you build software with whatever compile options you want (so you can still do -O3 -funroll-loops on everything if you really hate memory), just like Gentoo does. And there are packages for just about everything; partially because Debian's been around forever, and partially because "just about everyone" uses Ubuntu now. Gentoo does have a few "hacking" apps which are hard to find on other systems, but that's irrelevant to this discussion (and BackTrack is the way to go for that stuff anyway, IMHO). The primary difference is that you can build with source code that will actually work, and probably won't blow your system up when you just do a routine update. Wheras with Gentoo, some random kid who's too 'leet for testing might just promote to stable a new version of Xorg or Apache (both real examples from experience) which works fine on his system but breaks everyone else's in the world. And by "might" I mean "will". :)

    I'm posting that mostly because quite a few Gentoo users think that only Gentoo (and maybe some of the BSDs) can easily rebuild a system from source, so they put up with atrocious quality assurance (which is admittedly extremely difficult given the Gentoo user base, and supposedly has gotten better) because they don't know that there are quite usable alternatives that are also more mainstream.

  84. SLES by thatkid_2002 · · Score: 1

    SuSE Linux Enterprise Server is a proven HPC solution if management kicks up a stink and demands commercial support.

    I've worked with SLES and have been fairly happy with it.

    Scientific Linux is probably the optimal choice though ;)

  85. use what you're comfortable with. by markhahn · · Score: 1

    128 cores isn't enough to worry about - just install a distro you like and feel comfortable maintaining. although 128 cores isn't many, you should probably think about the style of install you want. lots of people seem to like diskful installs - afaikt purely because it's familiar. most significant clustering sites use diskless (NFS root) though, because it's so much easier to maintain. there's never any question of nodes getting out of sync. traffic due to NFS root is trivial. another best-practice is to configure 1-2 admin nodes (no users, provides NFS, scheduling and monitoring services), one or more dedicated login nodes, and discourage users from touching the compute nodes directly (among other things, give them non-routable addresses.) get or make a ticket system to keep track of user and system issues. monitor the heck out of your systems.

    I'm an HPC center admin and system programmer, 10+ years. I think we've been in the top 50 several times.

  86. Quantian by flyneye · · Score: 1

    Quantian live cd has backported Openmosix kernel and beaucoup science and math goodies.
    http://dirk.eddelbuettel.com/quantian.html or if website is down just google " quantian" and check out the cached page.
    I found it not hard to use, when I last used it a few years ago, which was a nice feature when Openmosix was in progress.
    Frankly I miss it.

    --
    *Repent!Quit Your Job!Slack Off!The World Ends Tomorrow and You May Die!
  87. Re:None of them by darkpixel2k · · Score: 1

    Just imagine a Beowulf cluster of bullcrap!

    Can you imagine the licensing costs of Windows bullcrap(tm)? At least with Linux it's free...

    --
    There's no place like ::1 (I've completed my transition to IPv6)
  88. You don't need an X Server on the cluster... by Anonymous Coward · · Score: 0

    That goes on your remote hosts. The servers will be at run level 3, and this cluster should not be outward facing anyway.

    As the most knowledgable IT person on the team, you should know this.

  89. Doesn't really matter by afitz0 · · Score: 1

    I maintain multiple ~100 core clusters and made extensive use of a few other clusters each with 10k-100k cores. RedHat, CentOS, Debian, Suse, Cray Linux, etc. They all use something slightly different.

    What it really boils down to is this:
    (a) Is it Linux?
    (b) Are you, as the primary maintainer, comfortable with it?

    If your distro of choice answers "Yes" to both of those, then you've made a decent choice.

    If you end up going to tens of thousands of cores, the choice will make more of a difference. But at your scale, it's really just what you're comfortable with.

  90. Re:None of them by cashman73 · · Score: 3, Interesting

    Actually, in the realm of biomedical supercomputing, "none of the above" has already been done. Check out the Anton supercomputer designed and built by D.E. Shaw Research. The entire supercomputer, right down to all of the processor cores themselves, were specially designed and built specifically for molecular dynamics research. The system has no operating system and, as such, no overhead. Every processor cycle goes straight into the calculations. It is capable of churning out simulations of 150,000+ atom protein complexes on the order of several microseconds long, using wallclock CPU time of a few days.

  91. Avoid Esoteric Choices When Possible by Anonymous Coward · · Score: 0

    I've had some experience setting up very small computing clusters, but it's given me some insight into how things work on bigger clusters as well. There *are* some specialized cluster-oriented distros. My experience with them, though, is that they're flaky and out-of-date and lacking in documentation. Actually, you will have trouble finding documentation in general; there's some good docs on how to choose hardware and a very broad view of cluster design and administration. There's some cluster schedulers and management software that has good highly-technical documentation available with the sources of some clustering software, but there's very little documentation that covers the middle ground of choosing what software to use to administer your cluster and how to integrate it together.

    My advice is: go with a general purpose operating system you are reasonably familiar with. Give up on user friendliness. Your clustering software will not be user friendly. It was not designed that way. At best, it was designed to be easy to administer or flexible in it's application by users; at worst, it's just an arcane mess of utilities that keep getting used because once you know how to use them, they work, and nobody has sufficient time or determination to replace them with something better. The applications you run on your cluster aren't particularly likely to be user friendly either. That said, the extent to which red hat or debian are user-hostile is not-so-apparent to the end user, only the system administrator.

    Getting a distro with a stable base is a must; Debian stable, or a RHEL derivative is probably your best choice, at least if you're only considering a Linux-based OS. You will almost certainly end up having to compile and install your own software and the smaller the changes in the base packages on your system, the less extra effort it is for you to maintain your custom-packaged software.

    To the extent possible, you also want to avoid relying on esoteric software for critical parts of your cluster infrastructure. For example: AFS has bunch of nice features as a clustered file system, but unless your *really* need them you will be greatly increasing your sysadmin's stress-level by making him try and get everything else you choose to use integrated with AFS; NFS sucks in a lot of ways, but it's the "standard choice" for networked storage on unix, so there's less software that will integrate poorly with it and lots more documentation explaining on how to set it up and administer it. It's a fact of life that you will *have* to run esoteric software on your cluster, but you don't want to do it unless you really need to.

    Lastly, if you want one knee-jerk explanation of what to choose for an HPC cluster in terms of the software stack:

    torque - cluster scheduler
                          - use the integrated job scheduler, forget about building the tcl gui, it's *much* harder to use than the CLI tools
    centos - operating system, maybe scientific linux or debian instead
    nagios - use this to monitor your cluster
    nfs - to share files between cluster nodes
    samba - to share files with windows, if you can't get by with ftp/sftp/scp
    ldap/nis - if you can completely secure your network, nis is easier to use that ldap, even though it *is* completely insecure
    nx/freenx - for remote graphical login; on a LAN you can get away with VNC but nx performs much better over a DSL connection

  92. Re-use existing resources as much as possible by enjar · · Score: 1

    If you are in a Debian shop, use Debian.
    If you are in a RedHat shop, use RedHat.

    The main reason is that if you already have other large clusters running $distro then someone already figured out deployment, maintenance, package management, drivers and a hardware vendor. Picking up a new distro throws aside major piles of work that have already been done and gives you the pleasure of re-inventing the wheel.

    I'm guessing you'd rather get your cluster up and running, doing Real Work, rather than spending a bunch of time getting user authentication working correctly. Especially when somebody already did that.

    Also, keep in mind that you do things like go on vacation and get sick, so the other people who are intimately familiar with whatever you already have can help out (and you can help them, too)

  93. As a cluster admin.... by Fallen+Kell · · Score: 1

    As an admin of a cluster that has evolved and changed over the last 7 years, I think I can help a bit. That being said, you truly failed at defining your real needs. I understand liking to stay with what people know, but from an end user point of view with interactions with a cluster, the only interface that they use is the only thing that needs to stay similar, and you can almost certainly use the same interface on any other linux distribution. So that said, what are you currently using to submit, queue, and schedule jobs? There are a few proprietary solutions out there and several open ones. There is Grid Engine (or Sun/Oracle Grid Engine), PBS, Maui, Torque, and several others out there. That should be the only real interface that an end user should have to the cluster.

    Now comes the second question I have. Why do they need to be running X? I can understand having the X server installed and all the libraries, but you absolutely should only be running your servers at run level 3 (i.e. command line). You can still run applications if you set your display to a remote X server as the output device, in this case one that you run on windows desktop like Cygwin, or Xming. All you do by running X.org on the cluster nodes is waste about 1 gig of memory and 5-10% CPU resources, which could be utilized by your end users' jobs/applications.

    Third, what kind of applications are your end users running? Are they real parallel environment applications using some sort of MPI (LamMIP/OpenMPI), or off the shelf products like Clustered Matlab (which actually uses MPI, but it is built into the product already, you just need to configure it properly)? Or are you really just running lots of batch jobs which may or may not be multithreaded applications, but do not do any intra-node communication?

    Fourthly, how are you monitoring your existing cluster? Are you using something like Ganglia?

    Finally, what kinds of third party software do you need to be able run/use? Is there anything that is commercial which may have limited support to specific linux distributions?

    All of those things are questions that you need to really answer in order to recommend a distro.

    All things being equal, personally, I would deploy a cluster using "Rocks Cluster" distro. It is designed from the ground up to be a easy to maintain and deploy cluster distribution. There are plenty of HPC specific packages/application/libraries available to be deployed on the nodes. "Rolls" are available, which basically contain a group of packages/applications/tools which are typically used together, or otherwise easily configure/install software that is required on each system, possibly with some complex interactions (for instance there is a "Ganglia" roll, which easily installs the Ganglia cluster monitoring software and automatically sets it up based on your Rocks installation. There is a "BIO" roll, which contains many open source tools and librarys which are useful in doing biological research clusters, like ClustalW, Glimmer, NCBI BLAST, just to name a few. Then there is the HPC roll, which is just some basic things like MPICH, MPICH2, OpenMPI, iozone, iperf. There is also a roll for PVFS for setting up a quick Parallel Virtual File System cluster).

    It is designed from the ground up to be a cluster, not just a bunch of nodes running linux with high speed interconnects. It has management utilities to deploy applications across all nodes at once, quickly install OS on all your cluster nodes via PXE booting the compute nodes. Flash/upgrade the BIOS of computer nodes remotely via PXE boot. Basically it is designed to be managed and maintained as a cluster, not "x" number of individual systems. Seriously consider something like it.

    http://www.rocksclusters.org/wordpress/

    --
    We were all warned a long time ago that MS products sucked, remember the Magic 8 Ball said, "Outlook not so good"
  94. Scyld by Disoculated · · Score: 1

    Admittedly, my experience is a few years out of date, but it used to be that the immediate answer to this question was Scyld, the direct descendant of the original "Beowulf" cluster created at Goddard Space Flight Center by Donald Becker. We used it for 3d rendering and video processing and it was really slick, and being based on RHEL it was easy to get people who knew how to work on it/software updates/support in forums, etc.

    I've only seen one comment in support of Scyld here, has it fallen out of favor for some reason?

  95. CUDA by Anonymous Coward · · Score: 1

    If you are planning a cluster for numerical or combinatorial computations the best option is using an nVidia card. You will have about 300 cores for at most $3,000. And there is no special cooling and space requirements.
    There are commercial nVidia drivers for Linux but Red Hat may be the best option due to its support.
    I really think the cluster epoch is over, now GPU computing si the most economic and efficient way for massive computation.

    You can also get AMD cards but nVidia has better technology (CUDA)

  96. But Rocks or OSCAR? by Anonymous Coward · · Score: 0

    I'm a first timer. This is going to be useful for me too. Thanks for the comments.
    Hv downloaded CentOS 5.6 and intending to put OSCAR on it.
    But everyone is mentioning Rocks. What about OSCAR? Should I stay away from it?

  97. Re:Gentoo by miknix · · Score: 1

    The problem with that suggestion is that the people maintaining the code don't have a clue what QA means.

    Gentoo developers don't maintain code, they maintain software packages. That means our main objective is to distribute the software to the users without minimal modifications, so that you get pretty much what upstream developers distribute. The only exceptions are when a patch that fixes a bug, security vulnerability or even a build problem, is available, then we would try to integrate it earlier than upstream. We also ensure the build system works, and we have documented policies on how to do that. Besides the regular stabilization process, there is also a QA team responsible for checking minimal ebuild (the portage recipes on how to compile software) quality. I had some commits reviewed by them so believe me when I say they are quite picky:
    http://www.gentoo.org/proj/en/glep/glep-0048.html

    And before people whine - I used Gentoo as my primary distro for around three years. The emerge system is great - but the data inside is crap.

    For someone using Gentoo for around three years, that doesn't seem like a very insightful answer does it? What is the "data inside"? Are tou referring to ebuilds? :)

    If you want to build your stuff from source and actually have a working system, look at the Debian-based distros. There's this nifty "apt-build" thing that lets you build software with whatever compile options you want (so you can still do -O3 -funroll-loops on everything if you really hate memory), just like Gentoo does

    It is not the same thing. I was told by some people that they needed to be constantly asking the sysadmins to install the development packages of scientific libraries in Suse Linux (same applies to debian), depending on the use case, that can take weeks.

    And there are packages for just about everything; partially because Debian's been around forever, and partially because "just about everyone" uses Ubuntu now.

    Debian is not very famous for up to date software or is it? Sure you can add alternate repositories but you don't need to do that on Gentoo..

    The primary difference is that you can build with source code that will actually work, and probably won't blow your system up when you just do a routine update. Wheras with Gentoo, some random kid who's too 'leet for testing might just promote to stable a new version of Xorg or Apache (both real examples from experience) which works fine on his system but breaks everyone else's in the world. And by "might" I mean "will". :)

    Obviously the Gentoo's policy on package stabilization can't catch every single package problem out there. The policy was somewhat made to allow a fair trade-off between stability and availability. We could increase stabilization times but software would be available in a less timely manner...

    Major breakages are not only Gentoo developer's fault. Sure sometimes a Gentoo developer messes up and makes its users rebuild the entire installed software, but most of the times are either bad decisions from upstream developers or because a major change (which breaks stuff) is really needed. If you understand how library linking and versioning works, I don't think I have to explain further..

    Oh and by the way, latest versions of Portage have a nice feature called "preserve-libs" which prevents breakage if the API of a library changes..

    I'm posting that mostly because quite a few Gentoo users think that only Gentoo (and maybe some of the BSDs) can easily rebuild a system from source, so they put up with atrocious quality assurance (which is admittedly extremely difficult given the Gentoo user base, and supposedly has gotten better) because they don't know that there are quite usable alternatives that are also mor

  98. Ofcourse you install Gentoo... by oomkiller · · Score: 1

    ... because then you can take advantage of the cluster. It would take seconds(or minutes if installing KDE) instead of days and weeks for the compilation... don't forget MAKEOPTS="-j129" as the manual states ...

  99. Submitter here by DrKnark · · Score: 1

    Hello,

    Thank you all for the informative replies, this will help us in deciding what to use.

    It seems that Redhat or a variant thereof is what most of you agree is good, so we will probably go with one of those. Especially since that is what we have used in the past.

    The reason for having X is that we work in X, some of the software we use need that for various reasons such as plotting. This will only be used on one node. Since this will be a small cluster (probably 4 boxes with 32 cores each) we do not intend on building a separate box for running X. We might use one of the old boxes for X, but I think we still would want the same dist on all of them for simplicity. (Oh, and to those who asked: these will be in racks and not used for desktops)

    Answer to another question that came up: This is for use at a university, we will be using it mainly for (nuclear physics) simulations/calculations based on Monte Carlo methods.

    Again, many thanks!

    1. Re:Submitter here by GPSguy · · Score: 1

      As pointed out earlier in the thread, you're not defining requirements well.

      I think you're going to want to consider setting up a cluster front-end. You generally do not want to run X on all the nodes: Let them run the monte carlo sims and don't waste memory or resources allowing users to hammer each node. Or, allow it now, and regret it later when performance plummets.

      Consider GPGPU (nVidia Tesla, realizing that AMD/ATI have GPGPU options, but I am not versed in them yet) for improved performance in calculations.

      Have you looked around your university? Is there anyone else running clusters with whom you could partner? My group does exactly that: We run a cluster and while I also am a numerical modeler, we provision and operate a cluster that serves users in agriculture, nuclear engineering, petroleum engineering, atmospheric sciences, HEP, chemistry and the social sciences. Your questions suggest, to me, that your time is better spent as a researcher and not as a system administrator.

      And while we're here... One of my pet peeves is when a professor takes a grad student who came into a program to get their degree in, say, nuclear physics, and turns them into a system administrator and user support girl for the group. Either instead of, or in addition to, their scientific career, they have to manage the computing resources and learn how all the software works. In my experience, if they're good graduate students and conscientious, they will do a great job, but will not get the education they came for. They may get the degree, but they are likely doomed to supporting other users who got a better education. They're still good, in fact, indispensible, to a research program, but they were sacrificed with little input to their future. Better, if that's what you need, to actively recruit for someone who wants to learn the field to better become a computational expert with a discipline track in your field, nurture them, and if they are deserving, provide said terminal degree. I really don't like sacrificing an unsuspecting graduate student to the HPC gods for a faulty member's benefit.

      --
      Never ascribe to malice that which can adequately be explained by tenure.
    2. Re:Submitter here by DrKnark · · Score: 1

      Hi,

      As I already said, no we will not be running X on all the nodes. One of them will run it, with a few cores reserved for the purpose (this is the way we do it today and we have no significant issues with the arrangement). But like I say, we may very well decide to use one of the boxes from the old cluster for this task, seems like a good idea. But I still think we'd like the same distro to make the administration bit easier.

      One of our most used softwares is practically closed source (to get access to the source you have to pass a series of security screenings as well as motivate why you really REALLY need the source, it's almost impossible for non-US citizens). This software does not implement GPGPU functionality at this point, and whether or when it will be supported is unknown to us. Therefore, getting GPUs at this point would probably not be the best idea. It has been discussed, and we might look at getting boxes that can be upgraded with GPUs in the future.

      As for being an administrator, I volunteered for it so I'm not "unsuspecting" :) I have my reasons for wanting to do it, won't go into them here. But it will not count towards the time I have for research, that is already reserved specifically for research. We will be at least 3 people sharing the load on this too, I don't really see a problem with it.

      Yes, we have looked at the other clusters around here. The problem with them is that we have to pay for them per unit of processing time per specific project. These will be used for larger jobs that our local cluster cannot handle, the local one will mainly be used for developing software to run on the larger ones as well as running smaller simulations and verifying algorithms. This is very useful to us, since it would be a pain to plan small things like that ahead of time. And we do need more processing power than we have today, even smaller jobs of the type we do require some power.
      Our current setup works fine for this and it has been of great use, but it is getting old and is beginning to be a limitation. So now that we're considering upgrading we wanted to do some research first, hence the question.

      Thanks very much for your advice, I will keep your comments in mind. Maybe we can find someone else who wants to run a cluster for a similar purpose. Though we really do like having one reserved just for us as well. This is a general opinion in the department, and it would probably be hard to convince everyone to do it any other way. Any additional comments are of course welcome!

  100. xCAT2 by Junta · · Score: 1

    Currently supports RHEL, spotty support for Fedora, Scientific Linux, CentOS, SuSE Linux Enterprise 11, Windows 2008 and up, and ESXi 4 and up.

    Debian and Ubuntu have made appearances in trunk, but I haven't tried it out personally yet.

    --
    XML is like violence. If it doesn't solve the problem, use more.
  101. Scientific Linux by cyfer · · Score: 1

    After 7 years working at CERN on the GRID project for LHC, i would recommand scientific linux, the target of this distro is to run the largest GRID in the world gLite on a free redhat, you have also in the EPEL repository many tools for large scale computing maintain by the GRID team in CERN and Fermilab.

  102. Re:None of them by Pigskin-Referee · · Score: 1

    Can you imagine the licensing costs of Windows bullcrap(tm)? At least with Linux it's free...

    Until you need technical support or drivers for new devices.

    --
    Pigskin-Referee
    Linux: Yesterday's technology, tomorrow ...
  103. FreeBSD might prove benefical by Pigskin-Referee · · Score: 1

    If you do not have cutting edge devices on you system, FreeBSD might be a good choice. It is quite stable although the number of devices it supports is somewhat limited. It also offers a fairly good support system.

    --
    Pigskin-Referee
    Linux: Yesterday's technology, tomorrow ...
    1. Re:FreeBSD might prove benefical by yosephi · · Score: 1

      FreeBSD has worked very well on our cluster.

  104. The distro matters in many ways by Skapare · · Score: 2

    The distro does matter, often in ways not particular to being a cluster, but perhaps in ways making it easy to manage in general. For example, I'm moving away from Ubuntu (server) because it is too hard to selectively upgrade a single package or group of packages without imposing an upgrade on other packages. This is where "hand holding" has turned into "wrist crushing". So I'm moving to Slackware (which is getting a lot more capability through the SlackBuilds community).

    --
    now we need to go OSS in diesel cars
  105. Re:Gentoo by Blymie · · Score: 1

    Bah.

    Any statements about 'up to date' software immediately shows a glaring lack of comprehension about code stability.

    Debian is only behind, if you like to use beta quality software. People with server farms, managing large quantities of data, don't WANT the latest and greatest, they want STABILITY. Stability is thousands, yes thousands of times more important than new features in code.

    Gentoo has its place. However, that place is not anywhere near a data center, not anywhere near a corporate office, not anywhere near a server farm. Anyone with any competence in the real world won't use Gentoo for serious work, for the reasons listed above. Frankly, if you show me a resume with the word "Gentoo" on it, you're not going to get hired.

    Gentoo's very nature ensures that it will *ALWAYS* be a BETA or even ALPHA quality build product. That's not because it's compiled from source, that's because of the way Gentoo manages packages, and because of a dozen different things that are in other projects to work towards stability. Gentoo seems to think that nothing is more important than the latest and greatest.. and as a result....

    Well... instability is what you get.

    (before people get all silly about this, that doesn't mean Gentoo doesn't have its place. However, stop trying to tell me that a home-built car should be deemed street worthy -- without even having to abide by the current legislation for street worth cars!)

    (Lastly -- comments from the above post, such as "We could increase stabilization times but software would be available in a less timely manner..." and "Gentoo developers don't maintain code, they maintain software packages." and "but most of the times are either bad decisions from upstream developers or because a major change (which breaks stuff) is really needed." shows how stability is the last thing on a Gentoo package maintainer's mind...)

  106. Re:Same as MS, Oracle, Apple.... by OldHawk777 · · Score: 1

    Competitive market pricing for technical support or drivers (hardware vendors often provide) for new devices is available for Linux, BSD....

    MS, Apple, Oracle... and hardware vendors will at their discretion provide the same as L/FOSS at a higher non-competitive price and bug-fixes or crap-design current WinOffice toolbar when/if they want.

    Closed-crap software is never competitive, but is customer-hostage focused for gross-profits and low MOTSSS overhead.

    IOW: If you want pfuck yourself, but don't phuck US, EU, or others.

    --
    Unaccountable leaders are masters, and unrepresented people are slaves. How do US and EU fare?
  107. Debian = a lot of work by cbroad · · Score: 1

    If you have experience with a particular distribution, go with that. I set up a 512 core cluster using Debian about five years ago. If you go that route, I suggest using FAI for installation. That way you can re-image your systems on reboot and easily keep things up-to-date and make config changes system-wide by just rebooting your nodes. Many software packages both commercial and open source are RedHat focused. I had to create my own deb packages for many softwares. This trend is not as strong as it was before, but RedHat still dominates the software world. Take that into consideration and know what you're using it for. As for building a RedHat cluster, I can't comment on that like all of the others who have never built one. I don't have enough experience to give any thoughts towards it.

  108. Re:Gentoo by miknix · · Score: 1

    I am sorry but your reply is sliding a little into a flame war and possibly out of context, so I'm going to stop right here.

  109. Understand your requirements by hargrand · · Score: 1

    You're specifying a solution before you seem to really articulated your requirements. For example, you have identified the following:

    1) ...new cluster (smallish, ~128 cores)
    2) It has to have an X-windows server
    3) Implied use of a Linux distribution

    These are all different aspects of the solution. What you should first do identify and document some use cases and some performance requirements. The closest you come to this is:

    A) User familiarity
    B) Remote access

    But these alone are insufficient to justify any expense implementing a new system. Therefore, I suggest you don't upgrade and instead use the current system.

  110. Thirded... by Anonymous Coward · · Score: 0

    Rocks is awesome. The bittorrent based install system scales beautifully across standard installs. To push out software updates you just update the package description and then re-install the whole cluster. The whole process took about 12 minutes for 40-something blades and should take a similar amount of time for many many more due to the peer-to-peer install process. *highly* recommended.

  111. The more the better for database by Quila · · Score: 1

    That wouldn't be true for one app running a few very taxing queries, but in our case it's getting hammered by dozens of apps running hundreds of smaller queries per second, which parallelizes rather well.

  112. Re:None of them by hairyfeet · · Score: 1

    I'm sure I'll get hate for pointing this out but its true: Linux is free if your time is worthless. I've have looked into offering Linux as an alternative OS in my little retail shop for years, and every year i find nothing has changed. Until Torvalds either retires or someone fires his irritating ass so that Linux can FINALLY, after everyone else (Solaris, BSD, OSX and Windows, hell even OS/2) has had them for over a decade, get a stable kernel level hardware ABI so drivers don't shit themselves and die every time Linus gets an itch to fuck shit up in the kernel, then Linux will remain a black hole of time wasting where you have to spend days or even a week or more every six months doing the "forum hunts" trying to find "fixes' for the multitude of drivers Torvalds breaks constantly.

    Which makes sense if anyone would look at it from an engineering instead of religious dogma perspective, as you are talking about literally tens of thousands of drivers nearly all of which have to interact with a kernel that Torvalds treats as his personal plaything and with little regard to the thousands of man hours he is pissing away, not only by all the developers that have to go in and ifx what he has broken but in all the hours users waste with forum hunts.

    Not to mention how many Linux users it ultimately ends up costing because mom&pop retailers like me, the kind of guys YOU NEED to get on board Linux, as we have NO real ties or support from MSFT and in our position could really help Linux with sales and after sale support who stay away from your OS because with all the man hours forum hunts suck up it makes Linux literally MORE expensive than Windows. My time is a minimum $35 an hour, at the rate it only takes 2.5 hours to make Linux more expensive than Windows 7 HP and I can easily waste half a day on a forum hunt, what with searching for, tweaking, and multiple attempts to get said fix working.

    So until the day comes I can sell a box with Linux on it and be confident that the drivers will continue working for at LEAST three years minimum, preferably five, then "Linux is free if your time is worthless" is simply the truth for all those that do this as something other than a hobby. Home users aren't gonna learn CLI to apply fixes, neither are SMBs and SOHO, and they sure as hell aren't gonna sit around with a list of make/model/rev of every piece of hardware they have in order to do forum hunts.

    I want Linux to succeed in the desktop and retail markets, I really really do. I grew up in the days of GEM and Commodore and having lots of choices, and I believe lots of choice makes for a healthy and vibrant ecosystem. but someone is gonna have to face the fact that Torvalds is a douchebag. It is all well and good he invented the kernel, but it ain't 1991 anymore and Linux isn't just some plaything for Torvalds to futz with and share his changes over IRQ. The kernel is the heart of a multi-billion dollar OS, being counted on by millions, yet Torvalds treats it NO differently than he did at the beginning.

    And before anyone says "LTS" let me say LTS is a bad joke. As long as much software is tied to which kernel you are using LTS is a codeword for "run out of date and possibly insecure software" and it is just ridiculous. It ain't 1991 folks, having drivers shit themselves and die is simply unacceptable in this day and age, especially when your competition gives on average a decade of support for their OS. Frankly this problem would be trivial to fix with a stable ABI for drivers, but Torvalds and his ego won't admit he made a mistake. The current way was fine when it was a hobbyist OS, it simply isn't anymore. Now you either shell out for expensive enterprise gear (which negates any savings by going Linux) where a team of developers have to constantly fix drivers for the life of the contract, or you are SOL, because you'll be wasting time on forum hunts. Sorry but that is just unacceptable.

    --
    ACs don't waste your time replying, your posts are never seen by me.
  113. Re:None of them by darkpixel2k · · Score: 1

    Until you need technical support or drivers for new devices.

    I know math is hard--but let's see if I can break it down for you. Microsoft is $259 per incident. Ubuntu us $320 per year.

    I know the Ubuntu number looks bigger until you realize that you can call 50 times for that same price. Calling 50 times for Windows would cost you just under $13,000.

    But who calls for tech support for Linux anyways? I build and maintain ubuntu-based firewalls, spam filters, mail servers, virtual servers, and VoIP servers. I've never had to call for support.

    --
    There's no place like ::1 (I've completed my transition to IPv6)
  114. Re:None of them by darkpixel2k · · Score: 1

    I'm sure I'll get hate for pointing this out but its true: Linux is free if your time is worthless.

    You misunderstand time then. It costs me nothing but time to setup a linux workstation for my wife. I spend under an hour, and she has a clean, non-virus-infected netbook. If I went the Windows route (because as you say, my time isn't worthless), I have to go shell out ~$150ish for Windows for her netbook...and I have to go work for almost a full day in order to pay for that. So not only do I waste an hour of time installing it for her, I waste a day working on my day job to pay for it. No thank you.

    As for the business world, would you rather pay someone $1,000 for a mail server install or $5,000 for a mail server install. In case you're confused, the $1,000 install is entirely paying for my time to setup whatever mail options you want. In the case of the $5,000 install, $3,500 is for Microsoft software licensing and $1,500 is for my install time with the options Microsoft lets you have.

    But whatever--keep telling yourself that linux sucks because of ABI breakage instead of the real reason: you want to keep your drivers closed source so you can lock users in, and you're too slow or stupid to recompile your drivers.

    My wife has been running Ubuntu for 4 years now, the only time she called me was when her SSD died. She did the upgrades too. So where's the problem?

    --
    There's no place like ::1 (I've completed my transition to IPv6)
  115. Re:Gentoo by cloudmaster · · Score: 1

    First, thanks for a rational response to a topic you probably find marginally offensive, given your implied role with the Gentoo project.

    I reject the idea that packagers are only responsible for making a package compile with minimal changes. Someone needs to be looking at the whole picture instead of focusing on their small slice of the world, and packagers are in the best place to do that (or at least play a huge role in that). I see that constantly in my day job (enterprise security) where every business area only cares about their piece of the pie, completely ignoring (or just not understanding) how their slice fits in to the whole picture. It's frustrating there, and frustrating in my OS. :)

    I thought it somewhat contextually obvious that the "data inside" referred to the primary data source for portage, but yes, "bundles of source code and supporting files known as ebuilds". :)

    I don't know what you're talking about with the reference to Scientific Linux (RedHat-based) and SuSE (which is, I suppose, SuSE-based); neither of those are debian-derived; both are RPM-based distros that I dislike. :) Debian and Ubuntu are common Debian distros. Here's the first useful Google result related to apt-build - https://nigibox.wordpress.com/2009/10/01/apt-build-%E2%80%94-optimize-your-debian/ - I'd suggest reading about it more, and about apt in general. At a high level, you can pin package versions from multiple repositories with apt, and you can rebuild everything from just one package and its dependencies up to the whole darned system with apt-build. Portage is a cool system, but if you look in-depth, the apt/dpkg world has a very comparable feature set. It does not suck nearly as much as rpm (even with yum/yast wrapped around it), or other package systems like pkgtool, or whatever it was that Stampede used (it's been a while since an i686-native distro was a novel idea), or HP's POS swtool, or AIX's lpp format, or...

    Debian Stable isn't known for up-to-date code, as that branch's goal is somewhat obviously "stability". You can use "unstable" and get very up-to-date code, or you can use a derivative like Ubuntu for a pretty good compromise in between. :)

    I will wholeheartedly embrace the idea that many (in fact, most) Gentoo problems are user problems. But there are still way more problems than acceptable which are issues which maintainers should have caught. I'm willing to grant that it's way hard to catch the problems I find unacceptable - between upstream changes and downstream Stupid Users(R), there are just too many variables for anyone to manage. Ultimately it comes down to the distro user's personal level of tolerance; my tolerance is pretty low, but just slightly higher than Gentoo could previously reach. Other people have different tolerance levels, and I don't think they're stupid for using Gentoo. Heck, I support RHEL during my day job, and I *hate* the way RedHat does things (both in the distro and as a business) - but I don't for a moment think my employer is stupid for wanting to use RHEL. I endorse the variety of distros and people's choice to use the distro which best suits their needs. I do think that Gentoo fills a pretty narrow niche, though, and that it's a poor choice for environments where stability or reliability are the top priorities. Based on previous experience which may no longer be completely valid - Gentoo only fills a stability need well through the use of a mostly-binary install, and at that point, Gentoo's primary benefits are very much diminished.

    I do like the Gentoo philosophy, though, and I've heard that things have turned around after the initial turmoil after Daniel Robbins left. But honestly, Gentoo offers me zero benefits over Ubuntu at this point. I get acceptable stability, and a very flexible build environment in the rare case that I need that. The only decen

  116. you'll need to work (you or someone else) by basiles · · Score: 1

    whatever you'll do, you will have to work, to learn more about your Linux system, or to subcontract someone to (or buy support) to run your cluster and help your users. The point is then: do you have a budget (time and money) for that? Are you interested yourself to learn more about Linux systems (hence to spend less time on numerical codes or science)? If not, you'll need to pay someone to do the work. If yes, you need to learn a lot.

  117. Re:None of them by hairyfeet · · Score: 1

    And here comes the religious dogma I was talking about! Isn't it funny that the argument against having stable functioning drivers always comes down to IDEOLOGY, with the rant most people link to going so far as to call those that refuse to hand over source "leeches" and hope the kernel futzing breaks their drivers?

    I mean WTF is it to you if some do and some don't? Is that ANY different than right now? Nope, as you still have companies like Nvidia that makes binary blobs, only now you get to watch them break every six months. Does having open drivers keep Linux from breaking? Nope again as the open drivers break just as often thanks to Linus and his kernel fucking, because if you could look at it logically instead of a faith based perspective you'd see that there are only so many devs, and there are fewer of them than drivers to fix so drivers will ALWAYS be broken when Linus gets a wild hair up his ass, every. single. time!

    And allow me to say that if your way "worked" in any kind of reasonable fashion retailers wouldn't avoid your OS like the clap which I can assure you we most certainly do. Not just all the thousands of mom&pop shops, dotting the entire country, but big names like Best Buy, Staples, Walmart, do you think they avoid your OS because of its "quality construction" or a secret conspiracy? NO! It is because they take the box home, run updates when the little icon tells them to and get a broken machine like it is 1993 all over again, and promptly take that broke ass shit back! And since we retailers can't sell used as new that means we take a hit on every return making Linux even MORE expensive!

    As a final word allow me to give you proof, undeniable proof like a slap to the face your current way is broke ass shit. Now I'm sure you'll find some excuse, like "Use Distro X" or "You should buy hardware Y" but in the end all you will have is excuses because this proof should make even YOU take note! Now you and your fellow converts think we retailers are just full of it, that it should "just work" right? well when one of the biggest OEMs on the planet has to DISABLE the repos and spend considerable money and man hours keeping a badly out of date "corporate repo" just for their customers because if they don't the drivers WILL break then i'm sure you can see why both little guys like me and big guys like Walmart and OEMs like ASUS have washed their hands of your OS. I mean when fricking netbooks, a class of machine built around Linux strengths and which started out more than 30% Linux ends up completely obliterated by a decade old Windows OS it is high time to ask yourself "What are we doing wrong?" and I'd say basing your OS on religion instead of sound design practices and trusting your customers to make purchases that will benefit them (such as choosing FOSS drivers where possible so they have LTS) is a good example of why Linux is so far behind everyone else, and why even free you are getting hammered by an OS with a $100 barrier to entry.

    --
    ACs don't waste your time replying, your posts are never seen by me.
  118. Re:None of them by Pigskin-Referee · · Score: 1

    Until you need technical support or drivers for new devices.

    I know math is hard--but let's see if I can break it down for you. Microsoft is $259 per incident. Ubuntu us $320 per year.

    I know the Ubuntu number looks bigger until you realize that you can call 50 times for that same price. Calling 50 times for Windows would cost you just under $13,000.

    But who calls for tech support for Linux anyways? I build and maintain ubuntu-based firewalls, spam filters, mail servers, virtual servers, and VoIP servers. I've never had to call for support.

    Actually, support starts at $195 per incident and there are several different plans.

    There is no charge if Microsoft is unable to rectify the complainant's problem..

    You have conveniently failed to address what I was stating in my original post; ie, the "free" factor evaporates once support is required. In effect both Microsoft and any allegedly "free" OS, the cost is the same for support as long as it is not required.

    In 20 years I have never called MS for any technical support. I am able to read and comprehend technical manuals, etcetera rather well. Plus, when a new device is released I do not have to wait for months (years, never) for support for that device with Microsoft. I have customers that demand high quality service.

    Now, I do have a FreeBSD server at home. It is a nice hobbyist toy and I do enjoy playing around with it from time to time. However, I would never use it in a mission critical environment.

    --
    Pigskin-Referee
    Linux: Yesterday's technology, tomorrow ...
  119. Scientific Linux not timely at all by rubycodez · · Score: 1

    Scientific Linux still hasn't put out the 5.6 release, they instead went for the 6.0, while RedHat is at 6.1 because 6.0 is so very buggy.

  120. Re:None of them by darkpixel2k · · Score: 1

    And here comes the religious dogma I was talking about!

    Yes, your ability to forsee that someone might disagree with you makes you correct.

    Isn't it funny that the argument against having stable functioning drivers always comes down to IDEOLOGY,

    Funny--I remember some bitching about ABI, but there was a whole ton of other crap you put in there about your ideology. I remember you bitching about having to hunt through forums (ever had to wade through a forum full of Windows noobs? "Uh, I rebooted and it fixed everything."), bitching about your time being oh so important, your retail woes, and other things completely irrelevant.

    with the rant most people link to going so far as to call those that refuse to hand over source "leeches" and hope the kernel futzing breaks their drivers?

    Nope--I don't call you a leech. I think you have a chosen business model (not to release the source because you want to lock people in), and that's fine. Just don't keep bitching about your inability to keep up. You seem like the kind of person who would have a business model around sending morse code via telegraph and then bitch that the internet costs way too much because the protocols keep changing every few decades (IPv4/IPv6) and you have to upgrade your router and switch your thinking from morse code to SMTP...all the while the telegraph becomes more and more obsolete.

    I mean WTF is it to you if some do and some don't? Is that ANY different than right now? Nope, as you still have companies like Nvidia that makes binary blobs, only now you get to watch them break every six months.

    Yup--and I don't buy their crap. The machines that I do work with that have nvidia work well though because they are using the open source driver made by the community. And while it has occasional issues too, it gets fixed faster than the Nvidia blob.

    Does having open drivers keep Linux from breaking? Nope again as the open drivers break just as often thanks to Linus and his kernel fucking, because if you could look at it logically instead of a faith based perspective you'd see that there are only so many devs, and there are fewer of them than drivers to fix so drivers will ALWAYS be broken when Linus gets a wild hair up his ass, every. single. time!

    ...except that I can easily recompile the driver if I found my self in that situation. And with DKMS, the system recompiles it for me. So I really never find myself in the situation you are talking about. But if I did, I could always pay Canonical a small support fee to support my workstation for the next 365 days (not per-incident like with Microsoft...)

    And allow me to say that if your way "worked" in any kind of reasonable fashion retailers wouldn't avoid your OS like the clap which I can assure you we most certainly do.

    Yeah--Amazon really *hates* linux. It's constantly fscking up their retail business... </sarcasm>

    Not just all the thousands of mom&pop shops, dotting the entire country, but big names like Best Buy, Staples, Walmart, do you think they avoid your OS because of its "quality construction" or a secret conspiracy? NO! It is because they take the box home, run updates when the little icon tells them to and get a broken machine like it is 1993 all over again, and promptly take that broke ass shit back!

    ...I think ACE Hardware runs on a Linux platform too. And then there's Lowes. They appear to run a customized version of Gnome on their staff kiosks. I work for a mom-and-pop shop that maintains Linux computers. We spend 99% of our time reinstalling windows due to virus infections and almost never see Linux boxes come back in.

    And since we retailers can't sell used as new that means

    --
    There's no place like ::1 (I've completed my transition to IPv6)
  121. Re:None of them by darkpixel2k · · Score: 1

    Actually, support starts at $195 per incident and there are several different plans.

    I went to microsoft.com/support and clicked on server support. It starts at $259 everywhere I looked. The same server support for Ubuntu is more expensive for your first incident--but if you have two incidents, you're ahead of Microsoft.

    There is no charge if Microsoft is unable to rectify the complainant's problem..

    You have conveniently failed to address what I was stating in my original post; ie, the "free" factor evaporates once support is required. In effect both Microsoft and any allegedly "free" OS, the cost is the same for support as long as it is not required.

    The free factor doesn't evaporate. Something like Ubuntu still costs $0 while Microsoft's offerings do not start at $0.

    In 20 years I have never called MS for any technical support. I am able to read and comprehend technical manuals, etcetera rather well. Plus, when a new device is released I do not have to wait for months (years, never) for support for that device with Microsoft. I have customers that demand high quality service.

    Really? You've been able to fix your own bugs with Windows ME, Windows Vista, Exchange 5.5, Sharepoint, etc? You're telling me in 20 years, you've never run into a developer-created bug or that you've magically prayed to the Ballmer and it wasn't an issue? I haven't even done that in Linux. The difference is I can fix most of my own Linux bugs. With Microsoft, you must call them--even if they end up acknowledging it and reversing the support charge at the end.

    Which new devices have you used in Windows that weren't already available in Linux?
    USB was supported first in Linux.
    IPv6 was supported first in Linux.
    I know wireless sucks in Linux, but that's because communication with a wireless card isn't a standard like IPv6 or USB is a standard.
    Plugging crap into my windows box generates endless popups and disk thrashing while it searches for drivers and usually fails. In Linux, I plug it in and by the time I look back up at the screen, I have a camera or USB drive mounted, or even a bluetooth device ready to use...

    Now, I do have a FreeBSD server at home. It is a nice hobbyist toy and I do enjoy playing around with it from time to time. However, I would never use it in a mission critical environment.

    Funny--I talked with a guy yesterday who runs a site used heavily by the insurance business. He said when he launched the site he chose BSD because the Microsoft option was prohibitively expensive. He had 5 servers and two load balancers. He said other than the hardware, it cost him nothing. Contrast that with whatever server version of Windows does clustering and load balancing. I remember running it back in the 2000-era (iirc) and it was tens of thousands of dollars.

    --
    There's no place like ::1 (I've completed my transition to IPv6)
  122. Re:None of them by hairyfeet · · Score: 1

    Do you even hear yourself? The amount of logical hop jumping and plain denial is just astounding! I guess denial isn't just a river in Egypt huh? Because the simple fact that you honestly believe that I should tell customers to learn to recompile their own drivers which BTW won't do SHIT when it comes to some of Linus's serious kernel fucking, is simply beyond ridiculous. How can you stand here with a straight face and claim your OS is ready for the masses ,em>if they need to compile their own drivers

    And WTF do Windows forums have to do with shit/ Or Lowes? you NEVER need Windows forums after simply updating the OS whereas you BETTER be ready to spend an assload of time at your distros forums with make/model/rev thanks to updates breaking shit left and right, which again I linked to. This is a classic case of "moving the goalposts" as you refuse to acknowledge, even though I provided links rubbing your nose in it, that even Dell can't keep the drivers working which is beyond insanity! And who gives a shit what some enterprise, which BTW has these things called "admins" that get paid big bucks to deal with broken shit like drivers and has about as much to do with retail PC sales as a car does with an F-16, have to do with this discussion? Did I MENTION anywhere enterprise? Or say in any place that we were talking about, in no particular order, enterprise deployments, servers, routers, cell phones, or any other damned thing that isn't a retail Linux sale? Nope don't think so.

    In the end the numbers don't lie. no retail B&M store will touch your OS, and after 20 years Linux is so far behind /. has an article congratulating Linux on reaching a whole 1%! Woo Hoo, and it only took 20 damned years! If the "community" continues like you with elitism, refusing to see problems and correct them, refusing to make things easy for the user, and most importantly refusing to keep Linus from constantly breaking shit, then don't be surprised that it takes Linux another 20 damned years to reach that magical 2%. The simple fact is it isn't 1993, and users aren't gonna jump through flaming hoops simply for "free as in freedom, fight teh power!" bullshit. you gotta be better or at the very least as good, and frankly with the kernel futzing Linux doesn't even rank as high as Windows 98 in my book, MAYBE Win 3.1. Because with Windows 98 I could actually take a RTM and update it to the last patch and the drivers still worked whereas the Linux update notifier may as well be a "Break Linux NOW!" button, for all the broken drivers. It is pretty God damned sad when you can't even run updates without your OS shitting itself, and if the choices are 1.-Give them a broken OS and telling them "RTFM Noob LOL!" 2.-Turning off ALL updates and leaving them as vulnerable as any other unpatched OS, or 3.-Installing Windows and at least having it run until EOL without having broken drivers? Well at $35 an hour it really only takes a single forum hunt to make your OS more expensive than Windows. But you pretend it is all a conspiracy, that we 'just don't understand" your OS. it reminds me of that old joke "I have no friends, Linux has no friends, maybe I can be Linux's friend!" because the public sure as hell ain't touching it!

    --
    ACs don't waste your time replying, your posts are never seen by me.
  123. Re:None of them by Billly+Gates · · Score: 1

    Haha

    That was an accurate, yet ballsy post to mention in slashdot of all places. I got flamed and modded down when stating the obvious with Linux myself, yet I still like it as a Server OS. Linux usage has gone down according to statcounter from nearly 1% to .7%. A very big drop thanks to Windows 7. I used to use Linux but finally gave up on it for the reasons you described. I am contemplating installing it today in a VM so I can run a LAMP stack with PostgresSQL as well as Joomla. But I am in the small small minority of users.

    Windows has it's weaknesses but being a consumer OS for business and home users is certainly not one fo them. Compiling a kernel is rediculous. I did PC support as a contractor on teh side and only mentioned Linux to a tech shop because the user needed a server for 10 users and didn't want to pay for Windows 2003 Server Small Business Edition and only needed a file server, domain controller, and a simple email and internet site. Linux fit the bill and hosted all 4 nicely, but that was supported server hardware and not for John in the Office to play his games or run Office on his Toshiba laptop with strange/cheap hardware. Dells do not even make good drivers for Windows in my experience and I hate them with a passion. However, since Michael Dell returned the quality has improved tremendously.

    I read your posts and you know your stuff. I used to charge $75/hr when I lived in Alaska for the rates and it sounds like the $35/hr might be a little low for your expertise. I never heard of that app you mentioned that kills adware infected with Flash. I will give it a try since I use music on youtube and prefer not to live without it.

  124. Re:None of them by Billly+Gates · · Score: 1

    Dell and Asus once sold Linux briefly at BestBuy before they pulled it. Walmart did too. Why?

    Because Joe Six Pack became furious as to why MS Office wouldn't work or why his resume created with OpenOffice looked like crap when a potential employer opend it with Word. OR why little Timmy's pc games with DirectX couldn't run on them? ETC.

    Not to mention BestBuy realized that consumers buying these cheap linux books would not provide any profit margins by buying anti virus software and printers. They lose money on every machine sold and only make it by spammer you with accessories and software. Bad for retail ...

    This is why Windows is here to stay. If you hate it, save up for a Mac. That is a consumer OS as well yet expensive that is higher quality than Linux or Windows. Or get one of those tablets running Andriod. There are options.

  125. air jordan high heels by Anonymous Coward · · Score: 0

    I am an air jordan shoes addicted.I love air jordan high heels so much. With perfect design and charming styles air jordan high heels are utilised as being a classy style wearing.

  126. Anything Redhat-based... by Anonymous Coward · · Score: 0

    Way, way back in ye olden days, when I built my first cluster with my boss, we looked at all the distros (the only major change in large distros since then, arguably, is that Mandrake is gone Ubuntu now fills the slot that Debian did) available, and Redhat had some major things going for it (not trying to start a holy war, or be flamebait, so please don't start anything but an actual discussion) :

    1) Automated installation - Kickstart was there when no other distro of the time (this was when Redhat Linux - not RHEL or Fedora, but good old Redhat Linux - 6.2 was just being released) had anything remotely equivalent. It's still the best-of-breed for automated, programmable Linux installations, and great now with PXE and TFTP. The only thing I could ever complain about is that the package selection macros could be improved/expanded upon.

    2) Package Management - At the time, we decided that RPM and DPKG were roughly equivalent at the time, but RPM was slightly ahead, looked to be moving more in the direction we wanted (look at where RPM and YUM are nowadays - I actually wrote something almost identical to YUM before it was around, for use by our desktop workstations...it was a lot cruder, but it operated on the same principles of building repositories and automated versus hands-on updating; there was no need to run it on our clusters, since once the nodes were installed, packages were very rarely changed, and when it was done it was a fairly significant event, best done by pushing the RPM out to the /tmp directory and then run a parallel shell command to pull it upgrade or install the package and then erase it if successful; in the rare event of an OS upgrade, we simply marked the node as using the new OS in our cluster management config files, then set it for installation since the local disk was just for scratch space and OS, rebooted the nodes N at a time [so we didn't overload our NFS server(s)], let them re-install, and then we were up and running again), and won out in terms of availability of packages. That last one has really proven true over time - most companies, if they bother putting something into a package manager at all, they almost always pick RPM. Or if they have a plain old-fashioned tarball and installation script (or Makefile, or whatever) you can wrap that up in an RPM. Or just write your own within the spec file, using the nifty RPM macros to help you.

    3) Package Availability/Compatibility - Almost all of the major packages that we needed were already available either as part of the distribution or a download away. I keenly remember that I only had to build 4 RPMs for *all* of our proprietary, in-house software to run (and out of those 4 RPMs, 2 were already available as part of the distro. but we needed a different version for library compatibility).

    And basically, those core truths still hold true to this day. Admittedly, both SuSE and Debian-based distros (Ubuntu being the most prominent, of course) have improved dramatically, but SuSE was more-or-less based off of Redhat, just with some different design goals at the time (mostly being more user-friendly, I believe, though I never talked to the developers so I can only guess from experience with the various versions of SuSE). These days, RHEL (or CentOS, which is a freely available version of the same; it's RHEL w/out the tags...including the price tag!) is nice if you can afford it (I've worked places where the RHEL subscriptions were donated by the same vendor that donated the hardware, and I've also worked places where we just ran RHEL on the I/O and management/login nodes and then used CentOS on the compute nodes [but this can be irritating because RHEL and CentOS are almost, but not quite, identical - keep that in mind], and I've also worked at places where we used CentOS on the I/O and management/login nodes and Fedora on the compute nodes) because of the stability (but it's also a drag if you have users that want/need the latest and greatest features - I think you might want to look at the EPEL,