Slashdot Mirror


High-Performance Linux Clustering

An anonymous reader writes "High Performance Computing (HPC) has become easier, and two reasons are the adoption of open source software concepts and the introduction and refinement of clustering technology. This first of two articles discusses the types of clusters available, uses for those clusters, reasons clusters have become popular for HPC, some fundamentals of HPC, and the role of Linux in HPC."

27 of 129 comments (clear)

  1. Imagine by commodoresloat · · Score: 4, Funny
    Single-processor implementations of this!

    *ducks*

  2. Geek by mysqlrocks · · Score: 4, Interesting

    With Linux and other freely available open source software components for clustering and improvements in commodity hardware, the situation now is quite different. You can build powerful clusters with a very small budget and keep adding extra nodes based on need.

    Yea, I'd like to build one but I'm not sure what I'd use it for. Does that mean I'm a geek?

    1. Re:Geek by donkeyoverlord · · Score: 3, Interesting

      Use it to crack some password withCisilia.

    2. Re:Geek by burnin1965 · · Score: 4, Informative

      Do you watch DVDs? Do you dream of squeezing all your DVDs onto a harddrive and streaming them to a media PC attached to your TV?

      You could copy the DVDs at ~8GB each to some large harddrives or you could transcode them to much smaller formats with all the garbage removed and go from ~8GB/movie to less than 4GB/movie. But to do this you need lots of processing power. A cluster works very good for this and the software is already there for you:

      http://www.exit1.org/dvdrip/doc/cluster.cipp

      For the cost of some overpriced Dell crap video editing PC you could build a decent diskless cluster. Who needs harddrives, monitors, video cards, keyboards, mice, etc. At least more than one set. ;)

      burnin

  3. Advice: Don't use Itaniums for Linux cluster by Work+Account · · Score: 5, Interesting

    We spent $849,000 on an Itanium cluster and have recently found ourselves SOL since it's a dying architecture.

    You can't even run Java on them.

    --

    If you "get" pointers add me as a friend (116)!
  4. A Thought by Crusader7 · · Score: 3, Interesting

    Okay, so I'd really enjoy trying something like the clustered model, just for academic kicks, but a relevant question comes to mind, at least for me.

    Where do people get the commodity systems cheap enough to be able to play around with this? I hardly want to spend two thousand bucks on some old P2s just to play around. Anyone have some hot tips where you can find real cheap (dare I dream... free) commodity systems to build a low-end cluster for kicks?

    Also, I'm a Windows guy by trade. Will making a Linux cluster make me instantly cool? :)

    1. Re:A Thought by burnin1965 · · Score: 2, Interesting

      linux cluster on the cheap:

      Go with a diskless cluster.
      Buy all in one motherboards, video, ethernet.
      Cases are pretty cheap, but you can save by creating a custom rack solution.
      Spend a little extra on 80% efficiency power supplies ( http://www.seasonic.com/co/index.jsp ).

      with that route you could build a decent little cluster for under $2k (USD).

      Will it make you cool, doubt it, but the path to the solution will teach you many lessons.

      burnin

    2. Re:A Thought by Procyon101 · · Score: 3, Interesting

      I make it known to friends and relatives I will set up their new computers in exchange for taking their old ones off their hands... I transfer the data over, make sure it's configured, etc... then take my new cluster node home. :) I get some pretty nice systems this way, since running XP on less than 1 Ghz/512 MB ram is pretty painful nowdays people upgrade in droves.

  5. This hasn't been my experience by composer777 · · Score: 4, Interesting

    From everything I've seen, MOSIX is having some issues right now. Unfortunately, MOSIX is one of the easiest, most flexible ways to set up an HPC, and ever since they forked, development has been slow. I did research about 2 months ago to look into setting up a small MOSIX cluster with a few computers. My main goal was to get my feet wet in setting up a cluster using a few desktop and laptop computers. I figured that setting up a cluster with my Athlon 64 x2, Athlon 64 3500+, and a few laptops would speed up compile times by quite a bit. But, it appears that the 2.6 version of MOSIX is still beta and won't support the kernel I need for my Athlon 64 x2 (versions before 2.6.9 don't support powernow with the x2, and also tend to be flaky). So, I have the choice of running a cluster with slower PC's, or waiting for better support. If you look at the year on some of those whitepapers, only one was written this year, and I'd be willing to bet they are describing how to use MOSIX with the 2.4 kernel, not 2.6. I finally gave up on the idea, as running the latest kernel is more important to me.

    1. Re:This hasn't been my experience by photon317 · · Score: 2, Informative


      MOSIX is really more of a halfway-point between a traditional cluster and a "single system image" sort of cluster. Unfortunately, some aspects of clustered computing are still extremely difficult to abstract away into an ssi type of implementation. I had hoped over the years that the MOSIX work would get folded in with mainstream Linux's NUMA scheduling and memory allocation, essentially treating non-local cpu and memory resources (other nodes) like a second layer of NUMA with even less connectivity than local NUMA nodes have. Throw in a truly distributed redundant filesystem (ala the google filesystem that we don't know all the details about), and we could really begin to approach the concept of turning large-scale local and even distributed clusters into truer single system images. But the kinds of fundamental work that needs to be done to get these things rolling hasn't even started, so I don't expect it to happen anytime soon. Aside from all that clustering and kernel work, there would have to be some evolutionary changes in how we write code, and in the languages we write it in, in order to smoothly take advantage of the dynamic availability and locality of resources easily.

      For now, your best bet is to construct your HPC linux cluster as a high-speed network of interconnected but independant Linux machines using a network topology that suits the class of problems you face and how easily the problem can be broken into loosely coupled peices, and then code the "clustering" aspect into your application code itself, usually working off of libraries like MPI.

      --
      11*43+456^2
  6. Re:Advice: Don't use Itaniums for Linux cluster by sho222 · · Score: 3, Informative

    You can't even run Java on them.

    What do you mean? I thought 1.4.2 and up had support for Itanium. Check this white paper (search for Itanium). Are their claims false, or are you running and older version of the JRE?

  7. Don't forget this low power hardware. by Erris · · Score: 3, Informative
    Cool, IBM on software. Add that to this hardware from a year ago and you are off to the races. Of course, you could just build the system as designed. Performance does not have to suck electricity and heat your home.

    I'm wanting to build one of these, but I really don't need it. Time may change that.

    --
    DMCA, Hollings, Palladium. What might have sounded like paranoia is now common sense.
  8. Re:Imagine! by maswan · · Score: 4, Informative
    Beowulf is a specific project/software for doing clusters. In reality, it is not that popular. There are lots of different "whole clustering solutions", and beowulf is one of those. Even more common in the HPC world is probably homegrown solutions, based on common components.

    /MattiasWadenstein - HPC sysadmin during weekdays

  9. Aggregate.org by PAPPP · · Score: 5, Informative

    For some very good information on F/OSS based clustering, check out aggregate.org. They have really neat ideas, that are reasonably well doccumented and freely implementable/usable. I built a little cluster (AFAPI on a WAPERS switch) with them for my highschool senior project, and it was a great experence.

  10. article sort of misleading on mpp/cluster by flaming-opus · · Score: 5, Informative

    Though mpp's are kind of like clusters, and the boundary between the two is vague, I think there's definately a distinction. In many MPPs, nodes share access to memory, just at a performance penalty. Often the scientific binary is written using a message-passing tool like MPI, but the OS is often run with direct memory access. Definately from a systems-administration point of view, an mpp is different from a cluster. In an MPP you don't have 4000 root hard drives and 4000 power supplies to replace when they break. An mpp may be like a (fast) cluster from the programmer's point of view, but they are a lot simpler to deploy and manage. (Blue Gene, xt3, altix)

    I also contest some of the distinctions drawn about vector processor systems. The two vector systems currently on the market, the cray X1 and the NEC SX-8 are clusters. Each node just happens to be a vector-smp. The earth simulator is a 640 node cluster of 8-way SMP boxes, where each of the processors in the smp is a vector cpu. However, the predominant programming method even on these boxes is with explicite message passing like MPI. Co-array fortran and Unified Parallel C are faster, but slow to catch on.

    Good summary of the common case though.

  11. Imagine by temojen · · Score: 4, Interesting

    Zero CPU implementations of this.

  12. Not to pick at Big Blue by Frumious+Wombat · · Score: 4, Interesting

    But their links could at least have mentioned OSCAR http://oscar.openclustergroup.org/ or my personal favorite, ROCKS http://www.rocksclusters.org/, as these are more prevalent than xCat systems.

    Personally, I like Rocks, as I ran three parallel architectures (i386/AMD64/IA64), on the same based distribution, just with each tuned to their particular processor. Comes with SGE and Myrinet support out of the box, and there are Rolls, i.e. custom software assemblages, for OpenPBS, for those who prefer it, as well as PVFS. It's easy to set up, and easy to administer, as the nodes are presumed to be interchangeable and disposable. When you reboot a node, it's obliterated and a fresh OS and supplementary package repository are laid down on a clean disk. No questions about version skew.

    They now have a custom roll to help you build a visualization wall, but I never had a chance to try that one. (try convincing your boss that you want 4 digital projectors and a big room to play with)

    The downside to the above distributions are that they presume batch-queue environments, which is appropriate for most of my work, but less so for many people trying to simulate owning an SMP, without paying SMP prices.

    Other people assure me that the current version of OSCAR is solid as well, but they seem to lag in the multiple architecture support area (Itanium is always behind), and don't current support AMD64 natively. On the other hand, they build on top of several RedHatish linuces, as opposed to Rocks where you get Centos (RHEL), period.

    --
    the more accurate the calculations became, the more the concepts tended to vanish into thin air. R. S. Mulliken
  13. Re:Beowulf Clusters by Hosiah · · Score: 2, Funny
    What else can we learn from Slashdot moderation, I wonder?

    Economics? I know every time I get mod points, it's like I'm racing the clock. "Oh, God, four left to spend in two hours! Uh, OK, this threads should need some modding. This post...fudge, it's maxed out! OK, this post...is this worth moderation? Isn't it? Is it? OK, *this* post...it's a troll, I'll...no, wait, somebody got it already...!" I end up blowing them away on frivilous mods or wasting them and letting them expire, I don't know which is worse. All I know is, I never have them when I see something really crying out for moderation.

    Kind of like the value of dollars in an inflationary spiral.

  14. Rocks Clusters by lheal · · Score: 4, Informative

    Rocks has a great system for making high-performance clusters from similar machines. A Rocks cluster consists of a front-end ("master") node and a bunch of compute nodes (and I think special-purpose nodes).

    The master gets a full Linux (RedHat-based) install. It's a NFS/DHCP/Kickstart server for the compute nodes, and runs whatever other services you want the compute nodes to use. The master has two network cards and acts as a firewall (NAT optional).

    The compute nodes boot via DHCP and Kickstart, downloading their kernel and whatever other OS files you want to their local disk. You decide how much NFS or local disk to use.

    Job queueing is handled by, e.g., Sun Grid Engine (an Open Source queueing package) or some other queueing software.

    Here's the neat thing: to make a change to a compute node setup, you change the Kickstart config and reboot all the compute nodes (as they finish whatever queued work they're doing, or immediately if you want). That makes the sysadmin's life easy, while still maintaining the speed of having the OS on the local disk.

    --
    Raise your children as if you were teaching them to raise your grandchildren, because you are.
  15. Re:Proof by BiggerIsBetter · · Score: 2, Informative

    That's a bugger, but still, web-browser applets != applications. If they offer the J2SE for Itanium, you should be good to go with anything other than browser applets. Java applications should run just fine, and even stand-alone applets should be runnable with the Java appletviewer.

    --
    Forget thrust, drag, lift and weight. Airplanes fly because of money.
  16. Re:Imagine! by burnin1965 · · Score: 2, Interesting

    There are other cluster solutions, i.e. http://warewulf.lbl.gov/pmwiki/

    But you can also roll your own. I did mine with Fedora by taking a fresh Fedora install, duplicating the common parts into a common NFS share, duplicating the distinct parts into a template and subsequent node NFS shares, compiled a custom NFSroot Fedora kernel, then setup a DHCP and TFTP server for the diskless nodes to PXE boot from.

    burnin

  17. Re:Proof by sho222 · · Score: 3, Funny

    Yeah, ok, I'm confused. I naturally thought you were talking about running java application servers or the like on Itaniums. It didn't cross my mind that you bought nearly a million dollars in hardware to run applets in your browser.

  18. You are absolutely clueless by Work+Account · · Score: 2, Informative

    I would wager you've never used Java. I say that not as an insult, but because you simply have yet to realize that a change made back in 1996 made the average piece of Java code run as fast as the average piece of C/C++ code.

    ---

    Five composite benchmarks listed below show that modern Java has acceptable performance, being nearly equal to (and in many cases faster than) C/C++ across a number of benchmarks.

    1. Numerical Kernels

    Benchmarking Java against C and Fortran for Scientific Applications
    Mark Bull, Lorna Smith, Lindsay Pottage, Robin Freeman,
    EPCC, University of Edinburgh (2001).

    The authors test some real numerical codes (FFT, Matrix factorization, SOR, fluid solver, N-body) on several architectures and compilers. On Intel they found that the Java performance was very reasonable compared to C (e.g, 20% slower), and that Java was faster than at least one C compiler (KAI compiler on Linux).

    The authors conclude, "On Intel Pentium hardware, especially with Linux, the performance gap is small enough to be of little or no concern to programmers."
    2.
    More numerical methods: SciMark2 scores

    R. F. Boisvert, J. Moriera, M. Phillipsen, R. Pozo,
    Java and Numerical Computing,
    Computing in Science & Engineering, 3(2):18-24, Mar.-Apr., 2001.

    SciMark includes a number of numerical codes. On a PIII/500, MFlops (higher is better):
    ibm jdk 1.3.0 84.5
    linux2.2 gcc (2.9x) -O6 87.1
    3.
    Still more numerical methods
    From the book Object-Oriented Implementations of Numerical Methods by Didier Besset (MorganKaufmann, 2001):

    Operation Units C Smalltalk Java
    Polynomial 10th degree msec. 1.1 27.7 9.0
    Neville Interpolation (20 points) msec. 0.9 11.0 0.8
    LUP matrix inversion (100 x 100) sec. 3.9 22.9 1.0

    4. Microbenchmarks (cache effects considered)

    Several years ago these benchmarks showed java performance at the time to be somewhere in the middle of C compiler performance - faster than the worst C compilers, slower than the best. These are "microbenchmarks", but they do have the advantage that they were run across a number of different problem sizes and thus the results are not reflecting a lucky cache interaction (see more details on this issue in the next section).

    These benchmarks were updated with a more recent java(1.4) and gcc(3.2), using full optimization (gcc -O3 -mcpu=pentiumpro -fexpensive-optimizations -fschedule-insns2...). This time java is faster than C the majority of the tests, by a factor of more than 2 in some cases... ... suggesting that java performance is catching up to or even pulling ahead of gcc at least.

    These test were mostly integer (except for an FFT).
    5.
    Microbenchmarks (cache effects not considered)
    In January 2004 OSNews.com posted an article, Nine Language Performance Round-up: Benchmarking Math & File I-O. These

    --

    If you "get" pointers add me as a friend (116)!
  19. Why SMP & Vector Processing Aren't Options (No by cmholm · · Score: 2, Informative
    I had originally posted this 'way back during the Ask Donald Becker call for comments. AFAIK, we never got a Donald Becker Replies, but life goes on. I should note that my shop is loaded to the gunwales with IBM clusters, some of the nodes for which are 32 cpu SMPs.

    [Donald's] work in making the "piles of PCs" approach to high performance computing a reality with Beowulf has been responsible for vastly expanding the construction and use of massively parallel systems. Now, viturally any high school - never mind college - can afford to construct a system on which students can learn and apply advanced numerical methods.
    In retrospect, however, it would seem that the obvious cost benefits of Beowulf very nearly killed the development and use of large SMP and vector processing systems in the US. My understanding of the situation is this:

    * Before Beowulf, academics had a very hard time getting time on hideously expensive HPC systems.

    * When Beowulf started to prove itself, particularly with embarrassingly parallel problems using MPI, those academics who happened to sit on DARPA review panels pushed hard to choke off funding for other HPC architectures, promising that they could make distributed memory parallel systems all singing, all dancing, and cheap(er).

    * They couldn't really deliver, but in the meantime, Federal dollars for large shared memory and vector processing systems vanished, and the product lines and/or vendors with it.... at least in the US.

    * Eight years later, only Fujitsu and NEC make truly advanced vector systems [top500.org], and Cray is only now crawling back out of the muck to deliver a new product. Evidently someone near the Beltway needs a better vector machine, and Congress ain't paying for anything made across the pond.


    Cutting to the chase, did [Donald Becker] advance a "political" stand among [his] peers within the public-funded HPC community, or [was he] just trying to get some work done with the budget available at NASA?

    --
    Luke, help me take this mask off ... Just for once, let me butterfly kiss you with my own eyes.
  20. Re:Imagine! by maswan · · Score: 2, Informative
    Well, other than beowulf, there is NPACI(sp?) Rocks and a few others like that. I don't have personal experience with those though, so I've probably missed alot. Then you have the turn-key ready cluster from a vendor type of ready clusters. There you pay IBM or Penguin Computing or whoever to do all this for you before startup, of course, then the maintenance is up to you.


    By components I mean software, since hardware is basically just a bunch of servers (or desktops), with optionally faster than commodity network and some stuff like that. The optional parts depend on what kind of applications that you run.


    The most important cluster components are a base operating system and a batch scheduler like torque or slurm. There are also communications libraries (MPI and friends) and optimised math routines (matrix calculations, FFTs, etc) for some application types.


    Then we have the administrative side, where it isn't that specific to HPC clusters, but a general matter for anyone that is handling a large number of similar machines. You want to have an automatic installation method, not answer 25 questions on the console every time you need to reinstall or add a node. You want to have a convenient way of synchronising configuration and settings. You want a distributed shell to run one command on all/many/several nodes without lots of arrow up and command line editing.


    This should be familiar to both cluster admins and admins of server farms or large deployments of desktops too. Automate repetitive tasks, choose tools that reduces the maintenance burden, etc.

    /Mattias Wadenstein

  21. Smokin Crack.. by tempest69 · · Score: 2, Interesting
    We spent $849,000 on an Itanium cluster and have recently found ourselves SOL since it's a dying architecture.

    You can't even run Java on them.

    ok, Where to begin...

    First, spending a million bucks on a machine that doesnt meet your needs. I hope there is an accountant ready to spank someone over this.

    Second, using Java in a massivly parallel fashion.. Last I knew there wasnt a MPI or PVM port that used Java, plus it kinda defeats the purpose of having big hardware running a slower language(yes I know compiled java can be fast, but Nowhere like near metal C ).

    Third, Giving up on a hellfire machine.... Really dont Boo-hoo that you cant make it run java, open a book, code some C++ and make the thing work. If you have a problem for it to solve, then solve it, heck use a cross compiler to translate your java core over to c-c++ if you feel the urge. Itanium is a brutally fast architecture, and until it's mips/watts ratio drops well below the norm, your buisness case for scrapping it is going to be tough.

    Storm

  22. my experience by netjiro · · Score: 3, Interesting

    I have deployed several clusters throughout the years, mainly for research in academic environments and small companies, and I can say that clustering makes a lot of things soo much easier.

    Diskless SSI clustering makes maintainance a breeze, and ensures that all systems are always in sync and up to date. All nodes can run the same system image, whether they are servers, dedicated compute nodes, or regular desktop machines.
    Of course you can still have local hard disks if you want, and for some apps it is recommended, but the system boots from the servers nontheless.

    OpenMosix dynamic distribution makes it possible to use heterogenous hardware, and handles highly dynamic computational load quite well. The applications just wander off to whatever physical machine will run them the fastest.
    This also makes simple parallel implementations of code a lot simpler, just fork and forget, and you will pay a small overhead for the benefit of having good load-balancing automagically.

    Dymanic distribution also makes it possible to use regular desktops as cluster nodes along with the dedicated compute nodes.
    Need windows dualboot on some nodes? no problem, when you shut them down do boot windows, the processes that used to run on those machines just migrate to another node. When you go back to linux, processes come back.

    Need explicit parallelism? no probs, MPI / PVM etc works fine together with the dynamic distribution and complements it for applications that are already well parallelized.

    Scaling? This has never been an issue as long as the network infrastructure is up to speed. A decent 100mb or gigabit system has proven to be good enough for just about everything I've seen.

    High availability? How about having several servers that can run hot or cold spare for each other, and which can function as compute nodes as well... Nice when a server MB catches fire (yes, I've had that, and lost as much as a few minutes of work time, (the time for someone to walk to the server room, unplug the smoking machine and restart a running (cold spare) backup server). Most of the people at the lab didn't even notice the hickup.)

    Batch/job queues? no probs, use sun grid engine, write your own, or whatever. simple as cake.

    I have mainly used gentoo linux for the flexibility and ease of maintainance and I can highly recommend it. It is all fairly simple to implement on gentoo. Just read up on gentoo system administration, pxelinux, tftp, openmosix, and whatever you feel you need to use it for.

    The main problem right now is the lack of good openmosix support for 2.6 series of kernels. But I'm sure that some or all of this can be built with any or all of the other dynamic distribution systems out there.

    If you have off-list questions please contact me at my nick at gmail.com.