Slashdot Mirror


How to get 1.5 TeraFlops from Linux

Oak Ridge National Lab has purchased from SGI an Altix 3000 (flash movie). This article claims that: SGI Altix 3000 is recognized as the first Linux cluster that scales up to 64 processors within each node and the first cluster ever to allow global shared memory access across nodes. There is more here, here, and here.

55 of 280 comments (clear)

  1. Look Out! by TWX · · Score: 4, Funny

    "SGI Altix 3000 is recognized as the first Linux cluster that scales up to 64 processors"

    SCO will be all over your ass now!

    --
    Do not look into laser with remaining eye.
    1. Re:Look Out! by monkey_jam · · Score: 4, Funny

      ..maybe you should learn to wipe better...

  2. Beowulf cluster jokes... by Kiriwas · · Score: 5, Interesting

    After all the beowulf cluster jokes, I am still incredibly curious about them. My goal is to build a small 5-6 node cluster by the end of the summer. The thing is, I still know very little about them. Every jokes about them, but no one puts any useful information. Are there specific langauges one must program in to tak advantage of the multiple processors? Or does the OS take care of that? How much speed can you actually get out of them? Is it pure processing power? Or is there more? I'm very curious and want to know.

    1. Re:Beowulf cluster jokes... by SkArcher · · Score: 2, Informative

      here and here are probably good places to look.

      --

      An infinite number of monkeys will eventually come up with the complete works of /.
    2. Re:Beowulf cluster jokes... by gladbach · · Score: 5, Informative

      just download clusterknoppix and knock yourself out. ; )

      http://bofh.be/clusterknoppix/

      --
      "Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd all be running around in darkened rooms,
    3. Re:Beowulf cluster jokes... by The_ForeignEye · · Score: 5, Informative

      Back in my days of parallel programming (read: 1998) on Beowulf clusters I used Fortran and C. The trick to make your program "parallel" is to use special programming libraries that will spawn instances of your program across the cluster and let them communicate between each other. The libraries I used were PVM and MPI.

      At that time they were working on a Java implementation, but I don't know what happened with that.

    4. Re:Beowulf cluster jokes... by 3141 · · Score: 2, Informative

      Another poster mentioned MOSIX, but openMosix is probably a better bet. It's released under the GPL, and is a combination of kernel-patch and user-space tools. Once you get these installed on each node, and connected via ethernet (all with networking set up of course... IP addresses etc) you should have yourself a cluster.

    5. Re:Beowulf cluster jokes... by RussianBeard · · Score: 2, Informative

      Take a look at OSCAR. We built a nine node cluster out of IBM e-servers using it. It was really quite straightforward.

      As far as languages go, you'll need an MPI library (like MPICH, or LAM/MPI (which is also a runtime environment), but the actual code used is usually C, C++, or Fortran. BTW, OSCAR comes with MPICH and LAM/MPI.

    6. Re:Beowulf cluster jokes... by Helmut+Kool · · Score: 2, Funny

      You don't seem to remember the joke well enough. You're supposed to IMAGINE the beowulf cluster, not actually build it.

    7. Re:Beowulf cluster jokes... by oudzeeman · · Score: 2, Informative

      This SGI isn't a beowulf cluster. Traditionally beowulf clusters refer to clusters that use COTS hardware, don't have global shared memory, etc. Lots of people in the cluster community won't even call clusters of workstations beowulf clusters if they have some high speed network like Myrinet. We just call ours a Linux cluster, a cluster, a distributed memory supercomputer... You can program your beowulf cluster in C or Fortran using a free MPI(message passing interface) implementation called MPICH. I have even seen a scaled down version of MPI for Python, (which requires MPICH to use). So start learning MPI. MPI-1 has 129 functions, but you can write most programs using a small subset of these calls. If you don't want to pay much money I suggest using C, because g77 sucks and there are no free Fortran 90 compilers. We use the Portland Group Fortran and C compilers as well as the Intel Fortran Compiler. I think we are going to switch completely to Intel Fortran and C. Why do you want to use a beowulf cluster if you have no clue about them or parallel programming in general? Just because they are 'cool'? A beowulf cluster is very usefull for modeling or datamining, but unless you are running models that take days/weeks/months on your workstation you won't need the processing power of a cluster. Right now we have someone running a model on 76 processors that takes about 9 hours to finish a 1 year cycle in the model. They want to run the model for a total of 50 years. This is a model of the pacific ocean where they introduce carbon into the ocean, and then they see what effect that has on temperature change, etc. After they get their 50 year resluts for the Pacific they want to do a global simulation. This is the real use of beowulf clusters. They aren't for load ballancing web servers, playing quake, or any of the other things people post about every time there is an article about supercomputers/beowulf clusters. The speed up you will get really depends on your application. The more communication is necessary, the smaller the speed up will be. If you have a 5 node cluster, with 2 processors per node, the theoretical maximum speed-up is 10, but you will never achieve that because of parallel overhead(MPI calls, communication time, etc). If you want more information on parallel programming and cluster computing send me a private message telling me what you hope to do with your cluster.

    8. Re:Beowulf cluster jokes... by battjt · · Score: 2, Interesting

      Bull.

      int a;
      void doSomeCalculations(int i) {
      a = doSomethingElse(a + i);
      }

      Would fail (multiple threaded access to a). It is extremely difficult to detect sideffects in C. I've never seen a "smart" compiler as you put it, though there are systems where the programmers can explicitely parallelize a loop.

      --
      Joe Batt Solid Design
  3. Apple by zzzmarcus · · Score: 5, Funny

    Oh great... I can see Jobs wringing his hands already.

    "Now how am I going to make the G5's look faster than THIS?"

    1. Re:Apple by Trigun · · Score: 2, Funny

      He just has to tell the apple fanboys that a G5 is, and they'll believe it.

      For his next trick, Jobs is going to walk on water.

    2. Re:Apple by Umrick · · Score: 2, Interesting

      Rendevous will be used in 10.3 with Xcode to discover resources and distribute software builds across available 10.3 machines. If there's a perceived benefit to Apple, do you honestly think there's anything preventing the next version 10.4 from having distributed capabilities?

      You can already compile programs with LAM-MPI support, so in reality there is nada stopping you from building a Beowulf cluster of XServes. There may even be a compelling reason to use XServes over x86 boxes after XServers are updated to G5s.

      Rumor was the original XServes were built to spec for a distributed cluster for a Blast! genome search engine.

      People get hung up on Beowulf = Linux, and that isn't necessarily the case if you take Beowulf to mean a cluster of inexpensive machines.

      OS 10.2+ with Rendevouz autodiscovery using LAM-MPI for communicating could just be a killer configuration. Lord knows cluster management/monitoring would be outstanding, though perhaps the setup would not be as simple initially.

  4. kernel sources? by gladbach · · Score: 5, Interesting

    they going to release their kernel that allows them to globally share memory? or is it more of a hardware thing, than software?

    --
    "Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd all be running around in darkened rooms,
    1. Re:kernel sources? by Anonymous Coward · · Score: 2, Informative

      I'm 100% sure it's very much a hardware thing. SGI has a long history of building very large hardware shared memory machines (e.g., the Origin line - 02000 and 03000) based on proprietary MIPS processors. They still make those machines, but market pressures forced them to also develop and sell Intel-based shared memory machines. I'll be curious to see how much of SGI's extensive work on IRIX to let it scale to 1000's of processors efficiently will bubble out to their Linux systems.

  5. Better than Beowulf for normal use... by TWX · · Score: 5, Informative

    You're better off using mosix. It'll allow for more normal (ie, not beowulf specific) applications to thread across computers. I'd imagine that an open-mosix setup (like the ones using the knoppix boot CDs tailored to it) could probably make for a fairly powerful computing cluster very easily.

    --
    Do not look into laser with remaining eye.
    1. Re:Better than Beowulf for normal use... by yuvtob · · Score: 3, Informative

      while you are probably right that for most cases mosix will do just fine (I used it for a ~50 PC cluster at nights for DSP calcs), these machines are for super-computer calculations that require a lot of memory. If you even could run a 2GB process on mosix, it would be slowed down by the network, and these beasts can run 100GB processes at a 2GB/s interconnect !

    2. Re:Better than Beowulf for normal use... by ERJ · · Score: 5, Informative

      Mosix is nice, because it treats the cluster like a single, large, multi-cpu box by simply allocating threads to different boxes. The nice thing about this is that any multi-threaded program can take advantage (as stated in the parent post).

      However, this also can cause problems. Most threaded programs are written assuming that all the threads have high speed (i.e. system bus / cpu cache) access to shared information. When we introduce the latency incurred by a network, this can cause programs to run alot slower then they would if they simply had all the threads on a single box. Obviously, it all depends on how the program was written, and what it does.

      If you are writting a program specifically for a cluster, I would suggest instead looking at something like LAM-MPI. This allows for a much more controlling approach to be taken. It is more work (you have to decide how the work will be split) but it allows for much better control of where and what is being done and how to optimize it.

    3. Re:Better than Beowulf for normal use... by battjt · · Score: 2, Informative

      Threads can't be migrated. Only processes can be migrated.

      http://howto.ipng.be/openMosixWiki/index.php/App li cations%20using%20pthreads

      You have to write your application as a bunch of processes to take advantage of a mosix cluster.

      Joe

      --
      Joe Batt Solid Design
  6. Rocket Haid... by poptones · · Score: 4, Funny
    Now those obsessed geniuses have even more reason to forget to change the oil in their cars...

    (Inside joke for my ol' friends at ORNL...)

  7. SCO and Microsoft reactions? by mao+che+minh · · Score: 5, Interesting
    I wonder what kind of FUD Microsft and SCO will cook up to try to thwart this new display of raw power. McNealy seems intent on not only winning the Asshat award, but outright retiring it in his honor.

    It's funny that Microsoft always tries to downplay Linux's enterprise capabilities, when Linux has been scaled to far more power then Microsoft's best offering for years now. Windows 2003 is a clumsy, bloated, closed source chunk of green crap.

    1. Re:SCO and Microsoft reactions? by cgb8176 · · Score: 2, Interesting

      It's funny that Microsoft always tries to downplay Linux's enterprise capabilities, when Linux has been scaled to far more power then Microsoft's best offering for years now.

      RTFA. They are using this machine for research in the "sciences, clean energy management and production, environmental protection, and homeland security."

      It's not a web server, and it isn't demonstrating "enterprise capabilities." Windows has never been intended for, or used for, scientific computing on a large scale.

  8. Best of both worlds... by goats_in_boats · · Score: 2, Funny

    ...now you get obscene frame rates on quake III while searching for those pesky pockets of natural gas!

  9. Yanking from my journal entry of 6/30/03 by anzha · · Score: 4, Informative

    HPC Wire had an article that I referenced in my journal on 6/30.

    It's an interesting machine. I'd love to get one to play with. I'm sure our benchmarkers will have some even more interesting comments once they're done. Expect teething problems, folks. Systems of this size and complexity take time to break in.

    --
    Do you know why the road less traveled by is littered with the bones of the unwary?
  10. lites by NetMagi · · Score: 3, Funny

    makes me just wanna turn off the lights and look at all those LED's blinkin!

    1. Re:lites by CoolVibe · · Score: 5, Interesting
      I've experienced it the other wat around once. At some previous $workplace, we had this humongous SGI Origin 3800 cluster. Due to a city-wide brown out, and due to the fact that we were just installing the diesel-powered generators, the thing had to survive for a couple of hours on the nobreak. Sure, all the lights in the building were out, but the behemoth was still churning. We (the venerable sysadmins) were trying to decouple a partition so we could hook up a console to ot to bring down the thing gracefully. Of course, that wasn't that easy.

      Suddenly the nobreak was all out, and the billion dollar machine went *poof* - down. Damage? A couple of SCSI disks, but of course everything was mirrored and had parity so even with the damaged disks, there was no data loss.

      Then (after a few hours) the powerfaillure ended, the lights went back on in the building, but the lights on the big cluster were still off. The other way round than you'd like to see. Although, when the building power was out, and the nobreak for the machine was active, it sure was a pretty sight. Although, with the impending doom, I didn't really have time to appreciate it.

    2. Re:lites by Leebert · · Score: 4, Informative

      the billion dollar machine

      What the hell kind of Origin 3800 do YOU have? ISTR ours (512-proc) was roughly $10M.

    3. Re:lites by green+pizza · · Score: 3, Informative

      SGI Origin 3800 cluster

      Just to nitpick... most Origins are not clusters but rather one large single machine. It is possible to partition the machine in firmware and have each partition talk to others over the existing (and now unused) numalink interconnects... but it's much faster (even for plain MPI code) to just run the beast as one large single machine.

    4. Re:lites by Anonymous Coward · · Score: 3, Informative

      The machine has 1024 procs

      There are two 1,024-processor Origin 3000's in the world. One is in Eagan, Minnesota. The other is at NASA. The NASA machine is called chapman. It has 256 GB of RAM. Not terabytes.

      How do I know this? Because I'm sitting here looking at lomax right now.

      You're a... whaddya call it. Liar.

    5. Re:lites by CoolVibe · · Score: 2, Informative

      Oh, I found a little page on the sara website where it is clarified (can't get onto the intranet anymore, else I'd have mirrored some better specs). Anyway, more about TERAS here.

  11. Obligatory: Mods on Crack! by panda · · Score: 5, Funny

    OK, so the moderators are on crack today. What's with all these obviously "funny" posts getting moderated as "insightful?"

    Guess it's time to meta-moderate!

    --
    Just be sure to wear the gold uniform when you beam down -- you know what happens when you wear the red one.
    1. Re:Obligatory: Mods on Crack! by darkov · · Score: 5, Funny

      OK, so the moderators are on crack today.

      Only today?

  12. Oops (RTFA) by Anonymous Coward · · Score: 5, Informative

    The machine has 256 processors for 1.5 teraflops, not 64.

  13. So... by Mipsalawishus · · Score: 3, Funny

    How hard would it be to /. one of these things??

    1. Re:So... by ocelotbob · · Score: 3, Insightful

      Depends on the bandwidth into the machine more than anything else. Most /.ings, unless the database explodes into a shower of sparks, are limited by the bandwidth of the machine more than anything else. It'll be quite easy to /. it if it's only got a T1 or so. If it's got a 10Gb connection or two, I'd imagine that the system load wouldn't even be noticed.

      --

      Marxism is the opiate of dumbasses

  14. Conversion scale? by jackDuhRipper · · Score: 5, Funny

    What's that in bogomips?

  15. SGI: Unsung coorporate heros ? by Anonymous Coward · · Score: 5, Interesting

    Perhaps this will finally get SGI's Open Source Software efforts in the spotlight.So far every other major hardware vendor has jumped the bandwagon making a lot of noise, and trying to get free publicity. SGI however has always quietly contributed large amounts of knowledge but always in a modest or even shy way (sometimes even publicly denying involvement, but working in secret :) ).
    In the meantime their additions have contributed quite a bit to open en free thinking in software, take OpenGL and open Inventor, or even to the kernel directly as with the XFS filesystem.
    I always liked this approach more than the hyping others have done with linux, but unfortunately this has kept them unadorned within the community. With the Altix cluster (as with their GNU/Linux workstations,which unfortunately failed) I think they have shown that they put their money where their mouth isn't.

    I think it's only fair that when we are talking about the large coorporate players in the OSS field SGI at least deserves a footnote for their efforts instead of just hammering exclusively on IBM,Sun etc..as the great backers.

    I know, I know. It's a coorporation, so they inherently put money over freedom, it's just something I noticed because of the lack of their name in any high-profile discussions, which I think is unfair.

    1. Re:SGI: Unsung coorporate heros ? by jd · · Score: 2, Interesting
      One thing I've considered for some time is a "league table" of companies involved in "Open Source Software".


      The table would record the number of packages released, the number of patches, and the licenses used for each. Originally, I was going to make a four-way split - open-source packages or patches, and packages/patches for open-source OS'.


      From this, you could create some kind of scoring system, and thus compare the "open-sourceness" of companies. (From the above, it should be obvious that I consider promotion of an Open Source OS to be important, whether the company actually releases any Open Source code itself or not.)


      In every league table I've drawn up - I've just not had the time to complete or maintain such a table - IBM ranks first, and SGI is second. The gap is surprisingly close between these two. No other major corporation even comes close.


      If you want to understand why SGI ranks so highly, look at their oss.sgi.com site, under projects and also under propack. (Propack is the collection of a lot of their Linux-specific code, including XFS for Linux.)


      Between Propack, OpenGL, GLX, Open Inventor, Coin, their OB1 code dump, their Apache 1.3.x acceleration patches (which apparently resulted in a political war between the Apache group and SGI), Rhino, their patches for Mozilla, plus all of their Open Source code for IRIX, there can be no serious question that SGI has done a lot.


      (Patches SGI used to support, but dropped, include AIO - Asynchronous I/O, and Scheduled Transfer Protocol. Both were for Linux.)


      In comparison, Compaq has a patch and a package for clustering, HP has a plug-in scheduler system, and The Open Group now provides a version of Motif for Linux.


      Hmmm. Yeah. Even with Compaq's take-over of HP, SGI are still so far ahead of the game that it's not funny.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  16. And just how did they accomplish this... by Kevinv · · Score: 4, Funny

    without SCO's help?

  17. uh huh by notque · · Score: 2, Insightful

    Mod parent +1 Funny!

    --
    http://use.perl.org
  18. Re:Hey, at least it's not running IRIX by Chicane-UK · · Score: 4, Informative

    Um..

    I always liked Irix, and everyone I ever talked to who used Irix liked it. The GUI is about 500x more usable than the horrors of OpenWindows or CDE on Solaris.. bleugh.

    --
    "Hey! Unless this is a nude love-in, get the hell off my property!!"
  19. Re:64 processors = 1.5 Cells by AmishSlayer · · Score: 2, Informative

    What I find amazing is that the Cell is supposed to run up to a TeraFlop when it reaches production. That compared to a 64 processor Linux cluster.

    thats 64 processors per node

  20. Setting one up now by jimshep · · Score: 5, Informative

    We just got ours installed yesterday. I'm still installing software and am starting benchmarks. It's only the deskside version (12 cpus, 24GB RAM, 1TB disk), but still more powerful than the 4-cpu SGI Origins that we have been using.

    It is the first one that the regional SGI reps had actually installed, but since it is almost exactly the same as the MIPS-based origin 3000 servers (with the exception of the obviously different Itanium 2 cpus and supporting chipsets), they ran into almost no problems getting it online. I have also been suprised as to how many commercial codes have already been ported to the platform.

    The main reasons we purchased this machine is for the ease in parallelizing code and the floating point performance of the Itaniam 2 cpus. We're computational materials engineers and the less time we have to spend optimizing codes so that the nodes of a cluster are always kept busy and minimizing I/O bottlenecks gives us more time to concentrate on the theoretical issues.

    It runs RedHat 7.2 with some tweaks by SGI called SGI ProPack. The Propack modifications come on separate CDs, with the proprietary software on separate CDs from the open source software. So far, from the command line, everything works just like my PC. It's kind of strange running Linux on a >$100K machine, but it sure beats dealing with the annoying differences between IRIX and Linux. Now to see if it performs as well as we expect...

    1. Re:Setting one up now by Jaeger- · · Score: 2, Funny

      Don't you know you can't follow "only" with the words "12cpu, 24gb ram, 1tb disk"...

      --
      E V E R Y T H I N G I W R I T E I S F A L S E
  21. Or, Try Quantix, which comes with some apps by coyote1 · · Score: 2, Informative

    or, try Quantix, which is derived from cluster knoppix. A self-booting ISO with data analysis software, based on Knoppix. This is geared more for scientific apps; it doesn't come with open office, etc, which cluster knoppix does.

    --
    Eat Lamb, 1 million coyotes can't be wrong
  22. Benchmarks I can Understand? by Greyfox · · Score: 2, Funny
    How about the Quake 3 framerate?
    Kernel compile time with make -j?

    Hmm, what are some other good ones?

    --

    I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

  23. Mosix... by wowbagger · · Score: 2, Informative

    The thing about Mosix is the costs of process migration.

    First, you have to understand process migration. In a mosix cluster, a running process can be moved, lock stock and barrel, from one CPU to another. All that is left behind is a "stub" process that forwards all file I/O across the network to the new location. So, if the program was a 3D raytracer that had the source description file and the output file open, after migration all file accesses to those files would be forwarded over the network to the stub (since you cannot guarantee that the remote machine can access those files in the same way.)

    Now, this is great for programs that do little file I/O but lots of computing (for example the ray tracer I just described.)

    However, the process must be set up on the local node first, then migrated. If the process has a 3 G core image (is taking up 3G of memory), then 3G of stuff has to be shoved across the wire, while the program is frozen. Thus, migrating a process is expensive.

    Now, if you have a bunch of long-running compute bound processes this is a net win (for example, rendering a movie might benefit). But something like building the Linux kernel won't benefit, since what you have is a bunch of short running, high I/O jobs.

    We have a Mosix cluster at work. I tried using it as a compile farm, and the results were disappointing. Not surprising - I was NOT using it for what it was designed for.

    However, if we can ever get the FPGA synthesis tools running natively under Linux, the hardware types are going to be quite happy....

  24. They must be running their web server on it. by twoslice · · Score: 2, Funny

    'Cause it is surviving a /.ing with a Flash intro even!

    --

    From excellent karma to terible karma with a single +5 funny post...
  25. Re:Hey, at least it's not running IRIX by platypus · · Score: 3, Funny

    I hated it, if it helps.

  26. Re:Hey, at least it's not running IRIX by the+gnat · · Score: 2, Interesting

    I always liked Irix, and everyone I ever talked to who used Irix liked it. The GUI is about 500x more usable than the horrors of OpenWindows or CDE on Solaris.. bleugh.

    I vastly prefer 4DWM to GNOME or KDE as well. I'm helping a coworker set up a Dell inspiron 7500 (P3-700) with Linux, and he immediately complained that KDE was far too slow. I switched to WindowMaker, and he immediately noticed the difference. This is a three-year-old machine, with tons of memory and a reasonable processor, and it crawls with KDE3. Pathetic.

    Meanwhile, you can run the latest version of Irix on a seven-year-old SGI box (and even older) and it'll still be smooth. My Indy at home feels just as responsive as any PC I've ever used. I wouldn't call it *fast* by any stretch of the imagination, but the OS alone does not cripple the computer. I'm a huge Linux fan, but there are tons of examples like this where it just hasn't caught up to the more polished offerings.

  27. I'm offering a Bounty to all posters by hellfire · · Score: 3, Funny

    $500 for the scalp of anyone who says the words "Beowulf" and "cluster" in the same post in response to this article.

    --

    "All great wisdom is contained in .signature files"

  28. Re:distributed shared memory by green+pizza · · Score: 2, Informative

    You can find a list here. For most computations and most hardware, you are probably still better off with MPI or PVM rather than shared memory.

    Note also that there are several high speed interconnects for Linux clusters available from many different vendors, including InfiniBand, Gigabit Ethernet, FireWire, and Myrinet.


    SGI systems (Origin and Altix) have massive interconnects that hold together the single-system architecture. They're fast for shmem-type shared memory apps, but also for MPI. In fact, SGI keeps tweaking their MPI implementation with every release of IRIX and the Linux ProPack, even though MPI is not the "best" way to run apps on their systems.

    The interconnects in most Origins and Altix systems are 3.2 gigaBYTE per second with extremely low latency. I don't know about Infiniband, but I do know that GigE is only 125 MB/sec with really high latency... FireWire 800 is 100 MB/sec with better latency.... and I think the bst version of Myrinet is 500 MB/sec (4 gigabit) with about 5x the latency of SGI's 'numalink'.

    The smaller Altix systems (and supposedly, future Altix and Origin systems this fall) can be double cabled or can run at a higher speed... for 6.4 gbyte/sec per interconnect.

    Also, the Altix can handle up to 64 processors per single machine / single node (or 128 with a very beta set of patches). The cluster in the article is actually four Altix systems, each with 64 processors. The Origin 3800/3900 can handle 512 processors per node (or 1024 with a special "XXL" IRIX kernel).

    Great stuff for I/O intensive tasks, but massive overkill for 3d rendering or calculating pi.

  29. How to get 2+ TeraFlops from Linux by tobiashm · · Score: 2, Informative

    This does not seem to have been mentioned before:
    Niflheim at Danish University of Technology