Slashdot Mirror


Linux Supercomputer Wins Weather Bid

Greg Lindahl writes "The Forecast Systems Laboratory, a divison of NOAA, selected HPTi, a Linux cluster integrator, to provide a $15 million supercomputing system during the next 5 years. The computational core of this system is a cluster of Compaq Alphas running Linux, using Myrinet interconnect. Check outwww.hpti.com for information on the company. "

22 of 115 comments (clear)

  1. Linux Not Useful For All Superclustering Tasks by LHOOQtius_ov_Borg · · Score: 2

    I work at a company that is working on a very complex artificial intelligence architecture, and for a variety of reasons it is written in Java (since the other most popular AI languages use VMs or are directly interpreted, expect the AI community at large to want good interpreters on Linux).

    We looked at putting together a Beowulf Linux cluster to run our software, which is very memory and processor intensive, but Linux could not do the job because JVMs on Linux are absolutely terrible. We wound up on WinNT (we couldn't afford Suns, but plan to upgrade when we can) because the JVMs were the best.

    Because people making large software systems are fed up with reengineering for new hardware, expect other people to start choosing Java for large, intensive applications that were previously written in C, Fortran, C++, etc.
    If Linux can't compete with other OSes for running large Java programs, these projects will not be able to consider Linux as their OS of choice (which we all WANTED to do here, we were very upset to go to NT).

    Right now the fastest Java environment we've found is Java 2 with HotSpot, running on NT (we're testing Solaris now, as we might be able to afford Suns soon). Can the Linux community do any better, or even as well? So far, no.

    --
    o/~ we are pissed, we are pissed, we have to resist... o/~ - ec8or
    1. Re:Linux Not Useful For All Superclustering Tasks by G-Man · · Score: 2

      Gotta admit I'm a little confused here. If you have a computationally intensive task, why would you ever want to run it through any VM or interpreter? Granted, LISP may run through an interpreter during the development phase, but you can always compile it to get better speed in the final product.

      Can you elaborate without giving away the company secrets?

    2. Re:Linux Not Useful For All Superclustering Tasks by Anonymous Coward · · Score: 2
  2. attn: moderator - follow the link! by LocalYokel · · Score: 2

    I'm crying foul on the moderations I've been given on this story. It's true that the government finds ways to mess things up, e.g. crypto laws, software patents, etc.

    M2 has seemed to make moderations a bit more accurate, but I don't see it working out for me here. Unless somebody actually goes to the page and sees what I'm talking about -- "Alpha" in ten hours, and the EV series are cranking out units faster than LensCrafters...

    I didn't make up those "CPU's". They are actually listed on the page! Please follow the link and see for yourself.


    --
    --

    --
    E2 IN2 IE?

  3. Re: Solving PDEs by coyote-san · · Score: 2

    Did I mention another of my graduate classes was chaotic dynamics? :-)

    The very definition of "chaos" is high sensitivity to changes in the initial conditions. If a weather front appears in the same place (within the resolution of the data grid) on all 120-hour forecasts despite a reasonable variation in the initial conditions, you can be pretty sure it isn't in a chaotic realm and your forecasts will be fairly accurate.

    On the other hand, if a modest amount of variation in the initial conditions result in wildly different predictions, the system is obviously in a chaotic realm and you can't make decent predictions.

    As odd as it sounds, for something as large as a planetary atmosphere it's quite reasonable for parts of the system to be chaotic while other parts are boringly predictable. That's why they were starting to compare the predictions from different models, the same models with slightly different initial conditions, etc. That might give the appropriate officials enough information to decide to evacuate a coastline (at $1M/mile), or to hold off another 6 hours since the computers predict the storm will turn away.

    P.S., the models do make mistakes, but fewer than you might expect. It's been years since I've thought about it, but as I recall most models work in "isentrophic" coordinates and are mapped to the coordinates that humans care about at the last step. The biggest problem has been the resolution of the grids; when I left I think the RUC model was just dropping to 60km; by now it's probably 40 or 30km. To get good mesoscale forecasts (which cover extended metro areas, and should be able to predict localized flooding) you probably need a grid with 5 or 10 km resolution.

    --
    For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
  4. Why Alpha's??? by Stink+Juice · · Score: 2

    I am curious as to the what the determining factors were for selecting Alphas over Pentium-based systems.

    I've installed Linux once on an Alpha box and the BIOS is truely impressive, much better than PCs. But what are some of the other reasons? Wider data/cpu buses? Larger memory configurations?

    Anyone who actually uses Linux on Alphas is encouraged to reply.

    1. Re:Why Alpha's??? by Panaflex · · Score: 2

      The 21264 is just better architecture all around. First of all, everything is 64 bit. Secondly, the FP is 10 times faster than the current P3. Thirdly, Compaq now has recently released compilers for Linux that provide an optimal 30% speed increase.

      Probably the best thing is that engineers like alphas, and they like linux.

      Pan

      --
      I said no... but I missed and it came out yes.
    2. Re:Why Alpha's??? by Panaflex · · Score: 3

      Oops!! Sorry! 10 times faster is a wrong. True specs are: (from www.spec.org)

      (UP2000 21264 667MHz -Alpha Processor Inc)
      53.7 SPECfp95
      32.1 SPECint95

      The P3 is

      (SE440BX2 MB/550MHz P3 -intel)
      15.1 SPECfp95
      22.3 SPECint95

      --
      I said no... but I missed and it came out yes.
  5. Re:Ok, here's your chance... by marbike · · Score: 2

    This kind of system would be great in helping to minimize the damage to life and property in the tornado ravaged areas of the Midwest. Having recently wittnesing a tornado for the first time (In downtown SLC no less) I have a new interest in tech like this.

    --
    it is better to light a flame thrower than curse the darkness. -Terry Pratchett Men at Arms
  6. Why Alpha's? Screaming FP performance, that's why by Troy+Baer · · Score: 3

    I've installed Linux once on an Alpha box and the BIOS is truely impressive, much better than PCs. But what are some of the other reasons? Wider data/cpu buses? Larger memory configurations?

    The big thing about the Alpha for people like NOAA (who run big custom number-crunching apps written in FORTRAN) is its stellar FP performance. A 500MHz 21264 Alpha peaks at 1 GFLOPS and can sustain 25-40% of that, because of the memory bandwidth available. A Pentium III Xeon at the same clock rate peaks at 500MFLOPS and can sustain 20-30% of that.

    That doesn't fly for everybody, though. Where I work, we have a huge hodgepodge of message-passed, shared-memory, and vector scientific codes, plus needs for some canned applications that aren't available on the Alpha. We picked quad Xeons for our cluster and bought the Portland Group's compiler suite to try to get some extra performance out of the Intel chips.

    --Troy
    --
    "My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
  7. Go to this link by aheitner · · Score: 3

    General Processor Info.

    Compare the SPECfp scores of high-end Intel and Alpha offerings. Take a look at a 600MHz PIII Xeon and a 667MHz Alpha 21264.

    The reason to choose Alpha should be obvious.

  8. Re: Solving PDEs by coyote-san · · Score: 4

    I worked at FSL for several years, although on a different project. I knew people working on the weather models, and I took a class on parallel processing from the CU professor who shared the old Paragon supercomputer with NOAA. I even had an account on the Paragon briefly (for that class) after leaving NOAA.

    NOAA needs to solve partial differential equations (PDEs). A *lot* of PDEs. My class spent a lot of time on solving numerical methods, and my entire undergraduate class in the early 80's was covered in the first lecture of my graduate class a few years ago. My Palm Pilot, running multigrid analysis, could beat the pants off a Cray-XMP running the best known algorithm from 15 years ago.

    AI programs may not scale well, but the type of work done at NOAA *does*. Furthermore the hot topic a few years ago was applying some ideas from chaos theory to weather forecasts - take a dozen systems, insert just a little bit of noise into the initial data (essentially, instrument noise in your observations), then let them all run. If all models show the same weather phenonema, you can be pretty sure that it will occur. If the models show wildly different results (e.g., Hurricane Floyd slams into Key West in one run, but NYC in the other) you know that you can't make any firm predictions. As an educated layman's guess, I expect that the reason the hurricane forecasts are so much better than just a few years ago is precisely this type of variational analysis.

    --
    For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
  9. Re:Why Alpha's? Screaming FP performance, that's w by Troy+Baer · · Score: 3

    If the G4 can sustain >1gflops, then why not build a cluster of G4s running LinuxPPC?

    I'm not convinced the G4 can sustain 1 GFLOP/s in any kind of real calculation -- it simply doesn't have enough memory bandwidth. The G4 uses the standard PC100 memory bus, AFAIK. That's 64 bits wide running at 100MHz = 800MB/s peak. So without help from the caches, the absolute best you can do is on *any* PC100 based system is 200 MFLOP/s using 32-bit FP or 100 MFLOP/s using 64-bit FP. In practice you can only sustain about 300-350 MB/s out of the PC100 memory bus, so things get even worse. The caches will help quite a bit (maybe a factor of 2-4), but I have trouble imagining the G4 being able to sustain over 500 MFLOP/s even on something small like Linpack 100x100 because of the limited bandwidth and latency of the PC100 bus. Other processors that have similar peak FP ratings have much higher memory bandwidths; we've benchmarked an Alpha 21264 (1 GFLOP/s peak, ~400 MFLOP/s sustained) at about 1 GB/s memory bandwidth (that's measured, not peak), and a Cray T90 CPU (1.8 GLOP/s peak, ~700 MFLOP/s sustained) at 11-13 GB/s (again, measured not peak).

    There's also the question of compilers. You have to have a compiler that recognizes vectorizable loops and generates the appropriate machine code to use the vector unit. Unless Motorola's feeling *really* magnanimous, I don't see that kind of technology making it into gcc (and g77, more importantly for scientific codes) any time soon. Otherwise, you're at the mercy of a commercial Fortran compiler vendor like Portland Group or Absoft. PGI hasn't shown any interest in PowerPC to this point, and Absoft currently does PPC compilers only for MacOS 8, not OSX or LinuxPPC.

    I'd love to be proven wrong on this, but based on my experience I don't see how you could do it.

    --Troy
    --
    "My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
  10. Great. I still wonder about the compilier though by winnt386 · · Score: 2

    A friend of mine tried linux alpha and the performance was quite bad. After a posting at a newsgroup she found out that the gcc compilier is not optimized for alpha. She had to buy an expensive c/c++ one for the alpha box and then after a recompile the performance was great. I wonder how hard it was to get the cluster going wiht the compilier issue. I would hate to make 80 diskettes for all the machines because of licensing issues with the compilier. I heard alpha linux lacked some features of the standard intel one. Is this true or was it refering to the unoptimized compilier that comes with alpha redhat linux?

    --
    "Never stick an electrical appliance down your pants." -Tim Allen
  11. Toy Story: The Beowolf Cluster by WillAffleck · · Score: 2

    AC said: "Either way, you could make Toy Story in about 10 minutes on this thing once it's up."

    Yes, but what kind of plot? Would it be Woody and Mr. Potatohead lost in a hurricane with a large number of penguins?

    Raw power is cool, but art takes a bit more than that.

    --
    Will in Seattle
  12. Linux Supercomputer Wins Weather Bid by dkh2 · · Score: 2
    Looks like the folks at NOAA are shooting to confirm what we already know, and what Microsoft is hoping to learn if they can ever get Windows ported to a 64bit system. A 64bit capable OS (Linux) on 64bit iron (Alpha) absolutely SCREAMS next to the identically clocked, sameinallotherrespects 32bit system running the 32bit version of said OS.

    Remember those vast performance diffs between the 80386SX-16 and the 80386DX-16? That's what we got here.

    7lt;Note-to-Microsoft> Nanny-nanny-nah-nah, our OS runs on IA-64 and yours won't.7lt;/Note-to-Microsoft>

    D. Keith Higgs
    CWRU. Kelvin Smith Library

    --
    My office has been taken over by iPod people.
  13. Infrastructure costs (Re:BEOWOLF!) by Troy+Baer · · Score: 2

    For $15,000,00 to buy an Alpha Beowolf, it sounds like they might have 2,500 nodes with a 'decent' Alpha system. But if they go really high end, they'll have about 750 nodes (For the 'killer' $20,000 Alpha machines).

    That doesn't include the cost of the Myrinet cards and switches, racks, 3rd party software, support people, power, cooling, etc. Believe me, if you're paying $15M for a machine, part of it better be going for support personnel and infrastructure. The configuration's probably more like 250-500 nodes with a corresponding number of Myrinet cards and switch ports, 30-75 racks (8 nodes/rack if you're lucky), a *buttload* of power and air conditioning, and 2-5 onsite support people working in it full time.

    --Troy
    --
    "My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
  14. The right tool for the right task ... by LL · · Score: 3

    Buying the hardware is only 15-30% of the total cost. Also, in a production environment, you should not be fixated by the CPU. The question should be, within the capital budget, what is the best combination of resources that maximises the effectiveness of achieving your mission.

    To give you some real-world experience, a group I'm working with is looking at continential-scale simulation at a 5km resolution with the aim of going down to 100m. Now despite what most people think, the bottleneck (in this example) is in fact the I/O, with estimated total requirements of 30 TBytes. Doing the sums show that to keep up with the CPU (say hypothetically 1 run/24 hours), you would need average throughput of 350 MByte/sec. Hardware that supports both this volume and capacity is NOT cheap. We would joke that we paid x million for the I/O and SGI would throw in the Cray for free :-).

    Now as for how an Alpha cluster could be used, it would fit very nicely into the dedicated batch box category. It has a very high CPU rate and some decent compiler optimisation. As such it would augment whatever existing environment exists, reducing the workload of the more expensive machines for development which generally have better tools (just you try debugging a multi-gigabyte core dump). The biggest problem nowadays is not the algorithms, but managing the data traffic to the CPUs and this is where Linux clusters are weak with relatively slow interconnects, unbalanced memory hierarchies, and cheaper but higher latency memory. You have to accept the disadvantages and shift jobs which are not suited for this architecture off. A bit of smarts goes a long way in stretching the budget.

    LL

  15. Re:will it live up to expectations? by quade]CnM[ · · Score: 3

    This is not true of Masively Parallel Systems such as boewulf. The problem with Linux and scalibility is more of a hardware problem then a software problem. While you aren't going to put Linux on a Sun E10k anytime soon, it was never ment to be on such a large SMP machine. The Intel SMP artecture is flawed in design. All processors share the same buss. Therefore if one processor can sustain 300M/sec of transfer, and you have 4 processors. That 800M/sec buss is going to slow down. now your processors are only 2/3 as efficient as they are in a single system. But you are probably going to be slower then this because most RAM has a sustained transfer rate of only 150-200M/sec. so you only use 1/2 of the processor.


    To fix this, you use 2 processor busses, and 2 memory busses. you fill these up, and you get 4 processor busses and 4 memory busses. now you need to connect these buss segments. You have several options. First, connect them within the same machine. This is what NUMA is. the other route is to put each bus in a seperate machine, each machine running a copy of the kernel localy, and connect each box together with a fast network. This is what boewulf is.

    To give you an example. think of a highway system. If you have a lot of traffic switching lanes(busses) constantently, then it would be best to build one big 20 lane highway(NUMA). but if all the traffic basicaly keeps in its own lane, without much need to switch lanes(Inter Process Comunication) then it may be more economical to build 10 2lane highways(boewulf).


    Infact ins't a cray T3E more of a boewulf type cluster of closly nit machines then a NUMA. I think each node on a T3E runs a local copy of the micro-kernel.

  16. Why digital??? by Anonymous Coward · · Score: 2

    Heck, My research center has Vacuum tubes computer that is faster than ASCI Red + All the flavors of Blue (9000 PPro + 6000 MIPS + 2000 Power3) You see, the trick is in the implementation. If you take 1 wavelength of an analog signal, there could easily be 100,000,000 discrete levels(especially with a 10,KV plate voltage.) Fine tuning of the voltage differentiation amplifier would probably quadruple the speed even more. Now we only have to upgrade the holographic scanner for the punchcard readers.

    Forget about any of these digital OS, we even implemented our own ANALinux, which used OS technology that was originally implemented for the quantum computers that is slow to come about. Except for the fact that probability wave algorithm in the kernel was reimplemented with the electron wave method(more descrete.)

    We can't open source it yet, since the whole kernel runs via negative feedback, so it is constantly being upgraded. We could take a snapshot of the loaded kernel image by detaching all the ferrule doughnuts at the same time, but the source would all be in analog stream and useless unless you have another valve box.

    It easily interfaces with outside systems even though it is 100% analog inside due to the (ported) quantum kernel's interface, which utilizes the duality of the wave and sends discrete signals to outside the box. The only problem is the primitiveness of current technology. Since petabit networking has not been implented, we basically watch the tube's change in brightness as I/O. Current internet access by outsiders is via out webcam pointed at the tubes.

    This OS is totally unhackable since nowbody know how to hack it. Input is vial variosistors instead of toggle switchs, so all the script gramps who hacked their way into Univacs would not know how to break in.

    So all you digiphiles, put you toys down and use the computer that work like the way humans do.

  17. Re:will it live up to expectations? by Chalst · · Score: 3

    When you talk of linux's problems with mulitple processors, I think that you are referring to its limited SMP capacity.

    SMP (Symmetric Multi- Processing) is fundamentally different to clustering, as all of the processors in an SMP configuration share the same memory bus, whilst in a cluster the machine architectures are distinct, and we use a high-speed network to exploit parallelism.

    See the Linux Parallel Processing HOWTO for more information.

  18. Re:Why Alpha's??? Now this is better by Anonymous Coward · · Score: 2

    Well, 10x could be true for the code these guys may be running. (spec is not everything, this is very important for Memory Intensive code). Take a look at STREAM, (memory bandwidth bench) PIII ~ 300MB/s Alpha DS20 ~ 1300MB/s And since these systems use EV6 "buses" each processor gets all that bandwidth to its self in multiprocessor systems. But back to spec, here are some more numbers Published results at www.specbench.org (Compaq XP1000 667 Mhz) 65.5 SPECfp95 37.5 SPECint95 (Compaq GS140 700 Mhz) 68.1 SPECfp95 39.1 SPECint95 Informal results (www.novaglobal.com.sg) (These systems have better memory systems than those above) (AlphaServer DS20 667 Mhz) 72 SPECfp95 38 SPECint95 And you can get a well equiped system (DS10) from www.dcginc.com for only $3500.