Slashdot Mirror


Linux Supercomputer Wins Weather Bid

Greg Lindahl writes "The Forecast Systems Laboratory, a divison of NOAA, selected HPTi, a Linux cluster integrator, to provide a $15 million supercomputing system during the next 5 years. The computational core of this system is a cluster of Compaq Alphas running Linux, using Myrinet interconnect. Check outwww.hpti.com for information on the company. "

115 comments

  1. Linux Not Useful For All Superclustering Tasks by LHOOQtius_ov_Borg · · Score: 2

    I work at a company that is working on a very complex artificial intelligence architecture, and for a variety of reasons it is written in Java (since the other most popular AI languages use VMs or are directly interpreted, expect the AI community at large to want good interpreters on Linux).

    We looked at putting together a Beowulf Linux cluster to run our software, which is very memory and processor intensive, but Linux could not do the job because JVMs on Linux are absolutely terrible. We wound up on WinNT (we couldn't afford Suns, but plan to upgrade when we can) because the JVMs were the best.

    Because people making large software systems are fed up with reengineering for new hardware, expect other people to start choosing Java for large, intensive applications that were previously written in C, Fortran, C++, etc.
    If Linux can't compete with other OSes for running large Java programs, these projects will not be able to consider Linux as their OS of choice (which we all WANTED to do here, we were very upset to go to NT).

    Right now the fastest Java environment we've found is Java 2 with HotSpot, running on NT (we're testing Solaris now, as we might be able to afford Suns soon). Can the Linux community do any better, or even as well? So far, no.

    --
    o/~ we are pissed, we are pissed, we have to resist... o/~ - ec8or
    1. Re:Linux Not Useful For All Superclustering Tasks by Anonymous Coward · · Score: 0

      That is insane! You need beowulf clusters to
      run *JAVA*! Why not help out the kids writing a java compiler you'll get speedup factor of 20 from that!

      -avi

    2. Re:Linux Not Useful For All Superclustering Tasks by William+Tanksley · · Score: 1

      Sometimes the task may be complicated enough to require the resources of the entire VM, so compiling wouldn't bring any improvement.

      However, native code is always an improvement (instead of bytecode).

      Makes me wish Juice had been more sucessfull... (Juice was/is the platform-independant binary format used for Oberon; its loader translated it quickly to native code). The current equivalent of Juice is ANDF.

      -Billy

    3. Re:Linux Not Useful For All Superclustering Tasks by G-Man · · Score: 2

      Gotta admit I'm a little confused here. If you have a computationally intensive task, why would you ever want to run it through any VM or interpreter? Granted, LISP may run through an interpreter during the development phase, but you can always compile it to get better speed in the final product.

      Can you elaborate without giving away the company secrets?

    4. Re:Linux Not Useful For All Superclustering Tasks by Anonymous Coward · · Score: 0

      Solaris runs on x86 too, did you look into that?

    5. Re:Linux Not Useful For All Superclustering Tasks by Anonymous Coward · · Score: 2
    6. Re:Linux Not Useful For All Superclustering Tasks by C.Lee · · Score: 1

      >Gotta admit I'm a little confused here. If you have a computationally >intensive task, why would you ever want to run it through any VM or >interpreter?

      The answer is you *DON'T*. This is basically crap from the JAVA crowd trying to pretend that JAVA is actually something you'll actually want to use in the real world. The Amiga Arexx crowd used run around pulling the same kind of stunts too. I wouldn't be too surprised to discover if in fact a large number of the JAVA advocates posting here also ran around adovacating the use of AREXX for *everything* on the Amiga, no matter how silly it was.

    7. Re:Linux Not Useful For All Superclustering Tasks by mistabobdobalina · · Score: 1

      yeah, the anti-java bigots here ALWAYS forget about towerj

      --
      -- your knees hurt, don't they?
    8. Re:Linux Not Useful For All Superclustering Tasks by LHOOQtius_ov_Borg · · Score: 1

      Though a good number of the people who responded to this are obvious flamebaiters, I'll take a minute to follow-up anyway.

      1) We're not using Java to gain in performance, obviously, we're trying to optimize performance
      of a system already written in Java.

      2) Solaris x86 JVMs also sucked. In fact, when we made the NT decision, JVMs on Solaris SPARC AND Solaris x86 were slower than on NT. Extensive benchmarking was done, using both our software, and simple benchmark tests.

      3) Only one person suggested that maybe Linux does need a better JVM. It's ironic that the response is to attack our software (which you know nothing about), Java, and our intelligence, rather than to suggest that writing a good JVM would be useful... R&D folks are taking a liking to Java, and without a good JVM Linux will be unusable by a fair portion of the R&D community.

      4) Actually, one of our people is writing a better JVM, though obviously it will be of little use to any of you...

      5) Um, we don't need Beowulf "to run Java", we need a cluster or supercomputer to run the very complicated software we've written in Java.

      It's funny, rather than being interested in how to expand the horizons of Linux and maybe try to understand why someone would want to use a VM based language like Java, people just get all uppity. Your computing paradigm is challenged, time to get defensive...

      Whatever.

      We're doing fine without Linux, actually, I just thought maybe some other Linux folks would be interested in writing a decent JVM, but we'll do it ourselves...

      --
      o/~ we are pissed, we are pissed, we have to resist... o/~ - ec8or
    9. Re:Linux Not Useful For All Superclustering Tasks by LHOOQtius_ov_Borg · · Score: 1

      You're obviously too biased and ignorant to understand, but actually VM based systems are very useful for some real world issues such as system portability (Java runs on lots of stuff, few portability issues except with AWT UI stuff), easier verification of program correctness (pointers screw that right up), possibility of supercompilation (can't do that properly with pointers, either), etc. There is also a development time issue for very large systems, as we did not have to write our own memory management schemes, and the issue of this version of the system having been written primarily by scientists first, not programmers, making Java a good choice for ease of use.

      Java compilers (and supercompilers, which would run prior to a compiler, actually) are being developed, and while the compilers may not speed things up much, supercompilers will.

      So, if the JVMs don't totally suck, Java is about as good as C++, and only 2-3 times slower than C.
      With JNI we could rewrite very computationally intensive parts of the program in C, as well. As things like TowerJ and HotSpot are ported to Linux and other platforms, speed-ups occur there, as well.

      All in all, if you're working in C++ you can get roughly the same performance from Java... (it will require a lot of tricks to get C level performance... maybe even the Java chip... but so what? most of the system doesn't need it... many systems don't...)

      --
      o/~ we are pissed, we are pissed, we have to resist... o/~ - ec8or
  2. http://www.haveland.com/povbench/ by Anonymous Coward · · Score: 0

    Check out these pages to see what 15,000,000 USD could really do... It is quite interesting to see a el cheapo based on 96 PIII-500 wiping 48 21264-450... Both systems were running Linux...

    1. Re:http://www.haveland.com/povbench/ by Anonymous Coward · · Score: 0

      What kind of bogus test is this? Having results come out in 1 or 2 seconds does almost nothing to stress any system. Its like those old PC benchmark where everything is posted w/results like .17, .43 seconds. This is just the inverse of that. All benchmarks should run continuously (or at least for many hours) and a dataset of the result on a given timeframe should be used to measure its performance.

    2. Re:http://www.haveland.com/povbench/ by Anonymous Coward · · Score: 0

      48x 21264 ??? EV5 == 21164 !!!

  3. Re:Motorola gcc optimisation by Anonymous Coward · · Score: 0

    I think the major problem is getting GCC and PPC/Linux to show up on their radar screen. For example, Motorola *could* be turning out specs for G3/G4-based motherboards and encourage Abit and Tyan to rollout consumer-level boards that bolt into ATX boxes and use SDRAM and EIDE drives. But they don't. Getting cheap PPC hardware and optimizations for GCC are pretty much hitting the same wall...Motorola is out-to-lunch.

  4. You are just creating crap. by Anonymous Coward · · Score: 0

    Thats bull, there has been no improvements to any fields of mathematics since Pythagoras's postulations.

  5. Re:Not a beowulf? by Anonymous Coward · · Score: 0

    I believe they are planning to use the Legion system developed at the U. of Virginia. http://legion.virginia.edu

  6. Re:Why Alpha's? Screaming FP performance, that's w by craw · · Score: 1
    Regardless of what the G4 can do, it is important to remember that this is what a cluster of alphas can do today. The decision by any government agency takes time to make. Since this is a mission critical piece of equipment, I would have to believe that this is not vaporware.

    Perhaps in the future a cluster of G4's will be used. The gcc compiler should/may be generating more efficient in the future as improvements are being made. IIRC, apple is using gcc in the development of the forthcoming MacOS-X.

    Nonetheless, it is nice to see the federal government go this route.

  7. The link also shows the k7 as faster by winnt386 · · Score: 1

    Something is up with that link you gave. I know the k7 is superior then the p3 but if you compare the k7 vs the alpha you will find the k7 is twice as fast. hmmmm

    I am also looking for speed in powerpc g3 for a new powerpc linux box and the standard p3 was alomst twoce as fast. I am a former mac guy and I use to regard anything from apple in benchmarking as fact but either apple is really lieing (probably are) or this test is biased. I think you should found out what this test was trying to prove. Something is really screwed up.

    --
    "Never stick an electrical appliance down your pants." -Tim Allen
  8. Re: Solving PDEs by craw · · Score: 1
    Nice answer. I would like to add to what you said. There are many ways to solve PDE's, finite difference and finite element are the two major ones. Both can take major advantage of parallel processing systems. Essentially (or is this simplistically), one has to "solve" the PDE for that describe connecting nodes. After computing the solution for all the nodes, one then iterates and iterates. For finite difference, one has to compute until a stable solution is achieved for a particular configuration. For time dependent models, one then starts all over again for the next time increment.

    The major controlling factor is the model. For fluid dynamics, approximations are made to make the problem solvable. Stuff like, aOf course, the input parameters/data can play a major role. If the problem is chaotic, one has to run a whole bunch of scenarios to obtain a statistical model.

    My only dispute with what you said is that if the model is wrong, the results may be wrong. Running three models with limitations that yield the same result may not give you the right answer. Additionally, chaotic effects can lead to bad results.

    And to the idiot who commented about no advances in math, I would like to say that while the math (e.g., 1+1=2) may remain the same, the physical model may be different.

  9. Re:Why Alpha's? Because they are fastest by Anonymous Coward · · Score: 1

    FSL's proposal stated that Computational Performance was the primary evaluation measure.

    Scientific, vector processor tuned codes are known to run fastest on the Alpha 21264 + Tsunami memory chipset, so it is the only choice for a no-compromise, fastest computer in the world solution.

    Take a look at the benchmark numbers (albeit limited) on http://www.hpti.com/clusterweb/ for some initial results.

    Now, on the choice of Myrinet... This is a more interesting question.

    Any takers?

    No_Target

  10. How about 400 MB/s Sustained? by Anonymous Coward · · Score: 1

    Your point is well taken in that there is a need for I/O balance in all supercomputing systems due to the need to save the results, particularly in those calculations that involve dynamic phenomena, like weather. The faster the computer, the faster results come out of it.

    An enabler for cluster effectiveness is the Fibre Channel Storage Area Network, a technology that allows multiple hosts to read _and_ write to the same file at the same time at very high bandwidth.

    In fact, the I/O bandwidth of a cluster in this context is still limited by the speed of the PCI busses on one node if you are serializing the I/O to that one node. If this is the case, the XP1000 will sustain about 250+ MB/s with three-four Fibre Channel Host Bus Adapters on its two independent PCI busses. If your software can distribute the I/O to multiple nodes, like FSL's parallel weather forecasting API can (SMS), then your I/O bandwidth is essentially limited by your budget for RAID systems, Fibre Channel Switches and HBAs.

    No_Target

    1. Re:How about 400 MB/s Sustained? by Greg+Lindahl · · Score: 1

      SMS doesn't distribute I/O to multiple nodes for a single job. But the bandwidth of a single I/O node is sufficient for FSL's needs.

  11. Re:Intelligent? by Anonymous Coward · · Score: 0

    If you are using Java for performance reasons, I would suspect that your intelligence is somewhat
    artificial.

    Reimplement it in C and and save yourself a cluster.

    I have seen these waste of time AI projects before. Lots of "good ideas" implemented in a
    stupid way.

  12. Living without a clue by Anonymous Coward · · Score: 0

    After reading through all these comments, I have come to the conclusion that rather than posting
    clueless messages on slashdot, some reading may be inorder. Take a look at the Linux Parallel Processing/Beowulf Howto's, there is also a Beowulf FAQ, A "How to Build a Beowulf" book, and much more.

    One thing about the computer business is that it
    is full of people who "do not know they do not know". RTFM

  13. Re:Linux Supercomputer Wins Weather Bid by Anonymous Coward · · Score: 0

    What does that have to do with it being a 64 bit processor? MMX does some 64 bit arithmetic, and I truly dont think it makes x86 machines 64 bit.

  14. Re:Why Alpha's??? by Anonymous Coward · · Score: 0

    The BIOS is impressive?
    Linux does not use the BIOS for much more than booting the system and collecting some configuration information...

  15. A few words about WRF by coats · · Score: 1
    For the Weather Research and Forecasting Model, you might want a look at these links:
    http://www-unix.mcs.anl.go v/~michalak/ecmwf98/final.html, Design of a Next-Generation Regional Weather Research and Forecast Model

    http://nic.fb4.noaa.gov:800 0/research/wrf.98july17.html, Dynamical framework of a semi-Lagrangian contender for the WRF model

    The design is for a hybrid-parallel design, in which the model domain is a rectangular grid split up into tiles, with each tile assigned to a (potentially shared-memory-parallel) node with either message passing or HPF parallelism between tiles; each tile is then broken up into patches, with OpenMP-style parallelism on the node. The WRF is targeting resolutions better than 10 km in the horizontal and 10 mb in the vertical -- so a regional forecast can expect grid sizes on the order of 300x300 horizontal x 100 vertical x 30 sec temporal, with research applications an order of magnitude finer yet. Note that computational intensity scales with the fourth power of the resolution (because of the dt-scaling issue), whereas memory usage scales with the cube. So high resolution forecasts are very compute-intensive, and improving the resolution to what we really want can chew up all available compute capacity for the foreseeable future.

    A few other thoughts:

    1. Not only are the Alpha 264s unmatched in terms of both floating point performance and memory bandwidth (although the next-generation PPC is very good in that regard also), they are also among the best at dealing with the data-dependencies and access-latencies which occur in real scientific codes.
    2. DEC^H^H^HCompaq probably has the best compiler technology of anybody out there commercially (IBM are also very good technically, but as Toon Moene of the Netherlands Met Office put it, "XLF was the first compiler I ever encountered that made you write a short novel on the command line in order to get decent performance."
    3. Note for AC # 68 State-of-the-art weather models are not spectral models. Spectral models are appropriate only for very coarse scales at which cloud effects are only crudely parameterized (and to some extent are only appropriate on vector-style machines (and not current microprocessor/parallel) because of the way they generate humongous vector-lengths). At the WRF scales, the flow is not weakly compressible! Note that the global data motion implied by the FFTs in hybrid spectral/explicit models is a way to absolutely kill scalability for massively parallel systems. Finally, spectral models do not support air quality forecasting, such as we are doing (see http://envpro.ncsc.org/projects/NAQP/).
    4. Weather modeling is a problem which has exponentially-growing divergence of solutions (two "nearby" initial conditions lead to different solutions that diverge exponentially in time), so as coyote-san suggests, there is a tendency to run multiple "ensemble" forecasts, each of which is itself a computationally-intense problem. So far, I haven't managed to get the funding to develop a stochastic alternative (which will be a fairly massive undertaking -- any volunteers?) This means weather modeling can soak up all avaailable CPU power for the (foreseeable)^2 future. At least the individual runs in ensemble forecasts are embarassingly-parallel.
    An aside to LHOOQtius ov Borg: have you tried the GNU java compiler (now a part of the gcc system -- for the intensive apps, generating native machine code is much faster.

    Hi, Greg! Didn't know you were here!

    --
    "My opinions are my own, and I've got *lots* of them!"
  16. Not! by Anonymous Coward · · Score: 0

    EV5 = 21164 and at 450, it's going to be the old style ones. Try again. That chip came out around the same time as the PPRO 200

  17. benchmarks by Sascha+Schumann · · Score: 1
    The system is a Ruffian (21164a) at 633MHz w/ 256MB RAM, the installation is based on Red Hat 6.

    gcc is gcc version 2.95.1 19990816 (release). Compile time options: -O9 -mcpu=ev56

    ccc is Compaq C T6.2-001 on Linux 2.2.13pre6 alpha. Compile time options: -fast -noifo -arch ev56

    The benchmark consisted of running two scripts through the CGI version of PHP4. We compare user times as measured by time(1). The tests were run three times, the shown results are mean values. The scripts are available from the Zend homepage. PHP was configured with --disable-debug.


    Quicksort (script ran 50 times)

    ccc version: 27s
    gcc version: 30s

    Mandelbrot (script ran 50 times)

    ccc version: 35s
    gcc version: 39s


    The test shows that the code ccc produced was about 10% faster than gcc's. Other conclusions are left as an exercise to the reader.
    1. Re:benchmarks by coats · · Score: 1
      Quicksort...
      Mandelbrot...
      Neither of which bears the slightest resemblance to the kinds of code presently found in weather models such as MM5, nor planned for the WRF.

      If you want to benchmark, then do a meaningful benchmark!

      --
      "My opinions are my own, and I've got *lots* of them!"
    2. Re:benchmarks by Sascha+Schumann · · Score: 1

      They have absolutely no need for sorting data or for doing calculations?

  18. The Real reason they chose linux by Sorklin · · Score: 1

    Too bad Windows 2000 can't handle bad weather, otherwise it would of been the logical choice. ;)

  19. Re:Not a beowulf? by Anonymous Coward · · Score: 0

    Beowulf clusters are a concept, not a physical item...

  20. attn: moderator - follow the link! by LocalYokel · · Score: 2

    I'm crying foul on the moderations I've been given on this story. It's true that the government finds ways to mess things up, e.g. crypto laws, software patents, etc.

    M2 has seemed to make moderations a bit more accurate, but I don't see it working out for me here. Unless somebody actually goes to the page and sees what I'm talking about -- "Alpha" in ten hours, and the EV series are cranking out units faster than LensCrafters...

    I didn't make up those "CPU's". They are actually listed on the page! Please follow the link and see for yourself.


    --
    --

    --
    E2 IN2 IE?

  21. Re:An interesting observation by villoks · · Score: 1

    Linux-clusters are also getting to business solutions!
    Check out: http://linuxtoday.com/stories/10157.html

    .signature not found

  22. One more thing... by Anonymous Coward · · Score: 0

    last I heard the fastest available JVM on Intel platforms is the one made by IBM for... get ready... OS/2! Yes, I beleive it is from 15 to 25 percent faster than any JVM on NT.

    1. Re:One more thing... by TummyX · · Score: 1

      Wrong, it's actually Microsoft's JVM, as a java programmer, I can vouch for that too.

      MS also won the award for best VM at JavaOne.

  23. Re:Why digital??? by Anonymous Coward · · Score: 0

    Hm; I need to write a perl script to generate this type of junk.

  24. Re: Solving PDEs by coyote-san · · Score: 2

    Did I mention another of my graduate classes was chaotic dynamics? :-)

    The very definition of "chaos" is high sensitivity to changes in the initial conditions. If a weather front appears in the same place (within the resolution of the data grid) on all 120-hour forecasts despite a reasonable variation in the initial conditions, you can be pretty sure it isn't in a chaotic realm and your forecasts will be fairly accurate.

    On the other hand, if a modest amount of variation in the initial conditions result in wildly different predictions, the system is obviously in a chaotic realm and you can't make decent predictions.

    As odd as it sounds, for something as large as a planetary atmosphere it's quite reasonable for parts of the system to be chaotic while other parts are boringly predictable. That's why they were starting to compare the predictions from different models, the same models with slightly different initial conditions, etc. That might give the appropriate officials enough information to decide to evacuate a coastline (at $1M/mile), or to hold off another 6 hours since the computers predict the storm will turn away.

    P.S., the models do make mistakes, but fewer than you might expect. It's been years since I've thought about it, but as I recall most models work in "isentrophic" coordinates and are mapped to the coordinates that humans care about at the last step. The biggest problem has been the resolution of the grids; when I left I think the RUC model was just dropping to 60km; by now it's probably 40 or 30km. To get good mesoscale forecasts (which cover extended metro areas, and should be able to predict localized flooding) you probably need a grid with 5 or 10 km resolution.

    --
    For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
  25. Re:Software? by scrytch · · Score: 1

    > I don't know about now, but five years ago, a state-of-the-art code for weather forcasting used spectral approximations (Fourier or Chebychev expansion functions) in the X- and Y-directions (Latitude and Longitude, say) and some high-order ...


    Dude... I think you just compressed an entire episode of Star Trek into six sentences. :)

    --
    I've finally had it: until slashdot gets article moderation, I am not coming back.
  26. Re:Why digital??? by Anonymous Coward · · Score: 0

    I assume this is a pun and goof, not proof that you haven't heard of the company Digital, which invented the Alpha and was bought by Compaq...

  27. Re:Great. I still wonder about the compilier thoug by norton_I · · Score: 1

    Usually, when one is investing in the kind of high end networking hardware necessary to make a clustered supercomputer, one uses FTP or NFS instead of floppies... Only an idiot would compile a program individually on each of 100 nodes of a cluster anyway.

  28. Re:Great. I still wonder about the compilier thoug by Anonymous Coward · · Score: 0

    www.unix.digital.com/linux/software.htm it's free now...

  29. Re:Why digital??? by Anonymous Coward · · Score: 0

    Digital? Have you seen their power supply? I don't see any digital in that, they should give credit where it is due.

    As far as their product, you sure are right when you say that they are in the alpha stage. How many electrons can they pump thru their chip in 1 second? Heck, our light house sized vacuum tubes could suck in electrons like its a black hole. Don't forget, in the pure analog world, thruput=power=speed, with each discrete electron acting as a piece of information. This is what real VLIW is suppose to be.

  30. Re:Not a beowulf? by forest · · Score: 1

    Absolutely. This is NOT a Beowulf cluster.

    Beowulf refers to the tools created at NASA Goddard CESDIS

    This cluster uses MPI and tools developed by the University of Virginia's Legion Project

    Beowulf has become, to some, a generic term for a Linux cluster, like Kleenex to tissues.

    Mark Vernon HPTi

  31. Re: Solving PDEs by Greg+Lindahl · · Score: 1

    FSL runs their RUC model globally with a 40km resolution today. They expect to run RUC globally with a 10km resolution on the new system. However, there is a lot of weather that wants even finer resolutions.

  32. NOAA doesn't care what's in the machine by forest · · Score: 1

    The fact that NOAA doesn't mention Linux in the press release means that NOAA doesn't care what the box is, if it meets the peformance requirements.

    If SGI or IBM (the two other leading competitors) had won, the press release wouldn't have mentioned Irix or AIX either.

    HPTi could deliver 10,000 trained monkey's in a box if it met the performance requirements.

    The fact that a Linux solution could exceed the performance of an SGI or IBM supercomputing solution is important to the Linux community, but not directly to NOAA.

    Mark Vernon
    HPTi

  33. I have Compaq's compiler, and it kicks ass. by Tony+Hammitt · · Score: 1

    To say that I'm favorably impressed by the performance of the Compaq ccc compiler would be a major understatement. IMHO, with the release of this compiler, they have just overcome the Intel price/performance issue.

    I've seen 280% speedups over gcc's best effort, more than justifying the 100% price premium of the hardware over (for instance) dual PIII boxen.

    If I was going to put in a number crunching cluster (and I may) AlphaLinux would be the best way for me to go, cutting 40% from my TCO over IntelLinux.

    Thanks Compaq!

  34. Re:You got it all mistaken dude. by quade]CnM[ · · Score: 1

    >You build your NUMA box that has 1 fat highway, and it turns up like the subway systems in the metropolitan areas. The whole
    purpose of hypercube or 5-D torus is to have a shortest path to as many places as possible, instead of hopping onto that megapipe
    and making a stop at every node to see who wants to get off.



    Technicaly you are correct. What I wanted to illistrate though is that in big NUMA boxes, you have one copy of the kernel running all processors. With a Beowulf system, and a Cray T3E I believe, you have a local copy of the kernel on each node of one or two processors. This negates the SMP problems of Linux on multi-CPU machines.

  35. its not really free, but WHY? by pixel+fairy · · Score: 1

    its beta. if they were giving it away for free
    there would be no reason not to just make thier
    own back end to gcc for the alpha.

    i still dont know why compaqs doing this...

    1. Re:its not really free, but WHY? by Greg+Lindahl · · Score: 1

      Compaq already has a compiler. It's very inexpensive to port it to a new OS; they even already had ELF from another project. It would be much more expensive for them to play with the gcc back-end.

  36. Re:Linux Supercomputer Wins Weather Bid by dkh2 · · Score: 1
    Yes, but on your 32bit system it's using twice the clicks to do it relative to the 64 bit system. This is because it has to address two values, perform the operation, then recombine. With a 64bit system you would cut your number of addresses in half for those 64bit ops that are currently being split and multichannelled.

    Granted, it's not a 2 to (1 + 1) performance ratio in the truest sense but the concept is valid if not the accuracy of my description.

    On top of that, the previous post said nothing about running on 32bit. Alpha and several other currently available systems are running 64bit today (and for the past several years). True, x86 is not 64bit. IA-64 is not really an x86 processor but the next generation from Intel. IA-64 will bring Intel more in line with what other chip manufacturers have been doing for extreme high end systems for years and will bring it to prominence on the desktop.

    D. Keith Higgs
    CWRU. Kelvin Smith Library

    --
    My office has been taken over by iPod people.
  37. How would this work without an OS by winnt386 · · Score: 1

    You would need each node to boot an OS image and I would prefer the optomized one. either way you need a diskette in each node to boot or use special ethernet cards that boot from a central server. This would be bad because it would hurt performance on the bottlneck of the super computer which would be the speed of the ehternet. The floppies would also need to contain the special messaging software. Again the ethernet would clog everything if its from a central server. Besides you only boot once and after its booted the diskette is no longer used. The other method is to install beowolf on each harddriver. This would take too long to install.

    --
    "Never stick an electrical appliance down your pants." -Tim Allen
  38. oops by winnt386 · · Score: 1

    I made a few spelling errors. I also meant hard drive on the second to last sentence. Sorry

    --
    "Never stick an electrical appliance down your pants." -Tim Allen
  39. Ok, here's your chance... by Chuck+Milam · · Score: 0

    Ok everyone, here's your chance to talk about that bitchin' Beowulf cluster...

    1. Re:Ok, here's your chance... by marbike · · Score: 2

      This kind of system would be great in helping to minimize the damage to life and property in the tornado ravaged areas of the Midwest. Having recently wittnesing a tornado for the first time (In downtown SLC no less) I have a new interest in tech like this.

      --
      it is better to light a flame thrower than curse the darkness. -Terry Pratchett Men at Arms
  40. Why Alpha's??? by Stink+Juice · · Score: 2

    I am curious as to the what the determining factors were for selecting Alphas over Pentium-based systems.

    I've installed Linux once on an Alpha box and the BIOS is truely impressive, much better than PCs. But what are some of the other reasons? Wider data/cpu buses? Larger memory configurations?

    Anyone who actually uses Linux on Alphas is encouraged to reply.

    1. Re:Why Alpha's??? by jshare · · Score: 0

      From what I understand, Alphas have better floating point performance. That, in conjunction with wanting to keep the number of nodes down, would tend to push your decision toward Alphas.

    2. Re:Why Alpha's??? by Panaflex · · Score: 2

      The 21264 is just better architecture all around. First of all, everything is 64 bit. Secondly, the FP is 10 times faster than the current P3. Thirdly, Compaq now has recently released compilers for Linux that provide an optimal 30% speed increase.

      Probably the best thing is that engineers like alphas, and they like linux.

      Pan

      --
      I said no... but I missed and it came out yes.
    3. Re:Why Alpha's??? by Panaflex · · Score: 3

      Oops!! Sorry! 10 times faster is a wrong. True specs are: (from www.spec.org)

      (UP2000 21264 667MHz -Alpha Processor Inc)
      53.7 SPECfp95
      32.1 SPECint95

      The P3 is

      (SE440BX2 MB/550MHz P3 -intel)
      15.1 SPECfp95
      22.3 SPECint95

      --
      I said no... but I missed and it came out yes.
    4. Re:Why Alpha's??? by rwhaley · · Score: 1

      Alphas have several rather large advantages over intel boxes. First, the floating point performance has a theoretical peak of twice that of a similarly clocked intel. Secondly, the memory bandwidth is significantly better. Also, alphas have 64bit PCI slots; I have never seen an intel motherboard with 64 bit PCI slots, though it seems like to me they should exist. Anyway, the peak bandwidth of myrinet is greater than 32 bit
      PCI can support, so your NIC becomes a message passing bottleneck without 64 bit PCI.

      There are various types of alphas available. As has already been mentioned, the 21264 (ev6) is the latest and greatest. Price/performance wise, however, you simply can't beat its older cousin, the 21164 (ev56). Volume sales have driven the cost of the 21164 down to right around the same cost as a similarly clocked Intel box.

      Someone mentioned the K7, or AMD Athlon, as being faster than a alpha. Not true. It has exactly the same floating point peak, and has the same bus as the ev6. However, due to its x86 instructions set, software has access to only 8 floating point registers, which means achievable peak is going to be quite a bit lower for the Athlon than for the ev6 (you wind up continually reloading stuff from L1 that you can keep in registers on the ev6).

    5. Re:Why Alpha's??? by CormacJ · · Score: 1

      We use alphas for the following reasons:
      1) They scale very easily
      2) They process very quickly
      3) They are totally modular, so if something breaks its very easily replaced.
      4) Pentium based servers haven't quite got the architecture to allow for multiprocessing and multiuser processes.

      Its good to see this happening especially after Microsoft stopped NT on Alphas. This would have traditionally been thier area. If this sort of thing continues Linux would get a lot of kudos and respectibility, which can only be good.

      I keep thinking back to the Coca-cola/Pepsi war, and the moment Coke changed their formula. Maybe Microsoft have just done the same thing and lost a lot of the battle.

      IA64 is good, but it will be a long time before it gets the stability and respect that Alpha processors currently have.

    6. Re:Why Alpha's??? by CormacJ · · Score: 1

      An oddity of the alpha design is that with each new evolution of the chip, the clock speed is actually dropping, and the processing power is increasing. This design means that they don't have to spend so much on working out all the cooling required for the board and concentrate on actually making the bus go fast.

  41. Are they afraid? by stoney · · Score: 0

    Ugh, mo mentioning of Linux in the press release. Fear of the penguin?

  42. Beowulf Cluster! by Mooset · · Score: 1

    That may kick ass, but imagine a Beowulf cluster made out of... oh wait, it already IS a cluster. :) I guess they need to get to work on that internet tunnelling massive computing surface initiative if they want to make this computer part of a Beowulf cluster...

  43. Good to see this sort of thing by joshv · · Score: 1

    Good to see that using Linux as a tool, a company can provide a commercial grade super computer at what appears to be a very attractive cost/performance ratio.

    Along with the use of Linux in digital VCRs and other Internet appliances this goes a long way to validating Linux as a viable, and very flexible commercial platform.

    -josh

  44. BEOWOLF! by Anonymous Coward · · Score: 0
    For $15,000,00 to buy an Alpha Beowolf, it sounds like they might have 2,500 nodes with a 'decent' Alpha system. But if they go really high end, they'll have about 750 nodes (For the 'killer' $20,000 Alpha machines).

    Either way, you could make Toy Story in about 10 minutes on this thing once it's up.

  45. Wow by Kythe · · Score: 0

    $15 million will buy a lot of beowulf. Anyone see how many nodes? I didn't find the number listed.

    Kythe
    (Remove "x"'s from

    --

    Kythe
  46. Why Alpha's? Screaming FP performance, that's why by Troy+Baer · · Score: 3

    I've installed Linux once on an Alpha box and the BIOS is truely impressive, much better than PCs. But what are some of the other reasons? Wider data/cpu buses? Larger memory configurations?

    The big thing about the Alpha for people like NOAA (who run big custom number-crunching apps written in FORTRAN) is its stellar FP performance. A 500MHz 21264 Alpha peaks at 1 GFLOPS and can sustain 25-40% of that, because of the memory bandwidth available. A Pentium III Xeon at the same clock rate peaks at 500MFLOPS and can sustain 20-30% of that.

    That doesn't fly for everybody, though. Where I work, we have a huge hodgepodge of message-passed, shared-memory, and vector scientific codes, plus needs for some canned applications that aren't available on the Alpha. We picked quad Xeons for our cluster and bought the Portland Group's compiler suite to try to get some extra performance out of the Intel chips.

    --Troy
    --
    "My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
  47. An interesting observation by Kismet · · Score: 1

    Although HPTi may believe in Linux as a clustering solution, it would appear that they have trusted their web page to IIS 4.0. It also seems that their web authoring tool is MS based, judging from the occurence of "?" where normal punctuation would be found.

    This is good news, but it only affirms the role of Linux in niche markets. It will be some time before it is accepted widely as a general purpose business or desktop solution.

    1. Re:An interesting observation by Anonymous Coward · · Score: 0
      Although HPTi may believe in Linux as a clustering solution, it would appear that they have trusted their web page to IIS 4.0.

      Ahh, com'on! What makes you think that Engineering and Product Development usually controls the corporate webpage? Do you think the handyman that changes the company light bulbs also plugs the CPUs into their slots? Everyone knows that most web servers are set up by some intern who is an Art major from the local junior college. Most people in engineering would never run across those guys in the course of a day's work. Well, maybe in the cafeteria we do gawk at the Art major intern decked out in black spandex and leather. You know, the one with the black lipstick and purple fingernails.

    2. Re:An interesting observation by Anonymous Coward · · Score: 0
      1. Although HPTi may believe in Linux as a clustering solution, it would appear that they have trusted their web page to IIS 4.0. It also seems that their web authoring tool is MS based, judging from the occurence of "?" where normal punctuation would be found.
      Notice the following results from Netcraft: Some parts of the page are IIS, some are using apache. You can bet that the someone has a clue, and is not happy running IIS at all. If they have any brains, they'll be trying to figure out how to drop IIS completely.
  48. Re:Why Alpha's? Screaming FP performance, that's w by Anonymous Coward · · Score: 0

    If the G4 can sustain >1gflops, then why not build a cluster of G4s running LinuxPPC? Jeremy Fincher

  49. Go to this link by aheitner · · Score: 3

    General Processor Info.

    Compare the SPECfp scores of high-end Intel and Alpha offerings. Take a look at a 600MHz PIII Xeon and a 667MHz Alpha 21264.

    The reason to choose Alpha should be obvious.

  50. Re:Linux Supercomputer Wins Weather Bid by Anonymous Coward · · Score: 0

    Total and utter BS. If you dont need an address range larger than 2 gig's in your application 64 bit can only slow it down.

  51. will it live up to expectations? by swonkdog · · Score: 1

    i hear all of the great tales of lore about boewulf cluster and their amazing speed yet i am forced to ask if it will perform as advertised. as i understand it, (and i may be way off here, so please correct me) beowulf clusters do not completely overcome the problems that linux has with multiple processors. of course this is something hoped to be fixed in later kernel releases, but does the noaa really have the time to bring down a system such as this for kernel recompiles? a very fast machine? yes. but will it ever live up to it's full potential? i hope it does, but i still have to wonder.

    1. Re:will it live up to expectations? by Anonymous Coward · · Score: 0
      Linux's SMP scaling issues are completely related to the kind of application running on the box. Ever since kernel 2.0 user land computational code scales almost perfectly with extra CPUs. This is likely the kind of thing that would be running on a cluster.

      Now nothing says you have to use SMP boxes for a cluster, but everyone does, and for $15 million there's pretty much no choice.

    2. Re:will it live up to expectations? by quade]CnM[ · · Score: 3

      This is not true of Masively Parallel Systems such as boewulf. The problem with Linux and scalibility is more of a hardware problem then a software problem. While you aren't going to put Linux on a Sun E10k anytime soon, it was never ment to be on such a large SMP machine. The Intel SMP artecture is flawed in design. All processors share the same buss. Therefore if one processor can sustain 300M/sec of transfer, and you have 4 processors. That 800M/sec buss is going to slow down. now your processors are only 2/3 as efficient as they are in a single system. But you are probably going to be slower then this because most RAM has a sustained transfer rate of only 150-200M/sec. so you only use 1/2 of the processor.


      To fix this, you use 2 processor busses, and 2 memory busses. you fill these up, and you get 4 processor busses and 4 memory busses. now you need to connect these buss segments. You have several options. First, connect them within the same machine. This is what NUMA is. the other route is to put each bus in a seperate machine, each machine running a copy of the kernel localy, and connect each box together with a fast network. This is what boewulf is.

      To give you an example. think of a highway system. If you have a lot of traffic switching lanes(busses) constantently, then it would be best to build one big 20 lane highway(NUMA). but if all the traffic basicaly keeps in its own lane, without much need to switch lanes(Inter Process Comunication) then it may be more economical to build 10 2lane highways(boewulf).


      Infact ins't a cray T3E more of a boewulf type cluster of closly nit machines then a NUMA. I think each node on a T3E runs a local copy of the micro-kernel.

    3. Re:will it live up to expectations? by Chalst · · Score: 3

      When you talk of linux's problems with mulitple processors, I think that you are referring to its limited SMP capacity.

      SMP (Symmetric Multi- Processing) is fundamentally different to clustering, as all of the processors in an SMP configuration share the same memory bus, whilst in a cluster the machine architectures are distinct, and we use a high-speed network to exploit parallelism.

      See the Linux Parallel Processing HOWTO for more information.

  52. Re: Solving PDEs by coyote-san · · Score: 4

    I worked at FSL for several years, although on a different project. I knew people working on the weather models, and I took a class on parallel processing from the CU professor who shared the old Paragon supercomputer with NOAA. I even had an account on the Paragon briefly (for that class) after leaving NOAA.

    NOAA needs to solve partial differential equations (PDEs). A *lot* of PDEs. My class spent a lot of time on solving numerical methods, and my entire undergraduate class in the early 80's was covered in the first lecture of my graduate class a few years ago. My Palm Pilot, running multigrid analysis, could beat the pants off a Cray-XMP running the best known algorithm from 15 years ago.

    AI programs may not scale well, but the type of work done at NOAA *does*. Furthermore the hot topic a few years ago was applying some ideas from chaos theory to weather forecasts - take a dozen systems, insert just a little bit of noise into the initial data (essentially, instrument noise in your observations), then let them all run. If all models show the same weather phenonema, you can be pretty sure that it will occur. If the models show wildly different results (e.g., Hurricane Floyd slams into Key West in one run, but NYC in the other) you know that you can't make any firm predictions. As an educated layman's guess, I expect that the reason the hurricane forecasts are so much better than just a few years ago is precisely this type of variational analysis.

    --
    For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
  53. Re:Why Alpha's? Screaming FP performance, that's w by Troy+Baer · · Score: 3

    If the G4 can sustain >1gflops, then why not build a cluster of G4s running LinuxPPC?

    I'm not convinced the G4 can sustain 1 GFLOP/s in any kind of real calculation -- it simply doesn't have enough memory bandwidth. The G4 uses the standard PC100 memory bus, AFAIK. That's 64 bits wide running at 100MHz = 800MB/s peak. So without help from the caches, the absolute best you can do is on *any* PC100 based system is 200 MFLOP/s using 32-bit FP or 100 MFLOP/s using 64-bit FP. In practice you can only sustain about 300-350 MB/s out of the PC100 memory bus, so things get even worse. The caches will help quite a bit (maybe a factor of 2-4), but I have trouble imagining the G4 being able to sustain over 500 MFLOP/s even on something small like Linpack 100x100 because of the limited bandwidth and latency of the PC100 bus. Other processors that have similar peak FP ratings have much higher memory bandwidths; we've benchmarked an Alpha 21264 (1 GFLOP/s peak, ~400 MFLOP/s sustained) at about 1 GB/s memory bandwidth (that's measured, not peak), and a Cray T90 CPU (1.8 GLOP/s peak, ~700 MFLOP/s sustained) at 11-13 GB/s (again, measured not peak).

    There's also the question of compilers. You have to have a compiler that recognizes vectorizable loops and generates the appropriate machine code to use the vector unit. Unless Motorola's feeling *really* magnanimous, I don't see that kind of technology making it into gcc (and g77, more importantly for scientific codes) any time soon. Otherwise, you're at the mercy of a commercial Fortran compiler vendor like Portland Group or Absoft. PGI hasn't shown any interest in PowerPC to this point, and Absoft currently does PPC compilers only for MacOS 8, not OSX or LinuxPPC.

    I'd love to be proven wrong on this, but based on my experience I don't see how you could do it.

    --Troy
    --
    "My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
  54. Embarrassing parallellism? weather@home by Anonymous Coward · · Score: 0
    I did a little of this work as an undergraduate, and I've heard that story too: that large scale simulations usually run over a large number of input sets with variations in initial conditions and the random number generator seeded differently.

    Doesn't this make the problem embarrassingly parallel? If I have to run 30 times on slightly perturbed input data, can't I get a near-30x speedup this way, without a lot of painful parallel PDE programming?

    Just imagine: weather@home (yeah I know it's probably not *that* embarrassingly parallel).

  55. Re:Linux Supercomputer Wins Weather Bid by Tal_Greywolf · · Score: 1

    I would suspect that there's a number of reasons why NOAA went with the solution they did, and not just merely because it's a fast set of machines running a fast operating system.

    Every six hours the National Weather Service sends out to all of it's forecast offices around the country a series of models to help in local forecasting. Each model is based on a massive amount of information that comes in to their central office, and that information is used in preparing the next set of forecasts. Now, you would want a) a system that is capable of processing all of this information rapidly and reliably, with b) redundancy built in so that if a part of the system goes down, you're still able to digest and transmit those models. Using a cluster of systems gives you that backup redundancy, and using a stable operating system gives you that speed and reliability to churn out models reliably.

    The people at NOAA likely could care less about advocacy in this respect. What they want is a system that they can use, provide them the reliability and performance that is demanded, for a reasonable cost. $15 million for a distributed cluster that gives them a lot more bang for the buck is definitely money well spent. And remember, this IS your tax dollars at work, one of the few times you will ever see it spent for a truly worthwhile cause.

    -Tal Greywolf

  56. Re:Linux Supercomputer Wins Weather Bid by Anonymous Coward · · Score: 0
    Not true. Our in house app will perform optimally in 2GB of address space, but is significantly faster in 64 bit. We do lots of 64 bit arithmetic (addition, subtraction and bit shifting) and the native instructions make an enourmous difference.

  57. Great. I still wonder about the compilier though by winnt386 · · Score: 2

    A friend of mine tried linux alpha and the performance was quite bad. After a posting at a newsgroup she found out that the gcc compilier is not optimized for alpha. She had to buy an expensive c/c++ one for the alpha box and then after a recompile the performance was great. I wonder how hard it was to get the cluster going wiht the compilier issue. I would hate to make 80 diskettes for all the machines because of licensing issues with the compilier. I heard alpha linux lacked some features of the standard intel one. Is this true or was it refering to the unoptimized compilier that comes with alpha redhat linux?

    --
    "Never stick an electrical appliance down your pants." -Tim Allen
  58. Java is junk by Anonymous Coward · · Score: 0

    Give me a buzz when Java gets Design By Contract or even the C language's simple "assert()".

  59. Re:Toy Story: The Beowolf Cluster by mistabobdobalina · · Score: 1

    come on now! we all know that plot is irrelevant these days!!! its all about explosions and breasts.

    --
    -- your knees hurt, don't they?
  60. Re:The right tool for the right task ... by coats · · Score: 1
    ...Now despite what most people think, the bottleneck (in this example) is in fact the I/O...
    LL, tell me about the analysis of computational complexity of your problem... or have you even analyzed it?? To model a particular domain for a particular time period, assuming a fixed archival output frequency (e.g., "We are saving snapshots every 15 minutes for analysis and archiving"), your I/O requirements vary inversely with the cube of your spatial resolution, whereas your computational intensity varies inversely with the fourth power. If you have a system with both performing satisfactorily at 5 KM, then at 100 M, you need 50^3 times the I/O but 50^4 times the CPU. In other words, if you bring in a new system in which you've scaled everything up by the same factor, and you think you have enough I/O, then you're way underpowered in the CPU department (you need 50 times more than you've got!!).

    --
    "My opinions are my own, and I've got *lots* of them!"
  61. supercomputing by mistabobdobalina · · Score: 1

    what are the areas that typically require heavy-duty processing power? all i know of is weather modeling and graphics rendering...

    --
    -- your knees hurt, don't they?
  62. PA-RISC definitely worth a look. by RallyDriver · · Score: 1

    Don't know about the latest Alpha based systems, but by the terms of some supercomputer apps, such as matrix algebra stuff (FE, CFD, etc.) the bigger DEC servers were nothing to right home about around 18-24 months ago when I was doing a lot of benchmarking.

    The peak total memory bandwidth available then was 2.4Gb/sec in the AlphaServer 8400, and it really had an impact on big calcs - can't speak for SPECfp, but for a big matrix algebra calc you need (asymptotically approaching) 4 bytes/sec per "flop", and these systems just didn't cut it.

    I won't even speak about 32-bit Intel boxes - the 100MHz cache bus sucks enormous rocks, and the 4Gb memory limit (3Gb with NT, less with Linux IIRC) cuts it out of the big job league anyway. This is maybe OK if it's a node in a large MPP system, but these days you want to be able to bring 64Gb or more of RAM to bear on a single problem.

    The question we used to hear from our engineering staff was along the lines of: "Hey, my desktop PC is n-zillion MHz, and it runs this tiny test calc almost as fast as the big machine, why don't we just get a lot of big twin Xeon PC's with XYZ graphics cards?". Or occaisonally, the same thing in favour of SGI workstations - engineers love toys just like the rest of us.

    This is the classic misconception caused by benchmarks in the FE industry; a lot of test calcs will fit in the cache on a Xeon PC or an R10k or UltraSparc workstation, and show pretty acceptable performance, but the dropoff when you move to a larger problem size and start hitting RAM is sudden and dramatic.

    By comparison, if you look at real supercomputers, like the high end Crays or NEC SX series, memory bandwidths of 2 to 4 Gb/sec *per processor* are the norm.

    The machine we ended up buying to replace a low-end vector Cray was - an HP V-Class.

    The PA-RISC has excellent scoreboarding and memory bus, and the Convex architecture keeps it well fed. We tested on the Convex S-Class hardware running at 180MHz with SPP-UX, and HP guaranteed that the delivered system running HP-UX would meet the clock over clock speedup ratio, which it did with room to spare. We saw well over 700 MFlops *sustained* per CPU on a 200MHz PA-8200 using rather nondescript FORTRAN, against a theoretical peak of 800.

    The picture with the newer PA-8500 machines is not so rosy, as the memory bandwidth does not seem to have been scaled up with the capabilties of the new CPUs, especially with double the number of CPUs per board. Nevertheless, as the previous posters' figures would indicate, I believe the sustained throughput still exceeds that of the latest Alpha based systems for certain types of job, and the price/performance is very good.

    Of course, for the rabidly religious, Linux is still not well supported on PA-RISC, and doesn't handle the high end hardware.

  63. Re:The right tool for the right task ... by LL · · Score: 1

    All your points are valid and I'll briefly explain the nitty gritty:

    1) global circulation models are actually done by people in the US, downscaling via nested regional models are limited to this part of the world and if and when the system becomes operationalised, is expected to be distributed. Think cooperating groups around the world sharing the CPU burden

    2) the 100m models are interfaced to streamflow and catchment models which are only a comparatively small region set within with the wider desert (rather uninteresting). Think sparse multi-resolutional hierarchy.

    3) futher submodels are inherently linear in space/time, while the climate fields are calculated once, the bulk of the operational landscape runs the scientists are interested in are multiple ensembles which require lots of memory, hence some rather painful use of staging and compression. Think conversion to streaming media rather than static files.

    If you're interested in more details, send me your email and I'll point you to some of my papers.

    Regards,
    LL

  64. 4 TFlops?! by cetan · · Score: 1

    Wow, that is one fast machine. All for just weather! Sure, it's not the fastest machine out there, but 4 TFlops for finding out if it's going to rain on Saturday? heh. just joking. The mathematical models used in weather forcasting, and understanding the complexities of even a single supercell (which produces thunderstorms and tornadic activity) is mind-boggling.

    --
    In Soviet Russia...michael would be rotting in Siberia!
  65. Congrats by Anonymous Coward · · Score: 0

    Greg - and the team at Digital^H^H^H^H^H^HCompaq responsible for the compiler involved: Congratulations ! This is a first, but it won't be the last ... Toon Moene.

  66. Toy Story: The Beowolf Cluster by WillAffleck · · Score: 2

    AC said: "Either way, you could make Toy Story in about 10 minutes on this thing once it's up."

    Yes, but what kind of plot? Would it be Woody and Mr. Potatohead lost in a hurricane with a large number of penguins?

    Raw power is cool, but art takes a bit more than that.

    --
    Will in Seattle
  67. Linux Supercomputer Wins Weather Bid by dkh2 · · Score: 2
    Looks like the folks at NOAA are shooting to confirm what we already know, and what Microsoft is hoping to learn if they can ever get Windows ported to a 64bit system. A 64bit capable OS (Linux) on 64bit iron (Alpha) absolutely SCREAMS next to the identically clocked, sameinallotherrespects 32bit system running the 32bit version of said OS.

    Remember those vast performance diffs between the 80386SX-16 and the 80386DX-16? That's what we got here.

    7lt;Note-to-Microsoft> Nanny-nanny-nah-nah, our OS runs on IA-64 and yours won't.7lt;/Note-to-Microsoft>

    D. Keith Higgs
    CWRU. Kelvin Smith Library

    --
    My office has been taken over by iPod people.
  68. Infrastructure costs (Re:BEOWOLF!) by Troy+Baer · · Score: 2

    For $15,000,00 to buy an Alpha Beowolf, it sounds like they might have 2,500 nodes with a 'decent' Alpha system. But if they go really high end, they'll have about 750 nodes (For the 'killer' $20,000 Alpha machines).

    That doesn't include the cost of the Myrinet cards and switches, racks, 3rd party software, support people, power, cooling, etc. Believe me, if you're paying $15M for a machine, part of it better be going for support personnel and infrastructure. The configuration's probably more like 250-500 nodes with a corresponding number of Myrinet cards and switch ports, 30-75 racks (8 nodes/rack if you're lucky), a *buttload* of power and air conditioning, and 2-5 onsite support people working in it full time.

    --Troy
    --
    "My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
  69. unimpressed. by LocalYokel · · Score: 0

    Just imagine a recursive $15M cluster of multiple Beowulf clusters... how's that?

    It wouldn't surprise me one bit -- U.S. government agencies seem to find ways of being excessive, duplicitous, overly redundant, and do things in an excessively superfluous manner.

    Maybe Rob can use some of the quick IPO cash from Bendover and put it into this site -- or maybe they've already gotten advances, and that's why Slashdot's been up and working for a change this afternoon?

    I'm no troll -- in fact, I pretty much stay away from bridges altogether...


    --
    --

    --
    E2 IN2 IE?

  70. Not a beowulf? by mwillis · · Score: 1

    Hold on folks, this isn't necessarily a beowulf. I could not find the word "Beowulf" on the HPTi page. (Maybe I didn't look hard enough though).

    Not every Linux cluster is a Beowulf. The fastest alpha Linux cluster in existence is not a Beowulf.

    Anyone know what they plan to use?

  71. No one said it was. by Anonymous Coward · · Score: 1

    All that is being said is that Linux is being used for one type of supercomputing task: weather forcasting models. Some people are joyful about that. But that does not infer they think Linux is a solution for everything. It is reasonable to infer though, that since you are making an incorrect logical inference, that your logic may be flawed in other areas of reasoning. I can't say for sure wether your JAVA/NT solution is the best solution for your application. But since we have already established that you are a person of flawed logic, I wouldn't place alot of confidence in your decision to use NT.

  72. A real life demonstration... by LocalYokel · · Score: 1

    visit the SETI@home CPU type statistics page. -- Alpha EV6 and EV67's are rockin' ass^H^H^H, if not as much as the "Intel Puntium" or "PowderPC" chips...
    --

    --

    --
    E2 IN2 IE?

  73. The right tool for the right task ... by LL · · Score: 3

    Buying the hardware is only 15-30% of the total cost. Also, in a production environment, you should not be fixated by the CPU. The question should be, within the capital budget, what is the best combination of resources that maximises the effectiveness of achieving your mission.

    To give you some real-world experience, a group I'm working with is looking at continential-scale simulation at a 5km resolution with the aim of going down to 100m. Now despite what most people think, the bottleneck (in this example) is in fact the I/O, with estimated total requirements of 30 TBytes. Doing the sums show that to keep up with the CPU (say hypothetically 1 run/24 hours), you would need average throughput of 350 MByte/sec. Hardware that supports both this volume and capacity is NOT cheap. We would joke that we paid x million for the I/O and SGI would throw in the Cray for free :-).

    Now as for how an Alpha cluster could be used, it would fit very nicely into the dedicated batch box category. It has a very high CPU rate and some decent compiler optimisation. As such it would augment whatever existing environment exists, reducing the workload of the more expensive machines for development which generally have better tools (just you try debugging a multi-gigabyte core dump). The biggest problem nowadays is not the algorithms, but managing the data traffic to the CPUs and this is where Linux clusters are weak with relatively slow interconnects, unbalanced memory hierarchies, and cheaper but higher latency memory. You have to accept the disadvantages and shift jobs which are not suited for this architecture off. A bit of smarts goes a long way in stretching the budget.

    LL

  74. Linux NOT mentioned in the official release by MarcoAtWork · · Score: 1

    Anybody else noticed that Linux is not mentioned anywhere in the NOAA press release while it's promimently displayed in the integrator's ?

    Is the NOAA afraid to say that they are basing a 15 million dollars investment on free software rather than on something from Microsoft/Sun/IBM/whatever ?

    --
    -- the cake is a lie
    1. Re:Linux NOT mentioned in the official release by BradyB · · Score: 1

      Not mentioned by the NOAA official release. Why would they need to mention it. Since it's HPTi that is going to be doing the work. It wouldn't be their place to say what how the HTPi would do what they are going to be contracted to do. Linux is mentioned 5 times at least ont he HTPi press release.

      --

      Good is never enough, when you dream of being great!
    2. Re:Linux NOT mentioned in the official release by Anonymous Coward · · Score: 1

      This is going to count as flamebait.. but has
      anyone had any experience porting MPI code from linux/solaris clusters to NT? i.e. same hardware?
      And assuming the same compilers etc.. Part of the problem is until recently there have been (allegedly) really nice compilers for NT that have not been available for linux. Also BLAS routines were native only for NT by intel for ages. I think they have been ported over now.

      From my understanding for most work, it makes absolutely no difference as the overhead of the OS should be negligable. In my experience w/ single processor jobs w/ large memory jobs (say > 500 megs), Solaris tends to run smoother.


      I ask this because I am moving to a school soon that got bought out my microsoft and they have ported all their code to just this: NT Clusters using MPI (from microsoft grant money that is being dumped on all the schools p.s. we got it here.. we just umm formated the drives ;)) and I am *really* *really* not looking forward to coding on NT but it could be a learning experience (of sorts)... but i'd be interested in hearing what I should probably expect. (shortcomings advantages etc??)

  75. Re:Why Alpha's? Screaming FP performance, that's w by Jeff+DeMaagd · · Score: 1

    I think another (better?) answer is that gcc/egcs doesn't have much in the way of DSP type stuff, where you do parallel computations. Alphas get performance inherently, as its FPUs are very good, and it does not have to d!ck with SIMD instructions - something that many compilers don't do well anyways - usually you have to call hand coded assembly to get good performance out of SIMD (= single instruction multiple data, where one instruction is executed on multiple sets of data - like MMX, KNI (SSE), AltiVec, etc)

    And the raw bandwidth of even the unreleased G4s trail that of three year old Alpha designs anyways, and now there's the switch-matrix arch that gets close to twice that of the new G4's theoretical bandwidth (EV6 500 ->> ~2.6 GB/s, G4 (7400) -> ~0.8 GB/s). This is the 'theoretical', Alphas still get 1.3GB/s in sustained throughput, 50% more than G4s Theoretical

  76. Re: Solaris runs on Intel by Anonymous Coward · · Score: 0

    Did you even bother to benchmark how well your
    stuff runs under solaris x86, on the same hardware
    you have now?

    "Affording suns" has nothing to do with anything.
    Solaris x86 is cheaper than an NT license,even.

  77. Motorola gcc optimisation by Mawbid · · Score: 1

    You seem to know what you're talking about so I'm worried I may be missing something, but I don't see why Motorola needs to feel magnanimous to contribute optimisations for their chips to gcc. Wouldn't they just need good business sense? Anything that increases the value of their processors must be a good thing for them. Or is vectorizing loops so hard a problem that they'd spend more than they'd gain?
    --

    --
    Fuck the system? Nah, you might catch something.
  78. Why digital??? by Anonymous Coward · · Score: 2

    Heck, My research center has Vacuum tubes computer that is faster than ASCI Red + All the flavors of Blue (9000 PPro + 6000 MIPS + 2000 Power3) You see, the trick is in the implementation. If you take 1 wavelength of an analog signal, there could easily be 100,000,000 discrete levels(especially with a 10,KV plate voltage.) Fine tuning of the voltage differentiation amplifier would probably quadruple the speed even more. Now we only have to upgrade the holographic scanner for the punchcard readers.

    Forget about any of these digital OS, we even implemented our own ANALinux, which used OS technology that was originally implemented for the quantum computers that is slow to come about. Except for the fact that probability wave algorithm in the kernel was reimplemented with the electron wave method(more descrete.)

    We can't open source it yet, since the whole kernel runs via negative feedback, so it is constantly being upgraded. We could take a snapshot of the loaded kernel image by detaching all the ferrule doughnuts at the same time, but the source would all be in analog stream and useless unless you have another valve box.

    It easily interfaces with outside systems even though it is 100% analog inside due to the (ported) quantum kernel's interface, which utilizes the duality of the wave and sends discrete signals to outside the box. The only problem is the primitiveness of current technology. Since petabit networking has not been implented, we basically watch the tube's change in brightness as I/O. Current internet access by outsiders is via out webcam pointed at the tubes.

    This OS is totally unhackable since nowbody know how to hack it. Input is vial variosistors instead of toggle switchs, so all the script gramps who hacked their way into Univacs would not know how to break in.

    So all you digiphiles, put you toys down and use the computer that work like the way humans do.

  79. Re:Software? by Anonymous Coward · · Score: 0

    I don't know about now, but five years ago, a state-of-the-art code for weather forcasting used spectral approximations (Fourier or Chebychev expansion functions) in the X- and Y-directions (Latitude and Longitude, say) and some high-order (compact) finite difference method in the Z-direction (Altitude). Incompressible (or weakly compressible) fluid flow, extra scalar transport equations for humidity, etc. Fractional step time integration method with the pressure correction equation being solve with a combination of Fast Fourier Transforms (in the X- and Y-directions) and a line-implicit solver in the Z-direction. No idea what they use for turbulence models...an anisotropic Reynolds Stress model maybe or a dynamic subgrid model if they are doing (Very) Large-Eddy Simulations. Things may have changed too...may have dumped the spectral approximations to get more flexibility in modelling surface contours (mountains and such). Spectral elements maybe? Who knows? I do automobiles and combustors for a living.

  80. You got it all mistaken dude. by Anonymous Coward · · Score: 0

    You build your NUMA box that has 1 fat highway, and it turns up like the subway systems in the metropolitan areas. The whole purpose of hypercube or 5-D torus is to have a shortest path to as many places as possible, instead of hopping onto that megapipe and making a stop at every node to see who wants to get off.

    And who is preventing the 10-D cube from having a 100lane highway? The only limitation is that you end up with the traveling salesman with too many route to follow(but with enough routes, there is a very good change your destination is only 1 hop away)

  81. Re:Why Alpha's??? Only the Facts Ma'am! by Anonymous Coward · · Score: 1

    Well, 10x could be true for the code these guys may be running. (spec is not everything, this is very important for Memory Intensive code). Take a look at STREAM, (memory bandwidth bench) PIII ~ 300MB/s Alpha DS20 ~ 1300MB/s And since these systems use EV6 "buses" each processor gets all that bandwidth to its self in multiprocessor systems. But back to spec, here are some more numbers Published results at www.specbench.org (Compaq XP1000 667 Mhz) 65.5 SPECfp95 37.5 SPECint95 (Compaq GS140 700 Mhz) 68.1 SPECfp95 39.1 SPECint95 Informal results (www.novaglobal.com.sg) (These systems have better memory systems than those above) (AlphaServer DS20 667 Mhz) 72 SPECfp95 38 SPECint95 And you can get a well equiped system (DS10) from www.dcginc.com for only $3500.

  82. Software? by Pulsar · · Score: 1

    What sort of software is it running? What exactly uses all this power (I know how fiendishly complex weather predictions are...I'm just curious what kind of software exists/is being developed for it...)?

  83. Re:Alphas because Intel is trashheap quality hardw by Anonymous Coward · · Score: 0

    Well, hmmmm, while I am not completely convinced that you aren't trolling, that is a shorter version of the arguments that I have used to get Alphas here at work. The bandwidth issues alone make such a huge difference that for really large data sets you can get close to mainframe class throughput with a UNIX platform and tools and a radically smaller price tag and better performance. Not a mainframe, but able to deal with big, fat pipes really well. Now, if we can get the PC hardware to make 133 or 266MHz 64 bit PCI buses so common that 6 channel LVD or FC-AL (or SSA, if the 320MB/s stuff ever gets released) can really keep the pipes full and we get two more buses on the duals, we could have some nice, Cray-type performance with the bandwidth as well. Now that would be cool.

  84. Re:Why Alpha's??? Now this is better by Anonymous Coward · · Score: 2

    Well, 10x could be true for the code these guys may be running. (spec is not everything, this is very important for Memory Intensive code). Take a look at STREAM, (memory bandwidth bench) PIII ~ 300MB/s Alpha DS20 ~ 1300MB/s And since these systems use EV6 "buses" each processor gets all that bandwidth to its self in multiprocessor systems. But back to spec, here are some more numbers Published results at www.specbench.org (Compaq XP1000 667 Mhz) 65.5 SPECfp95 37.5 SPECint95 (Compaq GS140 700 Mhz) 68.1 SPECfp95 39.1 SPECint95 Informal results (www.novaglobal.com.sg) (These systems have better memory systems than those above) (AlphaServer DS20 667 Mhz) 72 SPECfp95 38 SPECint95 And you can get a well equiped system (DS10) from www.dcginc.com for only $3500.