Slashdot Mirror


Supercomputer Breaks the $100/GFLOPS Barrier

Hank Dietz writes "At the University of Kentucky, KASY0, a Linux cluster of 128+4 AMD Athlon XP 2600+ nodes, achieved 471 GFLOPS on 32-bit HPL. At a cost of less than $39,500, that makes it the first supercomputer to break $100/GFLOPS. It also is the new record holder for POV-Ray 3.5 render speed. The reason this 'Beowulf' is so cost-effective is a new network architecture that achieves high performance using standard hardware: the asymmetric Sparse Flat Neighborhood Network (SFNN)." Because this was a university project, KASY0 was assembled entirely by unversity students, which while being a source of cheap labor, is also a good way to get a lot of students of involved in a great project.

11 of 281 comments (clear)

  1. Playstation2 at 5.5GFLOPS costs only $199 $40/GFL by gorim · · Score: 4, Insightful

    And it was introduced to consumers just a couple years
    ago. Sorry, the AMD beowulf cluster at $100/GFLOP just
    isn't that impressive.

  2. Re:The burning question by HanzoSan · · Score: 3, Insightful



    People dont share mp3s anymore, if they do the FBI, NSA, Secret Service, CIA, and Homeland Security Dep will swarm them and put them in the bay.

    I mean I wish we could crack down like this on organized crime, or on domestic terrorists, I'm surprised we are so aggressive at arresting teenagers who download music, but the KKK and Neo Nazis can collect a million guns and spread their crazy hate speech and its protected by freedom of speech.

    I'd think that hate speech does more harm than copyright infringement.

    --
    If you use Linux, please help development of Autopac
  3. Is that a real number or a marketing number? by Sycraft-fu · · Score: 5, Insightful

    I'm guessing the latter. You see all sorts of BSified numbers from marketing departments on processors, but they have little to do with reality. The number for this AMD cluster is a real, actual, measured-using-a-real-world-app number. To give you some idea of BS console numbers, the Xbox has a PIII 733 processor in it (ok, technically it's a little different, but it's a P3 core). Now the Gflop claim is 2.93. Out of a P3 733? Ya right, on paper perhaps but never in the real world, much less on a real app.

    Then, of course, there is the issue of specialised chips vs normal chips. A GeForce 4 4400 can claim, roughly, 80 Gflops peak. That sure beats the hell out of any sinlge CPU I've ever heard of, including the Power4. Thing is the GeForce 4 is a graphics DSP, it isn't a general purpose CPU. It can do that kind of math when all its units are working at what they do best, but try to reprogram it to do something else and it will slow to a crawl (for that matter I'm not even sure that it is turing complete).

    So don't take any hype on a console to equate to real performance in a general task. Oh, and the BS marketing number I see for the PS2's Emotion Engine is 6.2Gflops.

    1. Re:Is that a real number or a marketing number? by drinkypoo · · Score: 2, Insightful

      Of course, assuming it's only half the parent comment's assertion, thus 2.25 GFlops, at $180 it's still cheaper than $100/GFlop. However, as others (should?) have pointed out by now, it's useless as a supercomputing node for all but the smallest tasks since it has no local storage and extremely limited main memory. You will have to spend another $200 for a linux kit to get storage and networking, bringing it up to $380 for the system. If it were actually 5.5 GFlops in the real world, then that would still be cheaper than $1/GFlop, but of course neither you nor I believe it is that fast while doing general-purpose processing.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  4. Nice wiring! by nate.sammons · · Score: 2, Insightful

    Looks like most of the wiring jobs I've seen done by students: kasy0core.jpg.

    God forbid they use cable gutters ;-)

    Other than that, kick ass job guys!

    -nate

  5. University students by SuperBanana · · Score: 5, Insightful
    Because this was a university project, KASY0 was assembled entirely by unversity students, which while being a source of cheap labor, is also a good way to get a lot of students of involved in a great project.

    At the risk of being flamebait- No. Using university students is almost always purely a way of getting cheap labor to do semi-mindless, or completely mindless, stuff the staff doesn't want to do- it's a common myth that students 'learn' by doing grunt work. I should know- I have several grad student friends, and they've thusfar spent a large part of their academic careers working in labs doing mind-numbingly boring stuff(according to them.)

    Imagine if a Bio lab did this. The following would sound pretty absurd: "Help us move our lab, you'll learn about cellular recombination!". No. You'll learn what a bunch of lab equipment looks like, how eccentric the professors are, and how expensive/fragile/heavy the equipment is, and the next morning what sore muscles are like. Let's get a reality check here.

    (from the site):Our group develops the systems technology for cluster supercomputing; the more people we can show how to apply these technologies, the better.

    Huh? What cluster supercomputing "technology" does assembling a PC and plugging it into ethernet teach you? Did they give a presentation about how clustering technology works, for example? Did they explain to each person, as they put a machine in a particular place and wired it to a particular switch, WHY it was going there etc? Obviously I wasn't there, so perhaps someone from the group can contribute on this point.

  6. In other news... by rmdyer · · Score: 1, Insightful

    Now that the university students have graduated and moved on, there isn't any documentation, nor do they know how to use the darn thing...

    -1

  7. There is Flop and Flop by Tiosman · · Score: 4, Insightful

    It's not the first time that these folks in KY work around the definition of the acronym "Flop". A Flop is a floating point operation on 64 bits, not 32 bits. All entries in the Top500 used results with 64 bits HPL, nobody else in the world is running HPL on 32 bits. So claiming the moon on 32 bits is easy, useless for the sake of comparaison and almost unethical. I cannot believe that Dr Dietz do not know the difference by now.

    The same machine would yield average results on 64 bits. Difficult to draw attention without headline numbers...

  8. overclocking by snooo53 · · Score: 2, Insightful
    Looking at the specs I'm curious if anyone thought of overclocking the machines to get an even bigger performance increase. It seems that with most Athlons you can get at least a good 100 mhz of extra speed, even with a stock cooler, by increasing the fsb/multiplier and not even touching the voltage. Even a modest increase like that would yield an extra 12.8GHz of power, dropping that price figure even further. Depending on what type of computing they're doing, increasing the fsb might have an even bigger effect than more GHz

    Granted there might be some heat problems, but judging by their setup, I'm guessing the room is well-cooled.

    --
    The sending of this message pretty much inconveniences everyone involved.
  9. Re:Wrong by sjames · · Score: 3, Insightful

    Really, it's a spectrum. One one end you have fully commodity beowulf, in the middle, you see things like Dolphin and Myrinet, and on the high end you see fully custom backplanes and sometimes RAM and I/O controllers as well. Purpose built CPUs are becomming less common now, but not unheard of.

    Each step up the spectrum widens the domain of problems that the machine can work on efficiently, and raises the price for the machine. In many cases, a 'real' supercomputer is more or less a cluster with a specialized network and OS and mounted in a single cabinet so it doesn't look like a cluster.

    In general when a lower end machine can efficiently run your program, there is no benefit to using a more expensive machine.

    As server hardware improves and 'exotic' hardware becomes more mainstream, the gap between the low and the high end narrows. There will probably always be a small but existant set of problems that call for the 'real' supercomputer, but that set is shrinking.

    There are other considerations as well. If the Beowulf in your lab can solve the problem in 1 week and is available now, while the 'real' supercomputer on the other campus can solve it in 4 hours and will have a timeslot available in 2 weeks, the Beowulf is 'faster' from your point of view.

  10. Re:this is nice by stdarg · · Score: 2, Insightful

    Wait until computers start shipping with a few FPGA units. Then you can flash a new image onto the FPGA's for each specialized application you use the cluster for.