Slashdot Mirror


Supercomputer Breaks the $100/GFLOPS Barrier

Hank Dietz writes "At the University of Kentucky, KASY0, a Linux cluster of 128+4 AMD Athlon XP 2600+ nodes, achieved 471 GFLOPS on 32-bit HPL. At a cost of less than $39,500, that makes it the first supercomputer to break $100/GFLOPS. It also is the new record holder for POV-Ray 3.5 render speed. The reason this 'Beowulf' is so cost-effective is a new network architecture that achieves high performance using standard hardware: the asymmetric Sparse Flat Neighborhood Network (SFNN)." Because this was a university project, KASY0 was assembled entirely by unversity students, which while being a source of cheap labor, is also a good way to get a lot of students of involved in a great project.

18 of 281 comments (clear)

  1. Wow! by fryguy451 · · Score: 5, Funny

    Imagine a Beowu... errr... Oh..

  2. Let the Beowulf cluster jokes begin! by Anonymous Coward · · Score: 5, Funny

    Note to moderators, Beowulf cluster jokes CANNOT be offtopic.

    Imagine a Beowulf cluster of Beowulf cluster jokes!

  3. Also I wonder by HanzoSan · · Score: 5, Interesting



    How much electricity will these super computers use up?

    All those wires, it looks like it takes up alot of juice.

    --
    If you use Linux, please help development of Autopac
  4. Let's not get too excited.... by CGP314 · · Score: 5, Funny

    Supercomputer Breaks the $100/GFLOPS Barrier

    Not after you factor in the SCO license fees.

  5. Asymmetric Sparse Flat Neighborhood Network by FreeLinux · · Score: 5, Interesting

    Obviously, I don't get it. This doesn't look any different than redundant backbones or what is frequently done with VLANs. Multiple paths between hosts is what I see. How is this "new"?

    1. Re:Asymmetric Sparse Flat Neighborhood Network by flymolo · · Score: 5, Informative

      Due to "creative" (computed) wiring, if all switchs are functioning, no node is more than one hop from each other node. This requires a routing table written for each pc. It could be used for redunancy, but it is being used to minimize latency, and collisions, which are both killers in clusters.

      --
      "Sometimes it's hard to tell the dancer from the dance." --Corwin Of Amber in CoC
  6. Students as Slave Labor by gremlin_591002 · · Score: 5, Funny

    Ponders while there are not University students pictures in the National Geographic Article on Slavery....

  7. Playstation2 at 5.5GFLOPS costs only $199 $40/GFL by gorim · · Score: 4, Insightful

    And it was introduced to consumers just a couple years
    ago. Sorry, the AMD beowulf cluster at $100/GFLOP just
    isn't that impressive.

  8. So much power... by krahd · · Score: 5, Funny

    and it still can't run Doom III at a decent rate.

    --krahd

    mod me up, scottie!

    --
    mod me up scottie!
  9. Cooling by bengoerz · · Score: 4, Informative

    I toured the previous cluster these guys did (KLAT2) and was very impressed. However, using AMD Athlon Thunderbirds last time, it did get quite hot. I remember standing by the cluster looking at all the wiring and being bombarded by an overhead cooling vent. I'm also assuming that these cooling issues is the reason that each case has two blow-holes. I'd also like to see these guys post in-depth specs of each machine. Being a hardware nut, I'd like to see how they got so many machines so cheap, and maybe even what vender they used. As I remember, they worked REALLY hard on their last cluster to keep costs to an absolute minimum.

  10. Is that a real number or a marketing number? by Sycraft-fu · · Score: 5, Insightful

    I'm guessing the latter. You see all sorts of BSified numbers from marketing departments on processors, but they have little to do with reality. The number for this AMD cluster is a real, actual, measured-using-a-real-world-app number. To give you some idea of BS console numbers, the Xbox has a PIII 733 processor in it (ok, technically it's a little different, but it's a P3 core). Now the Gflop claim is 2.93. Out of a P3 733? Ya right, on paper perhaps but never in the real world, much less on a real app.

    Then, of course, there is the issue of specialised chips vs normal chips. A GeForce 4 4400 can claim, roughly, 80 Gflops peak. That sure beats the hell out of any sinlge CPU I've ever heard of, including the Power4. Thing is the GeForce 4 is a graphics DSP, it isn't a general purpose CPU. It can do that kind of math when all its units are working at what they do best, but try to reprogram it to do something else and it will slow to a crawl (for that matter I'm not even sure that it is turing complete).

    So don't take any hype on a console to equate to real performance in a general task. Oh, and the BS marketing number I see for the PS2's Emotion Engine is 6.2Gflops.

  11. Here is the bill! by borgdows · · Score: 5, Funny

    Dear customer,

    At the cheap introductory price of 699$ for 80 lines of code in the Linux kernel, it will cost you 8,377,500$ by kernel since we have discovered that in fact 1000000 lines of SCO IP were copied into Linux.

    Designation .. Price .. Qty .. Total
    Linux kernel .. 8,377,500$ .. 128 .. 1,118,400,000$

    So you must pay us only 1,118,400,000$, and in my kind almighty I will offer you a discount of 118,400,000$ so you only have to pay ONE BILLION DOLLAR if you pay before tomorrow!

    Please send you creditcard number at darl@sco.com

    Sincerely yours,

    -- Darl Mac Bride

  12. How many university have larger clusters? by SilverSun · · Score: 5, Interesting

    I wonder which universities/institutes have larger and maybe cheaper clusters, but just don't bother with running benchmarks. I for one are sitting next next to a tiny cluster with 40 dual-cpu nodes, which is connected (GRID like) to a 340 dual-node cluster in a nearby town. Non of us high ernergy physicists bothers with running any benchmarks on our clusters, other than our own applications. I wonder how many "linux-cluster-supercomputers" are out there which would easyly make it into the top 500, but noone has ever heard of....

    Cheers.

    --

    KdenLive/PIAVE - non-linear video editing

  13. University students by SuperBanana · · Score: 5, Insightful
    Because this was a university project, KASY0 was assembled entirely by unversity students, which while being a source of cheap labor, is also a good way to get a lot of students of involved in a great project.

    At the risk of being flamebait- No. Using university students is almost always purely a way of getting cheap labor to do semi-mindless, or completely mindless, stuff the staff doesn't want to do- it's a common myth that students 'learn' by doing grunt work. I should know- I have several grad student friends, and they've thusfar spent a large part of their academic careers working in labs doing mind-numbingly boring stuff(according to them.)

    Imagine if a Bio lab did this. The following would sound pretty absurd: "Help us move our lab, you'll learn about cellular recombination!". No. You'll learn what a bunch of lab equipment looks like, how eccentric the professors are, and how expensive/fragile/heavy the equipment is, and the next morning what sore muscles are like. Let's get a reality check here.

    (from the site):Our group develops the systems technology for cluster supercomputing; the more people we can show how to apply these technologies, the better.

    Huh? What cluster supercomputing "technology" does assembling a PC and plugging it into ethernet teach you? Did they give a presentation about how clustering technology works, for example? Did they explain to each person, as they put a machine in a particular place and wired it to a particular switch, WHY it was going there etc? Obviously I wasn't there, so perhaps someone from the group can contribute on this point.

    1. Re:University students by panda · · Score: 4, Informative

      Having worked there, and knowing what Hank Dietz and his students are doing, I can tell you that it is different from just slapping PCs together, stringing wire between them and installing clustering software.

      Dietz specializes in networking and all the wiring that you see in the photos is charted out by custom software that he's written just for this purpose.

      He works in the realm of optimizing communications among the nodes to avoid network latency and so on. If you read the POVRay benchmarks, you'll notice that the author comments that several clusters' CPUs spend most of their time idle due to network latency. Dietz is researching the best ways to eliminate much of that latency so that the CPUs in the cluster can spend more of their time crunching data rather than just throwing off heat. To my knowledge, he is succeeding at this and better than most other researchers in the field.

      As for what his students learned from this, I don't know exactly which students helped him on this. For KLAT2, there were several undergrad volunteers who helped with wiring and assembly, mostly from the campus Linux Users' Group. I know his grad students and research assistants are learning a lot about how clustering and network tech works, and a couple are doing their Ph.D. disserts in this very subfield of E.E.

      --
      Just be sure to wear the gold uniform when you beam down -- you know what happens when you wear the red one.
  14. why not DSP? by mike_g · · Score: 4, Interesting
    Why are not DSPs used in configurations such as this. The TI 67xx series are able to perform about 1 GFLOP/s running at only 150 MHz and cost only about $40 per chip.

    This price/performance ratio seems to make them very attractive compared to general purpose CPUs. According to the NASA G5 Study, the P4 2.66 GHz is only able to achieve 255 MFLOP/s. And the P4 costs about 4x the price of the 6711 DSP.

    It seems that DSPs should be the clear winner in supercomputer applications, what are their disadvantages and why are they not used? Granted there is a lack of mass produced hardware such as motherboards for DSPs, but that alone should not exclude them from the supercomputer realm.

  15. Mckenzie Cluster, faster, cheaper per TFlop by prof_bart · · Score: 5, Interesting
    Hmmm...

    Nice machine, but this January, CITA and the astro department at the University of Toronto brought a 256 node dual Xenon system on line: "1.2 trillion floating point mathematical operations per second (Tflops) on the standard LINPACK linear algebra benchmark." Total cost: CDN$900K (including tax) (in January prices, that's $600K U.S. or $0.50USD/GFlop.) It's being used for some very cool Astro simulations...

    See http://www.cita.utoronto.ca/webpages/mckenzie

  16. There is Flop and Flop by Tiosman · · Score: 4, Insightful

    It's not the first time that these folks in KY work around the definition of the acronym "Flop". A Flop is a floating point operation on 64 bits, not 32 bits. All entries in the Top500 used results with 64 bits HPL, nobody else in the world is running HPL on 32 bits. So claiming the moon on 32 bits is easy, useless for the sake of comparaison and almost unethical. I cannot believe that Dr Dietz do not know the difference by now.

    The same machine would yield average results on 64 bits. Difficult to draw attention without headline numbers...