Slashdot Mirror


Supercomputer Breaks the $100/GFLOPS Barrier

Hank Dietz writes "At the University of Kentucky, KASY0, a Linux cluster of 128+4 AMD Athlon XP 2600+ nodes, achieved 471 GFLOPS on 32-bit HPL. At a cost of less than $39,500, that makes it the first supercomputer to break $100/GFLOPS. It also is the new record holder for POV-Ray 3.5 render speed. The reason this 'Beowulf' is so cost-effective is a new network architecture that achieves high performance using standard hardware: the asymmetric Sparse Flat Neighborhood Network (SFNN)." Because this was a university project, KASY0 was assembled entirely by unversity students, which while being a source of cheap labor, is also a good way to get a lot of students of involved in a great project.

27 of 281 comments (clear)

  1. Wow! by fryguy451 · · Score: 5, Funny

    Imagine a Beowu... errr... Oh..

  2. Let the Beowulf cluster jokes begin! by Anonymous Coward · · Score: 5, Funny

    Note to moderators, Beowulf cluster jokes CANNOT be offtopic.

    Imagine a Beowulf cluster of Beowulf cluster jokes!

  3. Also I wonder by HanzoSan · · Score: 5, Interesting



    How much electricity will these super computers use up?

    All those wires, it looks like it takes up alot of juice.

    --
    If you use Linux, please help development of Autopac
  4. Let's not get too excited.... by CGP314 · · Score: 5, Funny

    Supercomputer Breaks the $100/GFLOPS Barrier

    Not after you factor in the SCO license fees.

  5. It's a university project by Anonymous Coward · · Score: 3, Funny

    Remember, everyone, this was a university project. *BSD was also a university project originally, and now *BSD is dying. So obviously university projects are not of very high quality.

  6. Asymmetric Sparse Flat Neighborhood Network by FreeLinux · · Score: 5, Interesting

    Obviously, I don't get it. This doesn't look any different than redundant backbones or what is frequently done with VLANs. Multiple paths between hosts is what I see. How is this "new"?

    1. Re:Asymmetric Sparse Flat Neighborhood Network by flymolo · · Score: 5, Informative

      Due to "creative" (computed) wiring, if all switchs are functioning, no node is more than one hop from each other node. This requires a routing table written for each pc. It could be used for redunancy, but it is being used to minimize latency, and collisions, which are both killers in clusters.

      --
      "Sometimes it's hard to tell the dancer from the dance." --Corwin Of Amber in CoC
    2. Re:Asymmetric Sparse Flat Neighborhood Network by Rich+Dougherty · · Score: 3, Informative

      Here's a quote from the site:

      Does The World Need Yet Another Network Topology?

      One would think (well, we did ;-) that the latest round of Gb/s network hardware would have made the design of a high-bandwidth cluster network a trivial exercise. However, that isn't the case when the prices are considered:

      • When we invented FNNs in 2000, the cheapest of the Gb/s NICs available were PCI Ethernet cards priced under $300 each; now they are $50-$100. Prices have continued to drop. Prices on custom high-performance NICs (e.g., Myrinet) start at close to $1000 and have not been going down.
      • In late 2002, 48-port 100Mb/s Fast Ethernet switches have dropped to less than $25/port. Gigabit Ethernet switches are starting to follow the same trend, with $100/port pricing in sight for switches up to about 48 ports. Wider switches with the needed performance are unlikely to become cheap in the near future. Thus, it would be necessary to build a heirarchical switch fabric using multiple layers of switches, yielding higher cost, higher latency, and significantly lower bisection bandwidth (unless you use a "fat tree" or other scheme, which adds still more expense -- especially because cheap layer 2 Ethernet switches don't support those topologies).

      In summary, the cost of the "obvious" Gb/s network for KLAT2's 66 single-processor nodes was OVER 30 TIMES the cost of the network we built for KLAT2. In fact, to match KLAT2's bisection bandwidth, a network built using Gb/s hardware would have cost even more. Gigabit Ethernet is getting cheaper, but obvious topologies just are not competitive with FNN performance. So, if you've got tons of money that you have to spend immediately, you can impress your friends by buying expensive custom network hardware that can use an obvious topology and still be competitive with FNN performance. Otherwise, read on.... ;-)

  7. Students as Slave Labor by gremlin_591002 · · Score: 5, Funny

    Ponders while there are not University students pictures in the National Geographic Article on Slavery....

  8. Playstation2 at 5.5GFLOPS costs only $199 $40/GFL by gorim · · Score: 4, Insightful

    And it was introduced to consumers just a couple years
    ago. Sorry, the AMD beowulf cluster at $100/GFLOP just
    isn't that impressive.

  9. So much power... by krahd · · Score: 5, Funny

    and it still can't run Doom III at a decent rate.

    --krahd

    mod me up, scottie!

    --
    mod me up scottie!
  10. Re:The burning question by HanzoSan · · Score: 3, Insightful



    People dont share mp3s anymore, if they do the FBI, NSA, Secret Service, CIA, and Homeland Security Dep will swarm them and put them in the bay.

    I mean I wish we could crack down like this on organized crime, or on domestic terrorists, I'm surprised we are so aggressive at arresting teenagers who download music, but the KKK and Neo Nazis can collect a million guns and spread their crazy hate speech and its protected by freedom of speech.

    I'd think that hate speech does more harm than copyright infringement.

    --
    If you use Linux, please help development of Autopac
  11. cable management by HBI · · Score: 3, Interesting

    What a mess of cables! I understand they were hitting a price point, but would it have killed them to spring $500 or so for a cable management system?

    There's something professional looking about having the cables look neat. On the other hand, maybe i'm just anal about things.

    --
    HBI's Law: Frequency of calling others Nazis is directly correlated with the likelihood of the accuser being Communist.
  12. Cooling by bengoerz · · Score: 4, Informative

    I toured the previous cluster these guys did (KLAT2) and was very impressed. However, using AMD Athlon Thunderbirds last time, it did get quite hot. I remember standing by the cluster looking at all the wiring and being bombarded by an overhead cooling vent. I'm also assuming that these cooling issues is the reason that each case has two blow-holes. I'd also like to see these guys post in-depth specs of each machine. Being a hardware nut, I'd like to see how they got so many machines so cheap, and maybe even what vender they used. As I remember, they worked REALLY hard on their last cluster to keep costs to an absolute minimum.

  13. Is that a real number or a marketing number? by Sycraft-fu · · Score: 5, Insightful

    I'm guessing the latter. You see all sorts of BSified numbers from marketing departments on processors, but they have little to do with reality. The number for this AMD cluster is a real, actual, measured-using-a-real-world-app number. To give you some idea of BS console numbers, the Xbox has a PIII 733 processor in it (ok, technically it's a little different, but it's a P3 core). Now the Gflop claim is 2.93. Out of a P3 733? Ya right, on paper perhaps but never in the real world, much less on a real app.

    Then, of course, there is the issue of specialised chips vs normal chips. A GeForce 4 4400 can claim, roughly, 80 Gflops peak. That sure beats the hell out of any sinlge CPU I've ever heard of, including the Power4. Thing is the GeForce 4 is a graphics DSP, it isn't a general purpose CPU. It can do that kind of math when all its units are working at what they do best, but try to reprogram it to do something else and it will slow to a crawl (for that matter I'm not even sure that it is turing complete).

    So don't take any hype on a console to equate to real performance in a general task. Oh, and the BS marketing number I see for the PS2's Emotion Engine is 6.2Gflops.

  14. Here is the bill! by borgdows · · Score: 5, Funny

    Dear customer,

    At the cheap introductory price of 699$ for 80 lines of code in the Linux kernel, it will cost you 8,377,500$ by kernel since we have discovered that in fact 1000000 lines of SCO IP were copied into Linux.

    Designation .. Price .. Qty .. Total
    Linux kernel .. 8,377,500$ .. 128 .. 1,118,400,000$

    So you must pay us only 1,118,400,000$, and in my kind almighty I will offer you a discount of 118,400,000$ so you only have to pay ONE BILLION DOLLAR if you pay before tomorrow!

    Please send you creditcard number at darl@sco.com

    Sincerely yours,

    -- Darl Mac Bride

  15. Wrong by imsabbel · · Score: 3, Informative

    In reality, beowolf clusters are good for only a subset of supercomputing tasks and the "real" supercomputers are still best at general purpose supercomputing.

    If you can paralize your application well enough, beowoulf rules, but if you need a lot of node2node communication, the network cost quickly surpasses the cpu cost of the system

    --
    HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
    1. Re:Wrong by sjames · · Score: 3, Insightful

      Really, it's a spectrum. One one end you have fully commodity beowulf, in the middle, you see things like Dolphin and Myrinet, and on the high end you see fully custom backplanes and sometimes RAM and I/O controllers as well. Purpose built CPUs are becomming less common now, but not unheard of.

      Each step up the spectrum widens the domain of problems that the machine can work on efficiently, and raises the price for the machine. In many cases, a 'real' supercomputer is more or less a cluster with a specialized network and OS and mounted in a single cabinet so it doesn't look like a cluster.

      In general when a lower end machine can efficiently run your program, there is no benefit to using a more expensive machine.

      As server hardware improves and 'exotic' hardware becomes more mainstream, the gap between the low and the high end narrows. There will probably always be a small but existant set of problems that call for the 'real' supercomputer, but that set is shrinking.

      There are other considerations as well. If the Beowulf in your lab can solve the problem in 1 week and is available now, while the 'real' supercomputer on the other campus can solve it in 4 hours and will have a timeslot available in 2 weeks, the Beowulf is 'faster' from your point of view.

  16. How many university have larger clusters? by SilverSun · · Score: 5, Interesting

    I wonder which universities/institutes have larger and maybe cheaper clusters, but just don't bother with running benchmarks. I for one are sitting next next to a tiny cluster with 40 dual-cpu nodes, which is connected (GRID like) to a 340 dual-node cluster in a nearby town. Non of us high ernergy physicists bothers with running any benchmarks on our clusters, other than our own applications. I wonder how many "linux-cluster-supercomputers" are out there which would easyly make it into the top 500, but noone has ever heard of....

    Cheers.

    --

    KdenLive/PIAVE - non-linear video editing

    1. Re:How many university have larger clusters? by Chilles · · Score: 3, Funny

      I wonder how many "linux-cluster-supercomputers" are out there which would easyly make it into the top 500, but noone has ever heard of....

      Well... probably more than one, definitely no more than 500.

  17. University students by SuperBanana · · Score: 5, Insightful
    Because this was a university project, KASY0 was assembled entirely by unversity students, which while being a source of cheap labor, is also a good way to get a lot of students of involved in a great project.

    At the risk of being flamebait- No. Using university students is almost always purely a way of getting cheap labor to do semi-mindless, or completely mindless, stuff the staff doesn't want to do- it's a common myth that students 'learn' by doing grunt work. I should know- I have several grad student friends, and they've thusfar spent a large part of their academic careers working in labs doing mind-numbingly boring stuff(according to them.)

    Imagine if a Bio lab did this. The following would sound pretty absurd: "Help us move our lab, you'll learn about cellular recombination!". No. You'll learn what a bunch of lab equipment looks like, how eccentric the professors are, and how expensive/fragile/heavy the equipment is, and the next morning what sore muscles are like. Let's get a reality check here.

    (from the site):Our group develops the systems technology for cluster supercomputing; the more people we can show how to apply these technologies, the better.

    Huh? What cluster supercomputing "technology" does assembling a PC and plugging it into ethernet teach you? Did they give a presentation about how clustering technology works, for example? Did they explain to each person, as they put a machine in a particular place and wired it to a particular switch, WHY it was going there etc? Obviously I wasn't there, so perhaps someone from the group can contribute on this point.

    1. Re:University students by panda · · Score: 4, Informative

      Having worked there, and knowing what Hank Dietz and his students are doing, I can tell you that it is different from just slapping PCs together, stringing wire between them and installing clustering software.

      Dietz specializes in networking and all the wiring that you see in the photos is charted out by custom software that he's written just for this purpose.

      He works in the realm of optimizing communications among the nodes to avoid network latency and so on. If you read the POVRay benchmarks, you'll notice that the author comments that several clusters' CPUs spend most of their time idle due to network latency. Dietz is researching the best ways to eliminate much of that latency so that the CPUs in the cluster can spend more of their time crunching data rather than just throwing off heat. To my knowledge, he is succeeding at this and better than most other researchers in the field.

      As for what his students learned from this, I don't know exactly which students helped him on this. For KLAT2, there were several undergrad volunteers who helped with wiring and assembly, mostly from the campus Linux Users' Group. I know his grad students and research assistants are learning a lot about how clustering and network tech works, and a couple are doing their Ph.D. disserts in this very subfield of E.E.

      --
      Just be sure to wear the gold uniform when you beam down -- you know what happens when you wear the red one.
  18. why not DSP? by mike_g · · Score: 4, Interesting
    Why are not DSPs used in configurations such as this. The TI 67xx series are able to perform about 1 GFLOP/s running at only 150 MHz and cost only about $40 per chip.

    This price/performance ratio seems to make them very attractive compared to general purpose CPUs. According to the NASA G5 Study, the P4 2.66 GHz is only able to achieve 255 MFLOP/s. And the P4 costs about 4x the price of the 6711 DSP.

    It seems that DSPs should be the clear winner in supercomputer applications, what are their disadvantages and why are they not used? Granted there is a lack of mass produced hardware such as motherboards for DSPs, but that alone should not exclude them from the supercomputer realm.

    1. Re:why not DSP? by SmackCrackandPot · · Score: 3, Informative

      Actually, they do, but they are referred to as vector processors rather than DSP's. Probably the most famous and the first was the Cray supercomputer. And there was also the INMOS "Transputer"

      DSP's are optimised to handle streamed data of a particular maximum size (Eg. 4-element float point variables). Useful for image processing (red,green,blue,alpha) and 3D graphics(XYZW), but if you're modelling something like ocean currents, global weather, every data element is more than likely going to have more than four variables (eg. temperature, humidity, velocity, pressure, salinity, ground temperature), you may not get full optimisation.

      Plus, you also need a means of getting all these processors to talk to each other. DSP's are nearly always optimised to operate in single pipelines, so don't need much communication support (eg. Sony Playstation 2). However, if you're designing a supercomputer system, the major bottleneck is the communication between processors (network topology). Some applications might only need adjacent processors to talk to each other (global weather simulation usually represents the atmosphere as a single large block of air, with sub-blocks assigned to seperate processors. Other applications might assign individual processors to different tasks, which complete at different rates (eg. the Mandelbrot set). A configurable network architecture allows the system to be used for many more different applications.

  19. Mckenzie Cluster, faster, cheaper per TFlop by prof_bart · · Score: 5, Interesting
    Hmmm...

    Nice machine, but this January, CITA and the astro department at the University of Toronto brought a 256 node dual Xenon system on line: "1.2 trillion floating point mathematical operations per second (Tflops) on the standard LINPACK linear algebra benchmark." Total cost: CDN$900K (including tax) (in January prices, that's $600K U.S. or $0.50USD/GFlop.) It's being used for some very cool Astro simulations...

    See http://www.cita.utoronto.ca/webpages/mckenzie

  20. There is Flop and Flop by Tiosman · · Score: 4, Insightful

    It's not the first time that these folks in KY work around the definition of the acronym "Flop". A Flop is a floating point operation on 64 bits, not 32 bits. All entries in the Top500 used results with 64 bits HPL, nobody else in the world is running HPL on 32 bits. So claiming the moon on 32 bits is easy, useless for the sake of comparaison and almost unethical. I cannot believe that Dr Dietz do not know the difference by now.

    The same machine would yield average results on 64 bits. Difficult to draw attention without headline numbers...

  21. What about $170K by Axe · · Score: 3, Funny
    That they own to SCO, that damn commies? Did they at least aknowledge using stolen property?

    What a shame. Freeloaders. They would never be able to achieve such performance if not for the fruits of labour of SCO .. eeeh.. lawers?

    --
    <^>_<(ô ô)>_<^>