Supercomputer Breaks the $100/GFLOPS Barrier
Hank Dietz writes "At the University of Kentucky, KASY0,
a Linux cluster of 128+4 AMD Athlon XP 2600+ nodes, achieved 471 GFLOPS on 32-bit HPL. At a cost of less than $39,500, that makes it the first supercomputer to break $100/GFLOPS. It also is the new record holder for POV-Ray 3.5 render speed.
The reason this 'Beowulf' is so cost-effective is a new network architecture that achieves high performance using standard hardware: the asymmetric Sparse Flat Neighborhood Network (SFNN)." Because this was a university project, KASY0 was assembled entirely by unversity students, which while being a source of cheap labor, is also a good way to get a lot of students of involved in a great project.
And it was introduced to consumers just a couple years
ago. Sorry, the AMD beowulf cluster at $100/GFLOP just
isn't that impressive.
People dont share mp3s anymore, if they do the FBI, NSA, Secret Service, CIA, and Homeland Security Dep will swarm them and put them in the bay.
I mean I wish we could crack down like this on organized crime, or on domestic terrorists, I'm surprised we are so aggressive at arresting teenagers who download music, but the KKK and Neo Nazis can collect a million guns and spread their crazy hate speech and its protected by freedom of speech.
I'd think that hate speech does more harm than copyright infringement.
If you use Linux, please help development of Autopac
I'm guessing the latter. You see all sorts of BSified numbers from marketing departments on processors, but they have little to do with reality. The number for this AMD cluster is a real, actual, measured-using-a-real-world-app number. To give you some idea of BS console numbers, the Xbox has a PIII 733 processor in it (ok, technically it's a little different, but it's a P3 core). Now the Gflop claim is 2.93. Out of a P3 733? Ya right, on paper perhaps but never in the real world, much less on a real app.
Then, of course, there is the issue of specialised chips vs normal chips. A GeForce 4 4400 can claim, roughly, 80 Gflops peak. That sure beats the hell out of any sinlge CPU I've ever heard of, including the Power4. Thing is the GeForce 4 is a graphics DSP, it isn't a general purpose CPU. It can do that kind of math when all its units are working at what they do best, but try to reprogram it to do something else and it will slow to a crawl (for that matter I'm not even sure that it is turing complete).
So don't take any hype on a console to equate to real performance in a general task. Oh, and the BS marketing number I see for the PS2's Emotion Engine is 6.2Gflops.
Looks like most of the wiring jobs I've seen done by students: kasy0core.jpg.
;-)
God forbid they use cable gutters
Other than that, kick ass job guys!
-nate
At the risk of being flamebait- No. Using university students is almost always purely a way of getting cheap labor to do semi-mindless, or completely mindless, stuff the staff doesn't want to do- it's a common myth that students 'learn' by doing grunt work. I should know- I have several grad student friends, and they've thusfar spent a large part of their academic careers working in labs doing mind-numbingly boring stuff(according to them.)
Imagine if a Bio lab did this. The following would sound pretty absurd: "Help us move our lab, you'll learn about cellular recombination!". No. You'll learn what a bunch of lab equipment looks like, how eccentric the professors are, and how expensive/fragile/heavy the equipment is, and the next morning what sore muscles are like. Let's get a reality check here.
(from the site):Our group develops the systems technology for cluster supercomputing; the more people we can show how to apply these technologies, the better.
Huh? What cluster supercomputing "technology" does assembling a PC and plugging it into ethernet teach you? Did they give a presentation about how clustering technology works, for example? Did they explain to each person, as they put a machine in a particular place and wired it to a particular switch, WHY it was going there etc? Obviously I wasn't there, so perhaps someone from the group can contribute on this point.
Please help metamoderate.
Now that the university students have graduated and moved on, there isn't any documentation, nor do they know how to use the darn thing...
-1
It's not the first time that these folks in KY work around the definition of the acronym "Flop". A Flop is a floating point operation on 64 bits, not 32 bits. All entries in the Top500 used results with 64 bits HPL, nobody else in the world is running HPL on 32 bits. So claiming the moon on 32 bits is easy, useless for the sake of comparaison and almost unethical. I cannot believe that Dr Dietz do not know the difference by now.
The same machine would yield average results on 64 bits. Difficult to draw attention without headline numbers...
Granted there might be some heat problems, but judging by their setup, I'm guessing the room is well-cooled.
The sending of this message pretty much inconveniences everyone involved.
Really, it's a spectrum. One one end you have fully commodity beowulf, in the middle, you see things like Dolphin and Myrinet, and on the high end you see fully custom backplanes and sometimes RAM and I/O controllers as well. Purpose built CPUs are becomming less common now, but not unheard of.
Each step up the spectrum widens the domain of problems that the machine can work on efficiently, and raises the price for the machine. In many cases, a 'real' supercomputer is more or less a cluster with a specialized network and OS and mounted in a single cabinet so it doesn't look like a cluster.
In general when a lower end machine can efficiently run your program, there is no benefit to using a more expensive machine.
As server hardware improves and 'exotic' hardware becomes more mainstream, the gap between the low and the high end narrows. There will probably always be a small but existant set of problems that call for the 'real' supercomputer, but that set is shrinking.
There are other considerations as well. If the Beowulf in your lab can solve the problem in 1 week and is available now, while the 'real' supercomputer on the other campus can solve it in 4 hours and will have a timeslot available in 2 weeks, the Beowulf is 'faster' from your point of view.
Wait until computers start shipping with a few FPGA units. Then you can flash a new image onto the FPGA's for each specialized application you use the cluster for.