Supercomputer Breaks the $100/GFLOPS Barrier
Hank Dietz writes "At the University of Kentucky, KASY0,
a Linux cluster of 128+4 AMD Athlon XP 2600+ nodes, achieved 471 GFLOPS on 32-bit HPL. At a cost of less than $39,500, that makes it the first supercomputer to break $100/GFLOPS. It also is the new record holder for POV-Ray 3.5 render speed.
The reason this 'Beowulf' is so cost-effective is a new network architecture that achieves high performance using standard hardware: the asymmetric Sparse Flat Neighborhood Network (SFNN)." Because this was a university project, KASY0 was assembled entirely by unversity students, which while being a source of cheap labor, is also a good way to get a lot of students of involved in a great project.
Imagine a Beowu... errr... Oh..
Note to moderators, Beowulf cluster jokes CANNOT be offtopic.
Imagine a Beowulf cluster of Beowulf cluster jokes!
How much electricity will these super computers use up?
All those wires, it looks like it takes up alot of juice.
If you use Linux, please help development of Autopac
gigaflop
As a measure of computer speed, a gigaflop is a billion floating-point operations per second (FLOPS).
you may find the Higgs in this signature.
Supercomputer Breaks the $100/GFLOPS Barrier
Not after you factor in the SCO license fees.
Remember, everyone, this was a university project. *BSD was also a university project originally, and now *BSD is dying. So obviously university projects are not of very high quality.
Obviously, I don't get it. This doesn't look any different than redundant backbones or what is frequently done with VLANs. Multiple paths between hosts is what I see. How is this "new"?
Ponders while there are not University students pictures in the National Geographic Article on Slavery....
but super computers as in giant iron are becoming more specialized and as such would woop the pants off a Beowulf cluster when competing in the specialty.
of course, if you just need a lot of general purpose super computing, it is obvious that you cannot compete with this.
I am the Alpha and the Omega-3
And it was introduced to consumers just a couple years
ago. Sorry, the AMD beowulf cluster at $100/GFLOP just
isn't that impressive.
though is how many mp3's are these students sharing on this monster ?
Do not try to read the dupe, thats impossible. Instead, only try to realize the truth
What truth?
There is no dupe
each node has two side case fans! that's gotta be the most dedicated case modding job i've ever seen! 132 pc's with 2 fans! too bad they didn't put fan guards ... or interior lights.. or blue led's... but i guess all that junk about a supercomputer makes up for it...
and it still can't run Doom III at a decent rate.
--krahd
mod me up, scottie!
mod me up scottie!
What a mess of cables! I understand they were hitting a price point, but would it have killed them to spring $500 or so for a cable management system?
There's something professional looking about having the cables look neat. On the other hand, maybe i'm just anal about things.
HBI's Law: Frequency of calling others Nazis is directly correlated with the likelihood of the accuser being Communist.
I toured the previous cluster these guys did (KLAT2) and was very impressed. However, using AMD Athlon Thunderbirds last time, it did get quite hot. I remember standing by the cluster looking at all the wiring and being bombarded by an overhead cooling vent. I'm also assuming that these cooling issues is the reason that each case has two blow-holes. I'd also like to see these guys post in-depth specs of each machine. Being a hardware nut, I'd like to see how they got so many machines so cheap, and maybe even what vender they used. As I remember, they worked REALLY hard on their last cluster to keep costs to an absolute minimum.
I'm guessing the latter. You see all sorts of BSified numbers from marketing departments on processors, but they have little to do with reality. The number for this AMD cluster is a real, actual, measured-using-a-real-world-app number. To give you some idea of BS console numbers, the Xbox has a PIII 733 processor in it (ok, technically it's a little different, but it's a P3 core). Now the Gflop claim is 2.93. Out of a P3 733? Ya right, on paper perhaps but never in the real world, much less on a real app.
Then, of course, there is the issue of specialised chips vs normal chips. A GeForce 4 4400 can claim, roughly, 80 Gflops peak. That sure beats the hell out of any sinlge CPU I've ever heard of, including the Power4. Thing is the GeForce 4 is a graphics DSP, it isn't a general purpose CPU. It can do that kind of math when all its units are working at what they do best, but try to reprogram it to do something else and it will slow to a crawl (for that matter I'm not even sure that it is turing complete).
So don't take any hype on a console to equate to real performance in a general task. Oh, and the BS marketing number I see for the PS2's Emotion Engine is 6.2Gflops.
Dear customer,
.. Price .. Qty .. Total .. 8,377,500$ .. 128 .. 1,118,400,000$
At the cheap introductory price of 699$ for 80 lines of code in the Linux kernel, it will cost you 8,377,500$ by kernel since we have discovered that in fact 1000000 lines of SCO IP were copied into Linux.
Designation
Linux kernel
So you must pay us only 1,118,400,000$, and in my kind almighty I will offer you a discount of 118,400,000$ so you only have to pay ONE BILLION DOLLAR if you pay before tomorrow!
Please send you creditcard number at darl@sco.com
Sincerely yours,
-- Darl Mac Bride
A playstation2 costs $199. That information is in your local newspaper. Actually, sales peg it at $179 lately, my mistake. The playstation2, with 2 vector processing units, each with 4 floats wide registers (128bit), capable of doing a multiply-add operation per clock cycle on whole registers, at 300mhz independant of the main CPU which still has its own scalar floating point coproc, handily does 5.5GFLOPS, and is well documented as such if you google around. Check out http://playstation2-linux.com/
Looks like most of the wiring jobs I've seen done by students: kasy0core.jpg.
;-)
God forbid they use cable gutters
Other than that, kick ass job guys!
-nate
These numbers for microprocessors etc mean nothing because they are usually referring to operations on data in cache.. you'ill find that real life performance is 10-20x slower because thats how much slower accessing main memory is.
In reality, beowolf clusters are good for only a subset of supercomputing tasks and the "real" supercomputers are still best at general purpose supercomputing.
If you can paralize your application well enough, beowoulf rules, but if you need a lot of node2node communication, the network cost quickly surpasses the cpu cost of the system
HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
I wonder which universities/institutes have larger and maybe cheaper clusters, but just don't bother with running benchmarks. I for one are sitting next next to a tiny cluster with 40 dual-cpu nodes, which is connected (GRID like) to a 340 dual-node cluster in a nearby town. Non of us high ernergy physicists bothers with running any benchmarks on our clusters, other than our own applications. I wonder how many "linux-cluster-supercomputers" are out there which would easyly make it into the top 500, but noone has ever heard of....
Cheers.
KdenLive/PIAVE - non-linear video editing
At the risk of being flamebait- No. Using university students is almost always purely a way of getting cheap labor to do semi-mindless, or completely mindless, stuff the staff doesn't want to do- it's a common myth that students 'learn' by doing grunt work. I should know- I have several grad student friends, and they've thusfar spent a large part of their academic careers working in labs doing mind-numbingly boring stuff(according to them.)
Imagine if a Bio lab did this. The following would sound pretty absurd: "Help us move our lab, you'll learn about cellular recombination!". No. You'll learn what a bunch of lab equipment looks like, how eccentric the professors are, and how expensive/fragile/heavy the equipment is, and the next morning what sore muscles are like. Let's get a reality check here.
(from the site):Our group develops the systems technology for cluster supercomputing; the more people we can show how to apply these technologies, the better.
Huh? What cluster supercomputing "technology" does assembling a PC and plugging it into ethernet teach you? Did they give a presentation about how clustering technology works, for example? Did they explain to each person, as they put a machine in a particular place and wired it to a particular switch, WHY it was going there etc? Obviously I wasn't there, so perhaps someone from the group can contribute on this point.
Please help metamoderate.
I mean these things are Athlons! Heck, they're saving money just from the fact that they'll never have to turn on the furnace again!
Did you guys notice from the pics that there doesn't seem to be any fans in the holes on the sides? Are they crazy? These are Athlons. I hope they put enough fans in those things.
My journal has hot
This price/performance ratio seems to make them very attractive compared to general purpose CPUs. According to the NASA G5 Study, the P4 2.66 GHz is only able to achieve 255 MFLOP/s. And the P4 costs about 4x the price of the 6711 DSP.
It seems that DSPs should be the clear winner in supercomputer applications, what are their disadvantages and why are they not used? Granted there is a lack of mass produced hardware such as motherboards for DSPs, but that alone should not exclude them from the supercomputer realm.
128+4...
That's like 132 isn't it?
From the FAQ:
KASY0's configuration is:
128 + 4 "cold spare" PC nodes, each containing:
One AMD Athlon XP 2600+ (the 2.075GHz version)
One 512MB PC2700 DDR SDRAM
BioStar M7VIT Pro motherboard
Two Linksys LNE100TX NICs
Codegen 6042L case with 400W power supply
18 BenQ SE0024 24-port Fast Ethernet switches
405 Cat5 Fast Ethernet cables
RedHat Linux 9.0, modified Warewulf 1.11
So it's 128, the other 4 are spares!
I'm a chainsmokin' alcoholic sociopath, so-ci-o-path
Nice machine, but this January, CITA and the astro department at the University of Toronto brought a 256 node dual Xenon system on line: "1.2 trillion floating point mathematical operations per second (Tflops) on the standard LINPACK linear algebra benchmark." Total cost: CDN$900K (including tax) (in January prices, that's $600K U.S. or $0.50USD/GFlop.) It's being used for some very cool Astro simulations...
See http://www.cita.utoronto.ca/webpages/mckenzie
Gah feel free to mod the previous version of this comment into oblivion, I hit submit accidentally.
The numbers you're looking at are marketing numbers first off, and overly generous. Second you don't scale for free - you never get anything like 100 times the performance of a single box when you wire 100 together, for the same reason that you don't get twice the horsepower out of an engine twice the size.
The previous price/performance champ was in fact a PS/2 cluster, mentioned here, but this AMD cluster is roughly three times the performance for the dollar. You can check the stats with different assumptions on their FAQ page, particularly the section labeled 'Is KASY0 really the first supercomputer under $100/GFLOPS?'
=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Friends don't let friends enable ecmascript.
It's not the first time that these folks in KY work around the definition of the acronym "Flop". A Flop is a floating point operation on 64 bits, not 32 bits. All entries in the Top500 used results with 64 bits HPL, nobody else in the world is running HPL on 32 bits. So claiming the moon on 32 bits is easy, useless for the sake of comparaison and almost unethical. I cannot believe that Dr Dietz do not know the difference by now.
The same machine would yield average results on 64 bits. Difficult to draw attention without headline numbers...
Good luck getting a beowulf cluster with that crap. Ethernet is not a good interconnect technology. It's not even a good networking technology. And interconnect technology is the main performance-determining factor with a beowulf cluster.
Anyway, if you think you can do better with PS2s, why don't you do so?
Granted there might be some heat problems, but judging by their setup, I'm guessing the room is well-cooled.
The sending of this message pretty much inconveniences everyone involved.
MOSIX is a parallel cluster operating system based on Linux that can run on nodes of different speeds. They all need to be the same platform, though -- you can't mix Sparc and Intel for example.
Other supercomputer applications are written specifically under the assumption that all nodes are the same speed, they are linked together a certain way, etc. It all depends on the application.
What a shame. Freeloaders. They would never be able to achieve such performance if not for the fruits of labour of SCO .. eeeh.. lawers?
<^>_<(ô ô)>_<^>