Linux Clusters Finally Break the TeraFLOP barrier
cworley submitted - several times - this well-linked submission about a slightly boring topic - fast computers. "Top500.org
has just released its latest
list of the world's fastest supercomputers (updated twice yearly). For
the first time, Linux Beowulf clusters
have joined the teraFLOP club, with six new clusters breaking the teraFLOP
barrier. Two Linux clusters now rank in the Top 10: Lawrence Livermore's "MCR" (built by Linux NetworX ) ranks #5 achieving 5.694 teraFLOP/s, and Forecast Systems Laboratory's "Jet" (built by HPTi) ranks #8 reaching
3.337 TeraFLOP/s. Other Linux clusters surpassing the teraFLOP/s barrier
include:
LSU's "SuperMike" at #17 (from Atipa
), the University at Buffalo
at #22 and Sandia National Lab at
#32 (both from Dell ), an Itanium cluster
for British Petroleum Houston at #42 (from HP
), and Argonne National Labs at
#46 (from Linux NetworX ) reached just
over the one teraFLOP/s mark with 361 processors. In the previous Top500 list compiled last June, the fastest Intel based Netfinity 1024 processor clusters from IBM were sub-teraFLOP/s and the University of Heidelberg's AMD based "HELICS" cluster (built by
Megware
) held the top tux rank at #35 with 825 GFLOP/s."
It's going to take me 4 hours to read all of this.
How long until computing powerful enough to render the probability thought patterns of a manager? That's what I want to know..
Bel, the mostly sane.. "Of course I can't see anything! I'm standing on the shoulders of idiots." -- Me
From the first line: cworley submitted - several times
So, is THAT how you get something accepted? Really I don't know if posting that story with that attached to the front of it was such a great idea.....
Now everyone who submits a story that they think is good, should it get rejected, they will simply submit like twenty copies of it....
What a pain for the poor editors.... Really I question the wisdom of telling us this works....
a single node from one of these clusters?
(hey what else can I say, it's already a cluster)
I have often wondered how long it takes to boot one of these things. In the HP-UX world I know how long it takes for a K class (sometimes more than 20 minutes). Superdomes are sometimes faster, but not by much.
Semper ubi sub ubi
1 NEC Earth-Simulator 35860.00
2 Hewlett-Packard 7727.00 Los Alamos
The distance from the first to the second is pretty impressive. What on earth did NEC really do over there?
HTTP/1.1 400
Is there a way to tell how many FLOPS my linux machine gets. I always wondered.
If all this should have a reason, we would be the last to know.
Comment removed based on user account deletion
First you find out how many FLOPS your computer is capable of, then multiply by the % of cpu load (over 100) and the number of second.
Why don't they write it: FLOP/s?
LedgerSMB: Open source Accounting/ERP
I built a small Beowulf cluster. It was actually very easy, apart from writing the MPI enabled code.
./your-prog
;)
;))
Step 1: Install the lam packages on all the nodes
Step 2: Create an account on all nodes, and use a passphrase-less ssh key to avoid prompting.
Step 3: Compile your code with mpicc (rather than gcc)
Step 4: Copy to all nodes.
Step 5: mpirun C
Admittedly it was only a 4 node cluster, but hey
Please, someone break it to me gently if this wasn't actually a Beowulf cluster
Get your own free personal location tracker
Is that enough links there? Glad this isn't that impressive to me.
Will we be able to slashdot every one of them though? PErhaps someone should post some mirrors
No, FLOPSs is How fast it computes when dropped out of a window.
You think that I'm crazy, you should see this guy!
now why not try using macs for your supercomputers?
I know that they arn't as scalable
I think you answered your own question there.
Read it again. What does it say? EARTH-SIMULATOR
It's gonna take some CPU power to simulate earth, don't you think??
Impressive numbers. I suggest you go take a look at that hardware that runs the Earth Simulator (#1 on the top 500 list). That flash movie is impressive. .. But don't forget that you got a helluva lot faster CPU inside your head - your brains beat all that expensive hardware all the way.
----
Since nobody is answering your question: The Top500 supercomputers are ranked by the results of the LinPack benchmark.
--- Hindsight is 20/20, but walking backwards is not the answer.
While most people seem to be complaining about the number of links in the story, if history is any indicator, 90% of people won't click on one of those links, let alone all of them.
Overrated / Underrated : Moderation
This is not such a dumb question. The LinuxBIOS project was started by and for the Los Alamos National Lab. One of the nifty things this allows them to do is change Kernel without taking the machines down. You can then switch to a kernel compiled for different purposes.
Help fight continental drift.
I hope none of those super computers was the webserver or else it's just the top 499 now. :p
Slashdot comments can be accurate, highly modded, or posted quickly. Pick two.
Ah, that would be because Apples 'supercomputer on the desktop' marketing drivel was just that.
Hell, the Sony Playstation 2 was subject to export restrictions because it was 'too powerful', which was driven by/followed with the requisite marketing drivel, but you don't see and PS2 clusters in the 'Worlds fastest supercomputer' list either.
It has been a long time since Apple PPC was competitive in terms of price/performance with x86s. Of course thats not the only reason to buy a computer, i don't want to get the apple-zealots panties in a bunch.
It's just that Intel/AMD didn't make a song and dance about breaking the GFLOP barrier, since that happened way back with the P3/Athlon 600-800, hardly cutting edge chips.
Hell, a 600Mhz Alpha had GFLOP performance years before either the G4 or the x86s.
The PPC has a nice vector processing unit (Altivec), which could make it a good choice in some situations, but given the premium you pay for Beowulf nodes (Xserves?) from Apple, you will, in general, get a lot more bang for the buck from x86.
I gots ta ding a ding dang my dang a long ling long
A real supercomputer supports much faster I/O, higher interconnection bandwidth and lower interconnection latency.
And btw. the new Cray X1 delivers the performance of a all but the largest linux-clusters in a single cabinet (820 GFlops peak that is..). In terms of computing efficiency it makes even the Earth Simulator look pale. I am really looking forward to the next iteration of the TOP500, when the first X1 machines are included.
They don't have the kind of memory bandwidth these systems need. With AltiVec, a G4 can indeed get a huge gigaflop number, but SIMD floating point takes up a lot of data (with 128 bit SIMD, 20 bytes per 4 operations) and the G4's memory bus runs at a paltry 1.3 GB/sec (compared to 4.2 GB/sec for a P4). Feeding the G4's AltiVec units at full speed requires 20 GB/sec of bandwidth, so once your dataset falls out of the 256K of L2 cache (which these scientific computing applications surely do) the G4 chokes. Besides, AltiIVec doesn't do double precision floating point, whic is necessary for this sort of thing.
A deep unwavering belief is a sure sign you're missing something...
As when other barriers are broken, a bit of a shock wave was created.
Windows machines for miles around were rattled.
Actually, Mac's are used in super computer clusters. JPL has an intresting benchmaark of 33 Xserves. They get 1/5th of a TeraFLOP of performance. Not bad, considering how cheap they are.
Why don't they write it: FLOP/s?
Because FLOPS means FLoating point Operations Per Second
'/' means 'per'.
FLOP/s would mean FLoating point Operations Per Per Second
FLO/s doesn't seem like a very good idea, except for cleaning your teeth.
Are there any Microsoft Windows-based systems that qualify as supercomputers?
(This is a serious question, I have no idea if they do or do not.)
This collection of links failed to mention that the #1 computer is an "Earth Simulator." How kewl is that! Reminds me of the book _Earth_ by David Brinn.
M@
Krispy Cream is people
Did I miss the sarcasm tags on the "slightly boring" comment or something? I think there's a large audience on slashdot who are all very excited about high speed computing. Overclockers aside, I know I hate waiting for a compile.
Latley though, I feel the things I'm waiting for my computer are not a function of how fast the CPU can run, but how poorly the software is written. Can someone can tell me why my windoze machines sometimes block for up to a min when I try to click the "Location" box on the top of the file browser common dialog control? Or the oft-complained about boot time for most everything? Or the time it takes almost any program to load up the first time you load it?
Anyone else think it's time to start over, and not just assume the fater and faster machines can deal with the laziness we program into the systems we build?
M@
Krispy Cream is people
- The weatherman is usually wrong.
- Aliens are abducting us. We need to send radio signals to Fife, Alabama, not out into space.
- Unified Theory is based on Heisenburg's stuff... You can have relativity and quantum mechanics... but not both at the same time. Damn, that guy was a genius. By the way, the unified theory is:
Of course, I'm sure Doom3 has this somewhere in its source code, so ummm... go crunch 40 TFLOPS on thate = 42; // always 42.
</humor>
/^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$/i
check out the report on our NetBSD cluster which would easily scale to many nodes.
It's just a question of proper application software, and OS doesn't really matter - I can't understand all this fuzz about Linux. *shrug*
"It's just that Intel/AMD didn't make a song and dance about breaking the GFLOP barrier..."
I don't know 'bout AMD, but Intel has these funny BunnyPeople to promote anything from breaking speed limits to new processors as shown here. So contrary to what you believe, yes Intel does make a song and dance(plus commercial) about [insert_marketing_gibberish_here]!
Myrinet Software. Not only does it support Windows plus a whole range of *NIXes.
They did. And it seems to be missing from the Top 500 list. According to this, 33 XServes reached 217 GFlops/sec. Now, according to Apple, they should be able to reach a much higher speed than this (roughly twice the performance they actually got), but part of the reason might be that they used 100BaseT instead of Gigabit, and theoretical != real world anyway. This earlier cluster of 76 G4's even acheived higher results. JPL found Macs to be "capable of excellent scalability in performance. "
I really thought there would be more Microsoft on the Top 500 Super Computer list, just as a matter of honor and homage to the Chief Software Architect.
:) What a lot of information, thanks for the great article!
Looking at the list, we can see that Super Computers Prefer ANYTHING BUT Microsoft, 499 to 1. I tried to find out more about the "1", but it has been encrypted by Seoul National University using a character set "charset=euc-kr". If anyone has more info on it, please post it in english.
I wonder when Steve Jobs will get a MAC cluster on this list
According to the SETI@HOME stats page, SETI is running about 45 TFLOPS, which is slightly ahead of the Earth Simulator's 40 TFLOPS or the LANL 10 TFLOPS machines. This isn't real precise - Top500 uses Linpack as their benchmark, which is a lot more realistic and controlled than SETI, so your mileage may vary. And of course that's Today's measurement from SETI, which is fairly variable in its CPU speed.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
These are single precision FLOPS on some apple fractal program optimized for Altivec and undoubtedly embarassingly parallel.
The top500 list is based on double-precision linpack scores. This cluster would not score anywhere near that level on the top500 test because Altivec doesn't do double precision, so you use the regular scalar FPU. Furthermore, you need a fairly fast interconnect to get a good fraction of theoretical peak on linpack, so I would estimate that this cluster wouldn't get more than 40 gflops or so in the top500 test.
P4s can do a double precision vector, and as a result, they get much better linpack scores in a similarly equipped cluster, and for far less money. This is why you don't see big clusters being built out of macs.