Linux Clusters Finally Break the TeraFLOP barrier
cworley submitted - several times - this well-linked submission about a slightly boring topic - fast computers. "Top500.org
has just released its latest
list of the world's fastest supercomputers (updated twice yearly). For
the first time, Linux Beowulf clusters
have joined the teraFLOP club, with six new clusters breaking the teraFLOP
barrier. Two Linux clusters now rank in the Top 10: Lawrence Livermore's "MCR" (built by Linux NetworX ) ranks #5 achieving 5.694 teraFLOP/s, and Forecast Systems Laboratory's "Jet" (built by HPTi) ranks #8 reaching
3.337 TeraFLOP/s. Other Linux clusters surpassing the teraFLOP/s barrier
include:
LSU's "SuperMike" at #17 (from Atipa
), the University at Buffalo
at #22 and Sandia National Lab at
#32 (both from Dell ), an Itanium cluster
for British Petroleum Houston at #42 (from HP
), and Argonne National Labs at
#46 (from Linux NetworX ) reached just
over the one teraFLOP/s mark with 361 processors. In the previous Top500 list compiled last June, the fastest Intel based Netfinity 1024 processor clusters from IBM were sub-teraFLOP/s and the University of Heidelberg's AMD based "HELICS" cluster (built by
Megware
) held the top tux rank at #35 with 825 GFLOP/s."
It's going to take me 4 hours to read all of this.
How long until computing powerful enough to render the probability thought patterns of a manager? That's what I want to know..
Bel, the mostly sane.. "Of course I can't see anything! I'm standing on the shoulders of idiots." -- Me
could anyone point me to a windows based utility that allows me to see how many FLOPs my home computer is doing?
From the first line: cworley submitted - several times
So, is THAT how you get something accepted? Really I don't know if posting that story with that attached to the front of it was such a great idea.....
Now everyone who submits a story that they think is good, should it get rejected, they will simply submit like twenty copies of it....
What a pain for the poor editors.... Really I question the wisdom of telling us this works....
a single node from one of these clusters?
(hey what else can I say, it's already a cluster)
in anticipation of the barrage of beowulf cliches:
Imagine all that power in a single computer with a single processor!
I know, I'm cheezy
I have often wondered how long it takes to boot one of these things. In the HP-UX world I know how long it takes for a K class (sometimes more than 20 minutes). Superdomes are sometimes faster, but not by much.
Semper ubi sub ubi
1 NEC Earth-Simulator 35860.00
2 Hewlett-Packard 7727.00 Los Alamos
The distance from the first to the second is pretty impressive. What on earth did NEC really do over there?
HTTP/1.1 400
Is there a way to tell how many FLOPS my linux machine gets. I always wondered.
She was more perky when I knew her, but I suppose she probably has quite a bit of flop these days.
Is that enough links there? Glad this isn't that impressive to me.
If all this should have a reason, we would be the last to know.
Comment removed based on user account deletion
You're too late, there's already been a barrage of beowulf clichés. Sigh.
I built a small Beowulf cluster. It was actually very easy, apart from writing the MPI enabled code.
./your-prog
;)
;))
Step 1: Install the lam packages on all the nodes
Step 2: Create an account on all nodes, and use a passphrase-less ssh key to avoid prompting.
Step 3: Compile your code with mpicc (rather than gcc)
Step 4: Copy to all nodes.
Step 5: mpirun C
Admittedly it was only a 4 node cluster, but hey
Please, someone break it to me gently if this wasn't actually a Beowulf cluster
Get your own free personal location tracker
Bring on the Beowulf Cluster jokes!
But will there be more beowulf cluster jokes then links in the stories?
image from first link
I love those giant black racks, even if it's not the fastest cluster in the world the Space Odyssey nostalgia is still there.
"My God, it's full of stars!"
-Matt
--- Need web hosting?
That sounds an awful lot like the new P4's.
now why not try using macs for your supercomputers?
I know that they arn't as scalable
I think you answered your own question there.
Read it again. What does it say? EARTH-SIMULATOR
It's gonna take some CPU power to simulate earth, don't you think??
a Beowulf Cluster of Beowulf Cluster jokes?
I think it would be interesting to look not just at the processing capacity, but also the costs associated with building and maintaining each mainframe.
Impressive numbers. I suggest you go take a look at that hardware that runs the Earth Simulator (#1 on the top 500 list). That flash movie is impressive. .. But don't forget that you got a helluva lot faster CPU inside your head - your brains beat all that expensive hardware all the way.
----
You would think they would be running their website on a server fast enough not to get /.-ed.
"Anonymous Coward" is for whistleblowers, not unpopular opinions.
I wonder, can it Slashdot every link in that article? :)
Last time i checked a dual g4 1.25 ghz system was below that of a p3 3.06+hyperthreading in graphics benchmarks (adobo after effects + photoshop). (the dual g4 system also cost $1k more).
It may still be ahead in gflops... I'm not into cpu's enough to answer that but i do doubt that. In any case mac's are for graphics people so that should be a real blow.
But I'll bet dual 2.4 ghz xeons will kick the 1.25 ghz system's ass in terms of gflops. Plus there only like $650 each so the mobo + processors won't cost more then $1400.
Hmmm... Pie...
For some reason, I think most of these actually uses myrinet?
Step 1: Imagine a Beowulf cluster of these.
Step 2: ???
Step 3: Profit!
-- Ed Avis ed@membled.com
Comment removed based on user account deletion
Shame top500.org itself ain't running on a supercomputer...
While most people seem to be complaining about the number of links in the story, if history is any indicator, 90% of people won't click on one of those links, let alone all of them.
Overrated / Underrated : Moderation
I was going to suggest creating a cluster from the top 10 there. Would that be possible? A beowulf cluster beowoulf cluster?
This is not such a dumb question. The LinuxBIOS project was started by and for the Los Alamos National Lab. One of the nifty things this allows them to do is change Kernel without taking the machines down. You can then switch to a kernel compiled for different purposes.
Help fight continental drift.
A large cluster of links. How fitting. Michael has a sense of humor.
Bring on the Beowulf Cluster jokes!
Are there more more nodes in a Beowulf cluster than there are jokes of it?
He probably went all crazy because Linux stories tend to get ignored here at Slashdot.
-
Inventor of the term 'pardon my French'.
I hope none of those super computers was the webserver or else it's just the top 499 now. :p
Slashdot comments can be accurate, highly modded, or posted quickly. Pick two.
Ah, that would be because Apples 'supercomputer on the desktop' marketing drivel was just that.
Hell, the Sony Playstation 2 was subject to export restrictions because it was 'too powerful', which was driven by/followed with the requisite marketing drivel, but you don't see and PS2 clusters in the 'Worlds fastest supercomputer' list either.
It has been a long time since Apple PPC was competitive in terms of price/performance with x86s. Of course thats not the only reason to buy a computer, i don't want to get the apple-zealots panties in a bunch.
It's just that Intel/AMD didn't make a song and dance about breaking the GFLOP barrier, since that happened way back with the P3/Athlon 600-800, hardly cutting edge chips.
Hell, a 600Mhz Alpha had GFLOP performance years before either the G4 or the x86s.
The PPC has a nice vector processing unit (Altivec), which could make it a good choice in some situations, but given the premium you pay for Beowulf nodes (Xserves?) from Apple, you will, in general, get a lot more bang for the buck from x86.
I gots ta ding a ding dang my dang a long ling long
yup, sage advice on what makes a graphics workstation from someone who uses adobo software. grow up.
A real supercomputer supports much faster I/O, higher interconnection bandwidth and lower interconnection latency.
And btw. the new Cray X1 delivers the performance of a all but the largest linux-clusters in a single cabinet (820 GFlops peak that is..). In terms of computing efficiency it makes even the Earth Simulator look pale. I am really looking forward to the next iteration of the TOP500, when the first X1 machines are included.
They don't have the kind of memory bandwidth these systems need. With AltiVec, a G4 can indeed get a huge gigaflop number, but SIMD floating point takes up a lot of data (with 128 bit SIMD, 20 bytes per 4 operations) and the G4's memory bus runs at a paltry 1.3 GB/sec (compared to 4.2 GB/sec for a P4). Feeding the G4's AltiVec units at full speed requires 20 GB/sec of bandwidth, so once your dataset falls out of the 256K of L2 cache (which these scientific computing applications surely do) the G4 chokes. Besides, AltiIVec doesn't do double precision floating point, whic is necessary for this sort of thing.
A deep unwavering belief is a sure sign you're missing something...
As when other barriers are broken, a bit of a shock wave was created.
Windows machines for miles around were rattled.
Offtopic? Sheesh. I can't find anything ONTOPIC in here.
Actually, Mac's are used in super computer clusters. JPL has an intresting benchmaark of 33 Xserves. They get 1/5th of a TeraFLOP of performance. Not bad, considering how cheap they are.
I'm not trying to start a war or anything. It's just an amusing observation.
Democracy Now! - your daily, uncensored, corporate-free
That's not an answer at all, it's a tautology. What does 'scalable' mean in this context? That you can climb to the top of it? To say that you can't build a cluster of Macs because they're 'not scalable' is the same as saying 'because you can't build a cluster of them'. The answer is probably that you get more performance for less cost from Intel or AMD setups, rather than technical issues.
...wearing a skin-tight topless leather jumpsuit, with cutaway buttocks and transparent crotch panel.
Are there any Microsoft Windows-based systems that qualify as supercomputers?
(This is a serious question, I have no idea if they do or do not.)
Step 1: Install the lam packages on all the nodes ./your-prog
Step 2: Create an account on all nodes, and use a passphrase-less ssh key to avoid prompting.
Step 3: Compile your code with mpicc (rather than gcc)
Step 4: Copy to all nodes.
Step 5: mpirun C
Step 6: PROFIT!
Oh my God! how many freaking links does one story need?
Is that enough times to count as a beowulf cluster of times you've heard beowulf cluster jokes?
Solutions to earth quakes and godzilla.
FLOP/s would mean FLoating point Operations Per Per Second
That is, unless their house style specifies that "FLOP" means "FLoating-point OPeration".
Will I retire or break 10K?
http://apple.slashdot.org/apple/02/11/15/1630248.s html?tid=181
This collection of links failed to mention that the #1 computer is an "Earth Simulator." How kewl is that! Reminds me of the book _Earth_ by David Brinn.
M@
Krispy Cream is people
my uncle works there
the rest run Windows.
Did I miss the sarcasm tags on the "slightly boring" comment or something? I think there's a large audience on slashdot who are all very excited about high speed computing. Overclockers aside, I know I hate waiting for a compile.
Latley though, I feel the things I'm waiting for my computer are not a function of how fast the CPU can run, but how poorly the software is written. Can someone can tell me why my windoze machines sometimes block for up to a min when I try to click the "Location" box on the top of the file browser common dialog control? Or the oft-complained about boot time for most everything? Or the time it takes almost any program to load up the first time you load it?
Anyone else think it's time to start over, and not just assume the fater and faster machines can deal with the laziness we program into the systems we build?
M@
Krispy Cream is people
- The weatherman is usually wrong.
- Aliens are abducting us. We need to send radio signals to Fife, Alabama, not out into space.
- Unified Theory is based on Heisenburg's stuff... You can have relativity and quantum mechanics... but not both at the same time. Damn, that guy was a genius. By the way, the unified theory is:
Of course, I'm sure Doom3 has this somewhere in its source code, so ummm... go crunch 40 TFLOPS on thate = 42; // always 42.
</humor>
/^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$/i
Only if you count duplicates separately
Special Relativity: The person in the other queue thinks yours is moving faster.
check out the report on our NetBSD cluster which would easily scale to many nodes.
It's just a question of proper application software, and OS doesn't really matter - I can't understand all this fuzz about Linux. *shrug*
Now, speaking to the coward... I don't use after effects or photoshop that much. The only reason they were mentioned is because I saw how they compared by some credible benchmarks. It's hard to find good benchmarks between apple systems and pc's... At least it is for me, i use both macs and pc's but i mostly go to pc hardware sites. Unfortunately the other site with benchmarks that i checked out was from apple.com. Which was a heavily skewed in their favor. The latest dual g4's versus p4's with non ddr SDR! The g4's ofcourse got ddr ram... Also, I wasn't trying to give sage advice... Just trying to prevent people from overating the mac. Which happens too much nowadays. Making fun of someone's knowledge based on just one typo is pretty sad. especially by coward.
Hmmm... Pie...
"It's just that Intel/AMD didn't make a song and dance about breaking the GFLOP barrier..."
I don't know 'bout AMD, but Intel has these funny BunnyPeople to promote anything from breaking speed limits to new processors as shown here. So contrary to what you believe, yes Intel does make a song and dance(plus commercial) about [insert_marketing_gibberish_here]!
It's a pretty awesome machine, each blade has two 1.4ghz Xeon processors and 4gigs of Ram.
This machine makes the best Counter Strike server I've ever played on!
Good security is based upon reality and common sense. Common sense is a function of having common knowledge.
Myrinet Software. Not only does it support Windows plus a whole range of *NIXes.
They did. And it seems to be missing from the Top 500 list. According to this, 33 XServes reached 217 GFlops/sec. Now, according to Apple, they should be able to reach a much higher speed than this (roughly twice the performance they actually got), but part of the reason might be that they used 100BaseT instead of Gigabit, and theoretical != real world anyway. This earlier cluster of 76 G4's even acheived higher results. JPL found Macs to be "capable of excellent scalability in performance. "
The third highest ranked supercomputer in Canada (356 on the list) is located at sobeys (a grocery chain company)... even above Bell Canada's supercomputers.
What is it about 1.000 teraflops that makes such a number a "barrier"?
Imagine a Beowulf cluster of...
Oh, wait.
Quid latine dictum sit, altum viditur.
Anything said in Latin, sounds profound.
Does anybody know where to find the 'old' X1 video (when the machine was still called the SV2). I can't find it on the cray website anymore. If I recall correctly, it was a 20-30 min mpeg file with a LOT more technical details than that smooth commercial you can find on their website these days.
I really thought there would be more Microsoft on the Top 500 Super Computer list, just as a matter of honor and homage to the Chief Software Architect.
:) What a lot of information, thanks for the great article!
Looking at the list, we can see that Super Computers Prefer ANYTHING BUT Microsoft, 499 to 1. I tried to find out more about the "1", but it has been encrypted by Seoul National University using a character set "charset=euc-kr". If anyone has more info on it, please post it in english.
I wonder when Steve Jobs will get a MAC cluster on this list
MPI for Mac OSX Jaguar
http://www.mpi-softtech.com/news/?id=1037037084
According to the SETI@HOME stats page, SETI is running about 45 TFLOPS, which is slightly ahead of the Earth Simulator's 40 TFLOPS or the LANL 10 TFLOPS machines. This isn't real precise - Top500 uses Linpack as their benchmark, which is a lot more realistic and controlled than SETI, so your mileage may vary. And of course that's Today's measurement from SETI, which is fairly variable in its CPU speed.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
An ad on Apple's web site about the G4:
"Twin engines All systems have dual PowerPC G4 processors -- up to 1.25GHz -- and L3 cache for the ultimate in performance and productivity delivering up to 18.3 gigaflops of power."
Is that why the "G" is in G4?
Is there a T4 coming soon?
Ya linux is a Tera Flop alright 10 yeras and the desktop still blows a penor.
Wow the new x that comes it 6 months can actually change resolutions without restarting!
What a radical concept!
The top500 list was made by running the benchmark that solves a dense system of linear equations. Though this test can be relevant for the centers like #1 "Earth Simulator Center", #2 #3 Los Alamos and #4 #5 Lawrence labs - they are doing exactly that - who can tell me how relevant it is for Charles Schwab, Verizon, AT&T and Bell South ?!!! These guys are just running portals accessing huge databases, aren't they?
It seems that the test is self-serving: the Earth Center is doing complex simulations - and it is #1 on the test that is designed exactly for that.
Imagine different computer center - millions of clients are making requests for specific data or knowledge that requires complex data search and sometime small computations. What would be relevant is a test that measures how many simultaneous requests the center can handle (in the millions per second) and latency for each such request (in milliseconds). I guess that the Earth Center, or national lab performance score will be much worse.
So I wonder how relevant top500 list is to the real power of a cumputer cluster that people need?
>
These are single precision FLOPS on some apple fractal program optimized for Altivec and undoubtedly embarassingly parallel.
The top500 list is based on double-precision linpack scores. This cluster would not score anywhere near that level on the top500 test because Altivec doesn't do double precision, so you use the regular scalar FPU. Furthermore, you need a fairly fast interconnect to get a good fraction of theoretical peak on linpack, so I would estimate that this cluster wouldn't get more than 40 gflops or so in the top500 test.
P4s can do a double precision vector, and as a result, they get much better linpack scores in a similarly equipped cluster, and for far less money. This is why you don't see big clusters being built out of macs.
It is certainly an impressive system that seems far beyond clusters. Do you have any idea about the price ? And availability outside USA ?
http://pbj.snu.ac.kr/structure/research/body_resea rch_supercom.html
s ea rch_supercom1.html
;)
http://pbj.snu.ac.kr/structure/research/body_re
according to their site, it was funded by M$, intel and samsung.
I wonder what would happens if they install unix on it
If you were trying to ask the question "does the number of flops of the cluster increase if the flops of the nodes does" then yes. Mac's really aern't all that far behind pc's in this field though, the main problem right now is the memory (which will be more than fixed with the PowerPC 970 nex year).
XServes running a PowerPC/970 (or even *droll* a Power4 with velocity decoding) will easily beat "wintel" hardware, then again no one will use either for the type of systems in question (I hope..)
Check out this:
. php
http://maccentral.macworld.com/news/0211/14.pooch
Slashdot reported a while ago that google.com used a cluster of 8,000 Red Hat boxes. Surely this would make the top 500 list?
Why would you put that stupid link with half-a-dozen popups in your SIG? Are you TRYING to be an asshole? Or do you need the word irony defined for you?
To the mod that modded this off-topic: You are a fucking loser, and I have meta-modded you unfair. This means you are less likely to mod in the future. The topic of this discussion is clustered computing. The topic of this message was clustered computing. Off-topic means those wouldn't be the same. Get a fucking clue.
Anonymous Meta-Mod
UCSD's Concurrent Systems Architecture Group
www.windowsclusters.org
NCSA NT Cluster Consortium
Real Application Performance
These are just a few links - many more are out there if you search for "Windows HPC"
ScottKin
I don't give a rat's behind about "karma" here or anywhere else. Don't like what I have to say here? Deal with it!
Wonder what would happen if we built a super cluster of clusters?
All this big iron and none of 'em are talking to each other outside their local custers!
Take 10 of these >=1024 node clusters and wire 'em together, and you end up with a monster 10,240 node cluster right..
I'm sure you'd get a slap in the face from Amdahls law somewhere, but wouldn't it be interesting to find our just where..
--dez;
http://WebSearch.COM.AU
http://WebSearch.CO.NZ
From the moment I picked your book up until I put it down I was convulsed
with laughter. Some day I intend reading it.
-- Groucho Marx, from "The Book of Insults"
- this post brought to you by the Automated Last Post Generator...