With Linux Clusters, Seeing Is Believing
Roland Piquepaille writes "As the recent release of the last Top500 list reminded us last month, the most powerful computers now are reaching speeds of dozens of teraflops. When these machines run a nuclear simulation or a global climate model for days or weeks, they produce datasets of tens of terabytes. How to visualize, analyze and understand such massive amounts of data? The answer is now obvious: using Linux clusters. In this very long article, "From Seeing to Understanding," Science & Technology Review looks at the technologies used at Lawrence Livermore National Laboratory (LLNL), which will host the IBM's BlueGene/L next year. Visualization will be handled by a 128- or 256-node Linux cluster. Each node contains two processors sharing one graphic card. Meanwhile, the EVEREST built by Oak Ridge National Laboratory (ORNL), has a 35 million pixels screen piloted by a 14-node dual Opteron cluster sending images to 27 projectors. Now that Linux superclusters have almost swallowed the high-end scientific computing market, they're building momentum in the high-end visualization one. The article linked above is 9-page long when printed and contains tons of information. This overview is more focusing on the hardware deployed at these two labs."
So, if I've got this straight, Slashdot drives the banner ad traffic, real journalists write the content, and all Roland has to do is rip off a few articles, then sit in the middle and collect the checks. How do I get a sweet gig like that?
Now that Linux superclusters have almost swallowed the high-end scientific computing market...
While some simulations parallelize very well to cluster environments, there are still plenty tasks that don't split up like that.
The reason clusters make up a lot of the Top 500 list is that they are relatively cheap and you can make them faster by adding more nodes - whereas traditional supercomputers need to be deisgned from the ground up.
G5 nodes do have excellent performance, but don't assume OSX is all they can run.
We at Terra Soft have just released Y-HPC, our version of Yellow Dog Linux, with a full 64-bit development environment, and a bunch of cluster tools built in.
I'm not much of a marketting drone, but being as I am part of the Y-HPC team, I had to put a shameless plug in. Bottom line is, it kicks OSX's ass any 2 ways you look at it.
Y-HPC
no comment
Beside the fact that you are (please forgive me) Apples and Oranges, your sample size is way too small to use as conclusive evidence. Until we start seeing X Serve Clusters in a few more places we can't be sure of the cost benefit.
Look here.
The speed you quoted is the theoretical peak, not the actual maximum achieved in a real world calculation (like the Top 500 organization's use of Linpack).
System X's equivalent theoretical peak is 20.24 TFlops.
I'm also not indicting Linux clusters in the least; they've clearly shown they can outperform traditionally architected and constructed supercomputers for many tasks, with the benefit of using commodity parts - at commodity pricing. All I'm saying is that there's a new player here, and it's a real contender, and has done a lot for very little money...which was the whole goal of Linux clusters in this realm in the first place.
(Also, as I said, the volunteer labor model is irrelevant - let's just pretend it was professionally installed for an additional $1M, or even $2M if that would satisfy you. It's still several million dollars cheaper, and 3Tflops greater performance. These are BOTH rackmount clusters with similar amounts of nodes and processors, running a commodity OS with fast interconnects. There are differences, yes, and perhaps even differences in goals. But looking past that, price/performance for something like this is still an important metric.)
Clusters are proven to be cost effective, but they do require more labor to optimize code to get it to work in that environment. Its easier to have the system and the complier do the work for you in a single image system. This article address those issues and concerns. single image shared vs distributed memory in large Linux systems
I know there will be a dozen predictable responses to this, deriding System X, Virginia Tech, Apple, Mac OS X, linpack, Top 500, and coming up with one excuse after another. But won't anyone consider the possibility that these Mac OS X clusters are worth something?
Your right!
1st, System X or the "Big Mac" was thrown together so that people like us would talk about it and to get a good standing for the November 2003 top 500 list. They did an excellent job at this.
Now for some reality. The system is not yet operational.
When it was first thown together, everyone "in the know" and myself questioned how this was going to work without a reliable memory subsystem, and the VT people responded that they were going to write software to correct any hardware errors, and we said OK, whatever. Then, they said, hmm, we kinda needa a reliable memory subsystem, so lets rip out all 1,100+ machines and start over with these new Xserve boxes that have ECC memory in them.
This system has not come up yet with the new Xserves, according to their website.
Now, I'm going to make a comment on Linpack. Linpack, like all good benchmarks are really good at measuring that benchmark's performance. Linpack is a good benchmark, but it is also a benchmark that does not require much RAM per node to run. Some applications do need a good amount of RAM/node to run and being that RAM costs $$, the cost adds up very quickly, and the cost/cpu/teraflop goes down accordingly.
With the comparison between System X and Tungsten NCSA cluster. Personally, I don't know why the Tungsten cluster cost more because the Mac cluster has more RAM/node and each node should have been cheaper in general. The NCSA cluster uses Myrinet which I know is expensive, but I do not know that in comparison to the Infiniband equipment on the Macs. Supposedly, the Infiniband interconnects were what got System X on the top500 list with such good results, or at least that is what the head of the project told me.
Although its popular here on slashdot because many of the readers are younger and inexperienced (and have no money) that they praise anything that costs less and extra brownie points go towards an underdog like AMD or Linux, however in the real world people actually will pay extra for something to ensure that it works. Working equipment may seem superfluous to the dorm room Linux guru, but trust me, I know what its like to work with equipment that cost about $1 mil and it doesn't work. We could have gone with the 2nd bidder at $1.2 mil and it would have worked. Yes, we "saved" $200,000, but we also wasted well over $500,000 when one considers that over 50% of the equipment is faulty and many people's time has been wasted.