Linux Cluster attains 125.2 GFLOPS
akey writes "CPlant98, a Linux cluster composed of 400
Digital Personal Workstation 500as, achieved 125.2 GFLOPS, which would place it at #53 on the Top 500 list. And this was only a 350-node run... "
I'm hearing rumors of 1000+ Linux Clusters. I'm itchin' for
it to come out of the closet so we can see some real
benchmarks.
Such a machine exists. See the subject for the name.
It's linked to from the Beowulf site somewhere.
retrorocket.o not found, launch anyway?
Maybe we've worked with different style cases, but the ones I've used don't have power supply issues.
Probably, there are so many different cases out there. Many of the 'slim' cases I have seen only have 120-175 watt power supplies, which may be adequate for most uses, but may not be the best for sustained heavy duty use. Typical mid tower cases usually have 200-250 watt supplies.
As for floppy drives - simple - don't put one in the box.
The problem there is that it may increase the time necessary to service a machine if it dies.
For bulk serving you don't need it.
Very true under normal circumstances, but my concern is when the proverbial excrement hits the rotary oscillator. I want to be able to fix any problem ASAP. Having to find and install a floppy drive in a machine to bring it back up is time I may not want to take.
Nor a CD ROM drive.
That is normally something that you can usually do without, provided one is available on a network reachable machine.
If the box is at a colocation, you're going to get to it via ssh, not standing in front of it.
Provided you aren't dealing with a crashed hard drive or some other issue that can't be solved remotely.
Another complaint I have about a lot of 'slim' cases I've seen is many of them have limited quantities of front-panel accessable drive bays. While that isn't a big deal for most things, one useful thing when you are dealing with a large number of servers is the inexpensive IDE 'lock and load' trays, which make swapping in and out hard drives much faster and easier. It can make large scale upgrades or dealing with crashed drives a lot faster since you can do the work on another machine and then only take down the server box to do the actual swap.
No, an int is usually 32-bits these days, making it more like 4G.
I honestly don't know for certain, but aren't the big cray machines and other microprocessor supercomputers effectively clusters of SMP nodes? Could the disparity here be fairly weak SMP performance of Intel's SMP scheme?
"onward!" cried the copper man, little knowing brass corrupts...
It's not that bad. You can create a 4D network out of 100b switches and get good performance out of really large networks. 8-port 100bT switches are about $250 these days, and that gives you 4 nodes to a switch, adding only $75 to the price of each computer. .. hmm, if you don't want more than 8 hops between any two nodes, that gives you a cap of sixteen thousand nodes ( 8**4 switches, times four nodes per switch ).
By "pretty large"
Don't forget India. And Pakistan (don't know about Iraq) definitely has the intelectual resources to create super computers. Just not economically, and probably not of the same quality as their American counter-parts. I don't think building a super-computer is as complex an engineering task as building a nuclear bomb....correct me if I'm wrong...
In a word, "yes" to all of those questions. Clusters are well-suited to running the software the gummint's concerned about -- simulating nuclear physics -- and they're not sure of what to do about it. There's been a lot of talk, but the politicos just don't understand computer technology. It's like a bunch of MBA PHB's trying to figure out how to detect cycles in a linked list by committee.
Basically, the control freaks are fucked. Computer technology is simply not controllable. If America shuts off all computer exports, then Korea (or any of a hundred other countries) could easily start manufacturing low-end microprocessors and sell them to India et al.
Welcome to the precursor to technological singularity.
"This is the first day of the last day."
Yes, there is usually an upper limit to the number of machines you can effectively add to a cluster. Although, this is usually a pretty fuzzy limit for a few reasons.
First, there are limits that depend entirely on the characteristics of the cluster architecture you're building. For example, things like the maximum practical number of nodes you can put on the network you are using as an interconnect. Theoretically you can scale these things to infinity, but practically you begin to find out that you have too many hops in your network for the interconnect to be effective.
Second, the characteristics of the particular applications you are running on your cluster determines the maximum scalability. This is pretty application-dependent. For example, if you are running an application that uses software-driven distributed shared memory, then this application will usually scale up far less effectively as a message passing application.
Or get a bunch of SBCs - you can put at least four 2way SBC cards in a standard backplane in a 19" rackmount and stack them 6 or 10 high giving you something like 80 cpu's in a standard network rack. 5 racks across is about 8 linear feet for 400 cpu's
Of course, on this machine the compute nodes aren't equipped with hard drives... :) Seriously, in a huge cluster like this, if a node fails, they will take it out, and may not even bother trying to fix it, I imagine.
A cluster runs a separate operating system on each node. This generally (again, this is hearsay) makes it much harder to maintain a cluster than a supercomputer (meaing one with one operating system). We purchased a small 8 node IBM sp2 computer six months ago, and still haven't figured out how to make it act like a single computer. :( Oh well.
ok, so a cluster has a separate copy of the os for each node, whereas the conventional supercomputer has a single os controlling all it's nodes. that being the case, is it possible to take a supercomputer (perhaps the above mentioned t3e) and run it as a cluster, with a separate instance of the os for each node. i'm guessing that you wouldn't really want to do this, but is it possible. an anti-beowulf setup, if you will.
"onward!" cried the copper man, little knowing brass corrupts...
NERSC, for example, has recently purchased an IBM SP system which has two processors (or was is 4?) per node, with plans to upgrade to 16 processors per node.
The problem with SMP and clusters is that the message passing software has to be smart emnough to take advantage of the shared memory situation, and needs to and this can also complicate things when you try to optimize your code.
How many Windows NT machines rank in the top 53 of the worlds fastest machines?
Werd.
Every time I see a photo of one of these clusters (assuming the sandia photo os of the cluster they are using), it seems everyone has opted for full-sized boxes. Would seem they could cut down on rack space by 50% or more by going with a slim chassis. Go by any co-location and you can tell the newbies from the vets by who maximizes their shelf space.
re: Big-Ass Clusters
Fermilab has plans to build a 2000-node cluster in the near future but is putting off purchasing all the nodes until the last second to maximize their value.
re: Rackmounts
They're more expensive, and typically the machine rooms at large Beowulf installations have enough space for whatever they choose to use. It's not like Los Alamos has to pay for space at the colo when they add a new pile of Alphas.
Remember that what's inside of you doesn't matter because nobody can see it.
In a talk with someone from VA linux they said that they *possibly* have a client who would is looking at setting up a 2600 Node Cluster..
.. umm... :)
Umm.. really fast Quake
IIRC, they're uniprocessor nodes
Christopher A. Bohn
cb
Oooh! What does this button do!?
how much can this type of computer scale up.
At least 2000 it seems (if somebody try to do it then it must thoerically scale to that extent), but do we have a theorical limit or something like that???
And are these computers mono-processors or SMP?
If Linux was going to have great enhancement in SMP for 4+ CPU's then would it be worth to create a cluster of SMP boxes given the price difference between SMP and non=SMP boxes actually (I suppose if you do a 2000 SMP cluster then you must have special price).
"The obvious mathematical breakthrough would be development of an easy way to factor large prime numbers." Bill Gates,
So it wouldn't be "distributed.net", but "joint research into highly parallelised, highly distributed encryption validation", and "SETI@Home" is actually "joint research into vastly parallelised radio inteferrometry, using test data from Aricebo".
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
There's a place on the webpage to get a user account if you have a suitable project to run. Hmmm...wonder if they'd consider distributed.net or SETI@home suitable...
Excuse my ignorance, but is there a conceivable limit to the number of nodes you can have on a cluster?
Just how big of a room does it take to house 2600 nodes?
This is only for employees and contractors of that location. = They had an online form to fill out but they were asking for your managers authorization and all that......
I ate my tag line.
I ate my tag line.
-=Ellis (D)25=-
I know that for what I do (pseudopotential plane wave calculations) one 100bT switch would be too slow to connect an 8 node cluster. As in, you'd be better off with all the memory in one computer, and forget parallelization.
Also, I think that typically you don't want more than two hops between nodes. Of course, it all depends what you're doing. If you're doing monte carlo stuff, you could probably get by with 9600 baud modems if it were cheaper than ethernet.
Actually no there is not really a limit. The only limit you would have is the band width, but with the technologies comming out right now, high band width will be like pocket lint...
Space comsuption depends on what the use for their systems. If they use standard workstation then it will take up alot of room, but if you use SBC's (Single Board computer) the amount of room would decress quite abit.
I ate my tag line.
I ate my tag line.
-=Ellis (D)25=-
The first cray2 cost about $30 Million. Running a well-optimized code, it churned about 1 Gigaflop. Supercomputers seem to hang at about the same price ( $20-30 M ) but increase performance an order of magnetude or more per generation. This would put is in the range of teraflop machines now - building toward the petaflop for the same $30M. A single 450 MHZ pentium II can do about 70-100 Mflops depending on the exact operation , so ten to fifteen could do the same gigaflop as the cray 2. ( assuming embarrasingly parallel code. ) These computers are easily available for $1K in quantity. 150 should yeld 10GF at under 200k including network, and 1500 should yield 100GF at under $2M. I think this should still look like 10X more cost effective than the massive parallel and vector/mpp
machines. If you used dual processor systems, the cost/performance would be even better.
enough is too much
There will be 1U chassis for a variation of the Compaq DS10 computer. Pricing hasn't been determined, but it is a huge bang for buck. The main drawback of such size is lack of expandability - only 1GB RAM, one drive and one open PCI slot. The time is this summer, I think.
Hey, Samsung makes Alphas in S. Korea, AMD will make K7's in Dresden, Germany, Fujitsu makes Sparcs in Japan, Intel has loads of plants in places like Isreal, Maylasia, Singaphore, etc.
I wonder if this will make weenies go for more treaties. Ugh.
The URL is http://www.esd.ornl.gov/facilities/beowulf/
A brilliant epitaph for a Slashdotter would be:
HERE LIES JOE BLOW
LAST COMMENT!
--
Get your fresh, hot kernels right here!
Personally, I think that the biggest problem that people are going to face when creating super huge clusters (200+ machines) is not one of floor space, or heat dissipation. The problem is going to be with the networking of them. Sure, you can go to ATM, or gigabit ethernet, or ... but when it is absolutely critical that data gets to the next machine as fast as possible... and that the packet doesn't get lost somewhere along the way... And the whole reason (ok.. one of the main reasons anyway) for doing clustered supercomputers is because it is cheaper. When you start rack mounting them, and putting gigabit ethernet in them, and ... you are really starting to jack up the price of each node in your cluster.
As far as the question about how large you can go with them, if you use an int to determine which machine you are addressing, that puts a theoretical limit of more than 60,000 nodes.
How large can a cluster be?
Short answer: it depends.
Long answer: it depends on the applications and the usage patterns.
(I'm assuming we're talking about practical limits here, not theoretical ones -- the theoretical limit is probably the address space of a cluster's message-passing interface (i.e. 4 billion nodes).)
Some applications -- the so-called "embarassingly parellel" ones -- will scale with nearly no deviation from linear to any number of nodes, because they do loosely-coupled problems. (Which means the result of one part of the parellel computation does not depend on a result from some other parellel computation. The mandlebrot set is a good example of this.)
In general, the more tightly-coupled the problem is, the harder it is to scale, as the amount of data that has to be exchanged pushes the limit of the interconnects. A 32-node cluster constructed on a hub will be faster for loosely-coupled programs than a 24-node cluster on a switch, which could beat the 32-node cluster on a tightly-coupled problem because of communications overhead in the 32-node cluster.
Usage patterns also determine the maximum useful size. If you're at a large lab like Sandia, you can reasonably expect a large number of jobs to be running concurrently, which essentially parellelizes the cluster -- running 6 tightly-coupled programs, each on their own hypercube interconnect, will complete faster than running the six in series, each with the whole cluster.
-_Quinn
Reality Maintenance Group, Silver City Construction Co., Ltd.
yes, I know I am ignorant, but what the heck is a GFLOP?
This sounds really promising, but how does the price/performance of a Linux cluster compare to other "real" supercomputers? For example, what is the price/performance of a VERY high end SGI, and what would it take (price wise) for a Linux cluster to match that. I've heard that Linux clusters cost considerably less, but I've never seen any hard statistics.
Forget about slim chassis; how about no chassis? Take a look at Beowulf on StrongARM boards for $2000. These folks are looking at building 6 StrongARM processors with RAM and the necessary "glue" onto a single PCI card. Since easily obtainable PCs have 3 PCI slots in them, you should be able to set up an 18 node beowulf cluster inside one box (the PC itself acts as the controller). Can you usefully cluster a bunch of these (a cluster of clusters)? I don't know, but it's interesting to think about.
Doug Loss
Don't articles like this one make you want to LAUGH when you read articles that talk about whether or not Linux SCALES as well as NT?