SGI Installing Beowulf
anaZ writes " SGI today
announced that it will install the company's first 128-processor Linux(R) cluster at
the Ohio Supercomputer Center." It seems SGI will be using 32 Quad
Xeon boxes (1400L). I wonder (and hope!) it is just the first of many.
Cool, very cool...
As a former employee of a company with offices at the Super Computer Center (The Greater Columbus FreeNet) Way to go!
I'm glad to see them get some new hardware to play quake on! Do you think this means they'll give away the Onyx2 machines they have now?
Mike Mangino
mmangino@acm.org
I dont know, but when I think of a beowulf cluster, I think of seperate machines. I guess 32 quad processor boxes are legal :) "... Yeah, and Microsloth is going to own the world .... oops."
So there.
V. Cool.
But, how much are those SGI servers going to cost? The PHB's consider SGI to be a "name" that they would consider installing and they assume an SGI would cost the same as a Sun.
Also, Why put a groovy case on a machine that's going to sit in a darkened server room?
----- Documentation is worth it just to be able to answer all your mail with 'RTFM' - Alan Cox.
"My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
I feel like one of us is vaguely missing the point...
Surely the idea with clusters (and SMP, for that matter) is to use truckloads of cheap stuff to get a better result than one big expensive thing. Presumably 1024 P2-450's (or G3's) would prove to be cheaper than 128 Quad Xeon's and a damn slight faster into the deal. Probably cheaper than a T3E too.
Anyone feel like doing the sums? You may assume a BOOTP server obviating the need for 1024 hard drives, if you want.
Dave
I write a blog now, you should be afraid.
Good, now I can get those extra fps in Quake to set me apart from the lamers.
---
I've been meaning to stroll across the street and genuflect before the Cray sometime.. looks like I may have an even better reason to drop by there.
Maybe I should go back to school... wait, wtf am I saying?!?
-fester(licious)
-'fester
Wow, I bet those would make a really great... oh... never mind. They did.
A Quad Xeon, for those of you who don't know, is a Studly Machine. Linus Torvalds has one (at home I think) - when it was delivered the truck guy got confused because his house didn't have a loading bay. It compiles the kernel (heck, it compiles ASS) in under 60 seconds.
128 of those boxes in one place, in one Beowulf cluster... I think I'm going to have an orgasm.
~ Give me 101 plastic soldiers, and I will conquer the world.
I was just wondering how a modern PC (dual or quad CPU) compares with the Cray 1 (1976 vintage)? I mean, just how fast is todays PC at doing the kinds of things that the Cray was designed for? Can I finally tell people that I've got the power of a Cray by my desk?
just curious,
mike
Dammit, this sucks. They always want everything installed early in the damn morning. 4:30am SUCKS! (guess who has to install the 7 or so racks?) Ah well, at least I'll be able to play with it before anyone else. hehehe. Hope I can convince engineering to let us use XFS. some sgi guy near OSC
It is entirely possible that the problem set is a better fit for the xeons and therfore gains significant performace increases (above the standard xeon 3%-5%) with them over the standard PII/PIII. This could be due to:
1. A piece of code that needs access to >512 megs ram.
2. A piece of code that performs significatly better with >512k of L2 cache.
Although I am not a big fan of saying xeon=server there are cases where the xeon solves a problem MUCH better than the standard PIII. I hope that the appropriate research has been done in this case as apposed to the salesman walking in and saying "you need..."
Well it looks like SGI is staying in the supercomputer business after all. I would not be supprised if they started selling beowulf clusters as their 'supercomputer offering' in the next year or so. They believe (probably rightly) that the old monolithic idea of the supercomputer is dead and the new supercomputer should be large clusters of workstations. Much cheaper to build a beowulf cluster than a cray.
SGI seems to be betting large chunk of their business on Linux. Is it a shrewd business move, or done out of desparation? Only the future shal tell. and, btw, at current market capitalization RedHat has become the second largest UNIX vendor (i.e. primary business based on UNIX). They could buy SGI and SCO at this point and still have some left over.
Slashdot, would a spell-checker for posting be too much to ask? It's not rocket science!
I bet they're gonna start selling clustered PC's as supercomputers and leave the new Cray unit to fend for itself...this sounds like a demo model more than anything....
Who am I?
Why am here?
Where is the chocolate?
What is your Slash Rating?
Cray YMP-8: 8 CPUs, 250 MFLOPS theoretical peak per CPU, about 150-200 MFLOPS on real-world code.
Cray T94: 4 CPUs, 1800 MFLOPS theoretical peak per CPU, about 450-900 MFLOPS per CPU on real-world code.
Cray T3E600/LC-136: 136 300MHz DEC Alpha 21164s (8 OS/command + 128 applications), 600 MFLOPS theoretical peak per CPU, about 90-150 MFLOPS per CPU on real-world code.
SGI/Cray Origin 2000: 32 300MHz MIPS R12ks, 600 MFLOPS theoretical peak per CPU, about 160-200 MFLOPS per CPU on real-world code.
OSC Mk.1 Beowulf node: 2 400MHz Pentium IIs, 400 MFLOPS theoretical peak per CPU, about 80-100 MFLOPS per CPU on real-world code.
(Assuming 64-bit floating point throughout; the Intel chips don't suffer as much as you might think from this, as they do all FP internally with 80-bit precision and truncate to 32 or 64 bits.)
If you've ever wondered why people pay big bucks for Cray vector machines, let me sum it up in three words: sustainable memory bandwidth. The T90 machines can sustain on the order of 13 GB/s memory bandwidth, and the J90/SV1 machines can sustain about 5 GB/s. By comparison, most workstation and PC systems can sustain about 300-500 MB/s on a good day with a tail wind.
"My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
What do you guys think an Origin is? Its a cluster with a ragingly-fast interconnect,
but with shared memory as well. Sgi has been
going this way for a while. Now that all the
Sun weenies have had to eat their hats about all
the shit they've slung about cc:numa, they too
have cluster-style machines in the wings, and
are pushing them HARD.
SMP/S^2MP/MPP and our little beowulfs are
getting closer together every day. (ever notice
that myrinet is 100% like the Cray t3e torus?)
I expect that clusters will continue to grow,
and companies like whatever sgi becomes will
push multi-cpu clusters on linux.
da'fly
Wish I had a key. Do they need to hire any programmers?
short end of the stick on this deal. We disagree, and here's why:
1. Reliability: The most common hardware failures in cluster systems are hard drives and power supplies. The 1400Ls have redundant power
supplies, and smaller numbers of nodes will generally have a lower component failure rate.
2. Migration Path: The "cluster of SMPs" model is in use in several of the largest computers currently in use, including the ASCI Blue Mountain and Blue Pacific machines. Users should be able to develop code on our cluster and then move their code to these much larger platforms with little additional effort.
3. Application Needs: We have several users with applications which need in excess of a GB of RAM and several GBs of temporary disk storage per node. Many of these applications are "legacy codes" from the vector machines which are difficult to parallelize using message
passing approaches, but which can be parallelized relatively easily on SMP systems using compiler directives. The architecture we have
selected allows this as well as multilevel parallel programming, using message passing between nodes and compiler directives within a node.
4. Flexibility: We have users in virtually all scientific and engineering disciplines, most of whom (50-75%) write their own code. We need a cluster architecture which can accomodate a mix of serial, SMP parallel, and MPP parallel applications.
There are also drawbacks to this approach, primarily related to memory bandwidth and the added cost for the quad processor nodes.
Here is a slightly more detailed description of the new OSC Beowulf than was in the press release:
32 compute nodes plus a front end node, each with
4 Pentium III Xeon 500MHz processors
2 GB RAM
18 GB SCSI-UW disk
1 Fast Ethernet interface
2 Myrinet interfaces
8 16-port Myrinet switches
various software:
SGI's modified Red Hat distribution
PBS queuing system
Portland Group and KAI compilers
AMBER (computational chemistry)
Gaussian 98 (computational chemistry)
Cactus (computational physics)
We will be posting further details at http://oscinfo.osc.edu/hardware/ as things develop.
Sincerely,
"My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
Do they need to hire any programmers?
We often hire OSU students as programmers, gofers, experimental test subjects (oops, did I say that out loud? :), etc. It's a great place to work. Stop by at the beginning of fall quarter or watch the OSU "green sheets".
"My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
Linus has stated that he doesn't want to 'kludge' the kernel to support more memory than 32 bits can address (only in 32-big CPU's, of course). This creates a limit of 2 gigabytes of addressable memory (Not 100% sure here).
Will this limit any of your applications? What was your reasoning behind not choosing a similar solution from an Alpha vendor? (64-bit CPU, much more addresses)
Just because a car is cheap doesn't mean it is not going to cost you in terms of repairs and shoddy engineering down the road. What people forget is that price is a function of scale of economy and functionality. Would you base a car purchase purely on the number of cylinders and torque? Similarly the computer purchasing experts look at the overall system, the balance between components, cost of parts, availability of drivers and software, cost of ownership and human learning curve. Personally I think too many people without a clue are hoodwinked by fast talking salesdroids and bells and whistles. If you look at the hard evidence, you might even find that MIPS chips (in the Origins) give better sustained real-world application performance for a certain class of problems than even the highly touted Alphas. Also if you look at say SGI's O2, you find that it is designed for easy rack-mount maintenance. All this comes at a premium.
Sheesh
LL
You guys must be wearing poo-eating grins:
1. Vector supercomputer (Cray T94)
2. MPP supercomputer (Cray T3E)
3. ccNUMA supercomputer (Origin 2000)
4. Adding ANOTHER MPP (Linux Beowulf)
It ALMOST seems redundant to buy the Beowulf, unless you have an inordinate amount of parallel intensive jobs.
Keep us updated on how the Beowulf compares with the T3E.....
Neat. I wonder if they need extra sysadmins...
For every problem, there is at least one solution that is simple, neat, and wrong.