SGI Installing Beowulf

Cool by Mountaineer · 1999-08-12 18:58 · Score: 1

Cool, very cool...

Re:Cool by Dr.+Zymotic · 1999-08-12 18:59 · Score: 1

Extremely cool.
Re:Cool by Anonymous Coward · 1999-08-12 19:28 · Score: 0

Actually, with all that hardware, it should be extremely hot :-)
Re:Cool by psamara · 1999-08-13 02:45 · Score: 1

SGI is playing the game very well to steer clear of trouble. Their sudden move to Linux makes them(and their products) to remain supercomputing arena for years to come. Furthermore, they win a huge abount of good will and publicity from the open source community. Yes the move is extremely cool.
Re:Cool by Mountaineer · 1999-08-15 18:55 · Score: 1

Heh, yeah. I want one of the new SGI A/C's for the server room at work.

Way to go OSC by mangino · 1999-08-12 19:02 · Score: 1

As a former employee of a company with offices at the Super Computer Center (The Greater Columbus FreeNet) Way to go!

I'm glad to see them get some new hardware to play quake on! Do you think this means they'll give away the Onyx2 machines they have now?

--
Mike Mangino
mmangino@acm.org

I thought it was only one processor per box. by NoNsense · 1999-08-12 19:05 · Score: 1

I dont know, but when I think of a beowulf cluster, I think of seperate machines. I guess 32 quad processor boxes are legal :) "... Yeah, and Microsloth is going to own the world .... oops."

--
So there.

Re:I thought it was only one processor per box. by wall · 1999-08-12 23:57 · Score: 1

Nah they come in up 8 processor varieties, though I think that the 4 is out, and the 8 is soon

da'fly

Couldn't find a price for those 1400L's by anthonyclark · 1999-08-12 19:10 · Score: 1

V. Cool.

But, how much are those SGI servers going to cost? The PHB's consider SGI to be a "name" that they would consider installing and they assume an SGI would cost the same as a Sun.

Also, Why put a groovy case on a machine that's going to sit in a darkened server room?

--
----- Documentation is worth it just to be able to answer all your mail with 'RTFM' - Alan Cox.

Re:Couldn't find a price for those 1400L's by airfabio · 1999-08-13 01:35 · Score: 1

256 megs quad Xeon 500 (512 kb) 256 megs
around $14000 using linux
Re:Couldn't find a price for those 1400L's by cweber · 1999-08-13 03:05 · Score: 1

Also, Why put a groovy case on a machine that's going to sit in a darkened server room?

Our supercomputers sit in a glasshouse in full view. You have to give the public something to look at or the $$$ won't come rolling in anymore. :) I'm really glad our serves aren't ugly gray or beige boxes! You want a very fast computer look the part.

Keep dreaming! :) by Troy+Baer · 1999-08-12 19:11 · Score: 1

The new OSC Beowulf is going to be used as a compute engine -- I'm not even sure if the 1400Ls have graphics cards! I don't see our SGI MIPS graphics hardware going anywhere soon. :)

--Troy Baer, Systems Engineer, OSC Science & Technology Support

--
"My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac

Quad Xeon? by WasterDave · 1999-08-12 19:14 · Score: 1

I feel like one of us is vaguely missing the point...

Surely the idea with clusters (and SMP, for that matter) is to use truckloads of cheap stuff to get a better result than one big expensive thing. Presumably 1024 P2-450's (or G3's) would prove to be cheaper than 128 Quad Xeon's and a damn slight faster into the deal. Probably cheaper than a T3E too.

Anyone feel like doing the sums? You may assume a BOOTP server obviating the need for 1024 hard drives, if you want.

Dave

--
I write a blog now, you should be afraid.

Re:Quad Xeon? by NoNsense · 1999-08-12 19:20 · Score: 1

That was my point of my post. I dont think I articulated it well, but I was trying to say that those expensive quad boxes are not what I had in mind for a Beowulf. I expected a whole bunch of PII machines ... true more bang per server, but more buck, too.

--
So there.
Re:Quad Xeon? by Anonymous Coward · 1999-08-12 19:45 · Score: 1

Wouldn't the slowest parts of the box be I/O which we'll throw to the side for now; and then the 100mb intercommunication links?

I would guess that you are giving it a performance kick by keeping the CPU's close together where they can talk to ram and other CPU's on a medium that's an order of magnitude faster than 100 base network. I'm not thinking that you'd get a 4x increase over single proc boxes, but I'd guess that you'd get more bang for the buck, since you can (if coded for it) keep processes near a group of cpu's and don't have to send info over the slower network. This is where clusters get interesting, how can I keep my part of the program near the closest RAM and CPU's, so I don't have to go over slow links; and still get the full potential of the box.

tsuiter@midusa.net
Re:Quad Xeon? by Anonymous Coward · 1999-08-12 19:45 · Score: 0

I think SGI went with Quad Xeons because that's the only machine they have with proper Linux support. Sure the Visual Workstations "support" Linux but they are not cost effective for a cluster of this sort.
Re:Quad Xeon? by SteveRyan · 1999-08-12 20:35 · Score: 1

You're absolutely right; there is no way they got the best bang-for-the-buck in this deal. It's not entirely irrational, though, as there are a couple of other factors besides price that weigh in here. Although 1024 P2-450's are cheaper to purchase, they still need space to put the boxes in, power to run them, network connections, hard drives, someplace to run all those cables and cooling capacity for all those CPUs and RAM and power supplies.

Also, the quad Xeon's, while not a great deal, are probably not as bad as it looks because each node requires a case, power supply, a minimum of one network interface and probably more for a big cluster, a port or more in the Ethernet switch, and a hard drive. The price tag on all of these does add up, even when you're buying cheap hardware, let alone when it comes from sgi.

I think a lot (most?) Beowulf clusters do have local storage on each node, and for good reason. The network is usually a bottleneck already, and it costs a lot more than the price of a hard drive to upgrade beyond fast ethernet (per node, that is).
Re:Quad Xeon? by Anonymous Coward · 1999-08-13 01:01 · Score: 0
1. That's 32 Quad Xeons, for a grand total of 128 processors, not 128 quads for 512 CPUs. (Damn cheaper, isn't it? ;)
2. The slowest part of a cluster (from what I could figure out) is the network link. So, by getting multi-processor machines, you group CPUs in a way that ensure better communication between some of 'em.
3. SGI parts are especially far from cheap. So, by reducing the # of hard drives, network cards and motherboards, it might just sum up a tad cheaper than even more single (or 'simply' dual-CPUed) boxes.

Good.. by pen · 1999-08-12 19:20 · Score: 1

Good, now I can get those extra fps in Quake to set me apart from the lamers.

---

Three Cheers for Ohio.. by uncleFester · 1999-08-12 19:31 · Score: 1

I've been meaning to stroll across the street and genuflect before the Cray sometime.. looks like I may have an even better reason to drop by there.

Maybe I should go back to school... wait, wtf am I saying?!?

-fester(licious)

--
-'fester

Re:Three Cheers for Ohio.. by Anonymous Coward · 1999-08-13 15:09 · Score: 0

So, you work at the gas station... or is the health food store? I know, you're a welder!
Re:Three Cheers for Ohio.. by Anonymous Coward · 1999-08-13 21:50 · Score: 0

heheh. or, could be that they work at "university systems" as it was called when i was there ;) that's not *that* far away :)

Beowulf by John+Campbell · 1999-08-12 19:47 · Score: 2

Wow, I bet those would make a really great... oh... never mind. They did.

Re:Beowulf by Overt+Coward · 1999-08-13 01:39 · Score: 1

But the real question is: does it run Li...
Oh... never mind...

Quad Xeon... yum. by The+Silicon+Sorceror · 1999-08-12 19:51 · Score: 1

A Quad Xeon, for those of you who don't know, is a Studly Machine. Linus Torvalds has one (at home I think) - when it was delivered the truck guy got confused because his house didn't have a loading bay. It compiles the kernel (heck, it compiles ASS) in under 60 seconds.
128 of those boxes in one place, in one Beowulf cluster... I think I'm going to have an orgasm.

--

~ Give me 101 plastic soldiers, and I will conquer the world.

(Offtopic) Cray 1 vs. modern PC by mtm · 1999-08-12 20:01 · Score: 1

I was just wondering how a modern PC (dual or quad CPU) compares with the Cray 1 (1976 vintage)? I mean, just how fast is todays PC at doing the kinds of things that the Cray was designed for? Can I finally tell people that I've got the power of a Cray by my desk?

just curious,
mike

Re:(Offtopic) Cray 1 vs. modern PC by Anonymous Coward · 1999-08-12 20:39 · Score: 0

Sure you can say it, but it doesn't really mean anything. The cray was a vector processor, and your PC is a scalar processor. They're very different things. You probably don't do a whole lot of math on your PC that would vectorize well, most people don't and thats why the supercomputer market is a niche market. I forget the mips/flops numbers of the old cray but if you just want to look at those then your PC is definately faster. Its not too hard to buy a PC now days that has a flops rating as high as the modern crays. That isn't a rating of how much work it can get done though. A cray T90 for example is a totally different computer than what you use. For example you might be suprised to find there is no cache at all in such a machine. Thats because all memory in the machine is cache. The big vector iron when crunching vectorized jobs can deliver an amazing amount of work, and thats really what its all about. However, most jobs (models, etc) can be run on other, cheaper hardware with good performance. Some things still run best on a vector processor and thats why there is still a niche supercomputer market.

AHHHH! This totally SUCKS! by Anonymous Coward · 1999-08-13 00:03 · Score: 0

Dammit, this sucks. They always want everything installed early in the damn morning. 4:30am SUCKS! (guess who has to install the 7 or so racks?) Ah well, at least I'll be able to play with it before anyone else. hehehe. Hope I can convince engineering to let us use XFS. some sgi guy near OSC

Re:Quad Xeon? (missed point?) by bored · 1999-08-13 00:04 · Score: 1

It is entirely possible that the problem set is a better fit for the xeons and therfore gains significant performace increases (above the standard xeon 3%-5%) with them over the standard PII/PIII. This could be due to:

1. A piece of code that needs access to >512 megs ram.

2. A piece of code that performs significatly better with >512k of L2 cache.

Although I am not a big fan of saying xeon=server there are cases where the xeon solves a problem MUCH better than the standard PIII. I hope that the appropriate research has been done in this case as apposed to the salesman walking in and saying "you need..."

Spinning off Cray, still in the supercomputer biz by Greg@RageNet · 1999-08-13 00:13 · Score: 1

Well it looks like SGI is staying in the supercomputer business after all. I would not be supprised if they started selling beowulf clusters as their 'supercomputer offering' in the next year or so. They believe (probably rightly) that the old monolithic idea of the supercomputer is dead and the new supercomputer should be large clusters of workstations. Much cheaper to build a beowulf cluster than a cray.

SGI seems to be betting large chunk of their business on Linux. Is it a shrewd business move, or done out of desparation? Only the future shal tell. and, btw, at current market capitalization RedHat has become the second largest UNIX vendor (i.e. primary business based on UNIX). They could buy SGI and SCO at this point and still have some left over.

--
Slashdot, would a spell-checker for posting be too much to ask? It's not rocket science!

I bet this is why they spun off Cray by shaldannon · 1999-08-13 00:20 · Score: 1

I bet they're gonna start selling clustered PC's as supercomputers and leave the new Cray unit to fend for itself...this sounds like a demo model more than anything....

Who am I?
Why am here?
Where is the chocolate?

--

What is your Slash Rating?

Re:I bet this is why they spun off Cray by Vryl · 1999-08-13 23:01 · Score: 1

Prolly . . . I guess they took the bits they needed (NUMA) and sold the rest

-- Reverend Vryl
Re:I bet this is why they spun off Cray by patk@ieee · 1999-08-15 23:48 · Score: 1

You bet? You can not speculate about a company that has shown such on-par marketing techniques up up till now! (um, heh.) By the way, no more cube logo for you kids. Use the "sgi", its for you.
You know, I wish you folks would come back to reality with this Beowulf crap. It may be a cheap way to weld a bunch of little bargain basement peecees together but no Beowulf abomination is going to do a specific Cray job like a Cray. I am assuming none of you folks that are making these claims have any experience whatsoever in the Science and Engineering disciplines on data mining, visualization, financial forecasting, and aerospace that cray machines are so well suited for. Cray's market share may be shrinking at the moment but that is only due to management geniuses and bandwagon jumpers that think this whole linux thing is going to save their company some how.
Linux is nowhere near the stability, scaleability, and performance of a polished proprietary unix like UNICOS. I cringe when I see you kids want to install your precious linux on a T3E so it will be "really fast". A T3E is running the best possible operating system for its architecture, as is the T90 and SV1 (know what those are?). No linux flavor is going to do these professional machines justice, let alone make them BETTER.
Keep it in the basement of 14-25 year old transluscent skinned pseudohacker types living at home because BitchX and Gimp are the only apps those kids are going to run. Oh, and have mom make me an extra grilled cheese sandwich.
-Patrick Krekelberg
Institute of Electrical and Electronics Engineers

Some numbers... by Troy+Baer · 1999-08-13 00:31 · Score: 1

Cray-1: 1 CPU, 80 MFLOPS theoretical peak, about 40-60 MFLOPS on real-world code.

Cray YMP-8: 8 CPUs, 250 MFLOPS theoretical peak per CPU, about 150-200 MFLOPS on real-world code.

Cray T94: 4 CPUs, 1800 MFLOPS theoretical peak per CPU, about 450-900 MFLOPS per CPU on real-world code.

Cray T3E600/LC-136: 136 300MHz DEC Alpha 21164s (8 OS/command + 128 applications), 600 MFLOPS theoretical peak per CPU, about 90-150 MFLOPS per CPU on real-world code.

SGI/Cray Origin 2000: 32 300MHz MIPS R12ks, 600 MFLOPS theoretical peak per CPU, about 160-200 MFLOPS per CPU on real-world code.

OSC Mk.1 Beowulf node: 2 400MHz Pentium IIs, 400 MFLOPS theoretical peak per CPU, about 80-100 MFLOPS per CPU on real-world code.

(Assuming 64-bit floating point throughout; the Intel chips don't suffer as much as you might think from this, as they do all FP internally with 80-bit precision and truncate to 32 or 64 bits.)

If you've ever wondered why people pay big bucks for Cray vector machines, let me sum it up in three words: sustainable memory bandwidth. The T90 machines can sustain on the order of 13 GB/s memory bandwidth, and the J90/SV1 machines can sustain about 5 GB/s. By comparison, most workstation and PC systems can sustain about 300-500 MB/s on a good day with a tail wind.

--Troy Baer

--
"My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac

Re:Some numbers... by Anonymous Coward · 1999-08-13 00:54 · Score: 0

some of the 4 CPU J-250(?) Cray machines are now going really cheap - i saw one for $11,000 last week. anyone wanna buy some crays ?
Re:Some numbers... by Anonymous Coward · 1999-08-13 21:54 · Score: 0

oh dear god the ultra enterprise blows....

Uhm HELLO? (knocking sound) McFly! by wall · 1999-08-13 00:43 · Score: 1

What do you guys think an Origin is? Its a cluster with a ragingly-fast interconnect,
but with shared memory as well. Sgi has been
going this way for a while. Now that all the
Sun weenies have had to eat their hats about all
the shit they've slung about cc:numa, they too
have cluster-style machines in the wings, and
are pushing them HARD.

SMP/S^2MP/MPP and our little beowulfs are
getting closer together every day. (ever notice
that myrinet is 100% like the Cray t3e torus?)

I expect that clusters will continue to grow,
and companies like whatever sgi becomes will
push multi-cpu clusters on linux.

da'fly

Re:Uhm HELLO? (knocking sound) McFly! by Anonymous Coward · 1999-08-13 00:57 · Score: 0

err..origin 2000's have 64 CPUs per machine and fast interconnects...we have a 192 processor origin 2000 cluster here at BU...3 machines that basically do PVM jobs (same as beowulf) but with 64 processors per machine.

Wow. I rollerblade past that building every day. by Anonymous Coward · 1999-08-13 01:03 · Score: 0

Wish I had a key. Do they need to hire any programmers?

Design Decisions, System Details by Troy+Baer · 1999-08-13 01:29 · Score: 1

Several people have commented that OSC might be coming out on the
short end of the stick on this deal. We disagree, and here's why:

1. Reliability: The most common hardware failures in cluster systems are hard drives and power supplies. The 1400Ls have redundant power
supplies, and smaller numbers of nodes will generally have a lower component failure rate.

2. Migration Path: The "cluster of SMPs" model is in use in several of the largest computers currently in use, including the ASCI Blue Mountain and Blue Pacific machines. Users should be able to develop code on our cluster and then move their code to these much larger platforms with little additional effort.

3. Application Needs: We have several users with applications which need in excess of a GB of RAM and several GBs of temporary disk storage per node. Many of these applications are "legacy codes" from the vector machines which are difficult to parallelize using message
passing approaches, but which can be parallelized relatively easily on SMP systems using compiler directives. The architecture we have
selected allows this as well as multilevel parallel programming, using message passing between nodes and compiler directives within a node.

4. Flexibility: We have users in virtually all scientific and engineering disciplines, most of whom (50-75%) write their own code. We need a cluster architecture which can accomodate a mix of serial, SMP parallel, and MPP parallel applications.

There are also drawbacks to this approach, primarily related to memory bandwidth and the added cost for the quad processor nodes.

Here is a slightly more detailed description of the new OSC Beowulf than was in the press release:

32 compute nodes plus a front end node, each with
4 Pentium III Xeon 500MHz processors
2 GB RAM
18 GB SCSI-UW disk
1 Fast Ethernet interface
2 Myrinet interfaces
8 16-port Myrinet switches
various software:
SGI's modified Red Hat distribution
PBS queuing system
Portland Group and KAI compilers
AMBER (computational chemistry)
Gaussian 98 (computational chemistry)
Cactus (computational physics)

We will be posting further details at http://oscinfo.osc.edu/hardware/ as things develop.

Sincerely,

--Troy Baer and Doug Johnson, OSC

--
"My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac

Well, if you're an OSU student... by Troy+Baer · 1999-08-13 01:53 · Score: 1

Do they need to hire any programmers?

We often hire OSU students as programmers, gofers, experimental test subjects (oops, did I say that out loud? :), etc. It's a great place to work. Stop by at the beginning of fall quarter or watch the OSU "green sheets".

--Troy

--
"My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac

RAM limit a problem? by Anonymous Coward · 1999-08-13 02:35 · Score: 0

Linus has stated that he doesn't want to 'kludge' the kernel to support more memory than 32 bits can address (only in 32-big CPU's, of course). This creates a limit of 2 gigabytes of addressable memory (Not 100% sure here).

Will this limit any of your applications? What was your reasoning behind not choosing a similar solution from an Alpha vendor? (64-bit CPU, much more addresses)

Re:RAM limit a problem? by Troy+Baer · 1999-08-13 03:04 · Score: 1

Linus has stated that he doesn't want to 'kludge' the kernel to support more memory than 32 bits can address (only in 32-big CPU's, of course). This creates a limit of 2 gigabytes of addressable memory (Not 100% sure here).

Actually, it's theoretically possible to address up to 4 GB. SGI's "bigmem" patch makes it possible to do this, but I think Linus has rejected the patch for the mainstream kernel. That wouldn't shop SGI from shipping kernels built with this patch, so long as they also distribute the source for it.

Will this limit any of your applications?

Not really. Our J90/SV1 and Origin system both have 16GB of memory, but I don't think we allow a single job to use more than 2-4 GB of memory. Large files is actually more of a problem than large memory; Gaussian can generate 20+ GB output files. Thankfully support of large files on 32-bit platforms seems to be coming along.

What was your reasoning behind not choosing a similar solution from an Alpha vendor? (64-bit CPU, much more addresses)

We have a fair amount of Alpha experience in-house already; several years ago we had a classroom cluster of DEC Alpha workstations, and we currently have a Cray T3E which is also Alpha-based. Our main concern with the Alpha was software availability, especially compilers. The Compaq Digital Fortran beta is a good start, but I would like to see the Portland Group and KAI compilers for Alpha Linux as well.

--Troy

--
"My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac

Cheap != Inexpensive was Re:Quad Xeon? by LL · 1999-08-13 03:47 · Score: 1

> Surely the idea with clusters (and SMP, for that matter) is to use truckloads of cheap stuff to get a better result than one big expensive thing

Just because a car is cheap doesn't mean it is not going to cost you in terms of repairs and shoddy engineering down the road. What people forget is that price is a function of scale of economy and functionality. Would you base a car purchase purely on the number of cylinders and torque? Similarly the computer purchasing experts look at the overall system, the balance between components, cost of parts, availability of drivers and software, cost of ownership and human learning curve. Personally I think too many people without a clue are hoodwinked by fast talking salesdroids and bells and whistles. If you look at the hard evidence, you might even find that MIPS chips (in the Origins) give better sustained real-world application performance for a certain class of problems than even the highly touted Alphas. Also if you look at say SGI's O2, you find that it is designed for easy rack-mount maintenance. All this comes at a premium.

Sheesh ... the PC makes a barely adequate car for roaming around the information backlanes but some people want big gruntly trucks for industrial computing. Companies pay real money for smarts who can evaluate the difference between the two.

LL

Good points, You guys have GREAT systems by Anonymous Coward · 1999-08-13 04:52 · Score: 0

You guys must be wearing poo-eating grins:

1. Vector supercomputer (Cray T94)
2. MPP supercomputer (Cray T3E)
3. ccNUMA supercomputer (Origin 2000)
4. Adding ANOTHER MPP (Linux Beowulf)

It ALMOST seems redundant to buy the Beowulf, unless you have an inordinate amount of parallel intensive jobs.

Keep us updated on how the Beowulf compares with the T3E.....

Right in my backyard! by Improv · 1999-08-13 05:23 · Score: 1

Neat. I wonder if they need extra sysadmins...

--
For every problem, there is at least one solution that is simple, neat, and wrong.

46 comments