How to get 1.5 TeraFlops from Linux
Oak Ridge National
Lab has purchased from SGI an Altix
3000 (flash movie). This
article claims that:
SGI Altix 3000 is recognized as the first Linux cluster that scales up to 64 processors within each node and the first cluster ever to allow global shared memory access across nodes.
There is more here,
here,
and here.
You're better off using mosix. It'll allow for more normal (ie, not beowulf specific) applications to thread across computers. I'd imagine that an open-mosix setup (like the ones using the knoppix boot CDs tailored to it) could probably make for a fairly powerful computing cluster very easily.
Do not look into laser with remaining eye.
here and here are probably good places to look.
An infinite number of monkeys will eventually come up with the complete works of
just download clusterknoppix and knock yourself out. ; )
http://bofh.be/clusterknoppix/
"Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd all be running around in darkened rooms,
Back in my days of parallel programming (read: 1998) on Beowulf clusters I used Fortran and C. The trick to make your program "parallel" is to use special programming libraries that will spawn instances of your program across the cluster and let them communicate between each other. The libraries I used were PVM and MPI.
At that time they were working on a Java implementation, but I don't know what happened with that.
HPC Wire had an article that I referenced in my journal on 6/30.
It's an interesting machine. I'd love to get one to play with. I'm sure our benchmarkers will have some even more interesting comments once they're done. Expect teething problems, folks. Systems of this size and complexity take time to break in.
Do you know why the road less traveled by is littered with the bones of the unwary?
The machine has 256 processors for 1.5 teraflops, not 64.
Um..
I always liked Irix, and everyone I ever talked to who used Irix liked it. The GUI is about 500x more usable than the horrors of OpenWindows or CDE on Solaris.. bleugh.
"Hey! Unless this is a nude love-in, get the hell off my property!!"
Another poster mentioned MOSIX, but openMosix is probably a better bet. It's released under the GPL, and is a combination of kernel-patch and user-space tools. Once you get these installed on each node, and connected via ethernet (all with networking set up of course... IP addresses etc) you should have yourself a cluster.
What I find amazing is that the Cell is supposed to run up to a TeraFlop when it reaches production. That compared to a 64 processor Linux cluster.
thats 64 processors per node
We just got ours installed yesterday. I'm still installing software and am starting benchmarks. It's only the deskside version (12 cpus, 24GB RAM, 1TB disk), but still more powerful than the 4-cpu SGI Origins that we have been using.
It is the first one that the regional SGI reps had actually installed, but since it is almost exactly the same as the MIPS-based origin 3000 servers (with the exception of the obviously different Itanium 2 cpus and supporting chipsets), they ran into almost no problems getting it online. I have also been suprised as to how many commercial codes have already been ported to the platform.
The main reasons we purchased this machine is for the ease in parallelizing code and the floating point performance of the Itaniam 2 cpus. We're computational materials engineers and the less time we have to spend optimizing codes so that the nodes of a cluster are always kept busy and minimizing I/O bottlenecks gives us more time to concentrate on the theoretical issues.
It runs RedHat 7.2 with some tweaks by SGI called SGI ProPack. The Propack modifications come on separate CDs, with the proprietary software on separate CDs from the open source software. So far, from the command line, everything works just like my PC. It's kind of strange running Linux on a >$100K machine, but it sure beats dealing with the annoying differences between IRIX and Linux. Now to see if it performs as well as we expect...
I'm 100% sure it's very much a hardware thing. SGI has a long history of building very large hardware shared memory machines (e.g., the Origin line - 02000 and 03000) based on proprietary MIPS processors. They still make those machines, but market pressures forced them to also develop and sell Intel-based shared memory machines. I'll be curious to see how much of SGI's extensive work on IRIX to let it scale to 1000's of processors efficiently will bubble out to their Linux systems.
Take a look at OSCAR. We built a nine node cluster out of IBM e-servers using it. It was really quite straightforward.
As far as languages go, you'll need an MPI library (like MPICH, or LAM/MPI (which is also a runtime environment), but the actual code used is usually C, C++, or Fortran. BTW, OSCAR comes with MPICH and LAM/MPI.
or, try Quantix, which is derived from cluster knoppix. A self-booting ISO with data analysis software, based on Knoppix. This is geared more for scientific apps; it doesn't come with open office, etc, which cluster knoppix does.
Eat Lamb, 1 million coyotes can't be wrong
Main product page: http://www.sgi.com/servers/altix/
and here there are bunch of PDFs to download: http://www.sgi.com/servers/altix/datasheets.html
for example:
SGI Altix 3000 Family of Servers and Superclusters (172K)
Linux Software for the SGI Altix 3000 Family (50K)
SGI Technology Solutions for Linux (48K)
This Like That - fun with words!
The thing about Mosix is the costs of process migration.
First, you have to understand process migration. In a mosix cluster, a running process can be moved, lock stock and barrel, from one CPU to another. All that is left behind is a "stub" process that forwards all file I/O across the network to the new location. So, if the program was a 3D raytracer that had the source description file and the output file open, after migration all file accesses to those files would be forwarded over the network to the stub (since you cannot guarantee that the remote machine can access those files in the same way.)
Now, this is great for programs that do little file I/O but lots of computing (for example the ray tracer I just described.)
However, the process must be set up on the local node first, then migrated. If the process has a 3 G core image (is taking up 3G of memory), then 3G of stuff has to be shoved across the wire, while the program is frozen. Thus, migrating a process is expensive.
Now, if you have a bunch of long-running compute bound processes this is a net win (for example, rendering a movie might benefit). But something like building the Linux kernel won't benefit, since what you have is a bunch of short running, high I/O jobs.
We have a Mosix cluster at work. I tried using it as a compile farm, and the results were disappointing. Not surprising - I was NOT using it for what it was designed for.
However, if we can ever get the FPGA synthesis tools running natively under Linux, the hardware types are going to be quite happy....
www.eFax.com are spammers
the billion dollar machine
What the hell kind of Origin 3800 do YOU have? ISTR ours (512-proc) was roughly $10M.
This SGI isn't a beowulf cluster. Traditionally beowulf clusters refer to clusters that use COTS hardware, don't have global shared memory, etc. Lots of people in the cluster community won't even call clusters of workstations beowulf clusters if they have some high speed network like Myrinet. We just call ours a Linux cluster, a cluster, a distributed memory supercomputer... You can program your beowulf cluster in C or Fortran using a free MPI(message passing interface) implementation called MPICH. I have even seen a scaled down version of MPI for Python, (which requires MPICH to use). So start learning MPI. MPI-1 has 129 functions, but you can write most programs using a small subset of these calls. If you don't want to pay much money I suggest using C, because g77 sucks and there are no free Fortran 90 compilers. We use the Portland Group Fortran and C compilers as well as the Intel Fortran Compiler. I think we are going to switch completely to Intel Fortran and C. Why do you want to use a beowulf cluster if you have no clue about them or parallel programming in general? Just because they are 'cool'? A beowulf cluster is very usefull for modeling or datamining, but unless you are running models that take days/weeks/months on your workstation you won't need the processing power of a cluster. Right now we have someone running a model on 76 processors that takes about 9 hours to finish a 1 year cycle in the model. They want to run the model for a total of 50 years. This is a model of the pacific ocean where they introduce carbon into the ocean, and then they see what effect that has on temperature change, etc. After they get their 50 year resluts for the Pacific they want to do a global simulation. This is the real use of beowulf clusters. They aren't for load ballancing web servers, playing quake, or any of the other things people post about every time there is an article about supercomputers/beowulf clusters. The speed up you will get really depends on your application. The more communication is necessary, the smaller the speed up will be. If you have a 5 node cluster, with 2 processors per node, the theoretical maximum speed-up is 10, but you will never achieve that because of parallel overhead(MPI calls, communication time, etc). If you want more information on parallel programming and cluster computing send me a private message telling me what you hope to do with your cluster.
SGI Origin 3800 cluster
Just to nitpick... most Origins are not clusters but rather one large single machine. It is possible to partition the machine in firmware and have each partition talk to others over the existing (and now unused) numalink interconnects... but it's much faster (even for plain MPI code) to just run the beast as one large single machine.
You can find a list here. For most computations and most hardware, you are probably still better off with MPI or PVM rather than shared memory.
Note also that there are several high speed interconnects for Linux clusters available from many different vendors, including InfiniBand, Gigabit Ethernet, FireWire, and Myrinet.
SGI systems (Origin and Altix) have massive interconnects that hold together the single-system architecture. They're fast for shmem-type shared memory apps, but also for MPI. In fact, SGI keeps tweaking their MPI implementation with every release of IRIX and the Linux ProPack, even though MPI is not the "best" way to run apps on their systems.
The interconnects in most Origins and Altix systems are 3.2 gigaBYTE per second with extremely low latency. I don't know about Infiniband, but I do know that GigE is only 125 MB/sec with really high latency... FireWire 800 is 100 MB/sec with better latency.... and I think the bst version of Myrinet is 500 MB/sec (4 gigabit) with about 5x the latency of SGI's 'numalink'.
The smaller Altix systems (and supposedly, future Altix and Origin systems this fall) can be double cabled or can run at a higher speed... for 6.4 gbyte/sec per interconnect.
Also, the Altix can handle up to 64 processors per single machine / single node (or 128 with a very beta set of patches). The cluster in the article is actually four Altix systems, each with 64 processors. The Origin 3800/3900 can handle 512 processors per node (or 1024 with a special "XXL" IRIX kernel).
Great stuff for I/O intensive tasks, but massive overkill for 3d rendering or calculating pi.
http://oss.sgi.com/projects/sgi_propack/
The machine has 1024 procs
There are two 1,024-processor Origin 3000's in the world. One is in Eagan, Minnesota. The other is at NASA. The NASA machine is called chapman. It has 256 GB of RAM. Not terabytes.
How do I know this? Because I'm sitting here looking at lomax right now.
You're a... whaddya call it. Liar.
This does not seem to have been mentioned before:
Niflheim at Danish University of Technology
Oh, I found a little page on the sara website where it is clarified (can't get onto the intranet anymore, else I'd have mirrored some better specs). Anyway, more about TERAS here.