High-Performance Linux Clustering
An anonymous reader writes "High Performance Computing (HPC) has become easier, and two reasons are the adoption of open source software concepts and the introduction and refinement of clustering technology. This first of two articles discusses the types of clusters available, uses for those clusters, reasons clusters have become popular for HPC, some fundamentals of HPC, and the role of Linux in HPC."
*ducks*
With Linux and other freely available open source software components for clustering and improvements in commodity hardware, the situation now is quite different. You can build powerful clusters with a very small budget and keep adding extra nodes based on need.
Yea, I'd like to build one but I'm not sure what I'd use it for. Does that mean I'm a geek?
Bradley Holt
But does it run...
Imagine a Beow...
Ahem. My apologies.
How is this article news? Expect to see regular articles that link to TLDP and *NIX man pages soon on slashdot.
from the beowolf-jokes-deserve-redundant-mods dept.
;-)
I've heard that Beowulf clusters can offer some level of redundancy. Maybe this is just Slashdot moderation imitating life.
What else can we learn from Slashdot moderation, I wonder?
Do you like German cars?
We spent $849,000 on an Itanium cluster and have recently found ourselves SOL since it's a dying architecture.
You can't even run Java on them.
If you "get" pointers add me as a friend (116)!
Cluster is very easy to implement today because there is a lot of software that can configure itself and connect to cluster nodes like OpenMosix
http://www.michel.eti.br
> lynx -source http://www-128.ibm.com/developerworks/linux/librar y/l-cluster1/ | grep -c Beowulf
0
Just curious why are you SOL? You have the cluster, linux will run on it and you need it to do calculations presumably? So you have the processor power now and you will have that processor power. Tomorrow of course something better may come along, but that's always the case.
ibook. hive.
xgrid, baby.
i've already computed the meaning of life, with a few of my buddies.
Jokes aside, when people say Linux cluster, do they usually mean Beowulf? Or are there other clusters and how do they compare? How difficult is it to setup a Beowulf cluster?
EvilCON - Made Famous by
Okay, so I'd really enjoy trying something like the clustered model, just for academic kicks, but a relevant question comes to mind, at least for me.
:)
Where do people get the commodity systems cheap enough to be able to play around with this? I hardly want to spend two thousand bucks on some old P2s just to play around. Anyone have some hot tips where you can find real cheap (dare I dream... free) commodity systems to build a low-end cluster for kicks?
Also, I'm a Windows guy by trade. Will making a Linux cluster make me instantly cool?
From everything I've seen, MOSIX is having some issues right now. Unfortunately, MOSIX is one of the easiest, most flexible ways to set up an HPC, and ever since they forked, development has been slow. I did research about 2 months ago to look into setting up a small MOSIX cluster with a few computers. My main goal was to get my feet wet in setting up a cluster using a few desktop and laptop computers. I figured that setting up a cluster with my Athlon 64 x2, Athlon 64 3500+, and a few laptops would speed up compile times by quite a bit. But, it appears that the 2.6 version of MOSIX is still beta and won't support the kernel I need for my Athlon 64 x2 (versions before 2.6.9 don't support powernow with the x2, and also tend to be flaky). So, I have the choice of running a cluster with slower PC's, or waiting for better support. If you look at the year on some of those whitepapers, only one was written this year, and I'd be willing to bet they are describing how to use MOSIX with the 2.4 kernel, not 2.6. I finally gave up on the idea, as running the latest kernel is more important to me.
You can't even run Java on them.
What do you mean? I thought 1.4.2 and up had support for Itanium. Check this white paper (search for Itanium). Are their claims false, or are you running and older version of the JRE?
I'm wanting to build one of these, but I really don't need it. Time may change that.
DMCA, Hollings, Palladium. What might have sounded like paranoia is now common sense.
For some very good information on F/OSS based clustering, check out aggregate.org. They have really neat ideas, that are reasonably well doccumented and freely implementable/usable. I built a little cluster (AFAPI on a WAPERS switch) with them for my highschool senior project, and it was a great experence.
Though mpp's are kind of like clusters, and the boundary between the two is vague, I think there's definately a distinction. In many MPPs, nodes share access to memory, just at a performance penalty. Often the scientific binary is written using a message-passing tool like MPI, but the OS is often run with direct memory access. Definately from a systems-administration point of view, an mpp is different from a cluster. In an MPP you don't have 4000 root hard drives and 4000 power supplies to replace when they break. An mpp may be like a (fast) cluster from the programmer's point of view, but they are a lot simpler to deploy and manage. (Blue Gene, xt3, altix)
I also contest some of the distinctions drawn about vector processor systems. The two vector systems currently on the market, the cray X1 and the NEC SX-8 are clusters. Each node just happens to be a vector-smp. The earth simulator is a 640 node cluster of 8-way SMP boxes, where each of the processors in the smp is a vector cpu. However, the predominant programming method even on these boxes is with explicite message passing like MPI. Co-array fortran and Unified Parallel C are faster, but slow to catch on.
Good summary of the common case though.
That's what to do to have your article /.ed: use Google Sets and make sure you include at least half a dozen on the first paragraph...
Uncopyrightable: The longest word you can write without repeating a letter.
That's the J2SE not the JRE.
You cannot run Java apps on Itanium.
If you "get" pointers add me as a friend (116)!
Zero CPU implementations of this.
But their links could at least have mentioned OSCAR http://oscar.openclustergroup.org/ or my personal favorite, ROCKS http://www.rocksclusters.org/, as these are more prevalent than xCat systems.
Personally, I like Rocks, as I ran three parallel architectures (i386/AMD64/IA64), on the same based distribution, just with each tuned to their particular processor. Comes with SGE and Myrinet support out of the box, and there are Rolls, i.e. custom software assemblages, for OpenPBS, for those who prefer it, as well as PVFS. It's easy to set up, and easy to administer, as the nodes are presumed to be interchangeable and disposable. When you reboot a node, it's obliterated and a fresh OS and supplementary package repository are laid down on a clean disk. No questions about version skew.
They now have a custom roll to help you build a visualization wall, but I never had a chance to try that one. (try convincing your boss that you want 4 digital projectors and a big room to play with)
The downside to the above distributions are that they presume batch-queue environments, which is appropriate for most of my work, but less so for many people trying to simulate owning an SMP, without paying SMP prices.
Other people assure me that the current version of OSCAR is solid as well, but they seem to lag in the multiple architecture support area (Itanium is always behind), and don't current support AMD64 natively. On the other hand, they build on top of several RedHatish linuces, as opposed to Rocks where you get Centos (RHEL), period.
the more accurate the calculations became, the more the concepts tended to vanish into thin air. R. S. Mulliken
Some guy on an HP forum asks how to get Java code to run in his Web browser on Linux Itanium. Shock and awe follows as he's told you can't run an applet on his multi-thousand dollar 64-bit workstation.
Link
If you "get" pointers add me as a friend (116)!
Rocks has a great system for making high-performance clusters from similar machines. A Rocks cluster consists of a front-end ("master") node and a bunch of compute nodes (and I think special-purpose nodes).
The master gets a full Linux (RedHat-based) install. It's a NFS/DHCP/Kickstart server for the compute nodes, and runs whatever other services you want the compute nodes to use. The master has two network cards and acts as a firewall (NAT optional).
The compute nodes boot via DHCP and Kickstart, downloading their kernel and whatever other OS files you want to their local disk. You decide how much NFS or local disk to use.
Job queueing is handled by, e.g., Sun Grid Engine (an Open Source queueing package) or some other queueing software.
Here's the neat thing: to make a change to a compute node setup, you change the Kickstart config and reboot all the compute nodes (as they finish whatever queued work they're doing, or immediately if you want). That makes the sysadmin's life easy, while still maintaining the speed of having the OS on the local disk.
Raise your children as if you were teaching them to raise your grandchildren, because you are.
I, for one, welcome our HPC Linux clustering overlords.
You can't even run Java on them.
:*)
And you would want to run a low performance non-scalable application development base on them why?
Or you just purchased the HPC with hopes of getting good performance out of JAVA?
This is like saying, "We heated the syrup to 400 degrees F. so it would come out of the jar faster, but now realize this won't work cause the jar keeps breaking."
Who in the heck would be trying to use something like JAVA on a HPC in the first place, and WHY?
Do you realize how much performance you are losing by not picking a better suited development platform? Or even one that doesn't make solitare chunk on a 3ghz P4.
How do I create a database of 8 billion records with 100k size each?
Slashdot = Sarcasm
I completely agree, but you'd be surprised how many scientists (computer scientists even!) will complain that XYZ algorithm is too computationally intensive...and then you find out they implemented it in Java.
It's like "DUH!" C or Fortran. Pick one. Or C++, even. But Not Java.
Of course, when you broach the topic, you will hear things like 'C is old. Fortran is older!' Wtf cares? It's F A S T.
Favorite
I would wager you've never used Java. I say that not as an insult, but because you simply have yet to realize that a change made back in 1996 made the average piece of Java code run as fast as the average piece of C/C++ code.
... suggesting that java performance is catching up to or even pulling ahead of gcc at least.
---
Five composite benchmarks listed below show that modern Java has acceptable performance, being nearly equal to (and in many cases faster than) C/C++ across a number of benchmarks.
1. Numerical Kernels
Benchmarking Java against C and Fortran for Scientific Applications
Mark Bull, Lorna Smith, Lindsay Pottage, Robin Freeman,
EPCC, University of Edinburgh (2001).
The authors test some real numerical codes (FFT, Matrix factorization, SOR, fluid solver, N-body) on several architectures and compilers. On Intel they found that the Java performance was very reasonable compared to C (e.g, 20% slower), and that Java was faster than at least one C compiler (KAI compiler on Linux).
The authors conclude, "On Intel Pentium hardware, especially with Linux, the performance gap is small enough to be of little or no concern to programmers."
2.
More numerical methods: SciMark2 scores
R. F. Boisvert, J. Moriera, M. Phillipsen, R. Pozo,
Java and Numerical Computing,
Computing in Science & Engineering, 3(2):18-24, Mar.-Apr., 2001.
SciMark includes a number of numerical codes. On a PIII/500, MFlops (higher is better):
ibm jdk 1.3.0 84.5
linux2.2 gcc (2.9x) -O6 87.1
3.
Still more numerical methods
From the book Object-Oriented Implementations of Numerical Methods by Didier Besset (MorganKaufmann, 2001):
Operation Units C Smalltalk Java
Polynomial 10th degree msec. 1.1 27.7 9.0
Neville Interpolation (20 points) msec. 0.9 11.0 0.8
LUP matrix inversion (100 x 100) sec. 3.9 22.9 1.0
4. Microbenchmarks (cache effects considered)
Several years ago these benchmarks showed java performance at the time to be somewhere in the middle of C compiler performance - faster than the worst C compilers, slower than the best. These are "microbenchmarks", but they do have the advantage that they were run across a number of different problem sizes and thus the results are not reflecting a lucky cache interaction (see more details on this issue in the next section).
These benchmarks were updated with a more recent java(1.4) and gcc(3.2), using full optimization (gcc -O3 -mcpu=pentiumpro -fexpensive-optimizations -fschedule-insns2...). This time java is faster than C the majority of the tests, by a factor of more than 2 in some cases...
These test were mostly integer (except for an FFT).
5.
Microbenchmarks (cache effects not considered)
In January 2004 OSNews.com posted an article, Nine Language Performance Round-up: Benchmarking Math & File I-O. These
If you "get" pointers add me as a friend (116)!
You could be the best programmer in the world in FORTRAN or C but don't think that because of that you understand Java. The vast majority of complainers and those who knock Java haven't written a line of code in Java ever in their life. Their experience may be with an applet or two in the browser, if that.
:)
r k.html
That being said, go read this article and then report back. I doubt you'll post because it's hard to dispute raw facts from an unbiased research team
---------------
Performance of Java versus C++
J.P.Lewis and Ulrich Neumann
Computer Graphics and Immersive Technology Lab
University of Southern California
Jan. 2003
updated 2004
This article surveys a number of benchmarks and finds that Java performance on numerical code is comparable to that of C++, with hints that Java's relative performance is continuing to improve. We then describe clear theoretical reasons why these benchmark results should be expected.
http://www.idiom.com/~zilla/Computer/javaCbenchma
If you "get" pointers add me as a friend (116)!
For those interested, there is a new website on clusters called ClusterMonkey. It just got started and has plenty of good free content (and more is coming).
HPC for Primates. Read Cluster Monkey
Anyone remember Transmeta? Well check out what they do now!
http://orionmulti.com/
[Donald's] work in making the "piles of PCs" approach to high performance computing a reality with Beowulf has been responsible for vastly expanding the construction and use of massively parallel systems. Now, viturally any high school - never mind college - can afford to construct a system on which students can learn and apply advanced numerical methods.
In retrospect, however, it would seem that the obvious cost benefits of Beowulf very nearly killed the development and use of large SMP and vector processing systems in the US. My understanding of the situation is this:
* Before Beowulf, academics had a very hard time getting time on hideously expensive HPC systems.
* When Beowulf started to prove itself, particularly with embarrassingly parallel problems using MPI, those academics who happened to sit on DARPA review panels pushed hard to choke off funding for other HPC architectures, promising that they could make distributed memory parallel systems all singing, all dancing, and cheap(er).
* They couldn't really deliver, but in the meantime, Federal dollars for large shared memory and vector processing systems vanished, and the product lines and/or vendors with it.... at least in the US.
* Eight years later, only Fujitsu and NEC make truly advanced vector systems [top500.org], and Cray is only now crawling back out of the muck to deliver a new product. Evidently someone near the Beltway needs a better vector machine, and Congress ain't paying for anything made across the pond.
Cutting to the chase, did [Donald Becker] advance a "political" stand among [his] peers within the public-funded HPC community, or [was he] just trying to get some work done with the budget available at NASA?
Luke, help me take this mask off
I'm sorry, but you're wrong.
... a new JVM! So when the Cell processors are available for scientific computing, C and Fortran compilers will be written to take advantage of their amazing capabilities. How quickly do you think a JVM will be available??
Java performance depends entirely on the JVM. If the JVM sucks, then the Java program runs slow. The JVM may be good on PIV x86, but what about Power series? What about Itanium? Clusters don't often use x86's because they consume lots of power and require lots of cooling. And new computing architectures require
I read the website you cited and lot of the reasons for its 'Java is faster than C!' aren't true. Most of the reasons why Run-time compilation are complete BS. A good Fortran/C compiler will do the _exact same thing_. The website doesn't convince me. It has too few benchmarks and does not provide the code it uses. Not impressed.
So, for scientific computing, Java is still not the answer.
And, I'm not alone. If Java was faster, then don't you think IBM/Intel/SGI/HP would create nice JVMs on all of their HPC systems with nice development environments. They don't.
Favorite
http://shootout.alioth.debian.org/benchmark.php?te st=hello&lang=java&id=0&sort=fullcpu
Need I say more?
C. None of the above. Clusters are about economics and the effect commodity hardware has on the market. Don did what any good engineer does, "he asks what if?"
HPC for Primates. Read Cluster Monkey
Odd, on both my $30K Itanium2 box and my $850K Itanium2 box I have both the IBM and BEA JREs installed. The link that you supply downthread is for a request for a web browser plugin. I'm sorry but there's no fucking reason for you to need to run a web browser on a $850K cluster of enterprise-class hardware.
Don't worry, Debian will support it. :)
Although I'm being slightly humorous, I really do appreciate this aspect of Debian, and I was a little sad (although I realize it was completely necessary) when they finally got around to dropping some of the lesser-used architectures. But it is nice to know that if I ever get a masochistic urge to run Linux on a Motorola 68LC040 that I know right where to get it.
I think IA64 will probably linger around for a while. Eventually it will just become a question of how much effort you or your company want to put into compiling your own code, once people stop building binaries for it off the shelf. I also don't know much about the applicability of a desktop OS like Debian to a clustered system, so maybe it's of no consequence at all in your situation.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
You can't even run Java on them.
ok, Where to begin...
First, spending a million bucks on a machine that doesnt meet your needs. I hope there is an accountant ready to spank someone over this.
Second, using Java in a massivly parallel fashion.. Last I knew there wasnt a MPI or PVM port that used Java, plus it kinda defeats the purpose of having big hardware running a slower language(yes I know compiled java can be fast, but Nowhere like near metal C ).
Third, Giving up on a hellfire machine.... Really dont Boo-hoo that you cant make it run java, open a book, code some C++ and make the thing work. If you have a problem for it to solve, then solve it, heck use a cross compiler to translate your java core over to c-c++ if you feel the urge. Itanium is a brutally fast architecture, and until it's mips/watts ratio drops well below the norm, your buisness case for scrapping it is going to be tough.
Storm
How about a HPC running Zimbra http://www.zimbra.com/ ... e-mail for the masses!
Nah, it's all about using the right tools for the job.
Clusters are a good thing, as they provide a very cost-effective platform for running codes with modest communication requirements. Just like running a communication-intensive code on a cluster will limit performance, running a code with little communication on a "real" supercomputer is a waste of money.
The sad thing about the current "HPC crisis" is not the rise of clusters, but the use of clusters for tasks which they are ill suited for (typically, "grand challenge" tasks).
The crucial thing is to have an appropriate balance, so that the tasks which really need a "real" supercomputer can get one, and other tasks can run on clusters.
Hi
This article could not have been posted at a better time!!
I want to build a cluster to host a highly redundant SQL database.
It would have 250ish users so performance is not too major an issue, however, it needs to be solid (5 9s uptime) as I dont wont to be fixing this thing on weekends.
Has anyone got a "real world" guide to implimenting this?
Our situation is this:
We have a lot of old Dell 1300 servers (10 or so). 4x9Gb HDDs and 384Mb Ram.
I want to run a shared calendar for 49 remote sites.
I need to have it reliable. I cant do this with windows (the cost of software alone will kill this project dead).
So, help me Obi Slashdot Kenobi, you really are my only hope!!
Why can't you run Java?
The TeraGrid has several large IA-64 clusters (mostly running SuSE Linux); as far as I know, Java works just fine on them.
from a business point of view. And try to sell it across the Business Community, Companies are still cn windows. They need to not just encourage such endeavours(frm their high towers) but also adapt them in order to help Small Scale and Medium Scale businesses take full advantage of HPC on Linux.
Scott McNealy to Michael: "Suck my Sun!" Michael Dell to Scott : "Lick my Dell!"
I have deployed several clusters throughout the years, mainly for research in academic environments and small companies, and I can say that clustering makes a lot of things soo much easier.
Diskless SSI clustering makes maintainance a breeze, and ensures that all systems are always in sync and up to date. All nodes can run the same system image, whether they are servers, dedicated compute nodes, or regular desktop machines.
Of course you can still have local hard disks if you want, and for some apps it is recommended, but the system boots from the servers nontheless.
OpenMosix dynamic distribution makes it possible to use heterogenous hardware, and handles highly dynamic computational load quite well. The applications just wander off to whatever physical machine will run them the fastest.
This also makes simple parallel implementations of code a lot simpler, just fork and forget, and you will pay a small overhead for the benefit of having good load-balancing automagically.
Dymanic distribution also makes it possible to use regular desktops as cluster nodes along with the dedicated compute nodes.
Need windows dualboot on some nodes? no problem, when you shut them down do boot windows, the processes that used to run on those machines just migrate to another node. When you go back to linux, processes come back.
Need explicit parallelism? no probs, MPI / PVM etc works fine together with the dynamic distribution and complements it for applications that are already well parallelized.
Scaling? This has never been an issue as long as the network infrastructure is up to speed. A decent 100mb or gigabit system has proven to be good enough for just about everything I've seen.
High availability? How about having several servers that can run hot or cold spare for each other, and which can function as compute nodes as well... Nice when a server MB catches fire (yes, I've had that, and lost as much as a few minutes of work time, (the time for someone to walk to the server room, unplug the smoking machine and restart a running (cold spare) backup server). Most of the people at the lab didn't even notice the hickup.)
Batch/job queues? no probs, use sun grid engine, write your own, or whatever. simple as cake.
I have mainly used gentoo linux for the flexibility and ease of maintainance and I can highly recommend it. It is all fairly simple to implement on gentoo. Just read up on gentoo system administration, pxelinux, tftp, openmosix, and whatever you feel you need to use it for.
The main problem right now is the lack of good openmosix support for 2.6 series of kernels. But I'm sure that some or all of this can be built with any or all of the other dynamic distribution systems out there.
If you have off-list questions please contact me at my nick at gmail.com.
What makes a supercomputer?
w se_frm/thread/93f8fec7407d662/d130c603031cf3b3?lnk =st&q=super+computer+faq+quote&rnum=2#d130c603031c f3b3
The fastest, most powerful machine to solve a problem today.
Generally credited to Sid Fernbach and George Michael and others
What if I qualify that with "cost?" ["for the cheapest"]
Then, it's not a supercomputer. Period.
http://groups.google.com/group/comp.sys.super/bro
How did this get modded interesting? It is REDUNDANT.
r y/l-cluster1/).
A moronic question (the answer is in the fucking article) that wasted other reader's time and created nothing but glut (since the answer is at the URL given in the story - http://www-128.ibm.com/developerworks/linux/libra
HPC hardware falls into three categories:
* Symmetric multiprocessors (SMP)
* Vector processors
* Clusters
Ignoring NUMA is unacceptable in an overview such as this. I guess I was expecting a decent article.
Linux Networx is an entire company providing HPC solutions using Linux.
http://www.lnxi.com/
Ten years ago Informix supported MPP configurations in which it would sit on top of MPP servers like the IBM SP2 (Deep Blue - the chess computer). In this configuration you'd have 20-200 separate nodes, each with its own memory & disk and Informix would be responsible for spreading its data across the nodes, running a query across nodes, and joining the results.
The sp2 was originally intended for scientific computing, but then most were sold as giant database servers. Around 1996 (I think) db2 also supported this kind of clustering.
Today the SP2 is no longer sold, but IBM supports the ICE configuration for db2 - in which you put together 5,10,200 blade servers, each with fibre connection to disk, and local memory. Then db2 handles queries spread across all blades. And of course, you don't have ot use blades - you can use desktops, or whatever. And it supports windows, aix, linux, solaris, etc. Informix still supports this kind of clustering as well.
Oracle now supports clustering over linux with RAC. It's more expensive, and doesn't scale as high, but probably is better at failover. And I think the teradata server (unix variant os & dbms) is also an mpp. At least it used to be.
Anyhow, I'd be surprised if there are as many scientific applications of clusters in production as there are databases.
OpenVMS clustering, arguably the most mature and most flexible clustering available, was somehow omitted from IBM's view of the clustering universe. Why didn't they address hot/hot[/hot[/hot...]] configurations? How about Single System Image (every member boots from the same system disk) configurations? (These two are not mutually exclusive).
/.ers remember VMS anyway.
These two are the Holy Grail of clustering capabilities. Um, no wonder IBM didn't mention them. And only the grey-haired
- The Kessel run is for nerf herders. I can circumnavigate the entire Central Finite Curve in a lot less than 12 parse
yes, kst, it does, in fact, run java just fine. It's part of CTSS if I recall correctly.
I thought with $849,000 worth of the best computers we'd be able to on a rare instance when needed use one of the idle ones to run something a 486 with Netscape 3.0 could run.
Guess not. We'll have to get a computer from 1994 to do that.
If you "get" pointers add me as a friend (116)!
Intel *does*, however, make a C compiler and a Fortran compiler, as well as an MPI implementation. They all work pretty darned nicely on a cluster of Itaniums. Granted, I'm not getting the java compiler to work right now (I seem to have gcj on here), but then, I'm not positive that I even installed the whole Java setup since I don't really care about Java on this (or any) cluster. :)
*Exactly*. And Intel's C/Fortran compiler is F A S T. (I use it on Itaniums as well. Not that Itaniums are all that fast, but who cares if you have 100+ of them. ;) )
For Linux, the Fortran compiler is also free right now (as in beer, for non-commercial use), which allows me to code and test from home on my x86 and then recompile it on Itanium. Easy!
Favorite
I wouldn't call Itanium a dying architecture.
We run a cluster of dual Itanium 2 servers and they work great - Java servers and all. They can handle massive loads compared to Xeon-based servers they replace.
off the front page.
My company (and me specifically) designed/built/runs a Windows 2000 cluster. It's not as affordable as a linux cluster, but our simulation engine is a windows-only product and there does not exist anything close for other platforms (I wish!!!). We have a huge efficiency rating with our in-house designed cluster system. A simulation that takes 8 minutes on a single serial processor takes less than 1 minute on an 8 computer cluster. Yes, you read that right, we are more efficient in a cluster than on a serial system.
It comes from how the simulation work is designed certainly, and then it comes down to available addressable memory. In a single 32bit system (under win2k) we can access ~2-3gb of memory for our simulation work. When we approach that limit for a single simulation, we see an incredible slow down due to swapping. By clustering the work, not only do we get the multiple processors working together to complete the simulation faster, but we use less memory per processor, eliminating (limiting) any HD swap slow down.
Eventually I suspect that 128bit systems will remove that inherent problem, we have simulations that easily take 16+gb of memory to complete that we just can't run on a single system. Until then, clusters are the way to go.
You can currently get up to 32G of ram on a dual opteron, 64 on a quad, or 128 on an 8way. This is using 4g (expensive) dimms. 2g dimms are much cheaper nowadays though, and 16/32/64 are still respectable numbers!
well, that's true but accessing that amount of memory, via a windows program, is not likely for any single processor in the SMP scenarios that you describe above. 128GB but its 16 per processor and not shared afaik... I don't do much on more than 2processor boxes