High-Performance Linux Clustering
An anonymous reader writes "High Performance Computing (HPC) has become easier, and two reasons are the adoption of open source software concepts and the introduction and refinement of clustering technology. This first of two articles discusses the types of clusters available, uses for those clusters, reasons clusters have become popular for HPC, some fundamentals of HPC, and the role of Linux in HPC."
With Linux and other freely available open source software components for clustering and improvements in commodity hardware, the situation now is quite different. You can build powerful clusters with a very small budget and keep adding extra nodes based on need.
Yea, I'd like to build one but I'm not sure what I'd use it for. Does that mean I'm a geek?
Bradley Holt
We spent $849,000 on an Itanium cluster and have recently found ourselves SOL since it's a dying architecture.
You can't even run Java on them.
If you "get" pointers add me as a friend (116)!
Jokes aside, when people say Linux cluster, do they usually mean Beowulf? Or are there other clusters and how do they compare? How difficult is it to setup a Beowulf cluster?
EvilCON - Made Famous by
Okay, so I'd really enjoy trying something like the clustered model, just for academic kicks, but a relevant question comes to mind, at least for me.
:)
Where do people get the commodity systems cheap enough to be able to play around with this? I hardly want to spend two thousand bucks on some old P2s just to play around. Anyone have some hot tips where you can find real cheap (dare I dream... free) commodity systems to build a low-end cluster for kicks?
Also, I'm a Windows guy by trade. Will making a Linux cluster make me instantly cool?
From everything I've seen, MOSIX is having some issues right now. Unfortunately, MOSIX is one of the easiest, most flexible ways to set up an HPC, and ever since they forked, development has been slow. I did research about 2 months ago to look into setting up a small MOSIX cluster with a few computers. My main goal was to get my feet wet in setting up a cluster using a few desktop and laptop computers. I figured that setting up a cluster with my Athlon 64 x2, Athlon 64 3500+, and a few laptops would speed up compile times by quite a bit. But, it appears that the 2.6 version of MOSIX is still beta and won't support the kernel I need for my Athlon 64 x2 (versions before 2.6.9 don't support powernow with the x2, and also tend to be flaky). So, I have the choice of running a cluster with slower PC's, or waiting for better support. If you look at the year on some of those whitepapers, only one was written this year, and I'd be willing to bet they are describing how to use MOSIX with the 2.4 kernel, not 2.6. I finally gave up on the idea, as running the latest kernel is more important to me.
Zero CPU implementations of this.
But their links could at least have mentioned OSCAR http://oscar.openclustergroup.org/ or my personal favorite, ROCKS http://www.rocksclusters.org/, as these are more prevalent than xCat systems.
Personally, I like Rocks, as I ran three parallel architectures (i386/AMD64/IA64), on the same based distribution, just with each tuned to their particular processor. Comes with SGE and Myrinet support out of the box, and there are Rolls, i.e. custom software assemblages, for OpenPBS, for those who prefer it, as well as PVFS. It's easy to set up, and easy to administer, as the nodes are presumed to be interchangeable and disposable. When you reboot a node, it's obliterated and a fresh OS and supplementary package repository are laid down on a clean disk. No questions about version skew.
They now have a custom roll to help you build a visualization wall, but I never had a chance to try that one. (try convincing your boss that you want 4 digital projectors and a big room to play with)
The downside to the above distributions are that they presume batch-queue environments, which is appropriate for most of my work, but less so for many people trying to simulate owning an SMP, without paying SMP prices.
Other people assure me that the current version of OSCAR is solid as well, but they seem to lag in the multiple architecture support area (Itanium is always behind), and don't current support AMD64 natively. On the other hand, they build on top of several RedHatish linuces, as opposed to Rocks where you get Centos (RHEL), period.
the more accurate the calculations became, the more the concepts tended to vanish into thin air. R. S. Mulliken
Some guy on an HP forum asks how to get Java code to run in his Web browser on Linux Itanium. Shock and awe follows as he's told you can't run an applet on his multi-thousand dollar 64-bit workstation.
Link
If you "get" pointers add me as a friend (116)!
You can't even run Java on them.
ok, Where to begin...
First, spending a million bucks on a machine that doesnt meet your needs. I hope there is an accountant ready to spank someone over this.
Second, using Java in a massivly parallel fashion.. Last I knew there wasnt a MPI or PVM port that used Java, plus it kinda defeats the purpose of having big hardware running a slower language(yes I know compiled java can be fast, but Nowhere like near metal C ).
Third, Giving up on a hellfire machine.... Really dont Boo-hoo that you cant make it run java, open a book, code some C++ and make the thing work. If you have a problem for it to solve, then solve it, heck use a cross compiler to translate your java core over to c-c++ if you feel the urge. Itanium is a brutally fast architecture, and until it's mips/watts ratio drops well below the norm, your buisness case for scrapping it is going to be tough.
Storm
I have deployed several clusters throughout the years, mainly for research in academic environments and small companies, and I can say that clustering makes a lot of things soo much easier.
Diskless SSI clustering makes maintainance a breeze, and ensures that all systems are always in sync and up to date. All nodes can run the same system image, whether they are servers, dedicated compute nodes, or regular desktop machines.
Of course you can still have local hard disks if you want, and for some apps it is recommended, but the system boots from the servers nontheless.
OpenMosix dynamic distribution makes it possible to use heterogenous hardware, and handles highly dynamic computational load quite well. The applications just wander off to whatever physical machine will run them the fastest.
This also makes simple parallel implementations of code a lot simpler, just fork and forget, and you will pay a small overhead for the benefit of having good load-balancing automagically.
Dymanic distribution also makes it possible to use regular desktops as cluster nodes along with the dedicated compute nodes.
Need windows dualboot on some nodes? no problem, when you shut them down do boot windows, the processes that used to run on those machines just migrate to another node. When you go back to linux, processes come back.
Need explicit parallelism? no probs, MPI / PVM etc works fine together with the dynamic distribution and complements it for applications that are already well parallelized.
Scaling? This has never been an issue as long as the network infrastructure is up to speed. A decent 100mb or gigabit system has proven to be good enough for just about everything I've seen.
High availability? How about having several servers that can run hot or cold spare for each other, and which can function as compute nodes as well... Nice when a server MB catches fire (yes, I've had that, and lost as much as a few minutes of work time, (the time for someone to walk to the server room, unplug the smoking machine and restart a running (cold spare) backup server). Most of the people at the lab didn't even notice the hickup.)
Batch/job queues? no probs, use sun grid engine, write your own, or whatever. simple as cake.
I have mainly used gentoo linux for the flexibility and ease of maintainance and I can highly recommend it. It is all fairly simple to implement on gentoo. Just read up on gentoo system administration, pxelinux, tftp, openmosix, and whatever you feel you need to use it for.
The main problem right now is the lack of good openmosix support for 2.6 series of kernels. But I'm sure that some or all of this can be built with any or all of the other dynamic distribution systems out there.
If you have off-list questions please contact me at my nick at gmail.com.