Rocks definitely has its flaws though. It's documentation is lacking too.. especially in troubleshooting.
Try making a separate home partition during the install. You'll end up with a borked kickstart server due to the/export/home automount to/home failing. So you reinstall
Then you'll decide to add openpbs/torque after you've got your cluster up.. D'oh! That roll can only be installed during the initial install... so you reinstall again.
Then you decide to make some changes to your systems.. say set up channel bonding. One day a power supply blows, you replace it, but when you power the node back up, the os gets automatically reinstalled blowing away all of your customizations.
The best bet for clustering is a standard linux distro. I prefer either Gentoo, Suse, or Fedora. Diskless setups should be configured with a small initrd containing your root filesystem./usr and stuff should be nfs mounted. For hard drives, use a backup restore tool like MCMS to install nodes.
Set up rsh or ssh with passwordless host based authentication and set up some scripts to run commands on all of them. Then you can easily install software, etc on all nodes simultaneously. Set up whatever communication libs you need (mpi, pvm, etc) on all systems using your scripts and a nfs shared directory(either to share the program out, or just install from).
Need a job scheduler? Fine, use your scripts and install it on everyone. Have a drive fail? Use your backup/restore tool to reimage a current system to the failed one. Now you have a easily customizable and maintainable cluster that is not tied to some other groups release schedule.
Eventually I suspect that 128bit systems will remove that inherent problem, we have simulations that easily take 16+gb of memory to complete that we just can't run on a single system. Until then, clusters are the way to go.
You can currently get up to 32G of ram on a dual opteron, 64 on a quad, or 128 on an 8way. This is using 4g (expensive) dimms. 2g dimms are much cheaper nowadays though, and 16/32/64 are still respectable numbers!
Our installs are all done over gigabit from a backed up image. That initial image might take a couple hours to generate, but we only do it once. Every system after that takes ~10-15 minutes. Due to the way Linux handles device drivers, it is much faster to take a generic image and set it up for any system than it would be for Windows. The same image can be installed to ANY opteron or em64t system and be configured in a matter of a couple of minutes.
Now of course if there are RAID controllers in the system, you will need to wait for the RAID to initialize before installing, but I wasn't counting that.. you have to do that for any os you install. And while I haven't done an 8 way xeon in that timeframe, I have done 8 way opterons in 15 minutes.
A big reason is the fact Windows was up and running in two hours at all the right patch levels. The installation of SAP took two days on Windows, the installation on Linux Red Hat took two weeks.
2 hours sounds like way too long to me. I'm a tech at a linux cluster company. We can install basically any linux distribution (which we've typically included the latest updates in already) on basically any hardware in under 15 minutes.
The total cost of ownership is actually lower in this case than with Linux because of the hidden costs of the support."
While it is true that MCSE's come cheaper than a good unix/linux admin, you typically need more of them to do the same job. In my job, I administer all the servers, keeping on top of security updates easily. This is all in addition to my regular work(95%+ my time) installing and configuring clusters. Doing this same work with Windows would require more people than we have, actually costing more total.
growisofs from dvd+rwtools works great.
It isn't just for dvd+, it works with dvd-r and dvd-rw too.
Turn off scsi emulation in the kernel and have regular ide cdrom support turend on.
If you're using any modern distro, you can just open up the system update tool.... and then apply all updates in one fell swoop. I have wasted so many hours of my life repetitively rebooting windows systems to apply more and more updates.
Example:
Apply update foobar
reboot
Apply update foobar 1.01
reboot
Apply update foobar 1.01b
reboot
Apply update for update foobar 1.01b
reboot
When in Linux, it would just install the latest version the first time.. and unless it was a kernel.. it wouldn't even need to reboot a single time!
That's one thing that I find Linux is way ahead in. There is absolutely no reason that MS should still require countless reboots for installing software. An application install should not need a reboot.
Rocks definitely has its flaws though. It's documentation is lacking too.. especially in troubleshooting.
/export/home automount to /home failing. So you reinstall
/usr and stuff should be nfs mounted. For hard drives, use a backup restore tool like MCMS to install nodes.
Try making a separate home partition during the install. You'll end up with a borked kickstart server due to the
Then you'll decide to add openpbs/torque after you've got your cluster up.. D'oh! That roll can only be installed during the initial install... so you reinstall again.
Then you decide to make some changes to your systems.. say set up channel bonding. One day a power supply blows, you replace it, but when you power the node back up, the os gets automatically reinstalled blowing away all of your customizations.
The best bet for clustering is a standard linux distro. I prefer either Gentoo, Suse, or Fedora. Diskless setups should be configured with a small initrd containing your root filesystem.
Set up rsh or ssh with passwordless host based authentication and set up some scripts to run commands on all of them. Then you can easily install software, etc on all nodes simultaneously. Set up whatever communication libs you need (mpi, pvm, etc) on all systems using your scripts and a nfs shared directory(either to share the program out, or just install from).
Need a job scheduler? Fine, use your scripts and install it on everyone. Have a drive fail? Use your backup/restore tool to reimage a current system to the failed one. Now you have a easily customizable and maintainable cluster that is not tied to some other groups release schedule.
You can currently get up to 32G of ram on a dual opteron, 64 on a quad, or 128 on an 8way. This is using 4g (expensive) dimms. 2g dimms are much cheaper nowadays though, and 16/32/64 are still respectable numbers!
Our installs are all done over gigabit from a backed up image. That initial image might take a couple hours to generate, but we only do it once. Every system after that takes ~10-15 minutes. Due to the way Linux handles device drivers, it is much faster to take a generic image and set it up for any system than it would be for Windows. The same image can be installed to ANY opteron or em64t system and be configured in a matter of a couple of minutes.
Now of course if there are RAID controllers in the system, you will need to wait for the RAID to initialize before installing, but I wasn't counting that.. you have to do that for any os you install. And while I haven't done an 8 way xeon in that timeframe, I have done 8 way opterons in 15 minutes.
2 hours sounds like way too long to me. I'm a tech at a linux cluster company. We can install basically any linux distribution (which we've typically included the latest updates in already) on basically any hardware in under 15 minutes.
While it is true that MCSE's come cheaper than a good unix/linux admin, you typically need more of them to do the same job. In my job, I administer all the servers, keeping on top of security updates easily. This is all in addition to my regular work(95%+ my time) installing and configuring clusters. Doing this same work with Windows would require more people than we have, actually costing more total.
growisofs from dvd+rwtools works great. It isn't just for dvd+, it works with dvd-r and dvd-rw too. Turn off scsi emulation in the kernel and have regular ide cdrom support turend on.
If you're using any modern distro, you can just open up the system update tool.... and then apply all updates in one fell swoop. I have wasted so many hours of my life repetitively rebooting windows systems to apply more and more updates. Example: Apply update foobar reboot Apply update foobar 1.01 reboot Apply update foobar 1.01b reboot Apply update for update foobar 1.01b reboot When in Linux, it would just install the latest version the first time.. and unless it was a kernel.. it wouldn't even need to reboot a single time! That's one thing that I find Linux is way ahead in. There is absolutely no reason that MS should still require countless reboots for installing software. An application install should not need a reboot.