HA-OSCAR 1.0 Beta release - unleashing HA Beowulf
ImmO writes " The eXtreme Computing Research (XCR) group at Louisiana Tech University is pleased to announce the first public release of HA-OSCAR 1.0 beta. High Availability Open Source Cluster Application Resource (HA-OSCAR) is an open source project that aims toward non-stop services in the HPC environment through a combined power of High Availability and Performance Computing solutions. Our goal is to enhance a Beowulf cluster system for mission-critical applications and downtime-sensitive HPC infrastructures. To achieve high availability, component redundancy is adopted in HA-OSCAR cluster to eliminate single point of failures, especially at the head node. HA-OSCAR also incorporates a self-healing mechanism; failure detection & recovery, automatic failover and fail-back. The 1.0 beta release supports new high-availability capabilities for Linux Beowulf clusters based on OSCAR 3.0 It provides an installation wizard GUI and a web-based administration tool that allows a user to create and configure a multi-head Beowulf cluster. A default set of monitoring services are included to ensure that critical services, hardware components and important resources are always available at the control node. "
...written by Tong Liu (the lead developer) in last month's LinuxWorld.
You have to be a subscriber to view the HTML, but it seems that you can download the PDF version for free...
The Army reading list
Worth noting also, Linuxworld magazine has an article this month on HA-OSCAR which is pretty good!
Have a Happy.
If you have seen all the jokes, but you still don't know what a beowulf cluster is, then this site is for you. It has all you need to know about it.
I have a fetish for traffic cones
I've been writing some articles about OSCAR and some of the projects that are related that are being developed at NCSA and other places. You can find the latest version of this newsletter at the Linux Developer Newsletter site.
The link in the story to OSCAR 3.0 should be to http://oscar.sourceforge.net The other site is just the parent organization's info page.
The only simple, honest answer to this is: it depends. If your jobs stay completely inside the CPU cache, and nothing else is happening in the system, and the scheduler is smart enough not to swap the tasks between CPUs without good reason, you should see very nearly 100% scalability. The larger the cache, the more likely this is, so at this point smaller jobs favor Xeon CPUs over Athlon/Opterons. Most jobs do need to access memory and disk, though. In these cases, the Opteron architecture does well, as the Hypertransport bus gives each CPU "dedicated" access to RAM.