Slashdot Mirror


HA-OSCAR 1.0 Beta release - unleashing HA Beowulf

ImmO writes " The eXtreme Computing Research (XCR) group at Louisiana Tech University is pleased to announce the first public release of HA-OSCAR 1.0 beta. High Availability Open Source Cluster Application Resource (HA-OSCAR) is an open source project that aims toward non-stop services in the HPC environment through a combined power of High Availability and Performance Computing solutions. Our goal is to enhance a Beowulf cluster system for mission-critical applications and downtime-sensitive HPC infrastructures. To achieve high availability, component redundancy is adopted in HA-OSCAR cluster to eliminate single point of failures, especially at the head node. HA-OSCAR also incorporates a self-healing mechanism; failure detection & recovery, automatic failover and fail-back. The 1.0 beta release supports new high-availability capabilities for Linux Beowulf clusters based on OSCAR 3.0 It provides an installation wizard GUI and a web-based administration tool that allows a user to create and configure a multi-head Beowulf cluster. A default set of monitoring services are included to ensure that critical services, hardware components and important resources are always available at the control node. "

10 of 90 comments (clear)

  1. There's an article on HA-OSCAR... by tcopeland · · Score: 4, Informative

    ...written by Tong Liu (the lead developer) in last month's LinuxWorld.

    You have to be a subscriber to view the HTML, but it seems that you can download the PDF version for free...

  2. Linuxworld by ViceClown · · Score: 4, Informative

    Worth noting also, Linuxworld magazine has an article this month on HA-OSCAR which is pretty good!

    --
    Have a Happy.
  3. Here's the LinuxWorld article in full by Anonymous Coward · · Score: 1, Informative

    Just click and enjoy. It's a good read.

  4. More about beowulf? by Krik+Johnson · · Score: 5, Informative

    If you have seen all the jokes, but you still don't know what a beowulf cluster is, then this site is for you. It has all you need to know about it.

  5. More info on OSCAR and related projects @ by brechin · · Score: 3, Informative

    I've been writing some articles about OSCAR and some of the projects that are related that are being developed at NCSA and other places. You can find the latest version of this newsletter at the Linux Developer Newsletter site.

  6. OSCAR 3.0 Link correction by brechin · · Score: 4, Informative

    The link in the story to OSCAR 3.0 should be to http://oscar.sourceforge.net The other site is just the parent organization's info page.

  7. Re:OSCAR vs. Grid by ahadsell · · Score: 2, Informative
    OSCAR vs. Grid: Substantially different. Kinda like the difference between a LAN and the Internet.

    OSCAR vs. other cluster software: HA-OSCAR is a logical development of other open-source cluster software out there. For instance, see SLURM, a package for scheduling jobs on a Linux cluster.

  8. Re:sources of failure by jahill_isu · · Score: 2, Informative
    but I am curious if those who have worked on actual clusters could expand on the most common causes of failure...

    As a research assistant that helps maintain a cluster, the most frequent problems in out Commercial Off The Shelf (COTS) clusters are power supplies. We have at least one die each week. Hard drives are a close second.

  9. Re:CPU RAID by straponego · · Score: 3, Informative

    The only simple, honest answer to this is: it depends. If your jobs stay completely inside the CPU cache, and nothing else is happening in the system, and the scheduler is smart enough not to swap the tasks between CPUs without good reason, you should see very nearly 100% scalability. The larger the cache, the more likely this is, so at this point smaller jobs favor Xeon CPUs over Athlon/Opterons. Most jobs do need to access memory and disk, though. In these cases, the Opteron architecture does well, as the Hypertransport bus gives each CPU "dedicated" access to RAM.

  10. how does this compare to openssi? by jelle · · Score: 2, Informative

    How does this compare to OpenSSI? OPenSSI is nice because of the single system image approach, that makes administration very simple. AFAIK, an OpenSSI cluster also supports PVM and MPI in addition to exec and run-time load balancing (a'la mosix).

    OpenSSI has a lot of "HA-" support, including support for various clustered filesystems, failover of network interfaces across nodes, and failover of the first node (hopefully soon without needing shared SCSI storage but using something like drbd).

    --
    --- Hindsight is 20/20, but walking backwards is not the answer.