Slashdot Mirror


HA-OSCAR 1.0 Beta release - unleashing HA Beowulf

ImmO writes " The eXtreme Computing Research (XCR) group at Louisiana Tech University is pleased to announce the first public release of HA-OSCAR 1.0 beta. High Availability Open Source Cluster Application Resource (HA-OSCAR) is an open source project that aims toward non-stop services in the HPC environment through a combined power of High Availability and Performance Computing solutions. Our goal is to enhance a Beowulf cluster system for mission-critical applications and downtime-sensitive HPC infrastructures. To achieve high availability, component redundancy is adopted in HA-OSCAR cluster to eliminate single point of failures, especially at the head node. HA-OSCAR also incorporates a self-healing mechanism; failure detection & recovery, automatic failover and fail-back. The 1.0 beta release supports new high-availability capabilities for Linux Beowulf clusters based on OSCAR 3.0 It provides an installation wizard GUI and a web-based administration tool that allows a user to create and configure a multi-head Beowulf cluster. A default set of monitoring services are included to ensure that critical services, hardware components and important resources are always available at the control node. "

90 comments

  1. Imagine... by Anonymous Coward · · Score: 0, Funny

    Just imagine a beowulf cluster of those! (fp)

    1. Re:Imagine... by Anonymous Coward · · Score: 0, Offtopic

      Thing is, it's actually a fairly amusing comment given the story. Since comments seem to get modded up or down largely based on the first couple of moderations made to them, I'd expect to see at least one make (Score: 5, Funny) while the rest go down to (Score: -1, Redundant/Troll/Flamebait/Offtopic).

      So what you've got to ask yourself is 'Am I feeling lucky?'. Well, are you, punk?

      (I'm not, hence the anonymous posting...)

    2. Re:Imagine... by Anonymous Coward · · Score: 0

      It was titled "You Have the Right to Remain Silent", or Not"

      I posted a reply to it, and cant access it through my user page, although i still see it there. I just get the 'move along nothing to see here' and it's completley gone from the front page.. ahh well.. ces la vie

    3. Re:Imagine... by Anonymous Coward · · Score: 2, Funny

      Given that this is slashdot, they probably deleted it because they realized it had nothing to do with Anime, Microsoft bashing, or Linux worship.

    4. Re:Imagine... by Anonymous Coward · · Score: 0

      You know, the one about caffeine and paranoia. Why did they delete it?

      haha. I'm not sure, maybe a dupe? It was linked to an article found on wired.coms front page.

  2. There's an article on HA-OSCAR... by tcopeland · · Score: 4, Informative

    ...written by Tong Liu (the lead developer) in last month's LinuxWorld.

    You have to be a subscriber to view the HTML, but it seems that you can download the PDF version for free...

  3. Linuxworld by ViceClown · · Score: 4, Informative

    Worth noting also, Linuxworld magazine has an article this month on HA-OSCAR which is pretty good!

    --
    Have a Happy.
  4. I wonder.. by Orgazmus · · Score: 0, Offtopic

    I wonder how many posts will be constructive on this topic, and how many that will be that "imagine a..." joke.

    But all hail HA-OSCAR. Powercomputing to the people. :)

    --
    The system had the verbosity of HTML combined with all the readability of compiled assembly viewed as bitmap images
  5. CPU RAID by manganese4 · · Score: 3, Interesting

    So on a multi-CPU sever if you started the same process synchronously on multiple CPU, how close in time would they finish assuming there is sufficient memory and disk drive controller to prevent severe competition?

    --
    I make my face look like this and concerned words come out.
    1. Re:CPU RAID by Anonymous Coward · · Score: 0

      Hate to point out the obvious, but if you start 4 identical tasks on 4 CPUs.. theoretically, barring other system resource jams, as you said, they would finish simultaniously.

    2. Re:CPU RAID by manganese4 · · Score: 1

      But is there enough noise/error in non-competition resource access to cause them to unsync?

      --
      I make my face look like this and concerned words come out.
    3. Re:CPU RAID by straponego · · Score: 3, Informative

      The only simple, honest answer to this is: it depends. If your jobs stay completely inside the CPU cache, and nothing else is happening in the system, and the scheduler is smart enough not to swap the tasks between CPUs without good reason, you should see very nearly 100% scalability. The larger the cache, the more likely this is, so at this point smaller jobs favor Xeon CPUs over Athlon/Opterons. Most jobs do need to access memory and disk, though. In these cases, the Opteron architecture does well, as the Hypertransport bus gives each CPU "dedicated" access to RAM.

  6. to really test it by Anonymous Coward · · Score: 0

    they should put their web page on that cluster and see if it can handle a /.ing...if it does, then i would allow them to admit its high availability

  7. First thing this cluster could compute: by Da+Fokka · · Score: 2, Funny

    The ratio 'imagine a...'-jokes to 'now there will be a lot of 'imagine a...''-jokes

    1. Re:First thing this cluster could compute: by OECD · · Score: 0, Offtopic

      100% Redundant? Sorry, guy. I wish I had mod points for you.

      Of course, this IS /. and it seems oddly fitting that the moderators don't RTFP.

      --
      One man's -1 Flamebait is another man's +5 Funny.
    2. Re:First thing this cluster could compute: by Anonymous Coward · · Score: 0
      No problem at all.

      Imagine an Xgrid of these!

    3. Re:First thing this cluster could compute: by Hentai · · Score: 1

      Damn. Thanks to my twitchy mousewheel, I just accidentally moderated this -1 Flamebait instead of +1 Funny. Replying now to cancel the moderation, and my sincere apologies.

      --
      -Hentai [in vita non pacem est]
  8. In Other News by PonyHome · · Score: 5, Funny

    Darl McBride files suit against Louisiana Tech, saying "This is one more example of how SCO innovation has been misappropriated."

  9. Re:Just imagine... by Nick+of+NSTime · · Score: 3, Funny

    What? Imagine what? Don't keep me in suspense!

  10. Re:Just what I need by meringuoid · · Score: 1

    You know, this is the sort of troll I just don't get. There's no way it's for real - throwing out Opterons?! - but it's not inflammatory, or even particularly interesting. Is there a '-1, Pointless' moderation?

    --
    Real Daleks don't climb stairs - they level the building.
  11. Buzzwords Aplenty! by DanoTime · · Score: 4, Funny

    Boy, I could make my manager's head spin just by reading the summary of that article!

    1. Re:Buzzwords Aplenty! by sg_oneill · · Score: 2, Funny

      Haha. Yeah I was kinda thinking there was a bit of a buzzword overload there..

      That said , I think they missed the bit about it using "XML compliant Strategic Webservice Failover Product placements + Redundant steak knives!!"

      Aint it scary tho, when you read articles like that, and despite having years of IT deep-fried knowledge, you'd probably have to pass it to marketing to decode it.

      --
      Excuse the Unicode crap in my posts. That's an apostrophe, and slashdot is busted.
  12. WTF? by Anonymous Coward · · Score: 0

    Less TLAs more actual worth please.

    Endless the world's turn, endless the sun's spinning Endless the quest; I turn again, back to my own beginning, And here, find rest.

  13. Here's the LinuxWorld article in full by Anonymous Coward · · Score: 1, Informative

    Just click and enjoy. It's a good read.

  14. Re:Just Imagine..... by Anonymous Coward · · Score: 0, Funny

    Just imagine a beowolf cluster of beowolf clusters of beowolf cluster jokes!

  15. More about beowulf? by Krik+Johnson · · Score: 5, Informative

    If you have seen all the jokes, but you still don't know what a beowulf cluster is, then this site is for you. It has all you need to know about it.

    1. Re:More about beowulf? by deadline · · Score: 1

      You may also find ClusterWorld web-site and magazine useful.

      --
      HPC for Primates. Read Cluster Monkey
  16. Buzzword count by Electrawn · · Score: 4, Funny

    High amount of corporate buzzwords detected: self-healing, mission-critical, GUI, beowulf...

    Oh, this project actually does those things? Quaint!

    Just running the vaporware bullshit o-meter here...

    1. Re:Buzzword count by AviLazar · · Score: 1

      GUI is a corporate buzzword? How many corporate slaves, who aren't semi-techies, know what GUI means - other then "I've been slimed" :) -A

      --

      I mod down so you can mod up. Your welcome.
  17. Re:Just what I need by Anonymous Coward · · Score: 0

    He's just trying to get more links to the site under his name. The porn one.

  18. hold on hold on by tetrahedrassface · · Score: 2, Funny

    I can hear the terrorist governments of the world licking their chops for this one! Im just joking. Or am I?

  19. Re:Is this Slashdot? by Wateshay · · Score: 0, Offtopic

    At least they were honest about it, instead of trying to pretend they're some random geek who stumbled onto their site.

    --

    "If English was good enough for Jesus, it's good enough for everyone else."

  20. Misread title.... by BenJeremy · · Score: 1, Funny

    I thought this was about Beowolf clusters in NASCAR. :o

  21. More info on OSCAR and related projects @ by brechin · · Score: 3, Informative

    I've been writing some articles about OSCAR and some of the projects that are related that are being developed at NCSA and other places. You can find the latest version of this newsletter at the Linux Developer Newsletter site.

  22. 3d desktop by iamthemoog · · Score: 0, Redundant

    I assume all that horsepower is needed to run a 3d desktop in Java...

    --
    No Norm, those are your safety glasses; I'll wear my own thanks...
  23. OSCAR 3.0 Link correction by brechin · · Score: 4, Informative

    The link in the story to OSCAR 3.0 should be to http://oscar.sourceforge.net The other site is just the parent organization's info page.

  24. /. effect by tehcyder · · Score: 1
    So how come none of the linked sites have been slashdotted?

    Is it because they have un-killable servers, or rather that is this not a hot enough topic here?

    --
    To have a right to do a thing is not at all the same as to be right in doing it
  25. Re:Just what I need by AftanGustur · · Score: 0, Offtopic


    You know, this is the sort of troll I just don't get.

    That's how trolls function.. Put out outrageus statements that about a third of the readers doesn't understand, a third thinks is "informative" and the last thinks is insulting..
    And then maybe a few %'s see as troll.

    This guy is just trolling on the fact that very few /. users will ever see this code run, let alone install a Beowulf cluster.

    It's like if /. had an article on new state regulations on buying rocket fuel, and I would say how glad I was I bought my 8 tons of rocket propellant last week, just in time.. And then make up some numbers about how long it could be stored and under what conditions. I'm sure it would look "informative" to a lot of people.

    --
    echo '[q]sa[ln0=aln80~Psnlbx]16isb572CCB9AE9DB03273snlbxq' |dc
  26. GUI is a buzzword by Electrawn · · Score: 1

    GUI is a buzzword the pointy haired types can easily understand and make sentences with.

    E:"Our new router blocks 99% of our spam! Saves us millions!"

    B:"But does it have a GUI?"

    E:"No, it..."

    B:"All our TCO spec products must have GUIs!"

    *blink*

    http://www.buzzwhack.com/buzzcomp/indgk.htm

    1. Re:GUI is a buzzword by AviLazar · · Score: 1

      I guess. But this begs the question, do they know what GUI means :) -A

      --

      I mod down so you can mod up. Your welcome.
  27. Kind ruins the cliche by Bob+Loblaw · · Score: 1

    "Just imagine one of these!" doesn't have the same ring ...

  28. Re:Pseudo code version by Mateito · · Score: 1

    Sorry, to get modded up on slashdot you should have written it in obfiscated perl.

  29. OSCAR? by Anonymous Coward · · Score: 0

    Mozilla got in IP trouble with the name Phoenix, as AMI had some sort of browser called Phoenix.
    Then there was a big snit over Firebird, because of the pre-existing database software. These things happen all the time. It's stupid, but it does happen.

    This said, is this project gonna run into issues with AOL regarding the "OSCAR" in the name? (AOL's IM protocol is called OSCAR).

    1. Re:OSCAR? by theguywhosaid · · Score: 1
      i wouldnt imagine so, because
      • not the same product space, though Firebird DB wasnt either
      • AOL's protocol isnt an important brand issue to them, the AIM service is.
      • i just dont
  30. OSCAR vs. Grid by Anonymous Coward · · Score: 0

    How different in technology implementation is the open source OSCAR vs. the grid computing technology? or any other form of cluster s/w out there?

    1. Re:OSCAR vs. Grid by ahadsell · · Score: 2, Informative
      OSCAR vs. Grid: Substantially different. Kinda like the difference between a LAN and the Internet.

      OSCAR vs. other cluster software: HA-OSCAR is a logical development of other open-source cluster software out there. For instance, see SLURM, a package for scheduling jobs on a Linux cluster.

  31. sources of failure by mkstowegnv · · Score: 1

    It isn't surprising that beowulf clusters would want to incorporate mechanisms to deal with node failure, but I am curious if those who have worked on actual clusters could expand on the most common causes of failure. I was surprised to read in a previous slashdot post (sorry no URL) that even clusters of mini-ITX boards without hard drives (the most failure-prone component I would have thought) have frequent failures.

    1. Re:sources of failure by Bombcar · · Score: 2, Funny

      I've heard (no sources, google it) that Grendel is hard for beowulf clusters to deal with.....

      Maybe? I dunno. :)

    2. Re:sources of failure by jahill_isu · · Score: 2, Informative
      but I am curious if those who have worked on actual clusters could expand on the most common causes of failure...

      As a research assistant that helps maintain a cluster, the most frequent problems in out Commercial Off The Shelf (COTS) clusters are power supplies. We have at least one die each week. Hard drives are a close second.

    3. Re:sources of failure by Anonymous Coward · · Score: 0

      Actually, Beowulfs have no problem with Grendels (or their moms) but fire breathing dragons do them in every time. If anyone brings you a gold cup, kill them where they stand.

      It's the heat! It cooks power supplies, drives, blade cards, what have you. Most cheap small PC's are too hot to stack in close quarters and run 24x7 without serious air conditioning.

    4. Re:sources of failure by streepje · · Score: 1

      The sources are essentially no different than for your desktop but if you do the math you'll see that failure is much more common when you have a bunch of them.

      What's the probability that your desktop will crash if you run it fully loaded for a week? Pick a number, say 1%. So it has 99% chance of completing the job.

      Now suppose you have a job that runs in parallel on 100 such nodes flat out for a week. The probability that the job finishes successfully is (0.99)^100 or about 36%

      So the job is about two-thirds as likely to fail. That's is why fault tolerance is such an issue on clusters.

  32. someone has to say it... by dcordeiro · · Score: 0, Redundant

    imagine a beowulf..

    wait a sec, someone has really imagined it !!!

  33. Imagine... by dustin_royer · · Score: 0, Redundant

    ...a beowulf cluster of, oh wait, nevermind.

  34. Dr Box by ChaserPnk · · Score: 2, Interesting

    I actually go to Louisiana Tech. Chokchai Leangsuksun (Dr. Box), the director of the HA-Oscar program also teaches my Operating Systems class. He came into class today looking tired...he said he'd been working very hard on it.

    I think it's about time LaTech got some recognition.

    --

    "A diplomat is a man who always remembers a woman's birthday but never remembers her age." -Robert Frost
    1. Re:Dr Box by elchican · · Score: 1

      Yes it is time that LaTech got a little recognition. Dr. Box also deserves a great deal of credit. He's a very talented and gifted man. Box actually brought a small cluster of his IBM tablets into class the other day and we actually saw a few of HA-OSCAR's capabilities. Hopefully HA-OSCAR will pan out like expected.

  35. HA Beta?!? by Charles+Dart · · Score: 1

    High availability and beta don't seem to go together to me. I don't think an OS should be classified such until it is STABLE



    <mumble> I doubt anyone will read this, drowning as it is in stupid Beowulf jokes</mumble>

    This story is burning up enough mod points to give us all karma nirvana.

    Please, stop wasting points modding off-topic.

    1. Re:HA Beta?!? by Rich+Klein · · Score: 1

      Good point. I was wondering why a beta was released at 1.0, which implies, to me, a production release. If it were up to me, I'd release a beta at 0.9 or something.

      If it's stable then they should probably drop the beta suffix.

      --
      -Rich
  36. Imagine my surprise by shancock · · Score: 1

    when there was no mention of the satellites or amateur radio here.

  37. Time to play! by pair-a-noyd · · Score: 1

    I've got about ~55 Compaq's that are bored to death and looking for something to do..

    Now, if the circuit breakers will only hold up long eno

  38. Hopefully fail safe ? by LupeSpywalper · · Score: 2, Funny

    I hear a certain terrorist group's Open Source Application Management Administrator (OSAMA) is already working hard to find some loop holes in the code.

  39. More Cluster Information by deadline · · Score: 1
    Shameless Plug:

    There is now a magazine and a news website dedicated to HPC/Beowulf cluster computing. You may recognize the webpage format.

    We are still running our free three month trial issue offer as well.

    --
    HPC for Primates. Read Cluster Monkey
  40. Does it auto fork processes by nurb432 · · Score: 1

    Mosix does this, but what about this? Or do you have to recompile and optimize for clustering?

    In a 'regular' environment auto propagate would be more useful.

    --
    ---- Booth was a patriot ----
  41. how does this compare to openssi? by jelle · · Score: 2, Informative

    How does this compare to OpenSSI? OPenSSI is nice because of the single system image approach, that makes administration very simple. AFAIK, an OpenSSI cluster also supports PVM and MPI in addition to exec and run-time load balancing (a'la mosix).

    OpenSSI has a lot of "HA-" support, including support for various clustered filesystems, failover of network interfaces across nodes, and failover of the first node (hopefully soon without needing shared SCSI storage but using something like drbd).

    --
    --- Hindsight is 20/20, but walking backwards is not the answer.