Slashdot Mirror


Fault Tolerant Shell

Paul Howe writes "Roaming around my school's computer science webserver I ran across what struck me as a great (and very prescient) idea: a fault tolerant scripting language. This makes a lot of sense when programming in environments that are almost fundamentally unstable i.e. distributed systems etc. I'm not sure how active this project is, but its clear that this is an idea whose time has come. Fault Tolerant Shell."

5 of 234 comments (clear)

  1. You're dealing with the problem too high up by phaze3000 · · Score: 5, Interesting
    IMO (as someone who works on clustered systems for a living) you're looking at this from the wrong point of view. A clustered shell is useful only if the system it is running on top of is inherently unstable.

    The real benefit is in having a system which is sufficiently distributed that any program running on top of it can continue to do so despite any sort of underlying failure.

    --
    Blaming GW Bush for the Iraq war is like blaming Ronald McDonald for the poor quality of food.
  2. Nice, which brings me too.... by MrIrwin · · Score: 5, Interesting

    The idea of being to timeout and exception handle in scripts is a great idea......assuming you want to use scripts. I think most people end up resorting to Perl, Python or whatever for anything more complex. But perhaps with this facility Scripts would be more useful? But...now I come to a related topic. I build factory wide systems, systems which have eg. Automatic warehouses and whatever in the middle. I do a lot of stuff with VB6 not because it is fault tolerant but because it is 'fix tolerant'. During the comminssioning phases I can leave a program running in the debugger and, if it freaks out, I can debug, fix, test by iterating forwards and **backwards** in the the function that caused the hitch, and then continue to run were I left off. Many minor problems get fixed on the fly without users even realizing anything was amiss. In every other respect (syntax, structure, error trapping etc) VB6 is a disaster and not really suited at all to these types of progects, so the fact that I use it is a measure of how important this feature is. Like the fault tolerant shell, it is a 'non-pure' extension insofar as purists say it should not be neccessary, but in pratice it is a godsend. Anybody know an alternative for VB6 in this respect?

    --

    And if you thought that was boring you obviously havn't read my Journal ;-)

  3. this can essentially already be done in /bin/bash by xlurker · · Score: 5, Interesting
    (the concept of fault-tolerant coding encourages sloppy coding. and it makes it harder to see what's actually happening in the script. but that's not what they actually mean.)

    what they seem to essentially want is

    • a try statement and error catching and
    • a fortran like syntax for testing and arithmetic
    I think the authors were a bit misguided. Instead of creating a whole new shell how about just extending a good existing shell with a new try statement a described.

    it can even be done without extending the shell:

    ( cd /tmp/blabla
    &&
    rm -rf tmpdir
    &&
    wget http://some.thing/wome/where
    ) || echo something went wrong

    as for the new syntax of .eq. .ne. .lt. .gt. .to.
    certainly looks like fortran-hugging to me , yuck

    as for integer arithmetic, that can be done with by either using backticks or the $[ ] expansion

    % echo $[ 12 * 12 + 10 ]
    % 154
    --
    ______________________________________________
    sigamajig...
  4. Once again, Babbage was thinking ahead... by andersen · · Score: 5, Interesting

    "On two occasions I have been asked [by members of Parliament!], 'Pray, Mr.
    Babbage, if you put into the machine wrong figures, will the right answers
    come out?' I am not able rightly to apprehend the kind of confusion of ideas
    that could provoke such a question."
    -- Charles Babbage

    --
    -Erik -- --This message was written using 73% post-consumer electrons--
  5. OK, wise guys... by JAPrufrock · · Score: 5, Interesting
    I'm working with Grid and ftsh as we speak. I'm a physicist, not a professional coder. I write reasonable code, but I'm no purist. With that said...

    ftsh has great utility in the realm it's written for. Obviously, it's not a basis for installing kernels or doing password authentication. In a Grid (not just distributed) environment, things break for all sorts of reasons all the time. You're dealing with a Friendly Admin on another system, one who may well be unaffiliated with your institution, project or field of study. He doesn't have any particular reason to consult with you about system changes.

    Now you find yourself writing a grid diagnostic or submitter or job manager. One does not need strongly typed compiled languages for this. Shell scripts are almost always more efficient to write, and the speed difference is unimportant. Right now, most Grid submitters are being written in bash or Python or some such. Bash sucks for exception handling of the sort we're talking about. Python does better with its try: statements, but there's room for improvement. ftsh is a good choice for a sublayer to these scripts. One writes some of the machinery that actually interacts with the Grid nodes and supervisors in this easy, clear and flexible form.

    Now there are a lot of specific points to answer:

    One needs a Windows port to be able to make the Grid software we write in Linux available to the poor drones who are stuck with Win boxes.

    This is not a code spellchecker or coding environment. At all.

    This is not a crutch for inadequate programmers. This is a collection of methods to deal with a specific set of recalcitrant problems.

    As I was pointing out before, this is, after all, an unstable system. One is using diverse resources on diverse platforms in many countries at many institutions. I appreciate the comment made by unixbob about operating in heterogeneous environments.

    This isn't a substitute for wget. One uses wget as an example because it's clear.

    The "pull" model breaks down immediately when there is no unified environment, as is described on infrastructures.org. When you're not the admin, and your software has to be wiped out the minute your job is done, "push" is the only way to do it. This is the case with most Grid computing right now (that I know about)

    All the woe and doom about the sloppy coding and letting the environment correct your deficiencies is... ill-thought-out. That's what a compiler is, folks. Should we all be coding in machine language? :) Use the right tool for the job and save time.

    I do agree, however, that one should indeed hone one's craft. Sloppy coding in projects of importance is inexcusable (M$). There is no reason to stick to strict exception handling, however, in the applications being discussed by ftsh's developers (the same folks who brought you Condor). When code becomes 3/4 exception handling, even when the specific exceptions don't matter, there's a problem, IMHO. :)