Fault Tolerant Shell
Paul Howe writes "Roaming around my school's computer science webserver I ran across what struck me as a great (and very prescient) idea: a fault tolerant scripting language. This makes a lot of sense when programming in environments that are almost fundamentally unstable i.e. distributed systems etc. I'm not sure how active this project is, but its clear that this is an idea whose time has come. Fault Tolerant Shell."
Also, I would appreciate (not quite the same) a auto-completing python interpreter and editor (which can complete methods and objects from modules)... Such kind of stuff really increases productivity !
Spelling mistakes: My is english spoken not tongue of mother.
The real benefit is in having a system which is sufficiently distributed that any program running on top of it can continue to do so despite any sort of underlying failure.
Blaming GW Bush for the Iraq war is like blaming Ronald McDonald for the poor quality of food.
It's a well meaning idea, but it would cause more problems than it would solve. It would just encourage sloppy code; people would rationalize "I don't need to fix errors because it doesn't matter", which is a very bad habit to get into when programming, ignoring errors, or even warnings
... or probably Perl or Python, either.
It doesn't actually seem to grok the commands that are being run, so something like
proc try {times script} {
if { [catch [uplevel $script] err] } { cleanup ; retry }
}
is all that's needed (of course to do it right you'd need a bit more, but still...).
try {5 times} {
commands...
}
Although Tcl is a bit lower level, and would require you to do exec ls, you could of course wrap that too so that all commands in the $script block would just be 'exec'ed by default.
In any case, better to use a flexible tool that can be tweaked to do what you need then write highly specialized tools.
http://www.welton.it/davidw/
This will not improve people's skills. In fact, it willl make them more prone to mistakes, and more likely to get the result that they didn't expect. It's similat to computer spell checkers. Ever since people started relying on these, their spelling has gone way downhill simly because they don't bother thinking. Computer do all the spelling for them. They don;t need a spell checker. They need spelling lessons.
This si even worse. Computers will try to second guess what the user means, get get it wrong half tyhe time.
A qualified shell scripter will be not make these mistakes in the first place. Anyone who thinks they need this shell actually just need to learn to spell and to ytype accuratly.
While, yes, you manage distributed systems from the center, you don't *push* updates, changes, modifications because, it doesn't scale. You end up having to write stuff like this fault tolerant shell which is frankly backwards thinking.
Instead, you automate everything and *pull* updates, changes, scripts etc. That way if a system is up, it just works, if it's down, it'll get updated next time it's up.
I won't go into details but I'll point you at http://infrastructures.org/
Government of the people, by corporate executives, for corporate profits.
on a loosely configured network, not saying this tool doesn't seem interesting, but it seems prone for use in DOS attacks...
Cloud City Digital: DVD Production at its cheapest/finest
how many times have you hacked something together in perl that ended up being relied on for some pretty important stuff, only to find 6 months down the track that there's some condition (db connects fine, but fails halfway through script execution as an example) you didn't consider and the whole thing just collapses in a heap - a nasty to recover heap cause you didn't write much logging code either.
This would REALLY be useful when you're connecting to services external to yourself - network glitches cause more problems with my code than ANYTHING else, and it's a pain in the arse to write code to deal with it gracefully. i'd really really like to see a universal "try this for 5 minutes" wrapper, which, if it still failed, you'd only have one exit condition to worry about. hey, what the hell, maybe i'll spend a few days and write one myself.
Perhaps the answer to the problem of teenagers dropping bricks from motorway and railway bridges is to sue Tetris.
All the programmers who need the environment to compensate for their inadequacies, step on one side. All the programmers who want to learn from their mistakes and become better at their craft, get on the other side.
Most of us know where this line is located.
The idea of being to timeout and exception handle in scripts is a great idea......assuming you want to use scripts. I think most people end up resorting to Perl, Python or whatever for anything more complex. But perhaps with this facility Scripts would be more useful? But...now I come to a related topic. I build factory wide systems, systems which have eg. Automatic warehouses and whatever in the middle. I do a lot of stuff with VB6 not because it is fault tolerant but because it is 'fix tolerant'. During the comminssioning phases I can leave a program running in the debugger and, if it freaks out, I can debug, fix, test by iterating forwards and **backwards** in the the function that caused the hitch, and then continue to run were I left off. Many minor problems get fixed on the fly without users even realizing anything was amiss. In every other respect (syntax, structure, error trapping etc) VB6 is a disaster and not really suited at all to these types of progects, so the fact that I use it is a measure of how important this feature is. Like the fault tolerant shell, it is a 'non-pure' extension insofar as purists say it should not be neccessary, but in pratice it is a godsend. Anybody know an alternative for VB6 in this respect?
And if you thought that was boring you obviously havn't read my Journal ;-)
So what happens if the files are crucial (let's use the toy example of kernel modules being updated): The modules get deleted, then the update fails because the remote host is down. Presumably the shell can't rollback the changes a la DBMS, as that would involve either hooks into the FS or every file util ever written.
Now I think it's a nice idea, but it could easily lead to such sloppy coding; if your shell automatically tries, backs off and cleans up, why would people bother doing it the 'correct' way and downloading the new files before removing the old ones?
...people start pronouncing "ftsh" as "fetish". Actually, I've started already, just ask the girl sitting next to me. ;-)
- import rlcompleter, readline
and
- here
autocompletion in the editor is availible in vim here______________________________________________
sigamajig...
You obviously didn't read the article.
"It [ftsh] is especially useful in building distributed systems, where failures are common, making timeouts, retry, and alternation necessary techniques."
It doesn't protect you from typos in the script, it handles failures in the commands that are executed.
"Password fairly correct. Root login granted."
I want my karma, and I want it now!
what they seem to essentially want is
- a try statement and error catching and
- a fortran like syntax for testing and arithmetic
I think the authors were a bit misguided. Instead of creating a whole new shell how about just extending a good existing shell with a new try statement a described.it can even be done without extending the shell:
as for the new syntax of .eq. .ne. .lt. .gt. .to.
certainly looks like fortran-hugging to me , yuck
as for integer arithmetic, that can be done with by either using backticks or the $[ ] expansion
______________________________________________
sigamajig...
"On two occasions I have been asked [by members of Parliament!], 'Pray, Mr.
Babbage, if you put into the machine wrong figures, will the right answers
come out?' I am not able rightly to apprehend the kind of confusion of ideas
that could provoke such a question."
-- Charles Babbage
-Erik -- --This message was written using 73% post-consumer electrons--
I'm sorry, but I can't understand why a Windows port (even if not native) is even attempted. Seems kind of useless in a totally GUI environment. Of course, maybe it's just me?
-- There is no spoon. Only fork.
Well.. besides pipes of course
In monopolistic America, you tolerate faulty shell.
I am a viral sig. Please copy me and help me spread. Thank you.
Shell scripts should be short and easy to write. I have seen plenty of them fail due to some resource or another being temporarily down. At first people are neat and then send an email to notify the admin. When this then results in a ton of emails everytime some dodo knocks out the DNS they turn it off and forget about it.
Every scripting language has their own special little niche. BASH for simple things, perl for heavy text manipulation, PHP for creating HTML output. This scripting language is pretty much like BASH but takes failure as given. The example shows clearly how it works. Instead of ending up with PERL like scripts to catch all the possible errors you add two lines and you got a wonderfull small script, wich is what shell scripts should be, that is none the less capable of recovering from an error. This script will simply retry when someone knocks out the DNS again.
This new language will not catch your errors. It will catch other peoples errors. Sure a really good programmer can do this himself. A really good programmer can also create his own libraries. Most find of us in admin jobs find it easier to use somebody elses code rather then constantly reinvent the wheel.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
I have often had to write such wrappers myself. Sure even easier/better would have been if somebody added this to say BASH as an extension but perhaps that is not possible.
How often have you needed to write horrible bash code just to pull data from an unreliable source and ended up either with a script that worked totally blind "command && command && command &&" wich never reported if it failed for days on end or ended up with several pages just to catch all the damn network errors that could occur.
I will definitly be giving this little language a try in the near future. Just another tool for the smart sys-admin. (smart people write as little code as possible. Let others work for you)
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
joshua:~#rm -Rf //tmp /tmp
Probable typing error detected. Parsed as rm -Rf /
-
Roses are #FF0000, Violets are #0000FF, find / -name '*base*' |xargs chown -R us && mv zig greatjustice
... was mentioned a few months back in one of the magazines I pick up almost monthly (forget which one out of the several it was).
I think the shell was called dsh. I believe this is the project site: http://dsh.sourceforge.net/
Are the aims of this fault tolerant shell and dsh the same? I'm not a programmer, but I'm trying to teach myself *nix system administration.
Eventually I'm hoping to cluster some older x86 systems I'm going to get at auction together for a Beowulf cluster. It sounds to me like one if not both of these two shells might come in handy!
ftsh has great utility in the realm it's written for. Obviously, it's not a basis for installing kernels or doing password authentication. In a Grid (not just distributed) environment, things break for all sorts of reasons all the time. You're dealing with a Friendly Admin on another system, one who may well be unaffiliated with your institution, project or field of study. He doesn't have any particular reason to consult with you about system changes.
Now you find yourself writing a grid diagnostic or submitter or job manager. One does not need strongly typed compiled languages for this. Shell scripts are almost always more efficient to write, and the speed difference is unimportant. Right now, most Grid submitters are being written in bash or Python or some such. Bash sucks for exception handling of the sort we're talking about. Python does better with its try: statements, but there's room for improvement. ftsh is a good choice for a sublayer to these scripts. One writes some of the machinery that actually interacts with the Grid nodes and supervisors in this easy, clear and flexible form.
Now there are a lot of specific points to answer:
One needs a Windows port to be able to make the Grid software we write in Linux available to the poor drones who are stuck with Win boxes.
This is not a code spellchecker or coding environment. At all.
This is not a crutch for inadequate programmers. This is a collection of methods to deal with a specific set of recalcitrant problems.
As I was pointing out before, this is, after all, an unstable system. One is using diverse resources on diverse platforms in many countries at many institutions. I appreciate the comment made by unixbob about operating in heterogeneous environments.
This isn't a substitute for wget. One uses wget as an example because it's clear.
The "pull" model breaks down immediately when there is no unified environment, as is described on infrastructures.org. When you're not the admin, and your software has to be wiped out the minute your job is done, "push" is the only way to do it. This is the case with most Grid computing right now (that I know about)
:) Use the right tool for the job and save time.
:)
All the woe and doom about the sloppy coding and letting the environment correct your deficiencies is... ill-thought-out. That's what a compiler is, folks. Should we all be coding in machine language?
I do agree, however, that one should indeed hone one's craft. Sloppy coding in projects of importance is inexcusable (M$). There is no reason to stick to strict exception handling, however, in the applications being discussed by ftsh's developers (the same folks who brought you Condor). When code becomes 3/4 exception handling, even when the specific exceptions don't matter, there's a problem, IMHO.
That shell script can be improved a lot by using " set -e " to exit on failure, as follows:
#!/bin/sh
set -e # exit on failure
cd
rm -rf bar
cp -r
This means that, if any command in the script fails, the script will exit immediately, instead of carrying on blindly.
The script's exit status will be non-zero, indicating failure. If it was called by another script, and that had "set -e", then that too will exit immediately. This is a little bit like exceptions in some other languages.
perl -e 'fork||print for split//,"hahahaha"'
Comment removed based on user account deletion
Ofcourse bash can do it as well using the proper constructions. That is not the point. Care should be taken not to view bash as a golden hammer when it comes to shell scripting. The same goes for 'ftsh', ofcourse. It won't try to replace bash for every script out there.
The author merely thinks it would be nice to have a shell in which such fault-tolerant constructions are natural by design. Just to save people headaches when writing simple scripts which are there to get some job done, not to waste time dealing with every single possible failure and time-out by hand.
Then it compiled (on Fedora Core 1).
Then it failed the functions test, because my computer does not have the file /etc/networks. For a fault tolerant shell, it does not seem very tolerant of my machine! After sudo touch /etc/networks, make succeeded.
Anyway, those were the only two problems, and now it's installed. Let's see if it's worth building into an RPM package.
Erlang (http://www.erlang.org) has it.
You can have multiple linked interpreters and
even fault-tolerant database!
It is a scripting language.
From the FAQ:
1.1. In a nutshell, what is Erlang?
Erlang is a general-purpose programming language and runtime environment. Erlang has built-in support for concurrency, distribution and fault tolerance. Erlang is used in several large telecommunication systems from Ericsson. The most popular implementation of Erlang is available as open source from the open source erlang site.
Remounting a filesystem with ACID on, a process sets a rollback point , executing a series of commands with the operating system keeping a record of the changes to the filesystem made by the process and its children. The process would inform the OS to either commit or rollback the changes.
This still raises questions on how to deal with with two or more competing "transactional" processes which rely on read information which another process chooses to rollback to an early state.
The system you pull from is a distribution server, all it does is distribute files. If it's slow, it's slow for all the machines sucking data and you need a bigger infrastructure. If it's down, the client scripts fail safe and do nothing.
l l. shtml
Even here, pull scales better than push, look at a web server as an example thousands of machines sucking web pages from a server is not uncommon. Try pushing those pages out to the same number of machines.
Push methodologies simply don't scale, I've been there, done that and it's a bad architecture for more than trivial numbers of machines and I'm not the only one to notice:
http://www.infrastructures.org/bootstrap/pushpu
Government of the people, by corporate executives, for corporate profits.
I think perl is where it is because so many people use it as "super script." To me that says, a) we recode all the Bourne and csh and bash in perl or b) we look at why people do shell scripting in perl or other languages and add that to the shell. I couldn't tell you which is right. It's a neat idea though and I'm glad they made it.
A real example I can think of, I had a test machine that had some kind of ext3 corruption and so it mounted up in read-only mode when it booted. I spent time diagnosing an application error in our application because nothing caught that; these are redhat type startup scripts. I noticed that our app couldn't write logs and began to debug the system. More interestingly, a dozen or so start-up scripts failed to start up critical components and their failure wasn't noticed. If you can't write to the filesystem, you can't create a socket(AF_UNIX) and all sort's of things go tits up then. If that's how you debug it's only going to get more difficult as you add more and more complexity, you have to detect the lower level failures and report them. Perversely, this wouldn't have been noticed had a different partition been read-only. Turns out that a drive was going bad. Had it been a different partition, it would have been noticed at catastrophic system failure time when the drive died.
I've done a fair amount of embedded work and there is always a test for new guys, you can tell the new guy (new college grad, whatever) because he skips half or more of the error checking in his code. You know printf returns a value? Funnier still, if you develop something like a consumer app in embedded space, you'll eventually see things like printf fail. We know it never should, but with 20,000+ users in different environments and what not, things like that can and do fail and usually point to a greater problem, like a dead drive or something. Instead of logging/alerting something to the critical and unusual printf failure, the app fails in a different way because this printf failed. Heaven forbid that it was sprintf that failed and then you shove bad data in to a database or configuration file and not just fail the system but corrupt the data too. Inspite of all of that, even veterans will forget error checking at times, it's a common bug and so having higher level tools to help assist, like exception in the shell can only be a good thing.
Funny you should mention this, because I was going to write something about pipes. Getting pipes right with good error semantics is hard. For all the "just use set +e in bash" weenies out there, try running
Where's your error?If you think about the unix process and pipe primitives, you will see the difficulty. To create a pipeline, you normally fork, create the pipe, fork again, and run the two ends of the pipeline in the two sub-processes. This is scalable to deeply nested pipelines, but has a cost: Only one of the sub-processes is a child of the shell, so only one exit status can be monitored. To work around this, you really need to build a mini-OS environment on top of unix.
This demonstrates that unix was fundamentally not designed with concern for error semantics (consider Erlang as a diametric example). And this, I'm sure, is why ftsh doesn't have pipes (yet).
The evaluation of an action as 'practical' . . . depends on what it is that one wishes to practice.
Spell checkers are *not* a substitute for knowing how words are spelled.
Of course not. But using a spell checker means having time to learn about the homonyms, instead of endlessly playing catch up.
You still predicated your post on "relying" on spell checkers; I'm saying that people learn from good spell checkers. That people can't learn everything from a spell checker is hardly a reason to throw the baby out with the bath water and insist that people use inferior learning techniques anyhow!
A kid that can spell accurately is in a much better position to learn about homonyms then one for whom spelling is still a mysterious, abstract art that involves guessing at things.
The language even allows you to create file "variables" which are safely allocated and destroyed during the operation of a script, analogous to automatic varibles in C++/Java or what-have-you. The point of this language is that it is entirely focused on exception handling. This is _excellent_ programming practice.
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
Um... no, that doesn't do the same thing. The whole point of ftsh is that the 'try' block encloses a set of statements which must all be executed or it fails. If the 'cd /tmp' fails, bash will blindly run the 'rm -f data' anyway, whereas ftsh will stop and jump to the start of the try block to have another go.
What's with all of the people claiming that FTSH will ruin the world because it makes it easier to be a sloppy programmer. Did you freaking read the documentation?
To massively oversimplify, FTSH adds exceptions to shell scripting. Is that really so horrible? Is of line-after-line of "if [$? -eq 0] then" really an improvement? Welcome to the 1980's, we've discovered that programming languages should try and minimize the amount of time you spent typing the same thing over and over again. Human beings are bad at repetitive behavior, avoid repetition if you can.
Similarlly FTSH provides looping constructs to simplify the common case of "Try until it works, or until some timer or counter runs out." Less programmer time wasted coding Yet Another Loop, less opportunities for a stupid slip-up while coding that loop.
If you're so bothered by the possibility of people ignoring return codes it should please you to know that FTSH forces you to appreciate that return codes are very uncertain things. Did diff return 1 because the files are different, or because the linker failed to find a required library? Ultimately all you can say is that diff failed.
Christ, did C++ and Java get this sort of reaming early on? "How horrible, exceptions mean that you don't have to check return codes at every single level."
Search 2010 Gen Con events
Something perhaps like this?
Does that mean they were (wait for it...) existentialists?
HSJ$$*&#^!#+++ATH0
NO CARRIER