A Diagnosis of Self-Healing Systems
gManZboy writes "We've been hearing about self-healing systems for a while, but (as is usual), so far it's more hype than reality. Well it looks like Mike Shapiro (from Sun's Solaris Kernel group) has been doing a little actual work in this direction. His prognosis is that there's a long way to go before we get fully self-healing systems. In this article he talks a little bit about what he's done, points out some alternative approaches to his own, as well as what's left to do."
The most successful example is Tandem. For decades, systems that have to keep running have run on Tandem's operating system. For an overview of how they did it, see the 1985 paper Why Computers Stop and What Can Be Done About It.
The basic concepts are:
Every time you use an ATM or trade a stock, somewhere a Tandem cluster was involved.
Tandem's problem was that they had rather expensive proprietary hardware. You also needed extra hardware to allow for fail-operational systems. But it all really does work. HP still sells Tandem, but since Carly, it's being neglected, like most other high technology at HP.