Ultra-Stable Software Design in C++?

← Back to Stories (view on slashdot.org)

Ultra-Stable Software Design in C++?

Posted by Cliff on Saturday February 4, 2006 @03:35PM from the failure-minimization dept.

null_functor asks: "I need to create an ultra-stable, crash-free application in C++. Sadly, the programming language cannot be changed due to reasons of efficiency and availability of core libraries. The application can be naturally divided into several modules, such as GUI, core data structures, a persistent object storage mechanism, a distributed communication module and several core algorithms. Basically, it allows users to crunch a god-awful amount of data over several computing nodes. The application is meant to primarily run on Linux, but should be portable to Windows without much difficulty." While there's more to this, what strategies should a developer take to insure that the resulting program is as crash-free as possible? "I'm thinking of decoupling the modules physically so that, even if one crashes/becomes unstable (say, the distributed communication module encounters a segmentation fault, has a memory leak or a deadlock), the others remain alive, detect the error, and silently re-start the offending 'module'. Sure, there is no guarantee that the bug won't resurface in the module's new incarnation, but (I'm guessing!) it at least reduces the number of absolute system failures.

How can I actually implement such a decoupling? What tools (System V IPC/custom socket-based message-queue system/DCE/CORBA? my knowledge of options is embarrassingly trivial :-( ) would you suggest should be used? Ideally, I'd want the function call abstraction to be available just like in, say, Java RMI.

And while we are at it, are there any software _design patterns_ that specifically tackle the stability issue?"

5 of 690 comments (clear)

Min score:

Reason:

Sort:

Here's your best bet. by neo · 2006-02-04 15:42 · Score: 5, Interesting

1. Write the whole thing in Python.
2. Once it's bullet-proof, replace each function and object with C++ code.
3. Profit.
They Write the Right Stuff by Pentclass · 2006-02-04 15:43 · Score: 5, Interesting

Follow NASA's advice... http://www.fastcompany.com/online/06/writestuff.ht ml
test with valgrind! by graveyhead · 2006-02-04 16:25 · Score: 4, Interesting

valgrind -v ./myapp [args]

It gives you massive amounts of great information about the memory usage of your program.

The other day I spent nearly 3 hours trying to decode what was happening from walking the backtrace in gdb. Couldn't for the life of me figure out what was happening. Valgrind figured out the problem on the first run and after that, I had a solution in a few minutes.

Highly recommended software, and installed by default on several distributions, AFAIK.

Enjoy!

--
std::disclaimer<std::legalese> sig=new std::disclaimer; sig->dump(); delete sig;
robust software by avitzur · 2006-02-04 18:26 · Score: 4, Interesting

Way back in 1993, thanks to a three month schedule delay in shipping the original Apple Power PC hardware, Graphing Calculator 1.0 had the luxury of four months of QA, during which a colleague and I added no features and did an exhaustive code review. Combine that with being the only substantial PowerPC native application, so everyone with prototype hardware played with it a lot, resulted in that product having a more thorough QA than anything I had ever worked on before or since. It also helped that we started with a mature ten year old code base which had been heavily tested while shipping for years. Combine that with a complete lack of any management or marketing pressure on features, allowed us to focus solely on stability for months.

As a result, for ten years Apple technical support would tell customers experiencing unexplained system problems to run the Graphing Calculator Demo mode overnight, and if it crashed, they classified that as a *hardware* failure. I like to think of that as the theoretical limit of software robustness.

Sadly, it was a unique and irreproducible combination of circumstance which allowed so much effort to be focused on quality. Releases after 1.0 were not nearly so robust.
Re:You're not the first one.... by The_Wilschon · 2006-02-05 04:18 · Score: 4, Interesting

More: http://www.cs.indiana.edu/~jsobel/c455-c511.update d.txt about a guy who wrote the "Fast Multiplication" algorithm very simply in scheme, and then transformed it (using correctness preserving transformations, which are much much easier to do in "Haskell or one of the other functional languages" than in C/C++ and friends) into scheme code that was as optimized as he could come up with, and which furthermore had a pretty much 1-1 correspondence with C statements. He then rewrote it in C (including perfect "goto"s!), and beat all but one person in his class on the speed of the algorithm. Furthermore, he spent significantly less time working on (read debugging) his code than anyone else in the class.

--
SIGSEGV caught, terminating

wait... not that kind of sig.