Ultra-Stable Software Design in C++?

← Back to Stories (view on slashdot.org)

Ultra-Stable Software Design in C++?

Posted by Cliff on Saturday February 4, 2006 @03:35PM from the failure-minimization dept.

null_functor asks: "I need to create an ultra-stable, crash-free application in C++. Sadly, the programming language cannot be changed due to reasons of efficiency and availability of core libraries. The application can be naturally divided into several modules, such as GUI, core data structures, a persistent object storage mechanism, a distributed communication module and several core algorithms. Basically, it allows users to crunch a god-awful amount of data over several computing nodes. The application is meant to primarily run on Linux, but should be portable to Windows without much difficulty." While there's more to this, what strategies should a developer take to insure that the resulting program is as crash-free as possible? "I'm thinking of decoupling the modules physically so that, even if one crashes/becomes unstable (say, the distributed communication module encounters a segmentation fault, has a memory leak or a deadlock), the others remain alive, detect the error, and silently re-start the offending 'module'. Sure, there is no guarantee that the bug won't resurface in the module's new incarnation, but (I'm guessing!) it at least reduces the number of absolute system failures.

How can I actually implement such a decoupling? What tools (System V IPC/custom socket-based message-queue system/DCE/CORBA? my knowledge of options is embarrassingly trivial :-( ) would you suggest should be used? Ideally, I'd want the function call abstraction to be available just like in, say, Java RMI.

And while we are at it, are there any software _design patterns_ that specifically tackle the stability issue?"

9 of 690 comments (clear)

Min score:

Reason:

Sort:

Here's your best bet. by neo · 2006-02-04 15:42 · Score: 5, Interesting

1. Write the whole thing in Python.
2. Once it's bullet-proof, replace each function and object with C++ code.
3. Profit.
They Write the Right Stuff by Pentclass · 2006-02-04 15:43 · Score: 5, Interesting

Follow NASA's advice... http://www.fastcompany.com/online/06/writestuff.ht ml
Re:You're not the first one.... by merlin_jim · 2006-02-04 16:06 · Score: 3, Interesting

I was going to post pretty much the same thing - managed code approaches C++ efficiency close enough that it shouldn't matter (I've seen figures of 80-95%)

And, in visual studio .net 2005 there are built in high performance computing primitives - all the management of internode communication and logical data semaphore locking are handled by the runtime - presumably debugged and stable code...

--
I am disrespectful to dirt! Can you see that I am serious?!
test with valgrind! by graveyhead · 2006-02-04 16:25 · Score: 4, Interesting

valgrind -v ./myapp [args]

It gives you massive amounts of great information about the memory usage of your program.

The other day I spent nearly 3 hours trying to decode what was happening from walking the backtrace in gdb. Couldn't for the life of me figure out what was happening. Valgrind figured out the problem on the first run and after that, I had a solution in a few minutes.

Highly recommended software, and installed by default on several distributions, AFAIK.

Enjoy!

--
std::disclaimer<std::legalese> sig=new std::disclaimer; sig->dump(); delete sig;
robust software by avitzur · 2006-02-04 18:26 · Score: 4, Interesting

Way back in 1993, thanks to a three month schedule delay in shipping the original Apple Power PC hardware, Graphing Calculator 1.0 had the luxury of four months of QA, during which a colleague and I added no features and did an exhaustive code review. Combine that with being the only substantial PowerPC native application, so everyone with prototype hardware played with it a lot, resulted in that product having a more thorough QA than anything I had ever worked on before or since. It also helped that we started with a mature ten year old code base which had been heavily tested while shipping for years. Combine that with a complete lack of any management or marketing pressure on features, allowed us to focus solely on stability for months.

As a result, for ten years Apple technical support would tell customers experiencing unexplained system problems to run the Graphing Calculator Demo mode overnight, and if it crashed, they classified that as a *hardware* failure. I like to think of that as the theoretical limit of software robustness.

Sadly, it was a unique and irreproducible combination of circumstance which allowed so much effort to be focused on quality. Releases after 1.0 were not nearly so robust.
Re:You're not the first one.... by Anonymous Coward · 2006-02-05 03:52 · Score: 3, Interesting

Purely functional languages have two big advantages applicable in this case:

* No (or very, very limited) side-effects. In other words the result of a function is not dependent on the current program state. Once it is exhaustively verified in testing, that function will forever more return the correct results because the run-time state won't affect it.

* The language itself can often be treated as a specification of correctness, and even formally proved through static analysis. As a trivial example if you write an implementation of factorial in Haskell, it strongly resembles the mathematical definition of factorial -- the code is more of a description of what the correct result is, rather than a set of low-level steps for carrying out the computation as in C.

Haskell is nice, however I think the original questioner is better off with something like Erlang, which was designed for just this kind of situation. If it's good enough for telephone switches...
Re:You're not the first one.... by The_Wilschon · 2006-02-05 04:18 · Score: 4, Interesting

More: http://www.cs.indiana.edu/~jsobel/c455-c511.update d.txt about a guy who wrote the "Fast Multiplication" algorithm very simply in scheme, and then transformed it (using correctness preserving transformations, which are much much easier to do in "Haskell or one of the other functional languages" than in C/C++ and friends) into scheme code that was as optimized as he could come up with, and which furthermore had a pretty much 1-1 correspondence with C statements. He then rewrote it in C (including perfect "goto"s!), and beat all but one person in his class on the speed of the algorithm. Furthermore, he spent significantly less time working on (read debugging) his code than anyone else in the class.

--
SIGSEGV caught, terminating

wait... not that kind of sig.
Sorry, *not* in C++ by HermanAB · 2006-02-05 12:55 · Score: 3, Interesting

You cannot write highly stable code in C++, due to design flaws in the language. For this reason, the FAA doesn't allow C++ for use in aircraft systems. You can improve the situation with the use of a garbage collector though, but if stability and safety is critical, then you should use ANSI C. See this: http://www.hpl.hp.com/personal/Hans_Boehm/gc/issue s.html

--
Oh well, what the hell...
Re:While I can certainly respect your opinions, by Chemisor · 2006-02-06 03:35 · Score: 3, Interesting

> I guarantee you that I rather encounter a
> for_each(components.begin(), components.end(), _1.disable())

It is never that simple. The fact that you can't do what you've typed is one of the reasons I dislike it so much. What you really need is:

for_each (components.begin(), components.end(), mem_fun_ref (&CComponent::disable));

Things suddenly got uglier, didn't they? But wait, what if you need to call a function with an argument? Gotta use a bind2nd adaptor to wrap it, and then it becomes:

for_each (components.begin(), components.end(), bind2nd (mem_fun_ref (&CComponent::SetParameter), value));

Wait 'till you try to explain to some maintaining programmer how to untangle that! Oh, and just for laughs, try to debug this thing. Put an assert in SetParameter, and you get a lovely callstack from gdb:

(gdb) run Starting program: /home/user/tmp/tes tes: tes.cc:18: void CComponent::SetParameter(int): Assertion `!"Check out the callstack!"' failed. Program received signal SIGABRT, Aborted. 0xffffe410 in __kernel_vsyscall () Current language: auto; currently c (gdb) where #0 0xffffe410 in __kernel_vsyscall () #1 0xb7d36126 in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:67 #2 0xb7d37b40 in *__GI_abort () at ../sysdeps/generic/abort.c:88 #3 0xb7d2f610 in *__GI___assert_fail (assertion=0x6 <Address 0x6 out of bounds>, file=0x6 <Address 0x6 out of bounds>, line=6, function=0x80495a0 "void CComponent::SetParameter(int)") at assert.c:83 #4 0x080485f6 in CComponent::SetParameter (this=0x804b008, arg=42) at tes.cc:18 #5 0x08048ac3 in std::mem_fun1_ref_t<void, CComponent, int>::operator() (this=0xbfe2cacc, __r=@0x804b008, __x=42) at stl_function.h:826 #6 0x08048ae8 in std::binder2nd<std::mem_fun1_ref_t<void, CComponent, int> >::operator() (this=0xbfe2cacc, __x=@0x804b008) at stl_function.h:446 #7 0x08048b0c in std::for_each<__gnu_cxx::__normal_iterator<CCompon ent*, std::vector<CComponent, std::allocator<CComponent> > >, std::binder2nd<std::mem_fun1_ref_t<void, CComponent, int> > > (__first={_M_current = 0x804b008}, __last= {_M_current = 0x804b00c}, __f= {<> = {<No data fields>}, op = {<> = {<No data fields>}, _M_f = {__pfn = 0x80485c4 <CComponent::SetParameter(int)>, __delta = 0}}, value = 42}) at stl_algo.h:158 #8 0x08048740 in main () at tes.cc:26 (gdb)

Now that's something to scare newbie programmers with! Oh, and forget about putting a breakpoint inside the loop; templated functions aren't targetable until executed.

> in some code I need to maintain then to encounter
> for(i = 0; i < components.count(); ++i) components[i].disable()

So why not just use an iterator loop? for_each does not have a monopoly on it:

foreach (compvec_t::iterator, i, components) i->disable();

(foreach is a macro I wrote because I use this construct so often)

> first form permits, for instance, components to be a linked list or even a hash.
> The second is implementation-dependent and if you change the underlying data
> structure, you'll have extra work to refactor.

If you use iterator loops, this wouldn't happen to you.

> I once worked, changing all instances of SomeObject* to auto_ptr
> eliminated altogether 35 bugs we had lurking in the BTS for a long, long time,
> with less than one day of work (strange, delayed, errors were suddently
> transformed in EARLY null-pointer dereferences

Why were you using SomeObject* in the first place? When I was advocating moderation in the use of auto_ptr, I wa