Tools for Debugging Stack Corruption?

← Back to Stories (view on slashdot.org)

Tools for Debugging Stack Corruption?

Posted by Cliff on Wednesday January 11, 2006 @06:57AM from the it-won't-corrupt-if-you-keep-it-clean dept.

blackcoot asks: "I know that there are tools which exist on hardened Linux distros and OpenBSD (and probably $your_os_of_choice too), which are designed to help track down stack corruption (which is often symptomatic of buffer overruns). Unfortunately, that's about all I know about those tools. What options are there? How effective are they? What does it take to get access to those tools? Are they really useful enough to make the effort justified?" "My goal here is to increase my effectiveness at hunting down memory bugs, not necessarily to produce bullet proof, secure production quality code — the bugs I'm dealing with are, I believe, largely in software delivered by a subcontractor who swears they test their code and can't reproduce my bugs. What I really want is to a) demonstrate to them that a problem does, in fact, exist; b) demonstrate that the problem exists inside their code; and c) give them the tools they need to find, repair, and verify that the bug is no longer an issue.

First prize in my mind would be a Valgrind like tool which only requires trivial changes to the build process, but I'm pretty open to suggestions. If I have to run a hardened Linux to make this all possible, suggestions on pretty leading edge distros with reasonable automagic self configuration and hardware detection + laptop support would be greatly appreciated. Thanks much!"

30 comments

Min score:

Reason:

Sort:

Electric Fence by sleepingsquirrel · 2006-01-11 07:07 · Score: 3, Informative

See Electric Fence and your documentation for debugging malloc.
1. Re:Electric Fence by Anonymous Coward · 2006-01-11 07:41 · Score: 0
  
  What a great suggestion, too bad he said STACK corruption and not HEAP corruption. duh.
2. Re:Electric Fence by c0nst · 2006-01-11 07:56 · Score: 1
  
  Electric Fence can only detect heap corruption, but I think the submitter meant memory corruption of any kind. Valgrind can detect both.
Boehm Garbage Collector by sleepingsquirrel · 2006-01-11 07:10 · Score: 2, Informative

Also, you can use the Boehm garbage collector as a leak detector.
A Couple Of Tools by BusDriver · 2006-01-11 07:11 · Score: 4, Informative

You don't have to run a whole new distro!
First of all, there's libsafe which is just a simple compile and install. Seeing as we don't know much about your specific problem, I'm not sure if it'll help.
Second to try is compiling your own custom kernel with the GrSecurity kernel patch. It has as part of it the PaX kernel patch which is very effective at protecting against overflows. You could even just install the PaX kernel patch itself, but I believe the version in GrSecurity is kept more up-to-date. You can compile it with protection turned off by default, but using the PaX tools turn on protection for just the binaries you wish to check.
Installing either (or both) of these could well help you, without the need to blow away your current install and start fresh.
1. Re:A Couple Of Tools by sleepingsquirrel · 2006-01-11 07:41 · Score: 3, Informative
  
  If I had to guess, I'd say most likely his problem is caused by fence-post type errors in dynamically allocated arrays (malloc and friends). But on the off chance he's got problems with buffer overruns caused by user input (gets, etc.) there's also stack protectors like ProPolice in addition to libsafe.
2. Re:A Couple Of Tools by Anonymous Coward · 2006-01-12 03:12 · Score: 0
  
  Nope. I think his problem is the "=" instead of "==" in the if statement on line 42.
3. Re:A Couple Of Tools by Anonymous Coward · 2006-01-12 09:53 · Score: 0
  
  Most clueless commercial developers/vendors using LGPL libraries that state in their EULA that end-users are not allowed to reverse-engineer is actually violating the LGPL library's terms & conditions.
  
  Since libsafe is LGPL, it means if they use libsafe header files or link to libsafe, 'their own terms' (EULA) must contain certain allowances related to allowing reverse-engineering and patent rights.
  
  In other words, LGPL allows you to distribute works that use the library under "your own terms", if and only if "your own terms" meet certain conditions.
  
  Have a lawyer explain what your obligations are when using an LGPL library and then choosing to distribute the resulting works under "your own terms". Be sure to ask about reverse-engineering and patent rights.
  
  If you cannot afford a lawyer, just read & use plain GPL or BSD/MIT licenses which are much less complex to understand.
  
  NOTE: some LGPL libraries use an 'exception clause to LGPL' that allow works that utilize the library to bypass such conditions. libsafe is not one of them.
There are obviously several alternatives. by Z00L00K · 2006-01-11 07:35 · Score: 4, Informative

As stated, Valgrind and Electric Fence are two alternatives. I would also like to point out Purify (now owned by IBM) which I have been using from time to time. Of course the catch with that tool is that it's commercial, but you should at least evaluate it.
Using a complete hardened Linux distro is not necessary for "normal" development work, but it may be a good idea to verify that your application actually works there too.
In addition to the run-time checkers you can also look for static checkers like Splint. It can provide you with some extra information about potential problems that only occurs under certain conditions that maybe aren't met during runtime.
If the effort of trying to track down stack corruption is worth it? - YES! Absolutely! Catching bugs in an early stage is essential to keep down the lifecycle cost of a system. Also consider the risk of badwill if your product is prone to strange behavior and crashes.

--
If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
1. Re:There are obviously several alternatives. by pthisis · 2006-01-11 07:50 · Score: 2, Informative
  
  The OP was asking for stack debuggers, which are much less common and generally come as intrusive compiler patches. libsafe is the best alternative.
  
  ElectricFence, valgrind, Boehm GC, Purify, etc are all heap debuggers (for finding problems with overruning malloc'd memory, memory leaks, etc). It's possible that Purify has stack debugging capability these days, I'm not sure.
  
  --
  rage, rage against the dying of the light
2. Re:There are obviously several alternatives. by Anonymous Coward · 2006-01-11 07:58 · Score: 0
  
  The OP was asking for stack debuggers...
  ...Maybe its just me, but the OP sounded pretty newbie. I'd expect someone in the know would have asked a better question. Maybe he'll chime in to provide us with more details on his actual problems. Blackcoot? Are you there?
3. Re:There are obviously several alternatives. by Eric+Sharkey · 2006-01-11 08:39 · Score: 1
  
  Valgrind can debug problems with both the stack and the heap.
4. Re:There are obviously several alternatives. by blackcoot · 2006-01-11 19:19 · Score: 1
  
  unfortunately valgrind isn't quite finely grained enough to localize stack corruption as it happens. it picks up on obscenely large jumps in the stack pointer, but doesn't detect whether addresses are "correct" relative to the stack frame. of course, how hard that is to do is a function of the abi and the architecture (if you do all your arithmetic on the stack this could be very problematic). it's probably not really tractable, now that i think about it.
5. Re:There are obviously several alternatives. by jesup · 2006-01-12 17:02 · Score: 1
  
  Another heap debugger (not stack) well worth using is dmalloc http://dmalloc.com/. Works in non-MMU embedded environments too; very configurable; checks for write-after-free, etc.
CCured by sleepingsquirrel · 2006-01-11 07:51 · Score: 2, Informative

Also, something I've never tried, but always been interested in is CCured...
CCured is a source-to-source translator for C. It analyzes the C program to determine the smallest number of run-time checks that must be inserted in the program to prevent all memory safety violations. The resulting program is memory safe, meaning that it will stop rather than overrun a buffer or scribble over memory that it shouldn't touch. Many programs can be made memory-safe this way while losing only 10-60% run-time performance (the performance cost is smaller for cleaner programs, and can be improved further by holding CCured's hand on the parts of the program that it does not understand by itself). Using CCured we have found bugs that Purify misses with an order of magnitude smaller run-time cost.
All good suggestions, but not for stack by kbielefe · 2006-01-11 07:53 · Score: 4, Informative

The replies I have seen so far have all been excellent suggestions for detecting buffer overflows on the heap. Adequate stack protection actually requires the code to be compiled with a compiler that adds extra checks to each function call. This page has more information on making gcc do what you want. Gentoo is very easy to set up for it, FYI, but it should be possible on any *nix distro without any kernel changes.
I feel your pain about bugs in libraries that you must use without the source code. I had an arrangement like that for nearly two years with extremely buggy code. Just relinking the static library with changes to my code, changing where in memory the library would reside, would often cause huge problems. Let's just say I got really good at debugging in assembly with gdb. I got where I could call them up and say something like, "you have some code at the end of function foo that looks like 'a[2] = b', but a was never allocated." They'd always reply with something like, "Yes it is ... oh wait..."

--
This space intentionally left blank.
1. Re:All good suggestions, but not for stack by blackcoot · 2006-01-11 18:15 · Score: 1
  
  i really, /really/ want to avoid the assembly route if at all possible. first, i have a healthy dislike of assembly. but more importanty, analyzing the assembly requires some reverse engineering which has the potential to land me in all sorts of trouble. our collaborators on this particular project are potential competitors on other projects (my company is in a rather small niche), which is why neither of us has the complete code base.
  
  i'm going to try to arrange to send them a set of object files which can then be linked against their stuff into the shared object files which we use to run our system, but other than that i'm out of good ideas... the heisenbugs are particularly frustrating.
How about your local D.A.? by Anonymous Coward · 2006-01-11 08:09 · Score: 0

When politicians are suspected of corruption it is usually traced back to a stack of money.
$your_os_of_choice !?! by chroot_james · 2006-01-11 08:12 · Score: 1

Excuse me! I don't have $ in front of my variables!

I don't like yourOsOfChoice style naming, so I'll give you that...

--
Reality is nothing but a collective hunch.
Purify by wfeick · 2006-01-11 09:29 · Score: 1

If you don't mind spending money, get Rational Purify from IBM. They seem to properly support Linux now, and they are the gold standard. You can download a 2 week evaluation license to give it a try.

As far as your build process, you just take whatever your link command is, and tag "purify" on the beginning of it. The tool will instrument all compiled objects that you are linking in, and produce a binary that does all sorts of really useful checking (read/write of freed memory, buffer overruns, memory leaks, file descriptor leaks, ...).

If you are developing software for money, there is nothing better.

If you're doing open source work for no pay, obviously it's hard to justify it.
1. Re:Purify by pthisis · 2006-01-11 11:08 · Score: 1
  
  If you are developing software for money, there is nothing better.
  
  I've found ccured more consistently finds problems than Purify. Plus it's faster, and free. It's also much harder to set up to work with large projects. So better in some significant ways, and worse in other significant ways. (also it doesn't find leaks but Boehm works fine for that).
  
  --
  rage, rage against the dying of the light
2. Re:Purify by blackcoot · 2006-01-11 19:31 · Score: 1
  
  i suppose that ccured doesn't handle c++. personally, i've got nothing against spending money to get the right tools to solve the problem. whatever gets me from showstopper bugs to running stablely for indefinite amounts of time quickest is the path i'm interested in.
Ultimate solution by Nutria · 2006-01-11 15:06 · Score: 1, Redundant

Don't use C!

--
"I don't know, therefore Aliens" Wafflebox1
1. Re:Ultimate solution by blackcoot · 2006-01-11 17:51 · Score: 1
  
  i'm using c++, so i guess that means i'm using c too. it's not difficult to avoid about 99.99% of memory bugs in c++ by judicious use of the stl, avoiding dynamic allocation wherever possible, asserting that indices are always in bounds, etc.
  
  unfortunately, the folks supplying the code do not use those techniques and i don't have the luxury of re-educating them.
  
  as for why i'm using c++: implementing about 95% of this particular application in any language that isn't compiled to machine code with a darn good optimizer is pretty much a loss -- the application has to process live video streams in real time. c++ happens to be my weapon of choice in the "compiled to machine code with a darn good optimizer" realm.
2. Re:Ultimate solution by Nutria · 2006-01-11 19:31 · Score: 1
  
  FORTRAN? Ada? Pascal?
  
  Their GCC optimization probably isn't that hot, though.
  
  One of the really nice things about VMS is the Common Language Environment. This makes it possible to create executables out of object files written in multiple languages: C, Bliss, Macro/Asm, BASIC, COBOL, FORTRAN, Ada, etc. All of the language parsers create an intermediary language which the GEM common backend code generator takes and uses to build .obj files.
  
  Yes, .NET does that, and so does GCC 4, to a degree, but VMS has had it for 10 years...
  
  --
  "I don't know, therefore Aliens" Wafflebox1
3. Re:Ultimate solution by blackcoot · 2006-01-11 19:37 · Score: 1
  
  all good languages. pity i don't speak them. intel's compilers are pretty darn good at optimization (you pretty much get about 10-15% speed up v. gcc with a recompile, more if you're willing to throw more disk space and time at the problem and more still if you link in their math and image libraries). it's not free, but it's cheap enough that it doesn't have to save much time before it's paid for itself.
Microsoft Visual C++ by functor0 · 2006-01-11 18:57 · Score: 0, Offtopic

If you're on Windows, the first line of defense is running a debug build of the app with /RTC1 and /GZ turned on. I ran into a stack buffer overrun just the other day and message box came up immediately telling me exactly which variable got corrupted.
probabilistic memory safety (DieHard) by Ristretto · 2006-01-13 11:47 · Score: 1

You may find DieHard useful, if heap errors are also a problem -- it's a plug-in replacement for malloc/free that provides probabilistic guarantees of correct execution in the face of errors.

http://www.cs.umass.edu/~emery/diehard

-- emery
gcc -fbounds-check by nnorwitz · 2006-01-14 19:44 · Score: 1

gcc -fbounds-check could find some memory problems (and some non-problems). I used this feature over 6 years ago before it was integrated into gcc. YMMV.