Bounds Checking for Open Source Code?
roarl asks: "Is anyone working on an Open Source bounds checking system? (A system that checks a program at runtime for array out of bounds access, reading uninitialized memory, memory leaks and so on). I've been using BoundsChecker for some time and believe me, there are situations where you know you are going to spend hours debugging unless you let BoundsChecker sort it out for you. But it annoys me that I have to transfer (and sometimes port) the buggy program to Windows each time. I'd much rather stay in Linux.
Insure works on Linux. I haven't tried Insure for some time, but last time I tried I wasn't especially impressed. Purify seems still not to support Linux, but on other Unix platforms it works great. The problem with all of these products is that they are so da*n expensive. So it makes me wonder, are all Open Source programmers doing without them? If so, what can we expect of the quality of Open Source developed programs? If not, is there a free alternative?"
Need bounds checking for Linux? May I suggest the CMU Common Lisp interpreter and compiler (to machine code) or perhaps Smalltalk. :)
Working toward a usable PDA environment in the spirit of Newton OS: Dynapad
The ever-resourceful Bruce Perens wrote a cool gizmo called "electric fence", which I have used on many occasions. It doesn't actually do bounds checking as such, what it does is provide a replacement "malloc" that allocates unwritable pages either above or below every memory allocation. Your application will then segfault when it misbehaves, and you can then use conventional debugging tools to track down the
It's very "non-invasive" -- all you have to do to use it is link against it, and maybe set a few environment variables.
2*3*3*3*3*11*251
Of the top of my head, and with the help of my bookmarks:
I personally had high hopes for the GCC BP project. If you feel like doing something that will earn you the admiration of millions, finish that code up. :-)
You cannot apply a technological solution to a sociological problem. (Edwards' Law)
Isn't bounds checking just a specialized case for checking any type of access to uninitialized memory? There are several tools that provide replacements for malloc() that can track *all* memory allocation, and some, like Valgrind, provide almost a virtual machine that tracks basically everything your program does. Any time you read, write, or allocate memory, Valgrind will track it, and tell you if it is in error. Like I said, array bounds checking is just a special case of this.
I like to use the bounds checking patches to gcc to check code. You recompile your code and it checks every array access, memory access, etc. http://web.inter.nl.net/hcc/Haj.Ten.Brugge/
Valgrind
How about memprof?
While it may not be EXACTLY what you want, it may be MORE....
With the 80188, intel actually introduced the bound instruction, which compares a register against pair of upper/lower bounds an produce interrupt 5 if the register is too high or low. Motorola's 680x0 CHK instruction does the same.
It would be useful if gcc produced debugging code to do static array bounds checking.
Do you even lift?
These aren't the 'roids you're looking for.
I've found that ccmalloc helped me to find a lot of problems in C code. The output is more verbose than Purify, but it showed me where some real problems lay with my code.
Check out this site by Ben Zorn on free and other tools for this.
"Provided by the management for your protection."
Insure++ is heavenly, I don't know how long it's been since you've used it, but it detects almost all errors. I think most open source people who use it have their company buy it for them though; it is very expensive. It does very good bounds checking for both reading and writing, but it's real amazing help is in tracking down bad or dangling pointers.
It also does very detailed tracking of memory leaks, but can get a little confused when you store the last referencing pointer in a hashtable.
I think other than its somewhat clunky UI, price is the big killer. it takes a pretty fast machine to be able to use it much and it has a large up front cost, plus maintainence(upgrades and support) fee. It's really too bad they don't have a program in place with someone like sourceforge to let people use Insure++ on the test machines because that would not only be great advertising for them, but also could really help the open source developers too.
Warning: I'm a language zealot, so be warned that I'm utterly irrational and unamenable to the Sweet Voice of Reason. That said... :)
Use a different language. There are some things which C is appropriate for, but one of the things it's categorically not called for is when you have concerns about buffer-overflow conditions [*]. If this is a purely open-source, noncommercial project, do yourself and your career a favor: learn another language (one which doesn't have these sorts of problems) and write your app in that instead. You'll learn more, and you won't have to spend a dime on Purify or whatnot. If you go this route, I'd suggest Scheme; it's a beautiful LISP derivative.
If this is a commercial project, ask Management how married they are to C. In the overwhelming majority of cases, you can quietly substitute C++ without affecting the APIs one bit. Just wrap the external APIs in extern "C" and, inside the code, use C++'s beautiful vector instead of C-style arrays. Sure, you'll take a minor performance hit, but the increase in reliability will be well worth it.
Anyway, to try and give a (weak) answer to your question--instead of slapping a Band-Aid on the festering wound that is C memory management, you might want to think about doing away with the festering wound altogether. Use the right tool for the job--if C really is the right tool for the job, then fine, may God have mercy on your code. But if there are other, better, tools available... use them instead.
[*] OpenBSD manages to do pretty well with a C kernel, but that's because they're certifiably insane. It also impacts their dev cycle; they spend a great deal of time avoiding the pitfalls of C, so much so that it affects how much time they can devote to new development.
I guess the flag is -C and it does what
you would expect: program checks bounds
on any array access. (Used it a couple
of month ago to track a really nasty bug
in some ancient code).
I doubt this would be easily portable to
the C/C++ side of GCC, because in C you have
miriad ways to access the same memory location
(via different pointers).
Of course, already mentioned Electric Fence
is a really nice tool to debug malloc() problems
(but not other types of memory overruns, like
overrunning a static array).
Linker can put a 0xDEADBEEF after all arrays and
verify that it is the same on the program exit,
might help some...
Paul B.
An excellent general solution I've found for problems of this nature can be found at "file:///usr/include/assert.h". Seriously,
preconditions, postconditions, and invariants are the best approach to avoiding such errors. Will a bounds-checker detect if you access an element that is out-of-bounds in a view (subarray) of a larger array? Also, if you are developing a library, using assertions will also greatly assist any end-users who are not using a bounds-checking tool.
IMHO, you should do a mix of C and C++ and use the Standard Template Librarys vector, deque, or list classes instead of an array. Hell, even if you use an array, the STL functions and algorithms still work on them. You can even use the Queue and Stack wrappers if thats what your doing... Thats just my opinion though....
Its info page on my GNU/Linux box says assert is a GNU extension. I suppose I could still keep a debugging copy with asserts, and then sed them all out for a shipping copy, or better make configure do it if necessary, but that's work.
If we were ants living on a Rubik's cube, differential geometry would be a little more confusing.
Since libc5.4.23 the standard malloc has included rudimentary bounds checking. Just set MALLOC_CHECK_ to 1 or 2. At 1 it prints debugging output, at 2 it calls abort() so you can look at the core and see what happened. The best part is you can even do this on code you don't have the source to. Of course the other suggestions here are good, but I've tracked down a lot of bugs without having to link to one of the special range checkers.
You can also set MALLOC_CHECK_ to 0 to get a malloc like Windows and BSD that's safe against double free's and most off by one errors. Not useful for debugging, but can sometimes make a buggy closed source program run without dumping core. It's slower of course, but...
Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.0rc2) Gecko/20020510
Wide page my dick. Yeah, I use windows.
Hi guys, Yes it's true that most those tools are really really expensive. I think I would like to try and write something like that. I have a lot of C++ and C experiance, and I also have experiance with memory managment. I would like to start writing something like that which will be a GPL or a shareware so that people can use it. I'm just sure it will be a very big project. Anyone want to help me out ? Anyone want to join in? Anyone know where I can find people that would be able to help me and want to join me and take an active part in it ? Let's try and start rolling out something like that !!! ;-)
Cheers
Dory.
...unless you're really determined to let everyone know what a whiny loser you are...
wow,
I've had splint.org in my sig for a while now. I think it's one of those projects that needs more attention. This project used to be called lclint but got renamed to splint.
There are lots of papers out there on static checkers. One good intro paper is at http://www.research.ibm.com/people/h/hind/paste01. ps. This would give you a nice intro on pointer analysis, a sub topic in static analysis..
Based on upvotes, Ageism is the only "-ism" Slashdotters care about and think isn't SJW
YAMD by Nate Eldredge has much of the functionality your looking for. Plus, you don't have to recompile your code to use it!
Some are commercial and some are freeware/public domain/whatever.
GNAT, an open source Ada-95 compiler, support those checks.
Mea navis aericumbens anguillis abundat
Sounds painfull...
In the pursuit of creating a language holy war, I'd say to use scripting and virtual machine languages, where possible.
.so mess isn't nearly as great as M$'s.
JSP & PHP are great for web sites. Perl & Python are great replacements for shell scripting, as well as most general-purpose stuff. LISP is great, if you're a purist. Java has its uses, to be sure.
The point of all of this is that built-in memory allocation, built-in garbage collection, and a lack of pointers is A Very Good Thing(tm). You basically don't have bounds-checking problems. In general, scripting and VM code won't break due to memory leaks and the like.
Interpreted code, in particular, is highly reliable. As an example, Perl code, if well written, which means it traps all errors, etc., is rock solid. Python, I am told, is even more solid. C, on the other hand, is highly unstable. C++ is almost as bad, and VB code on M$ boxen, breaks all the time, as well.
These days, hardware, memory, and disk are SO cheap and fast that you *should* recoup almost all program performance costs associated in interpreted/scripting/vm languages in four ways: 1) faster, easier coding; 2) easier debugging; 3) more portability; 4) more reliable software. Of course, in the case of Perl, you've got to force good style upon yourself, so items 1 and 2 may not apply, some times....
I'd also avoid stuff that puts too much faith in the stability of dynamically linkable code. DLL's and COM objects in M$ land is a huge problem. It goes without saying that Linux's
C, C++, etc., have their uses, to be sure. People use C where it doesn't belong. It belongs in writing operating systems, interfaces, drivers, etc., but it isn't, for most intents and purposes, a good business language. C++ is better, but Java and *modern* scripting languages are even better, most of the time.
If we're going after "The Best Tool for the Job," I see that you need to balance among several different tensions: a semi-popular language (so you can get help, when needed), one that's well-documented (good books at your local book store and many web sites that cover it, for example), is highly portable (the larger, older, more successful, and more mature a project gets, the chances it'll get ported increase), does the job with a minimum amount of effort (planning, coding, testing, debugging, and documentation all go into this), won't crash unexpectedly (like C/C++/VB/assembly), runs quickly enough (with modern hardware and preemptive multitasking/multiprocessing operating systems, this isn't a bug issue, most of the time), is easy to fix/alter (most scripting languages don't have a compile step, so the code is the executable, ergo, it's usually easier to fix), and is general purpose (not specialized).
Just as important, you need to avoid the tensions of "too many" or "too few" languages for a project. Having 1 language that tries to force the big square peg in the small round hole is just as bad as 10 languages in a small to medium-sized project. Working on a team illustrates this even more. While SQL, OS shells, XML, HTML, and JavaScript are all exceptions to the rule (they're usually the only/main way to accomplish a specific task), having one person writing in C, another in C++, one in Perl, one in Python, another in VB, and still another in Java is usually a ticket to disaster, for most projects.
My personal rule of thumb: Perl for batch processing, utilities, command-line scripts, and most data massaging; PHP for small to medium web apps; JSP for larger web apps, or those created on teams of about 4 or more people; Java for most apps, especially GUIs; C/C++/VB for really specialized stuff; and what ever else, if you've got to support old code (new code from the above list).
glibc 2.2.x has a number of really nice little quirks that you can use to help debug memory problems. Among my favourites are:
r / ibc_3.html#SEC37
r /g db_6.html#SEC341 .1/html_chapter/g db_6.html#SEC35
MALLOC_CHECK_
If you set the environment variable MALLOC_CHECK_ before running a program, glibc uses a slow but thorough variant of malloc to do some checking on buffer overruns, double-frees, etc... Setting MALLOC_CHECK_ to 0 makes it ignore problems, 1 causes it to print a diagnostic to stderr, and 2 causes it to print a diagnostic and abort(). All of this is the glibc malloc(3) man page.
MALLOC_TRACE and mtrace()
If you "#include " in your source, you can call mtrace(3) at some point in your code. This function looks for the environment variable MALLOC_TRACE which it then logs all malloc(3)s, free(3)s, realloc(3)s and calloc(3)s to. When your program is finished, you can run the mtrace(1) perl script (also supplied with glibc) to run through this log, and print out a list of all unfreed memory, all freed, unallocated memory, all double-freed memory and probably a bit more besides. It's really handy.
I tend to put the "#include " and "mtrace()" calls inside "#ifdef HAVE_MTRACE" guards, and then add "-DHAVE_MTRACE" to my CFLAGS when compiling debug builds.
The documentation for this can be found at http://www.gnu.org/manual/glibc-2.2.3/html_chapte
malloc() and free() are weak symbols.
glibc's copy of free(3) is a `weak' symbol in the library. What this means is that you can write your own functions called malloc() and free() in your program, and those will be called all the time, instead of the proper ones. You can call the originals with _malloc() and _free, or __malloc() and __free() (can't remember which, think it's the first pair.) and do little extra checks and things yourself. (Such as filling memory with bogus data before returning, etc..., to make sure you're not forgetting to zero some bytes here and there for example.
gdb is also really great too and has loads of stuff that I've not found in other debuggers. Check out the manual sections on `ignore' (to ignore a breakpoint x times to catch the (x + 1)th malloc), and `commands' (to automatically print out variable values and continue for example) w.r.t. breakpoints.
http://www.gnu.org/manual/gdb-5.1.1/html_chapte
http://www.gnu.org/manual/gdb-5.
Why doesn't the gene pool have a life guard?
dmalloc is by far the best memory cheker ive tried
I'm not sure if it meets your requirements for a *good* object system, but certainly an existing Scheme object system is Scheme--, which is the obvious corollary to C++. I used this several years ago in an algorithms course. I didn't have the understand of and respect for OO that I currently possess, so I can't say if it was a crock or I simply didn't appreciate it.
Tried to google up some Scheme-- links, but alas, no luck. Sorry.
Bell Labs released vmalloc() for public use and gave a white paper on it at 1996 USENIX. I recently investigated it for memory leak problems in an embedded real time system. http://www.research.att.com/sw/tools/vmalloc/
Development is currently stoped but it looks super powerful and tracks memory allocation/dealocation by funtion etc...
r of iler.html
http://www710.univ-lyon1.fr/~yperret/fnccheck/p
I evaluated Scheme, Lisp for writing programs that we can distribute.
They don't seem to have many of the facilities that we take for granted in a widely used language. Like standard interfaces to TCP/IP networks, standard interfaces to databases.
What I would have liked to have done is - here is the API for sockets, here is the API for databases and spend time writing code. Agreed, not chasing mallocs saves time but so does not having to write foreign function interfaces to standard functionality. Maybe I was wrong in my evaluation. If so I would be glad to be corrected and begin using these languages
Without a doubt, the most industrial strength language avaiable for Free Sofware use is GNU Ada. Ada won't let you f*ck up. It is truly an awesome language. Check out the GNU Visual Debugger - gvd- for an example of one of the coolest examples of what Free Software Ada technology can do.
Good places to start:
There is a wealth of Ada lerning resources on the web, perhaps more on line instruction than any other progrmming language. Ada is at or above the same level of abstraction as C++. C++ programmers should not have too much trouble learning Ada. One other nice aspect of Ada is that since it was the first ISO standard OOP language, and since the way it interacts with other programming languages is codified as part of that standard, it is very easy to use Ada for the "mission critical parts of a software project. There is no need to re-write a whole project to start taking advantage of Ada; it can be done piece by piece.
I've found that this works quite nicely for tracking memory usage. Doesn't check for over runs etc but is pretty flexible and makes spotting leaks/double free etc very easy (which is pretty much all i use boundschecker for in win32...) its GPL too and nice and simple to look at... cheers.
http://web.inter.nl.net/hcc/Haj.Ten.Brugge/
I have used it, and it works very well.
Given the requirement of a using a
brain-damaged language, it is the best open source
tool I am aware of for finding bounds checking errors.
jeff.deifik@jpl.nasa.gov
With gcc, you can add bounds checking easily with extended inline assembler. It only adds one assembly instruction,
;
/* create a plain vanilla array for our test */
/* store the lower and upper limits of your test array */
/* if the index is out of range, create a core dump */
/*
and very little overhead. The C macro Bound(), defined below, make it very simple. Here is a demonstration:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
struct _bounds {
uint32_t lower;
uint32_t upper;
} __attribute__ ((aligned (4)))
#define Bound(X,Y) __asm__ ( "bound %0,%1\n\t" : : "r" (X), "m" (Y) )
#define UPPER_BOUND(X) (sizeof(X)-1)
#define LENGTH 15
static char test_array [LENGTH];
struct _bounds limits = { 0, UPPER_BOUND(test_array) };
void
bound_test (int index)
{
Bound (index, limits);
test_array[index] = 'a';
}
* We can invoke our test procedure bound_test() by entering
* an array index on the command line. If the index is out
* of range for the bound_test() procecure, the x86 "bound"
* instruction will trigger a core dump.
*/
int main(int argc, char *argv[])
{
if (argc > 1) {
bound_test (atoi(argv[1]));
}
return 0;
}