Using Redundancies to Find Errors
gsbarnes writes "Two Stanford researchers (Dawson Engler and Yichen Xie) have written a paper (pdf) showing that seemingly harmless redundant code is frequently a sign of not so harmless errors. Examples of redundant code: assigning a variable to itself, or dead code (code that is never reached). Some of their examples are obvious errors, some of them subtle. All are taken from a version of the Linux kernel (presumably they have already reported the bugs they found). Two interesting lessons: Apparently harmless mistakes often indicate serious troubles, so run lint and pay attention to its output. Also, in addition to its obvious practical uses, Linux provides a huge open codebase useful for researchers investigating questions about software engineering."
PDF usually crashes my computer (crappy adobe software). So here's a convenient text link!
: www.stanford.edu/~engler/p401-xie.ps+&hl=en&ie=UTF -8
http://216.239.37.100/search?q=cache:yuZKW8CjTqIC
More details
Appeared in FSE 2002. Finds funny bugs by looking for redundant operations (dead code, unused assignments, etc.). From empirical measurements, code with such redundant errors is 50-100% more likely to have hard errors. Also describes how to check for redundancies to find holes in specifications.
Link to PostScript file for easy viewing/printing
File
Reply or e-mail; don't vaguely moderate. Ex-O'Reilly/MIT employee, now a full-time Google employee.
html version is here.
There aint no pancake so thin it doesn't have two sides.
Well let me summerize it for you:
That paper explored the hypothesis that redundancies, like type erros, flag higher-level correctness mistakes. They evaluated the approach using four checkers which they applied to the Linux operating system. These simple analyses found many superising (to them) error types. Further, those errors correlated well with known hard errors: redundancies seemed to flag confused or poor programmers who were prone to other error types. According to them, these indicators could be used to decide where to audit a system.
If only the story poster had actually read the paper.
They used a custom checker that finds these things much more effiectivly than lint.
I actually remember the flood of bug reports and kernel patches that toy of theirs generated the first few months they put it to use on the kernel.
For the record, it's been moved...
Larch FTP Site
January 28, 1999
Many files formerly on this site were moved elsewhere after a disk
crash in March, 1998.
The LCLint distribution can be found at
ftp://ftp.sds.lcs.mit.edu/pub/lclint
or http://www.sds.lcs.mit.edu/lclint
"Great men are not always wise: neither do the aged understand judgement." Job 32:9
Using Linux for academic research is hardly a new idea. In my group alone one of the profs has been publishing papers and giving talks about research using Linux since 2000.
d f - about the evolution of Linux
An example of such is http://plg.uwaterloo.ca/~migod/papers/evolution.p
There *are* better tools already available, though not for free.
I've long wished that Gimpel Software would release a Linux version of their lint -- it does everything described in the PDF article and much more.
CQual
It's been used to find security holes.
Gee, reread yourself... Sorry but they're 100% right. If they could acquire the lock, cam and cam->ops cannot be NULL due to the first check.
BTW, in 2.4.18, the code now looks like this for this same function:
struct cam_data *cam = dev->priv; int retval; if (!cam || !cam->ops) return -ENODEV; DBG("cpia_mmap: %ld\n", size); if (size > FRAME_NUM*CPIA_MAX_FRAME_SIZE) return -EINVAL; /* REDUNDANT! */
if (!cam || !cam->ops)
return -ENODEV; /* make this _really_ smp-safe */
if (down_interruptible(&cam->busy_lock))
return -EINTR;
Check out Splint (formerly LCLint). Whereas traditional lint is less needed now that compilers have -W switches, splint has a whole bunch of extra stuff which gcc won't warn about. The only trouble with it is that by default, it wants you to add annotations to your program to help it be checked (for example if a function parameter is a pointerm you can annotate whether it is allowed to be null). If you go along with that then splint can give lots of help in finding places where null pointer dereferences could happen, and other bugs. But even if you don't want to annotate and you use the less strict checking it's still a handy tool. (OK, maybe C99 has some of this stuff too, but splint has more.)
-- Ed Avis ed@membled.com
...using this thing here:
http://pmd.sourceforge.net/cpd.html
CPD uses a variant on Greedy String Tiling to find duplicated code in Java programs. There's also a JavaSpaces version since finding duplicated code is fairly parallelizable....
Yours,
Tom
The Army reading list
>Uhm, if cam was NULL after the first check,
>wouldn't cam->busy_lock cause a segfault
I don't feel bad about being flamed by people that missed my point, but I do feel bad about this. You are 100% right - the way I read the code can't possibly be correct.
--
GCP