Using Redundancies to Find Errors
gsbarnes writes "Two Stanford researchers (Dawson Engler and Yichen Xie) have written a paper (pdf) showing that seemingly harmless redundant code is frequently a sign of not so harmless errors. Examples of redundant code: assigning a variable to itself, or dead code (code that is never reached). Some of their examples are obvious errors, some of them subtle. All are taken from a version of the Linux kernel (presumably they have already reported the bugs they found). Two interesting lessons: Apparently harmless mistakes often indicate serious troubles, so run lint and pay attention to its output. Also, in addition to its obvious practical uses, Linux provides a huge open codebase useful for researchers investigating questions about software engineering."
PDF usually crashes my computer (crappy adobe software). So here's a convenient text link!
: www.stanford.edu/~engler/p401-xie.ps+&hl=en&ie=UTF -8
http://216.239.37.100/search?q=cache:yuZKW8CjTqIC
More details
Appeared in FSE 2002. Finds funny bugs by looking for redundant operations (dead code, unused assignments, etc.). From empirical measurements, code with such redundant errors is 50-100% more likely to have hard errors. Also describes how to check for redundancies to find holes in specifications.
Link to PostScript file for easy viewing/printing
File
Reply or e-mail; don't vaguely moderate. Ex-O'Reilly/MIT employee, now a full-time Google employee.
This really old Slashdot logo still in use over on Team Slashdot's page on distributed.net.
"...dead code (code that is never reached)"
Perhaps it's just shy!
html version is here.
There aint no pancake so thin it doesn't have two sides.
To me 'redundant' implies duplication of something already there. (a=1; a=1;)
a=a; and dead code aren't so much redundant as they are superfluous. It's still a sign of possible errors, for sure.
Unfortunately, this paper doesn't really offer any practical advice. Is is probably a little useful to very good, or great programmers. However, for new or moderately good programmers, it probably won't be very useful. It is certainly interesting in the academic sense, but I always want to see more practical advice. (I suppose that good practical advice flows down from good theoretical advice.)
What are some of the best ways to learn to avoid problems? I know that experience is useful. Trial and error is good, mentoring is good, education is good. What else can you think of? What books are useful?
Also, I wonder about usability problems. In other words, this article mainly hits on the problems of "hidden" code, not the interface. I'd like to see more about how programmers stuff interfaces with more and more useless crap, and how to avoid it. (Part of the answer is usability testing and gathering useful requirements, of course.) What do you think about this? How can we attack errors of omission and commission in interfaces?
How to Download YouTube Videos
Writing repetitive code only once offers the same benefits as using Cascading Style Sheets for your webpages. If there is a serious error, you only have to track it down in the one place where it exists versus every single place you re-wrote the code. Also, it makes adding features much simpler as well. I'm an old school procedural programmer that is making the rocky transition to OOP programming. THIS is where it starts coming together...
C. Griffin
"Can I keep his head for a souvenir?" --Max from Sam 'N Max Freelance Police
They also found that:
Russian errors cause code
Incorrect code causes errors
Missing code causes errors
Untested code causes errors
Redundant codec causes redundancies
Driver code causes headaches
C code causes buffer overflows
Java code causes exceptions
Perl code causes illiteracy
Solaris code causes rashes
Novell code causes panic attacks
Slashdot code causes multiple reposts
Slashdot articles cause poor-quality posts
Microsoft code causes exploits
Apple code causes user cults
Uncommented code causes code rage
RIAA code causes computers to stop functioning
(Poor idea causes long, desperate post)
This sig intentionally left bla... dammit!
Who's got the whiteout?
It really is. It's a redundant holdover from ye old BSD versions. Granted, there are one or two times i've used it when -Wall -pedantic -Werror -Wfor-fuck's-sake-find-my-bug-already doesn't work, but a lot of the time it comes up with a LOT of complaints that are really unnecessary. Am i really going to have to step through tens of thousands of lines of code castind the return of every void function to (void)? Come on.
I got a sig so you would remember me.
Three letters: NIH.
Now, if you'll excuse me, I've got to get back to my text editor project.
Isn't this the job of that smart dude down the hall who runs Lunix computers and reads some Slash Period website or something?
Well, at least that's how I finish all my projects.
Reply or e-mail; don't vaguely moderate. Ex-O'Reilly/MIT employee, now a full-time Google employee.
A good editor could easily cut that article in half without loss of any information.
x += 0;
x += 0;
x += 0;
x += 0;
x += 0;
x += 0;
It actually caused a bug 'cuz they accidentally left the '+' off one of the lines. What an idiot.
"Dead" or unreachable code is almost always caused by patches or fixes to an existing codebase and it's always good to detect and get rid of it because it may point to other problems in the application (in my experience), or is simply dead wood that should be removed.
Ummm... surely if any story should be duped in the near future, it's this one. Please submit story suggestions accordingly.
deus does not exist but if he does
I saw Dawson's talk at FSE (Foundations of Software Engineering). He uses static flow analysis to find problems in the code (like an advanced form of pclint). The most interesting part of his tool is in the ranking of the problem reports. He has developed a couple of heuristics that sort the problems by order of importance and they supposedly do a very good job. Static analysis tools find most of their problems in rarely run code, such as error handlers. Such problems are problematic and sometimes lead to non-deterministic problems, which are extremely hard to find with standard testing and debugging. (This is especially true, when the program under consideration is a kernel.) Dawson also verifies configurations of the kernel that no one would compile, because he tries to get as many possible drivers at the same time as he can. The more code, the better the consistency checks do at finding problems.
By making assumptions about the program and checking the consistency of the program, his tool finds lots of problems. For instance, assume there is a function named foo that takes a pointer argument. His tool will notice how many of the callers of foo treat the parameter as freed versus how many treat the parameter as unfreed. The bigger the ratio, the more likely the 'bad' callers are to represent a bug. It doesn't really matter which view is correct. If the programmer is treating the parameter inconsistently, it is very likely a bug.
He also mentioned that counter to his expectations, the most useful part of his tool was to find 'local' bugs. By local, I mean bugs that are local to a single procedure. They are both easier for the tool to find, more likely to actually be bugs, and much easier for the programmer to verify if they are in fact bugs.
He analyzed a couple of the 2.2.x and 2.4.x versions of the kernel and found hundreds of bugs. Some of them were fixed promptly. Others were fixed slowly. Some were fixed by removing the code (almost always a device driver) from the kernel. Others he couldn't find anyone that cared about the bug enough to fix it. He was surprised at the amount of abandonware in the Linux kernel.
It is extremely frustrating that Dawson won't release his tool to other researchers (or even better to the open source community at large). Without letting other people run his tool (or even better modify it), his research ultimately does little good other than finding bugs in linux device drivers. *heavy sigh* Oh well, eventually someone WILL reimplement this stuff and release it to the world.
On a snide comment, if he was a company he would no doubt have been bought by Microsoft already. Intrinsa was doing some interesting stuff with static analysis and now after they were bought a couple of years ago, their tool is only available inside of Microsoft. *sigh*
And moreover, in the areas where you find these mechanically detectable bugs, the likelihood of more subtle bugs is higher, as these errors are made when the programmer isn't really thinking about what he's doing. It's like triaging your wounded code, to fix the worst first.
Well let me summerize it for you:
That paper explored the hypothesis that redundancies, like type erros, flag higher-level correctness mistakes. They evaluated the approach using four checkers which they applied to the Linux operating system. These simple analyses found many superising (to them) error types. Further, those errors correlated well with known hard errors: redundancies seemed to flag confused or poor programmers who were prone to other error types. According to them, these indicators could be used to decide where to audit a system.
I suppose they've hit my pet peeve. I've seen many simple problems turned into hideous monstrosities with many bugs by people trying to handle bugs that can't ever happen and imaginary special cases because they were never taught how to abstract a function. Perhaps it can't be taught. In 20+ years of programming, its been a very rare time when I've picked up code and not been able to cut out large chunks without replacing them.
If only the story poster had actually read the paper.
They used a custom checker that finds these things much more effiectivly than lint.
I actually remember the flood of bug reports and kernel patches that toy of theirs generated the first few months they put it to use on the kernel.
Hopefully, we can expect much more of such valuable breakthroughs from the academic community in the future, complete with papers full of badly formatted C code!
In the great CONS chain of life, you can either be the CAR or be in the CDR.
I was going to moderate this but there's no +1 Redundant.
I am artificially intelligent.
vi could do it!
For the record, it's been moved...
Larch FTP Site
January 28, 1999
Many files formerly on this site were moved elsewhere after a disk
crash in March, 1998.
The LCLint distribution can be found at
ftp://ftp.sds.lcs.mit.edu/pub/lclint
or http://www.sds.lcs.mit.edu/lclint
"Great men are not always wise: neither do the aged understand judgement." Job 32:9
Using Linux for academic research is hardly a new idea. In my group alone one of the profs has been publishing papers and giving talks about research using Linux since 2000.
d f - about the evolution of Linux
An example of such is http://plg.uwaterloo.ca/~migod/papers/evolution.p
Second, use standard idioms. For some, that may mean learning the standard idioms. These should become second nature. Programmers should express their creativity in the logic, structure, and simplicity of the code, not the non standard grammar. Standard forms allow more accurate coding and easier maintenance.
"She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
How does this get marked up as +3 insightful? Did you read the paper?
They are using analysis techniques to locate bugs at specific points within the parse-tree.Hence, they are locating bugs within specific functions rather than just files. As all of their examples showed. Sure, its a nice point. But it is what they are doing.
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
The Stanford Checker is great. I was blown away when I read their papers last year. Their checker is not released yet, so I wrote a similar checker (smatch.sf.net) based on their publications.
The poster mentions Lint, but I did not have any success using Lint on the kernel sources. The source is too unusual.
Also Lint does not look for application specific bugs. For example, in the Linux kernel you are not supposed to call copy_to_user() with spinlocks held. It took me about 10 minutes to modify one my existing scripts to check for that in the Linux kernel last Monday. It found 16 errors. (It should have found more but I was being lazy.)
A lot of the time, you can't tell what will be a common error until you try looking for it. One funny thing was that there were many places where people had code after a return statement. On the other hand, I didn't find even one place where '=' and '==' were backwards.
It's fascinating stuff playing around with this stuff. I have been learning a lot about the kernel through playing around with Smatch.
DNA code also has high redundancy, which allows error-correcting transcription and other hacks ( see Parity Code And DNA or DNA's Error Detecting Code)
In both cases factors yielding robust DNA code are found to indicate bad digital computer code.
flip
(background: Ars Technica's Computational DNA primer
So many people have made silly comments about this being obvious, useless or whatever. This is probably because they did not actually READ the paper.
The paper is not about obvious code redundancy bugs, it is about subtle errors which are not as simple as just duplicate code. It is about code that *appears* to be executed but actually is not.
Go take a look at the examples and see how long it takes you to notice the different errors...now imagine have a thousand pages of code to peruse..would you catch it? Many of them probably not.
The conclusion of the paper is basically, errors cluster around errors; finding a trivial unoptimal syntactical constructions tends to point to real bugs.
Where there's smoke, there's fire.
- Every programs contains bugs.
- Every program contains redundancies and so can be made smaller without changing behavior.
Therefore the empty program is redundant but still buggy.CQual
It's been used to find security holes.
They made fools out of themselves with this one:
/* make this _really_ smp-safe */
if (!cam || !cam->ops)
return -ENODEV;
if (down_interruptible(&cam->busy_lock))
return -EINTR;
if (!cam || !cam->ops)
return -ENODEV;
Their comment: 'We believe this could be indication of a novice programmer...blabla...shows poor grasp of the code'.
BZZZZZZZZZT
Nice try kids, but unlike you, this piece of code was probably written by an experienced guy that has actually written code for parallel systems before. Since it's tricky, you would be excused if not for the 'novice programmer' comment above and the fact that the code itself says it's there for SMP safety.
Here's a hint: UNTIL you acquire the lock on 'cam', any other process can change the value, including at the point BETWEEN the first check and the acquisation of the lock.
--
GCP
Here's an example they cite from the Linux kernel:That last line assigns a variable to itself. Do you think that's what the programmer intended? Of course not. It's a bug. But no one caught it. If not for their program, maybe it would never have been caught.
You think this research is useless? Do you always write bug-free code? Maybe you should run this program on your own code and see what happens.
Is this dead code going to get removed?
No.
Why not?
Because, one, it's only an opinion that it's dead code. There could be some obscure case that no one imagined that could use it. Two, if some programmer removed it and it turned out that it was needed or the programmer screwed up the removal, the programmer would be blamed and take a lot of grief for it. If it ain't broke, don't fix it.
Now, it could be that the dead code doesn't work properly for the obscure case. But how could you tell? Do you want to write a test case for code that no one can figure out how it gets invoked?
If you look at a CVS repository and identify those files that have high revision numbers, there's a good chance they are full of errors and need to be rewritten.
One visualization is to color code according to it's age - old code blue, and new code red - then look at the results. You will often see that the red code clusters, and there are huge regions of blue that have been stable for years. You will also see relatively small clusters of differening shades of red, as people need to keep banging on the same problematic code.
Try this: pmd
Like more aphorisms, you can argue this, but my point is this - every line of code in a program is a potential bug. Every line of code requires a bit more grey matter to process, making your code just that much more difficult to understand, debug, and maintain.
So I ruthlessly remove dead code. Often, I'll see big blocks like this:
#ifdef old_way_that_doesnt_work_well
blah;
blah;
blah;
#endif
And I will summarily remove them. "But they were there for archival purposes - to show what was going on" some will say. Bullshit! If you want to say what didn't work, describe it in a comment. As for preserving the code itself - that is what CVS is for!
By stripping the code down to the minimum number of lines, it compiles faster, it checks out of and in to CVS faster, and it is easier to understand and maintain.
I will often see the following in C++ code:
void foo_bar(int unused1, int unused2)
{
unused1 = unused1;
unused2 = unused2;
}
And I will recode it thus:
void foo_bar(int , int )
{
}
That silences the "unused variable" warning, and makes it DAMN clear in the prototype that the function will never use those parameters. (True, you cannot do this in C.)
Code should be a lean, mean state machine - no excess fat. (NOTE - this does NOT me remove error checking, #assert's, good debugging code, or exception handlers).
www.eFax.com are spammers
Yes, and to the point, critical systems software tests have use code coverage as one of the measures of being 'fit' to use. Dead code cannot be executed and thus falls into the 'untouched code' category. If the software package has much of this untouched/untouchable code, it can't be used in a critical system. For example (going from memory here), Motif couldn't be certified for digital graphical primary instrument display because of this (I think that even after exhaustive testing, it only hit the 66% coverage mark, IIRC) on the Boeing 777. It could therefore only be used for an alternate display and the primary displays had to be implemented using something else.
The problem is that if the code can't be tested, it cannot be trusted and if *somehow* that code got executed while operational, the results could be "bad".
...using this thing here:
http://pmd.sourceforge.net/cpd.html
CPD uses a variant on Greedy String Tiling to find duplicated code in Java programs. There's also a JavaSpaces version since finding duplicated code is fairly parallelizable....
Yours,
Tom
The Army reading list
These researchers obviously have a good hold on compiler technology, since they implemented their checkers with xgcc. They also seem to understand logic quite well, since their code uses and extends on gcc's control-flow analysis algorithms. And they do, actually, understand what's going on here.
As for your particular example, the check really is redundant, but it was almost definitely intentional. It's true that another processor could change the cam variable between the first check and the lock -- but taking the first check out would have no impact on the functionality or correctness of the code. It's just a performance enhancement so that the routine can exit early in the error case, without the overhead of locking the lock. Removing the bit of redundant code would just add a little overhead to the error case.
In short, their checker found a true redundancy. They may have not realized its purpose since they don't have specific experience with this kind of parallel programming, but it's a redundancy. If you had actually read the paper instead of merely glancing over it, you would have seen that their checker respects the volatile nature of variables declared as such -- the checker is fully aware that a second thread can change the value between one operation and the other -- and it still figures out that the check is redundant.
Here's a hint: don't go around claiming people are fools unless you've got some evidence. These guys had hundreds and hundreds of bugs to go through, and expecting them to perfectly analyze every last one of them is unfair.
Oh, and -10 points for using "BZZZZZZT".