Using Redundancies to Find Errors
gsbarnes writes "Two Stanford researchers (Dawson Engler and Yichen Xie) have written a paper (pdf) showing that seemingly harmless redundant code is frequently a sign of not so harmless errors. Examples of redundant code: assigning a variable to itself, or dead code (code that is never reached). Some of their examples are obvious errors, some of them subtle. All are taken from a version of the Linux kernel (presumably they have already reported the bugs they found). Two interesting lessons: Apparently harmless mistakes often indicate serious troubles, so run lint and pay attention to its output. Also, in addition to its obvious practical uses, Linux provides a huge open codebase useful for researchers investigating questions about software engineering."
NO flame intended i just have no clue as to the point of this . What exactly do these flaws cause and etc.. Maybe I'm just not l33t enough but i could sure use an explanation from someone more learned than me .
A file (module/unit/whatever) is still a fairly high granularity to make decisions upon. I'd be more interested at finding bugs within specific functions rather than just files...
/* affect != effect */ void affect(int *thing,int effect) { *thing += effect; }
Unfortunately, this paper doesn't really offer any practical advice. Is is probably a little useful to very good, or great programmers. However, for new or moderately good programmers, it probably won't be very useful. It is certainly interesting in the academic sense, but I always want to see more practical advice. (I suppose that good practical advice flows down from good theoretical advice.)
What are some of the best ways to learn to avoid problems? I know that experience is useful. Trial and error is good, mentoring is good, education is good. What else can you think of? What books are useful?
Also, I wonder about usability problems. In other words, this article mainly hits on the problems of "hidden" code, not the interface. I'd like to see more about how programmers stuff interfaces with more and more useless crap, and how to avoid it. (Part of the answer is usability testing and gathering useful requirements, of course.) What do you think about this? How can we attack errors of omission and commission in interfaces?
How to Download YouTube Videos
Writing repetitive code only once offers the same benefits as using Cascading Style Sheets for your webpages. If there is a serious error, you only have to track it down in the one place where it exists versus every single place you re-wrote the code. Also, it makes adding features much simpler as well. I'm an old school procedural programmer that is making the rocky transition to OOP programming. THIS is where it starts coming together...
C. Griffin
"Can I keep his head for a souvenir?" --Max from Sam 'N Max Freelance Police
It really is. It's a redundant holdover from ye old BSD versions. Granted, there are one or two times i've used it when -Wall -pedantic -Werror -Wfor-fuck's-sake-find-my-bug-already doesn't work, but a lot of the time it comes up with a LOT of complaints that are really unnecessary. Am i really going to have to step through tens of thousands of lines of code castind the return of every void function to (void)? Come on.
I got a sig so you would remember me.
".. so run lint .."
I understand that there are non-C compilers out there that actually detect code errors without running a separate utility. Amazing!
Rendundant code would be more akin to something like this:
if (a==1)
{
[some chunk of code]
}
else
{
[same (or almost exact) chunk of code]
}
where the same code block appears multiple times in a file/class/project. By having the same block of code appear multiple times the chance of a user-generated error increases. Easily fixed by moving the repeated code into a parameterized function.
It's not, keep the code squeaky clean because cleanliness is next to godliness, it's keep the code clean so it's easy to read. Keep it clean because it's a discipline that will pay off when it's time to spot/fix the real errors in the code.
I've seen this (they were fired the next month): // and now to boost my LOC/Day performance...
Was the manager asking for lines of code/day fired too?
Ugh, counting LOC sucks. I find my most productive days are the ones i REMOVE lines of code, not add them. If i get a loop running tighter and faster, if i remove stuff that i could do better another way... that's what i'm paid to do.
I got a sig so you would remember me.
All too often people cut and paste blocks of code and then change parts of the pasted blocks and don't look at the rest of it enough to recognize problems.
Second, use standard idioms. For some, that may mean learning the standard idioms. These should become second nature. Programmers should express their creativity in the logic, structure, and simplicity of the code, not the non standard grammar. Standard forms allow more accurate coding and easier maintenance.
"She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
A second is that it's probably not really in a fit state for sharing as yet (the tool is not the goal of the research, after all).
This is not a valid reason, I know a lot of people (including me) that would be happy to improve his research code into something useful for the community at large. I remember a paper describing a gcc extension to write semantic checks (for instance, reenable interrupts after disabling them). This program found an amazing number of bugs in the linux kernel. I really wish I could have something like that at hand!
So many people have made silly comments about this being obvious, useless or whatever. This is probably because they did not actually READ the paper.
The paper is not about obvious code redundancy bugs, it is about subtle errors which are not as simple as just duplicate code. It is about code that *appears* to be executed but actually is not.
Go take a look at the examples and see how long it takes you to notice the different errors...now imagine have a thousand pages of code to peruse..would you catch it? Many of them probably not.
The conclusion of the paper is basically, errors cluster around errors; finding a trivial unoptimal syntactical constructions tends to point to real bugs.
Where there's smoke, there's fire.
Here's an example they cite from the Linux kernel:That last line assigns a variable to itself. Do you think that's what the programmer intended? Of course not. It's a bug. But no one caught it. If not for their program, maybe it would never have been caught.
You think this research is useless? Do you always write bug-free code? Maybe you should run this program on your own code and see what happens.
Is this dead code going to get removed?
No.
Why not?
Because, one, it's only an opinion that it's dead code. There could be some obscure case that no one imagined that could use it. Two, if some programmer removed it and it turned out that it was needed or the programmer screwed up the removal, the programmer would be blamed and take a lot of grief for it. If it ain't broke, don't fix it.
Now, it could be that the dead code doesn't work properly for the obscure case. But how could you tell? Do you want to write a test case for code that no one can figure out how it gets invoked?
If you look at a CVS repository and identify those files that have high revision numbers, there's a good chance they are full of errors and need to be rewritten.
One visualization is to color code according to it's age - old code blue, and new code red - then look at the results. You will often see that the red code clusters, and there are huge regions of blue that have been stable for years. You will also see relatively small clusters of differening shades of red, as people need to keep banging on the same problematic code.
Like more aphorisms, you can argue this, but my point is this - every line of code in a program is a potential bug. Every line of code requires a bit more grey matter to process, making your code just that much more difficult to understand, debug, and maintain.
So I ruthlessly remove dead code. Often, I'll see big blocks like this:
#ifdef old_way_that_doesnt_work_well
blah;
blah;
blah;
#endif
And I will summarily remove them. "But they were there for archival purposes - to show what was going on" some will say. Bullshit! If you want to say what didn't work, describe it in a comment. As for preserving the code itself - that is what CVS is for!
By stripping the code down to the minimum number of lines, it compiles faster, it checks out of and in to CVS faster, and it is easier to understand and maintain.
I will often see the following in C++ code:
void foo_bar(int unused1, int unused2)
{
unused1 = unused1;
unused2 = unused2;
}
And I will recode it thus:
void foo_bar(int , int )
{
}
That silences the "unused variable" warning, and makes it DAMN clear in the prototype that the function will never use those parameters. (True, you cannot do this in C.)
Code should be a lean, mean state machine - no excess fat. (NOTE - this does NOT me remove error checking, #assert's, good debugging code, or exception handlers).
www.eFax.com are spammers
I suggest you check out Dawson Engler's resume; he has almost certainly done 10x more parallel-systems development than you have. This particular code example might be a bad one, because the analysis that supports the author's conclusion is omitted from the article, but the basic point is still valid: code that contains duplicate condition checks like those in the example is more likely to contain real bugs than less duplicative code, and the "low-hanging fruit" can be identified automatically. It's not hard at all to see how deeper analysis, different rules, or annotation could do a better job of weeding out false duplicates without compromising the tool's ability to flag legitimate areas of concern.
You're arguing about low-level implementation, when the author was trying to make a point about a high-level principle. That's the hallmark of an insecure junior programmer.
Slashdot - News for Herds. Stuff that Splatters.