Slashdot Mirror


Using Redundancies to Find Errors

gsbarnes writes "Two Stanford researchers (Dawson Engler and Yichen Xie) have written a paper (pdf) showing that seemingly harmless redundant code is frequently a sign of not so harmless errors. Examples of redundant code: assigning a variable to itself, or dead code (code that is never reached). Some of their examples are obvious errors, some of them subtle. All are taken from a version of the Linux kernel (presumably they have already reported the bugs they found). Two interesting lessons: Apparently harmless mistakes often indicate serious troubles, so run lint and pay attention to its output. Also, in addition to its obvious practical uses, Linux provides a huge open codebase useful for researchers investigating questions about software engineering."

20 of 326 comments (clear)

  1. Here's a text link by Anonymous Coward · · Score: 4, Informative

    PDF usually crashes my computer (crappy adobe software). So here's a convenient text link!

    http://216.239.37.100/search?q=cache:yuZKW8CjTqIC: www.stanford.edu/~engler/p401-xie.ps+&hl=en&ie=UTF -8

  2. Patience! by houseofmore · · Score: 5, Funny

    "...dead code (code that is never reached)"

    Perhaps it's just shy!

  3. How to Avoid Mistakes? Practical Advice? by webword · · Score: 4, Insightful

    Unfortunately, this paper doesn't really offer any practical advice. Is is probably a little useful to very good, or great programmers. However, for new or moderately good programmers, it probably won't be very useful. It is certainly interesting in the academic sense, but I always want to see more practical advice. (I suppose that good practical advice flows down from good theoretical advice.)

    What are some of the best ways to learn to avoid problems? I know that experience is useful. Trial and error is good, mentoring is good, education is good. What else can you think of? What books are useful?

    Also, I wonder about usability problems. In other words, this article mainly hits on the problems of "hidden" code, not the interface. I'd like to see more about how programmers stuff interfaces with more and more useless crap, and how to avoid it. (Part of the answer is usability testing and gathering useful requirements, of course.) What do you think about this? How can we attack errors of omission and commission in interfaces?

    1. Re:How to Avoid Mistakes? Practical Advice? by trance9 · · Score: 4, Insightful


      I think the lesson here is basically that the compiler is your friend. Turn on all the error checking you possibly can in your development environment and pay attention to every last warning.

      If there is something trivial causing a warning in your code--fix it so it doesn't warn, even though it wasn't a "bug". If your compiler output is always pristine, with no warnings, then when a warning shows up if it's a bug you'll notice.

      Kind of common sense if you ask me--but maybe that's just a lesson I learned the hard way.

    2. Re:How to Avoid Mistakes? Practical Advice? by Pseudonym · · Score: 4, Insightful

      I'd put it more strongly than that. Reading between the lines of the paper, don't just fix the warning. Look around the place where the warning happened. You'll most likely find a bug.

      It's also a call for compilers to generate more warnings, which can only be a good thing.

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
    3. Re:How to Avoid Mistakes? Practical Advice? by sbszine · · Score: 5, Insightful

      Don't be afraid to refactor code every so often, because, schedule or no schedule, new requirements move the 'ideal' design away from what you drew up last month. That's (to my mind) the second largest contributor. Even good coders crumble to cost and schedule, and band-aid code that just plain needs to be rethought. In some environments, that's a fact of life. In others you will have to fight for it, but you can get code rewritten.

      In my experience, programming for an employer is the process of secretly introducing quality. This usually consists of debugging and refactoring on the sly while your pointy-haired boss thinks you're adding 'features'.

      Is it just me, or this the way it's done most places?

      --

      Vino, gyno, and techno -Bruce Sterling

    4. Re:How to Avoid Mistakes? Practical Advice? by ConceptJunkie · · Score: 4, Interesting

      Sometimes it's worse. I quit a job after 15 months, because I was constantly getting in trouble for trying to fix the cause of the problems rather than the symptoms.

      I was working on a 300,000-line Windows application, and I am not exaggerating here, it was about 5/6 redundant code. 100-line functions would be copy-and-pasted a dozen times and two lines changed in each. Plus, there were numerous executables to this project and often the same code (with minor variations of course) would exist in different executables.

      It was originally written in Borland C++ and the back-end was ported to Visual C++, but all the utility and support functions still existed in _both_ the Borland code and Microsoft code. Worse, they were not identical. Even worse, there was substantial use of STL, which doesn't work the same in Borland C++ (which is an ancient version... circa 1996) and Visual C++.

      That and the fact that using strcpy would have been a step up in maintaining buffer integrity, usually they just copied buffers and if one #define was different from a dozen others in completely different places, memory would be toast and we all know how that manifests.

      Worse, there was UI code written by someone who completely confused parent windows and base classes, such that the child window classes had all the data elements for the parent window, because they were derived from the parent window class!

      I spent an entire week once reviewing every single memcpy, etc, in the entire codebase (which was spaghetti-code in the extreme) just to eliminate all the buffer overruns I was discovering. THe program used a database with about 100 tables (a nightmare of redundancy in itself) and there was a several thousand-line include file of all the field sizes, maintained by hand (with plenty of typos). Eventually, I wrote a util to generate that include file automatically, which of course wasn't appreciated.

      I was trying to overcome these difficulties while being barraged with support calls, sometimes critical because this was software to manage access control systems for buildings, meaning I spent 80% of my time fighting fires. You know, situations like: "Our system is down... you need to fix this bug because we've had to hire guards for each door until we can get it back up again."
      There was only one other person working with me, and he quit in disgust and was not replaced for about 3 months.

      Finally, after stuggling mightily (in time, effort and at the expense of emotional well-being) to overcome the sheer incompetence put into this project (parts of which were 10 years old), I finally gave notice after it looked like my unscrupulous boss (who wrote a lot of this code) was doing everything he could to make it look like my fault to the client (even though they knew better and really liked working with me, precisely because I was not a BS-artist)... and after 15 years over never needing more than two weeks to find good work, I have been unemployed since May 2002.

      There's a moral here, but it escapes me.

      --
      You are in a maze of twisty little passages, all alike.
  4. Good Design = Tight Code by chewtoy-11 · · Score: 5, Insightful

    Writing repetitive code only once offers the same benefits as using Cascading Style Sheets for your webpages. If there is a serious error, you only have to track it down in the one place where it exists versus every single place you re-wrote the code. Also, it makes adding features much simpler as well. I'm an old school procedural programmer that is making the rocky transition to OOP programming. THIS is where it starts coming together...

    --
    C. Griffin
    "Can I keep his head for a souvenir?" --Max from Sam 'N Max Freelance Police
  5. And their other findings ... by kruetz · · Score: 5, Funny

    They also found that:

    Russian errors cause code
    Incorrect code causes errors
    Missing code causes errors
    Untested code causes errors
    Redundant codec causes redundancies
    Driver code causes headaches
    C code causes buffer overflows
    Java code causes exceptions
    Perl code causes illiteracy
    Solaris code causes rashes
    Novell code causes panic attacks
    Slashdot code causes multiple reposts
    Slashdot articles cause poor-quality posts
    Microsoft code causes exploits
    Apple code causes user cults
    Uncommented code causes code rage
    RIAA code causes computers to stop functioning
    (Poor idea causes long, desperate post)

    --

    This sig intentionally left bla... dammit!
    Who's got the whiteout?
  6. Intentional redundant code by 1nv4d3r · · Score: 5, Funny
    I've seen this (they were fired the next month):

    // and now to boost my LOC/Day performance...
    x += 0;
    x += 0;
    x += 0;
    x += 0;
    x += 0;
    x += 0;

    It actually caused a bug 'cuz they accidentally left the '+' off one of the lines. What an idiot.

    1. Re:Intentional redundant code by gUmbi · · Score: 4, Insightful

      I've seen this (they were fired the next month): // and now to boost my LOC/Day performance...


      Was the manager asking for lines of code/day fired too?

    2. Re:Intentional redundant code by Anonymous+Hack · · Score: 4, Insightful

      Ugh, counting LOC sucks. I find my most productive days are the ones i REMOVE lines of code, not add them. If i get a loop running tighter and faster, if i remove stuff that i could do better another way... that's what i'm paid to do.

      --
      I got a sig so you would remember me.
  7. Re:I have no idea what this article means ! by The+Bungi · · Score: 4, Insightful
    In any large system there's bound to be some amount of redundant code that can sometimes cause subtle errors, like slow memory leaks. These conditions develop over the lifetime of the code. Analyzing the code *as a whole* provides information about these types of situations and how to fix them, which more often than not is not trivial.

    "Dead" or unreachable code is almost always caused by patches or fixes to an existing codebase and it's always good to detect and get rid of it because it may point to other problems in the application (in my experience), or is simply dead wood that should be removed.

  8. Re:lint is horrible by RhettLivingston · · Score: 4, Insightful

    I enforced a policy of eliminating all Lint info messages on a 1.5 million line, from scratch project. And, I do mean from scratch. we wrote our own operating system, ANSI c Library, and drivers and ran it on hardware that we designed and produced. In the first 2 years of deployment, only five bugs were reported. Lint was only part of the reason, but it was a total part.

  9. Re:redundant? by Karl+Hungus,+nihilis · · Score: 4, Insightful
    Steve McConnell in the excellent book Code Complete talks about this sort of stuff. One of the big things was unused variables. Completely harmless, but a good indication that the code may have problems. If whoever is maintaining the code didn't bother to remove useless variables, what else are they not bothering to take care of?

    It's not, keep the code squeaky clean because cleanliness is next to godliness, it's keep the code clean so it's easy to read. Keep it clean because it's a discipline that will pay off when it's time to spot/fix the real errors in the code.

  10. Saw his talk at FSE by owenomalley · · Score: 5, Interesting

    I saw Dawson's talk at FSE (Foundations of Software Engineering). He uses static flow analysis to find problems in the code (like an advanced form of pclint). The most interesting part of his tool is in the ranking of the problem reports. He has developed a couple of heuristics that sort the problems by order of importance and they supposedly do a very good job. Static analysis tools find most of their problems in rarely run code, such as error handlers. Such problems are problematic and sometimes lead to non-deterministic problems, which are extremely hard to find with standard testing and debugging. (This is especially true, when the program under consideration is a kernel.) Dawson also verifies configurations of the kernel that no one would compile, because he tries to get as many possible drivers at the same time as he can. The more code, the better the consistency checks do at finding problems.

    By making assumptions about the program and checking the consistency of the program, his tool finds lots of problems. For instance, assume there is a function named foo that takes a pointer argument. His tool will notice how many of the callers of foo treat the parameter as freed versus how many treat the parameter as unfreed. The bigger the ratio, the more likely the 'bad' callers are to represent a bug. It doesn't really matter which view is correct. If the programmer is treating the parameter inconsistently, it is very likely a bug.

    He also mentioned that counter to his expectations, the most useful part of his tool was to find 'local' bugs. By local, I mean bugs that are local to a single procedure. They are both easier for the tool to find, more likely to actually be bugs, and much easier for the programmer to verify if they are in fact bugs.

    He analyzed a couple of the 2.2.x and 2.4.x versions of the kernel and found hundreds of bugs. Some of them were fixed promptly. Others were fixed slowly. Some were fixed by removing the code (almost always a device driver) from the kernel. Others he couldn't find anyone that cared about the bug enough to fix it. He was surprised at the amount of abandonware in the Linux kernel.
    It is extremely frustrating that Dawson won't release his tool to other researchers (or even better to the open source community at large). Without letting other people run his tool (or even better modify it), his research ultimately does little good other than finding bugs in linux device drivers. *heavy sigh* Oh well, eventually someone WILL reimplement this stuff and release it to the world.

    On a snide comment, if he was a company he would no doubt have been bought by Microsoft already. Intrinsa was doing some interesting stuff with static analysis and now after they were bought a couple of years ago, their tool is only available inside of Microsoft. *sigh*

  11. Re:Is this not Obvious by gmack · · Score: 5, Informative

    If only the story poster had actually read the paper.

    They used a custom checker that finds these things much more effiectivly than lint.

    I actually remember the flood of bug reports and kernel patches that toy of theirs generated the first few months they put it to use on the kernel.

  12. vi! by gregstoll · · Score: 4, Funny

    vi could do it!

  13. Re:I Hope They Didn't Get Paid by Temporal · · Score: 5, Insightful
    Read the paper, will you? Of course good programmers don't write redundant code... at least, not intentionally. So, when you do see redundant code in a program, it is more likely a typo, and if it's a typo, it is likely a bug. So, if you have a program which detects redundant code, it will likely find bugs for you. They wrote such a program. It found hundreds of bugs in the Linux kernel.

    Here's an example they cite from the Linux kernel:
    /* 2.4.1/net/appletalk/aarp.c:aarp_rcv */
    else { /* We need to make a copy of the entry. */
    da.s_node = sa.s_node;
    da.s_net = da.s_net;
    That last line assigns a variable to itself. Do you think that's what the programmer intended? Of course not. It's a bug. But no one caught it. If not for their program, maybe it would never have been caught.

    You think this research is useless? Do you always write bug-free code? Maybe you should run this program on your own code and see what happens.
  14. Removing dead code... by wowbagger · · Score: 4, Insightful
    I have a saying:
    If a line of code doesn't exists, then it cannot contain a bug.


    Like more aphorisms, you can argue this, but my point is this - every line of code in a program is a potential bug. Every line of code requires a bit more grey matter to process, making your code just that much more difficult to understand, debug, and maintain.

    So I ruthlessly remove dead code. Often, I'll see big blocks like this:


    #ifdef old_way_that_doesnt_work_well
    blah;
    blah;
    blah;
    #endif


    And I will summarily remove them. "But they were there for archival purposes - to show what was going on" some will say. Bullshit! If you want to say what didn't work, describe it in a comment. As for preserving the code itself - that is what CVS is for!

    By stripping the code down to the minimum number of lines, it compiles faster, it checks out of and in to CVS faster, and it is easier to understand and maintain.

    I will often see the following in C++ code:


    void foo_bar(int unused1, int unused2)
    {
    unused1 = unused1; // silence compiler warning
    unused2 = unused2; // silence compiler warning
    }


    And I will recode it thus:

    void foo_bar(int , int )
    {

    }


    That silences the "unused variable" warning, and makes it DAMN clear in the prototype that the function will never use those parameters. (True, you cannot do this in C.)

    Code should be a lean, mean state machine - no excess fat. (NOTE - this does NOT me remove error checking, #assert's, good debugging code, or exception handlers).