Slashdot Mirror


Using Redundancies to Find Errors

gsbarnes writes "Two Stanford researchers (Dawson Engler and Yichen Xie) have written a paper (pdf) showing that seemingly harmless redundant code is frequently a sign of not so harmless errors. Examples of redundant code: assigning a variable to itself, or dead code (code that is never reached). Some of their examples are obvious errors, some of them subtle. All are taken from a version of the Linux kernel (presumably they have already reported the bugs they found). Two interesting lessons: Apparently harmless mistakes often indicate serious troubles, so run lint and pay attention to its output. Also, in addition to its obvious practical uses, Linux provides a huge open codebase useful for researchers investigating questions about software engineering."

36 of 326 comments (clear)

  1. I have no idea what this article means ! by red-beard's · · Score: 1, Insightful

    NO flame intended i just have no clue as to the point of this . What exactly do these flaws cause and etc.. Maybe I'm just not l33t enough but i could sure use an explanation from someone more learned than me .

    1. Re:I have no idea what this article means ! by The+Bungi · · Score: 4, Insightful
      In any large system there's bound to be some amount of redundant code that can sometimes cause subtle errors, like slow memory leaks. These conditions develop over the lifetime of the code. Analyzing the code *as a whole* provides information about these types of situations and how to fix them, which more often than not is not trivial.

      "Dead" or unreachable code is almost always caused by patches or fixes to an existing codebase and it's always good to detect and get rid of it because it may point to other problems in the application (in my experience), or is simply dead wood that should be removed.

    2. Re:I have no idea what this article means ! by RollingThunder · · Score: 2, Insightful

      And moreover, in the areas where you find these mechanically detectable bugs, the likelihood of more subtle bugs is higher, as these errors are made when the programmer isn't really thinking about what he's doing. It's like triaging your wounded code, to fix the worst first.

  2. Analysis "by file" vs "by function"? by mcdrewski42 · · Score: 0, Insightful

    A file (module/unit/whatever) is still a fairly high granularity to make decisions upon. I'd be more interested at finding bugs within specific functions rather than just files...

    --
    /* affect != effect */ void affect(int *thing,int effect) { *thing += effect; }
    1. Re:Analysis "by file" vs "by function"? by smallfries · · Score: 2, Insightful

      How does this get marked up as +3 insightful? Did you read the paper?

      They are using analysis techniques to locate bugs at specific points within the parse-tree.Hence, they are locating bugs within specific functions rather than just files. As all of their examples showed. Sure, its a nice point. But it is what they are doing.

      --
      Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
  3. How to Avoid Mistakes? Practical Advice? by webword · · Score: 4, Insightful

    Unfortunately, this paper doesn't really offer any practical advice. Is is probably a little useful to very good, or great programmers. However, for new or moderately good programmers, it probably won't be very useful. It is certainly interesting in the academic sense, but I always want to see more practical advice. (I suppose that good practical advice flows down from good theoretical advice.)

    What are some of the best ways to learn to avoid problems? I know that experience is useful. Trial and error is good, mentoring is good, education is good. What else can you think of? What books are useful?

    Also, I wonder about usability problems. In other words, this article mainly hits on the problems of "hidden" code, not the interface. I'd like to see more about how programmers stuff interfaces with more and more useless crap, and how to avoid it. (Part of the answer is usability testing and gathering useful requirements, of course.) What do you think about this? How can we attack errors of omission and commission in interfaces?

    1. Re:How to Avoid Mistakes? Practical Advice? by trance9 · · Score: 4, Insightful


      I think the lesson here is basically that the compiler is your friend. Turn on all the error checking you possibly can in your development environment and pay attention to every last warning.

      If there is something trivial causing a warning in your code--fix it so it doesn't warn, even though it wasn't a "bug". If your compiler output is always pristine, with no warnings, then when a warning shows up if it's a bug you'll notice.

      Kind of common sense if you ask me--but maybe that's just a lesson I learned the hard way.

    2. Re:How to Avoid Mistakes? Practical Advice? by Pseudonym · · Score: 4, Insightful

      I'd put it more strongly than that. Reading between the lines of the paper, don't just fix the warning. Look around the place where the warning happened. You'll most likely find a bug.

      It's also a call for compilers to generate more warnings, which can only be a good thing.

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
    3. Re:How to Avoid Mistakes? Practical Advice? by 1nv4d3r · · Score: 3, Insightful

      I'd like to see more about how programmers stuff interfaces with more and more useless crap, and how to avoid it.

      Well, identifying it as useless crap is a good first step.

      And for managers:

      while(economy.isDown())
      if(newInterface.isUselessCrap())
      {
      fireEmployee();
      hireNextTelecomRefugee();
      }

      If you want a serious answer, one reason programs get filled with so much useless crap is because 80% of programmers program so they can collect a paycheck. They don't give a flying fuck if their code is good or not. That was a big eye-opener for me when I first got out of school. I couldn't believe how many people just didn't care.

      If you are interested at all in not muddying the interface, you are most of the way there. Give it some thought, consult with your peers, and try to learn from mistakes.

      Don't be afraid to refactor code every so often, because, schedule or no schedule, new requirements move the 'ideal' design away from what you drew up last month. That's (to my mind) the second largest contributor. Even good coders crumble to cost and schedule, and band-aid code that just plain needs to be rethought. In some environments, that's a fact of life. In others you will have to fight for it, but you can get code rewritten.

    4. Re:How to Avoid Mistakes? Practical Advice? by sbszine · · Score: 5, Insightful

      Don't be afraid to refactor code every so often, because, schedule or no schedule, new requirements move the 'ideal' design away from what you drew up last month. That's (to my mind) the second largest contributor. Even good coders crumble to cost and schedule, and band-aid code that just plain needs to be rethought. In some environments, that's a fact of life. In others you will have to fight for it, but you can get code rewritten.

      In my experience, programming for an employer is the process of secretly introducing quality. This usually consists of debugging and refactoring on the sly while your pointy-haired boss thinks you're adding 'features'.

      Is it just me, or this the way it's done most places?

      --

      Vino, gyno, and techno -Bruce Sterling

    5. Re:How to Avoid Mistakes? Practical Advice? by megaralf · · Score: 2, Insightful

      On the contrary. This paper is most practical.
      Just use this mechanism and you can find thousands of errors in an already tested system.
      What impressed me most was something like this.
      if( complex statement );
      do=that;

      Notice the semicolon! This kind of errors are very hard to spot and they can stay in the code forever.
      I will propose to use a code-checker like this in our software to improve the quality.

    6. Re:How to Avoid Mistakes? Practical Advice? by blibbleblobble · · Score: 2, Insightful

      "Unfortunately, this paper doesn't really offer any practical advice."

      It looks like it's intended for the people who program GCC, perl, kaffe, etc., as they can use this information to build better checking into their respective compilers, rather than for programmers.

  4. Good Design = Tight Code by chewtoy-11 · · Score: 5, Insightful

    Writing repetitive code only once offers the same benefits as using Cascading Style Sheets for your webpages. If there is a serious error, you only have to track it down in the one place where it exists versus every single place you re-wrote the code. Also, it makes adding features much simpler as well. I'm an old school procedural programmer that is making the rocky transition to OOP programming. THIS is where it starts coming together...

    --
    C. Griffin
    "Can I keep his head for a souvenir?" --Max from Sam 'N Max Freelance Police
    1. Re:Good Design = Tight Code by shortscruffydave · · Score: 2, Insightful

      I'm an old school procedural programmer that is making the rocky transition to OOP programming

      I agree whole-heartedly with what you say about code re-use. However I wouldn't see this as being a feature solely of OOP. Get the design right, and you can have some equally tight, highly-reusable procedural code.

    2. Re:Good Design = Tight Code by ArthurDent · · Score: 3, Insightful

      While OOP can be one method of solving repetitive code, good design is always the best way to solve it. What I've found is *any* time you're tempted to use the cut and paste functions within code, you need to ask yourself: Is there a common function that I can factor out to make only one opportunity for errors rather than two?

      You'll have more functions and the code might be a little harder to follow for the unfamiliar, but it will be much easier to debug if there is only one function that does a particular task.

      Ben

  5. lint is horrible by Anonymous+Hack · · Score: 3, Insightful

    It really is. It's a redundant holdover from ye old BSD versions. Granted, there are one or two times i've used it when -Wall -pedantic -Werror -Wfor-fuck's-sake-find-my-bug-already doesn't work, but a lot of the time it comes up with a LOT of complaints that are really unnecessary. Am i really going to have to step through tens of thousands of lines of code castind the return of every void function to (void)? Come on.


    --
    I got a sig so you would remember me.
    1. Re:lint is horrible by RhettLivingston · · Score: 4, Insightful

      I enforced a policy of eliminating all Lint info messages on a 1.5 million line, from scratch project. And, I do mean from scratch. we wrote our own operating system, ANSI c Library, and drivers and ran it on hardware that we designed and produced. In the first 2 years of deployment, only five bugs were reported. Lint was only part of the reason, but it was a total part.

    2. Re:lint is horrible by Anonymous+Hack · · Score: 3, Insightful

      Lint? Lint compains if you call a function that returns an int and you ignore the int. This is particularly irritating in the case of strcpy() and similar functions where you would normally do:

      strcpy(buf, "hello");

      except you're supposed to do:

      (void) strcpy(buf, "hello");

      or...

      buf = strcpy(buf, "hello");

      And that's just the beginning...

      --
      I got a sig so you would remember me.
    3. Re:lint is horrible by Arandir · · Score: 1, Insightful

      Lint compains if you call a function that returns an int and you ignore the int.

      Seeing that you used strcpy() as an example, I can certainly understand why you don't like lint. For those of us who know better than to use such dangerous functions, lint and -Wall are our annoying but useful friends.

      --
      A Government Is a Body of People, Usually Notably Ungoverned
    4. Re:lint is horrible by Anonymous+Hack · · Score: 2, Insightful

      Yeah, you're right :-) But it's a bitch, ya know? Explicitly casting 90% of API functions to void, or checking for errors on printf() which 99.999999% of the time works, unless you're running on some weird embedded platform that doesn't have stdout. In my current project checking API returns would be nasty... It may only be a few processor cycles, but having the compiled code do lots of JNZ instructions that most of the time will never be true could mean the difference between waiting 20 seconds and 30 seconds for the program to complete its task... and i think i'd rather have the performance.

      --
      I got a sig so you would remember me.
    5. Re:lint is horrible by pmz · · Score: 3, Insightful

      lint is horrible

      No, it isn't. I took a legacy application that I began maintaining and used lint to eliminate hundreds of lines of code and several real never-before-detected bugs. It also encouraged me to remove dozens of implicit declarations and redundant "extern" statements in favor of real header files. The application really is better for it, and to do this work without lint would have been very very tedious. Granted, my experience is with Sun's compiler's lint, so I can't say whether other implementations are as good.

      ...it comes up with a LOT of complaints that are really unnecessary.

      Actually, all of lint's complaints are about a potential problem. You just have to decide what is worth the time to fix.

      Using lint is a deliberate process that should take several days or weeks for a large application (on the first time through). After that initial investment, using lint is still an important part of the ongoing health of the program, but it should become less and less of an effort each time.

  6. Error checking compilers by Anonymous Coward · · Score: 1, Insightful

    ".. so run lint .."

    I understand that there are non-C compilers out there that actually detect code errors without running a separate utility. Amazing!

  7. Re:redundant? by drzhivago · · Score: 2, Insightful

    Rendundant code would be more akin to something like this:


    if (a==1)
    {
    [some chunk of code]
    }
    else
    {
    [same (or almost exact) chunk of code]
    }


    where the same code block appears multiple times in a file/class/project. By having the same block of code appear multiple times the chance of a user-generated error increases. Easily fixed by moving the repeated code into a parameterized function.

  8. Re:redundant? by Karl+Hungus,+nihilis · · Score: 4, Insightful
    Steve McConnell in the excellent book Code Complete talks about this sort of stuff. One of the big things was unused variables. Completely harmless, but a good indication that the code may have problems. If whoever is maintaining the code didn't bother to remove useless variables, what else are they not bothering to take care of?

    It's not, keep the code squeaky clean because cleanliness is next to godliness, it's keep the code clean so it's easy to read. Keep it clean because it's a discipline that will pay off when it's time to spot/fix the real errors in the code.

  9. Re:Intentional redundant code by gUmbi · · Score: 4, Insightful

    I've seen this (they were fired the next month): // and now to boost my LOC/Day performance...


    Was the manager asking for lines of code/day fired too?

  10. Re:Intentional redundant code by Anonymous+Hack · · Score: 4, Insightful

    Ugh, counting LOC sucks. I find my most productive days are the ones i REMOVE lines of code, not add them. If i get a loop running tighter and faster, if i remove stuff that i could do better another way... that's what i'm paid to do.

    --
    I got a sig so you would remember me.
  11. Re:Patience! by Greeneland · · Score: 2, Insightful

    All too often people cut and paste blocks of code and then change parts of the pasted blocks and don't look at the rest of it enough to recognize problems.

  12. heed all warnings by fermion · · Score: 3, Insightful
    There are two practical upshots of this that I use in my own code. First, it is best to treat all warnings as bugs. Warnings are an indication that the compiler or programmer can get confused with the code. Neither is a good situation. Code that generates warnings should be rewritten in a more understandable manner. Some would say this stifles creativity. This may be true, but we can't all be James Joyce, and, as much as we may like to read his work, few of us would enjoy wading through such creative code.

    Second, use standard idioms. For some, that may mean learning the standard idioms. These should become second nature. Programmers should express their creativity in the logic, structure, and simplicity of the code, not the non standard grammar. Standard forms allow more accurate coding and easier maintenance.

    --
    "She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
  13. Re:Saw his talk at FSE by CH-BuG · · Score: 2, Insightful

    A second is that it's probably not really in a fit state for sharing as yet (the tool is not the goal of the research, after all).

    This is not a valid reason, I know a lot of people (including me) that would be happy to improve his research code into something useful for the community at large. I remember a paper describing a gcc extension to write semantic checks (for instance, reenable interrupts after disabling them). This program found an amazing number of bugs in the linux kernel. I really wish I could have something like that at hand!

  14. Where there's smoke, there's fire by rufusdufus · · Score: 3, Insightful

    So many people have made silly comments about this being obvious, useless or whatever. This is probably because they did not actually READ the paper.

    The paper is not about obvious code redundancy bugs, it is about subtle errors which are not as simple as just duplicate code. It is about code that *appears* to be executed but actually is not.

    Go take a look at the examples and see how long it takes you to notice the different errors...now imagine have a thousand pages of code to peruse..would you catch it? Many of them probably not.

    The conclusion of the paper is basically, errors cluster around errors; finding a trivial unoptimal syntactical constructions tends to point to real bugs.

    Where there's smoke, there's fire.

  15. Re:I Hope They Didn't Get Paid by Temporal · · Score: 5, Insightful
    Read the paper, will you? Of course good programmers don't write redundant code... at least, not intentionally. So, when you do see redundant code in a program, it is more likely a typo, and if it's a typo, it is likely a bug. So, if you have a program which detects redundant code, it will likely find bugs for you. They wrote such a program. It found hundreds of bugs in the Linux kernel.

    Here's an example they cite from the Linux kernel:
    /* 2.4.1/net/appletalk/aarp.c:aarp_rcv */
    else { /* We need to make a copy of the entry. */
    da.s_node = sa.s_node;
    da.s_net = da.s_net;
    That last line assigns a variable to itself. Do you think that's what the programmer intended? Of course not. It's a bug. But no one caught it. If not for their program, maybe it would never have been caught.

    You think this research is useless? Do you always write bug-free code? Maybe you should run this program on your own code and see what happens.
  16. Dead code by xyote · · Score: 2, Insightful
    Dead code is most likely found in really old code that has been modified many many times. Does really old code that has been modified many many times have lots of bugs? Quite likely.


    Is this dead code going to get removed?


    No.


    Why not?


    Because, one, it's only an opinion that it's dead code. There could be some obscure case that no one imagined that could use it. Two, if some programmer removed it and it turned out that it was needed or the programmer screwed up the removal, the programmer would be blamed and take a lot of grief for it. If it ain't broke, don't fix it.


    Now, it could be that the dead code doesn't work properly for the obscure case. But how could you tell? Do you want to write a test case for code that no one can figure out how it gets invoked?

    1. Re:Dead code by arkanes · · Score: 2, Insightful

      Well, the dead code detected by the program in the article is not an opinion - it's something that is provably unreachable. If you've got well documented functions, it should still be possible to either generate a test case or prove that certain code branches are unreachable. In fact, if you were working on REALLY mission critical stuff, one of the audits is to trace each and every possible branch of execution. Yes, thats an enormous amount of work. However, it's also (largely) possible to automate it.

  17. Confirms that 80/20 works for defects, too... by bryane · · Score: 2, Insightful
    This isn't really new information - there have been studies that show bugs cluster, as well as the intuition most programmers have that "this part is really bad"

    If you look at a CVS repository and identify those files that have high revision numbers, there's a good chance they are full of errors and need to be rewritten.

    One visualization is to color code according to it's age - old code blue, and new code red - then look at the results. You will often see that the red code clusters, and there are huge regions of blue that have been stable for years. You will also see relatively small clusters of differening shades of red, as people need to keep banging on the same problematic code.

  18. Removing dead code... by wowbagger · · Score: 4, Insightful
    I have a saying:
    If a line of code doesn't exists, then it cannot contain a bug.


    Like more aphorisms, you can argue this, but my point is this - every line of code in a program is a potential bug. Every line of code requires a bit more grey matter to process, making your code just that much more difficult to understand, debug, and maintain.

    So I ruthlessly remove dead code. Often, I'll see big blocks like this:


    #ifdef old_way_that_doesnt_work_well
    blah;
    blah;
    blah;
    #endif


    And I will summarily remove them. "But they were there for archival purposes - to show what was going on" some will say. Bullshit! If you want to say what didn't work, describe it in a comment. As for preserving the code itself - that is what CVS is for!

    By stripping the code down to the minimum number of lines, it compiles faster, it checks out of and in to CVS faster, and it is easier to understand and maintain.

    I will often see the following in C++ code:


    void foo_bar(int unused1, int unused2)
    {
    unused1 = unused1; // silence compiler warning
    unused2 = unused2; // silence compiler warning
    }


    And I will recode it thus:

    void foo_bar(int , int )
    {

    }


    That silences the "unused variable" warning, and makes it DAMN clear in the prototype that the function will never use those parameters. (True, you cannot do this in C.)

    Code should be a lean, mean state machine - no excess fat. (NOTE - this does NOT me remove error checking, #assert's, good debugging code, or exception handlers).

  19. Re:Parallel programming 101 by Salamander · · Score: 2, Insightful
    unlike you, this piece of code was probably written by an experienced guy that has actually written code for parallel systems before.

    I suggest you check out Dawson Engler's resume; he has almost certainly done 10x more parallel-systems development than you have. This particular code example might be a bad one, because the analysis that supports the author's conclusion is omitted from the article, but the basic point is still valid: code that contains duplicate condition checks like those in the example is more likely to contain real bugs than less duplicative code, and the "low-hanging fruit" can be identified automatically. It's not hard at all to see how deeper analysis, different rules, or annotation could do a better job of weeding out false duplicates without compromising the tool's ability to flag legitimate areas of concern.

    You're arguing about low-level implementation, when the author was trying to make a point about a high-level principle. That's the hallmark of an insecure junior programmer.

    --
    Slashdot - News for Herds. Stuff that Splatters.