Slashdot Mirror


Using Redundancies to Find Errors

gsbarnes writes "Two Stanford researchers (Dawson Engler and Yichen Xie) have written a paper (pdf) showing that seemingly harmless redundant code is frequently a sign of not so harmless errors. Examples of redundant code: assigning a variable to itself, or dead code (code that is never reached). Some of their examples are obvious errors, some of them subtle. All are taken from a version of the Linux kernel (presumably they have already reported the bugs they found). Two interesting lessons: Apparently harmless mistakes often indicate serious troubles, so run lint and pay attention to its output. Also, in addition to its obvious practical uses, Linux provides a huge open codebase useful for researchers investigating questions about software engineering."

74 of 326 comments (clear)

  1. Here's a text link by Anonymous Coward · · Score: 4, Informative

    PDF usually crashes my computer (crappy adobe software). So here's a convenient text link!

    http://216.239.37.100/search?q=cache:yuZKW8CjTqIC: www.stanford.edu/~engler/p401-xie.ps+&hl=en&ie=UTF -8

  2. More details / PostScript version by Amsterdam+Vallon · · Score: 2, Informative

    More details
    Appeared in FSE 2002. Finds funny bugs by looking for redundant operations (dead code, unused assignments, etc.). From empirical measurements, code with such redundant errors is 50-100% more likely to have hard errors. Also describes how to check for redundancies to find holes in specifications.

    Link to PostScript file for easy viewing/printing
    File

    --

    Reply or e-mail; don't vaguely moderate. Ex-O'Reilly/MIT employee, now a full-time Google employee.
  3. Errors like... by Anonymous Coward · · Score: 3, Funny

    This really old Slashdot logo still in use over on Team Slashdot's page on distributed.net.

  4. Patience! by houseofmore · · Score: 5, Funny

    "...dead code (code that is never reached)"

    Perhaps it's just shy!

    1. Re:Patience! by Greeneland · · Score: 2, Insightful

      All too often people cut and paste blocks of code and then change parts of the pasted blocks and don't look at the rest of it enough to recognize problems.

    2. Re:Patience! by heikkile · · Score: 2, Funny
      "...dead code "

      It's not dead - it is resting. It is pining for the fjords.

      --

      In Murphy We Turst

  5. html by farnsworth · · Score: 2, Informative

    html version is here.

    --

    There aint no pancake so thin it doesn't have two sides.

  6. redundant? by 1nv4d3r · · Score: 3, Interesting

    To me 'redundant' implies duplication of something already there. (a=1; a=1;)

    a=a; and dead code aren't so much redundant as they are superfluous. It's still a sign of possible errors, for sure.

    1. Re:redundant? by drzhivago · · Score: 2, Insightful

      Rendundant code would be more akin to something like this:


      if (a==1)
      {
      [some chunk of code]
      }
      else
      {
      [same (or almost exact) chunk of code]
      }


      where the same code block appears multiple times in a file/class/project. By having the same block of code appear multiple times the chance of a user-generated error increases. Easily fixed by moving the repeated code into a parameterized function.

    2. Re:redundant? by Karl+Hungus,+nihilis · · Score: 4, Insightful
      Steve McConnell in the excellent book Code Complete talks about this sort of stuff. One of the big things was unused variables. Completely harmless, but a good indication that the code may have problems. If whoever is maintaining the code didn't bother to remove useless variables, what else are they not bothering to take care of?

      It's not, keep the code squeaky clean because cleanliness is next to godliness, it's keep the code clean so it's easy to read. Keep it clean because it's a discipline that will pay off when it's time to spot/fix the real errors in the code.

  7. How to Avoid Mistakes? Practical Advice? by webword · · Score: 4, Insightful

    Unfortunately, this paper doesn't really offer any practical advice. Is is probably a little useful to very good, or great programmers. However, for new or moderately good programmers, it probably won't be very useful. It is certainly interesting in the academic sense, but I always want to see more practical advice. (I suppose that good practical advice flows down from good theoretical advice.)

    What are some of the best ways to learn to avoid problems? I know that experience is useful. Trial and error is good, mentoring is good, education is good. What else can you think of? What books are useful?

    Also, I wonder about usability problems. In other words, this article mainly hits on the problems of "hidden" code, not the interface. I'd like to see more about how programmers stuff interfaces with more and more useless crap, and how to avoid it. (Part of the answer is usability testing and gathering useful requirements, of course.) What do you think about this? How can we attack errors of omission and commission in interfaces?

    1. Re:How to Avoid Mistakes? Practical Advice? by trance9 · · Score: 4, Insightful


      I think the lesson here is basically that the compiler is your friend. Turn on all the error checking you possibly can in your development environment and pay attention to every last warning.

      If there is something trivial causing a warning in your code--fix it so it doesn't warn, even though it wasn't a "bug". If your compiler output is always pristine, with no warnings, then when a warning shows up if it's a bug you'll notice.

      Kind of common sense if you ask me--but maybe that's just a lesson I learned the hard way.

    2. Re:How to Avoid Mistakes? Practical Advice? by Pseudonym · · Score: 4, Insightful

      I'd put it more strongly than that. Reading between the lines of the paper, don't just fix the warning. Look around the place where the warning happened. You'll most likely find a bug.

      It's also a call for compilers to generate more warnings, which can only be a good thing.

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
    3. Re:How to Avoid Mistakes? Practical Advice? by Glass+of+Water · · Score: 2, Interesting
      I think the lesson is just what they're saying. Applying these tests on a body of code is a good way to find high-level errors, even though the tests just check for low-level mistakes.

      That seems pretty practical to me.

      The issue you raise about interfaces is only tangentially related. There, you get in to the problem (a very real problem) of confusing coding. This paper does not deal with the issue of whether the code is written well from the point of view of other programmers who need to work with it.

      --
      There are no trolls. There are no trees out here.
    4. Re:How to Avoid Mistakes? Practical Advice? by 1nv4d3r · · Score: 3, Insightful

      I'd like to see more about how programmers stuff interfaces with more and more useless crap, and how to avoid it.

      Well, identifying it as useless crap is a good first step.

      And for managers:

      while(economy.isDown())
      if(newInterface.isUselessCrap())
      {
      fireEmployee();
      hireNextTelecomRefugee();
      }

      If you want a serious answer, one reason programs get filled with so much useless crap is because 80% of programmers program so they can collect a paycheck. They don't give a flying fuck if their code is good or not. That was a big eye-opener for me when I first got out of school. I couldn't believe how many people just didn't care.

      If you are interested at all in not muddying the interface, you are most of the way there. Give it some thought, consult with your peers, and try to learn from mistakes.

      Don't be afraid to refactor code every so often, because, schedule or no schedule, new requirements move the 'ideal' design away from what you drew up last month. That's (to my mind) the second largest contributor. Even good coders crumble to cost and schedule, and band-aid code that just plain needs to be rethought. In some environments, that's a fact of life. In others you will have to fight for it, but you can get code rewritten.

    5. Re:How to Avoid Mistakes? Practical Advice? by sbszine · · Score: 5, Insightful

      Don't be afraid to refactor code every so often, because, schedule or no schedule, new requirements move the 'ideal' design away from what you drew up last month. That's (to my mind) the second largest contributor. Even good coders crumble to cost and schedule, and band-aid code that just plain needs to be rethought. In some environments, that's a fact of life. In others you will have to fight for it, but you can get code rewritten.

      In my experience, programming for an employer is the process of secretly introducing quality. This usually consists of debugging and refactoring on the sly while your pointy-haired boss thinks you're adding 'features'.

      Is it just me, or this the way it's done most places?

      --

      Vino, gyno, and techno -Bruce Sterling

    6. Re:How to Avoid Mistakes? Practical Advice? by Ed+Avis · · Score: 2, Funny

      I think it is the software version of 'zero tolerance'. Get rid of beggars and squeegee merchants and you make the more serious crimes (bugs) easier to detect and solve. Or something like that.

      --
      -- Ed Avis ed@membled.com
    7. Re:How to Avoid Mistakes? Practical Advice? by megaralf · · Score: 2, Insightful

      On the contrary. This paper is most practical.
      Just use this mechanism and you can find thousands of errors in an already tested system.
      What impressed me most was something like this.
      if( complex statement );
      do=that;

      Notice the semicolon! This kind of errors are very hard to spot and they can stay in the code forever.
      I will propose to use a code-checker like this in our software to improve the quality.

    8. Re:How to Avoid Mistakes? Practical Advice? by awol · · Score: 2, Interesting

      I have only one piece of advice I give in answer to this kind of question. If you observe something in an application (and this applies to big ones more than little ones, but it is worthwhile for all software) that you cannot explain, it is a problem, almost certianly a bug. But a problem waiting to bite you in the ass nonetheless.

      We spent almost two years with software that was live and on about three occasions during that time we observed a problem that was moderately serious (critical but for the redundancy of the architecture). Now eventually we found the problem, design flaw/bug, but the evidence for the problem was there for all to see a year before the system went live, even in testing. An apparently benign difference in counters between different instances of the same component, that none of us could adequately explain, since they should all have been the same, But all the other counters were identical (these other counters were of a coarser grain and counting different things). If we had found the cause of the difference at the time we would have found the flaw at the time (guaranteed, since it was obvious once one knew the source of the deviation).

      That one is the best example that I can think of to demonstrate the "If you can't explain it, it's broken" aspect of software behaviour

      --
      "The first thing to do when you find yourself in a hole is stop digging."
    9. Re:How to Avoid Mistakes? Practical Advice? by blibbleblobble · · Score: 2, Insightful

      "Unfortunately, this paper doesn't really offer any practical advice."

      It looks like it's intended for the people who program GCC, perl, kaffe, etc., as they can use this information to build better checking into their respective compilers, rather than for programmers.

    10. Re:How to Avoid Mistakes? Practical Advice? by ConceptJunkie · · Score: 4, Interesting

      Sometimes it's worse. I quit a job after 15 months, because I was constantly getting in trouble for trying to fix the cause of the problems rather than the symptoms.

      I was working on a 300,000-line Windows application, and I am not exaggerating here, it was about 5/6 redundant code. 100-line functions would be copy-and-pasted a dozen times and two lines changed in each. Plus, there were numerous executables to this project and often the same code (with minor variations of course) would exist in different executables.

      It was originally written in Borland C++ and the back-end was ported to Visual C++, but all the utility and support functions still existed in _both_ the Borland code and Microsoft code. Worse, they were not identical. Even worse, there was substantial use of STL, which doesn't work the same in Borland C++ (which is an ancient version... circa 1996) and Visual C++.

      That and the fact that using strcpy would have been a step up in maintaining buffer integrity, usually they just copied buffers and if one #define was different from a dozen others in completely different places, memory would be toast and we all know how that manifests.

      Worse, there was UI code written by someone who completely confused parent windows and base classes, such that the child window classes had all the data elements for the parent window, because they were derived from the parent window class!

      I spent an entire week once reviewing every single memcpy, etc, in the entire codebase (which was spaghetti-code in the extreme) just to eliminate all the buffer overruns I was discovering. THe program used a database with about 100 tables (a nightmare of redundancy in itself) and there was a several thousand-line include file of all the field sizes, maintained by hand (with plenty of typos). Eventually, I wrote a util to generate that include file automatically, which of course wasn't appreciated.

      I was trying to overcome these difficulties while being barraged with support calls, sometimes critical because this was software to manage access control systems for buildings, meaning I spent 80% of my time fighting fires. You know, situations like: "Our system is down... you need to fix this bug because we've had to hire guards for each door until we can get it back up again."
      There was only one other person working with me, and he quit in disgust and was not replaced for about 3 months.

      Finally, after stuggling mightily (in time, effort and at the expense of emotional well-being) to overcome the sheer incompetence put into this project (parts of which were 10 years old), I finally gave notice after it looked like my unscrupulous boss (who wrote a lot of this code) was doing everything he could to make it look like my fault to the client (even though they knew better and really liked working with me, precisely because I was not a BS-artist)... and after 15 years over never needing more than two weeks to find good work, I have been unemployed since May 2002.

      There's a moral here, but it escapes me.

      --
      You are in a maze of twisty little passages, all alike.
  8. Good Design = Tight Code by chewtoy-11 · · Score: 5, Insightful

    Writing repetitive code only once offers the same benefits as using Cascading Style Sheets for your webpages. If there is a serious error, you only have to track it down in the one place where it exists versus every single place you re-wrote the code. Also, it makes adding features much simpler as well. I'm an old school procedural programmer that is making the rocky transition to OOP programming. THIS is where it starts coming together...

    --
    C. Griffin
    "Can I keep his head for a souvenir?" --Max from Sam 'N Max Freelance Police
    1. Re:Good Design = Tight Code by shortscruffydave · · Score: 2, Insightful

      I'm an old school procedural programmer that is making the rocky transition to OOP programming

      I agree whole-heartedly with what you say about code re-use. However I wouldn't see this as being a feature solely of OOP. Get the design right, and you can have some equally tight, highly-reusable procedural code.

    2. Re:Good Design = Tight Code by ArthurDent · · Score: 3, Insightful

      While OOP can be one method of solving repetitive code, good design is always the best way to solve it. What I've found is *any* time you're tempted to use the cut and paste functions within code, you need to ask yourself: Is there a common function that I can factor out to make only one opportunity for errors rather than two?

      You'll have more functions and the code might be a little harder to follow for the unfamiliar, but it will be much easier to debug if there is only one function that does a particular task.

      Ben

  9. And their other findings ... by kruetz · · Score: 5, Funny

    They also found that:

    Russian errors cause code
    Incorrect code causes errors
    Missing code causes errors
    Untested code causes errors
    Redundant codec causes redundancies
    Driver code causes headaches
    C code causes buffer overflows
    Java code causes exceptions
    Perl code causes illiteracy
    Solaris code causes rashes
    Novell code causes panic attacks
    Slashdot code causes multiple reposts
    Slashdot articles cause poor-quality posts
    Microsoft code causes exploits
    Apple code causes user cults
    Uncommented code causes code rage
    RIAA code causes computers to stop functioning
    (Poor idea causes long, desperate post)

    --

    This sig intentionally left bla... dammit!
    Who's got the whiteout?
  10. lint is horrible by Anonymous+Hack · · Score: 3, Insightful

    It really is. It's a redundant holdover from ye old BSD versions. Granted, there are one or two times i've used it when -Wall -pedantic -Werror -Wfor-fuck's-sake-find-my-bug-already doesn't work, but a lot of the time it comes up with a LOT of complaints that are really unnecessary. Am i really going to have to step through tens of thousands of lines of code castind the return of every void function to (void)? Come on.


    --
    I got a sig so you would remember me.
    1. Re:lint is horrible by RhettLivingston · · Score: 4, Insightful

      I enforced a policy of eliminating all Lint info messages on a 1.5 million line, from scratch project. And, I do mean from scratch. we wrote our own operating system, ANSI c Library, and drivers and ran it on hardware that we designed and produced. In the first 2 years of deployment, only five bugs were reported. Lint was only part of the reason, but it was a total part.

    2. Re:lint is horrible by Anonymous+Hack · · Score: 3, Insightful

      Lint? Lint compains if you call a function that returns an int and you ignore the int. This is particularly irritating in the case of strcpy() and similar functions where you would normally do:

      strcpy(buf, "hello");

      except you're supposed to do:

      (void) strcpy(buf, "hello");

      or...

      buf = strcpy(buf, "hello");

      And that's just the beginning...

      --
      I got a sig so you would remember me.
    3. Re:lint is horrible by Anonymous+Hack · · Score: 2, Insightful

      Yeah, you're right :-) But it's a bitch, ya know? Explicitly casting 90% of API functions to void, or checking for errors on printf() which 99.999999% of the time works, unless you're running on some weird embedded platform that doesn't have stdout. In my current project checking API returns would be nasty... It may only be a few processor cycles, but having the compiled code do lots of JNZ instructions that most of the time will never be true could mean the difference between waiting 20 seconds and 30 seconds for the program to complete its task... and i think i'd rather have the performance.

      --
      I got a sig so you would remember me.
    4. Re:lint is horrible by Ed+Avis · · Score: 2, Informative

      Check out Splint (formerly LCLint). Whereas traditional lint is less needed now that compilers have -W switches, splint has a whole bunch of extra stuff which gcc won't warn about. The only trouble with it is that by default, it wants you to add annotations to your program to help it be checked (for example if a function parameter is a pointerm you can annotate whether it is allowed to be null). If you go along with that then splint can give lots of help in finding places where null pointer dereferences could happen, and other bugs. But even if you don't want to annotate and you use the less strict checking it's still a handy tool. (OK, maybe C99 has some of this stuff too, but splint has more.)

      --
      -- Ed Avis ed@membled.com
    5. Re:lint is horrible by pmz · · Score: 3, Insightful

      lint is horrible

      No, it isn't. I took a legacy application that I began maintaining and used lint to eliminate hundreds of lines of code and several real never-before-detected bugs. It also encouraged me to remove dozens of implicit declarations and redundant "extern" statements in favor of real header files. The application really is better for it, and to do this work without lint would have been very very tedious. Granted, my experience is with Sun's compiler's lint, so I can't say whether other implementations are as good.

      ...it comes up with a LOT of complaints that are really unnecessary.

      Actually, all of lint's complaints are about a potential problem. You just have to decide what is worth the time to fix.

      Using lint is a deliberate process that should take several days or weeks for a large application (on the first time through). After that initial investment, using lint is still an important part of the ongoing health of the program, but it should become less and less of an effort each time.

  11. Most prevalent source of redundant code.... by 1nv4d3r · · Score: 2, Funny

    Three letters: NIH.

    Now, if you'll excuse me, I've got to get back to my text editor project.

  12. Finding errors in your code by Amsterdam+Vallon · · Score: 3, Funny

    Isn't this the job of that smart dude down the hall who runs Lunix computers and reads some Slash Period website or something?

    Well, at least that's how I finish all my projects.

    --

    Reply or e-mail; don't vaguely moderate. Ex-O'Reilly/MIT employee, now a full-time Google employee.
  13. A good editor... by yintercept · · Score: 3, Funny

    A good editor could easily cut that article in half without loss of any information.

  14. Intentional redundant code by 1nv4d3r · · Score: 5, Funny
    I've seen this (they were fired the next month):

    // and now to boost my LOC/Day performance...
    x += 0;
    x += 0;
    x += 0;
    x += 0;
    x += 0;
    x += 0;

    It actually caused a bug 'cuz they accidentally left the '+' off one of the lines. What an idiot.

    1. Re:Intentional redundant code by gUmbi · · Score: 4, Insightful

      I've seen this (they were fired the next month): // and now to boost my LOC/Day performance...


      Was the manager asking for lines of code/day fired too?

    2. Re:Intentional redundant code by Anonymous+Hack · · Score: 4, Insightful

      Ugh, counting LOC sucks. I find my most productive days are the ones i REMOVE lines of code, not add them. If i get a loop running tighter and faster, if i remove stuff that i could do better another way... that's what i'm paid to do.

      --
      I got a sig so you would remember me.
    3. Re:Intentional redundant code by 1nv4d3r · · Score: 2

      I think he just wanted to look like he was busy. We weren't judged by LOC/day, but we did take regular metrics. I guess he thought if the manager saw 400 new lines that week, the assumption would be that he was doing more work than he was.

  15. Re:I have no idea what this article means ! by The+Bungi · · Score: 4, Insightful
    In any large system there's bound to be some amount of redundant code that can sometimes cause subtle errors, like slow memory leaks. These conditions develop over the lifetime of the code. Analyzing the code *as a whole* provides information about these types of situations and how to fix them, which more often than not is not trivial.

    "Dead" or unreachable code is almost always caused by patches or fixes to an existing codebase and it's always good to detect and get rid of it because it may point to other problems in the application (in my experience), or is simply dead wood that should be removed.

  16. Redundancy? by kubrick · · Score: 2, Funny

    Ummm... surely if any story should be duped in the near future, it's this one. Please submit story suggestions accordingly.

    --
    deus does not exist but if he does
  17. Saw his talk at FSE by owenomalley · · Score: 5, Interesting

    I saw Dawson's talk at FSE (Foundations of Software Engineering). He uses static flow analysis to find problems in the code (like an advanced form of pclint). The most interesting part of his tool is in the ranking of the problem reports. He has developed a couple of heuristics that sort the problems by order of importance and they supposedly do a very good job. Static analysis tools find most of their problems in rarely run code, such as error handlers. Such problems are problematic and sometimes lead to non-deterministic problems, which are extremely hard to find with standard testing and debugging. (This is especially true, when the program under consideration is a kernel.) Dawson also verifies configurations of the kernel that no one would compile, because he tries to get as many possible drivers at the same time as he can. The more code, the better the consistency checks do at finding problems.

    By making assumptions about the program and checking the consistency of the program, his tool finds lots of problems. For instance, assume there is a function named foo that takes a pointer argument. His tool will notice how many of the callers of foo treat the parameter as freed versus how many treat the parameter as unfreed. The bigger the ratio, the more likely the 'bad' callers are to represent a bug. It doesn't really matter which view is correct. If the programmer is treating the parameter inconsistently, it is very likely a bug.

    He also mentioned that counter to his expectations, the most useful part of his tool was to find 'local' bugs. By local, I mean bugs that are local to a single procedure. They are both easier for the tool to find, more likely to actually be bugs, and much easier for the programmer to verify if they are in fact bugs.

    He analyzed a couple of the 2.2.x and 2.4.x versions of the kernel and found hundreds of bugs. Some of them were fixed promptly. Others were fixed slowly. Some were fixed by removing the code (almost always a device driver) from the kernel. Others he couldn't find anyone that cared about the bug enough to fix it. He was surprised at the amount of abandonware in the Linux kernel.
    It is extremely frustrating that Dawson won't release his tool to other researchers (or even better to the open source community at large). Without letting other people run his tool (or even better modify it), his research ultimately does little good other than finding bugs in linux device drivers. *heavy sigh* Oh well, eventually someone WILL reimplement this stuff and release it to the world.

    On a snide comment, if he was a company he would no doubt have been bought by Microsoft already. Intrinsa was doing some interesting stuff with static analysis and now after they were bought a couple of years ago, their tool is only available inside of Microsoft. *sigh*

    1. Re:Saw his talk at FSE by RodgerDodger · · Score: 2, Interesting

      It is extremely frustrating that Dawson won't release his tool to other researchers (or even better to the open source community at large).

      There's probably a bunch of reasons why he hasn't done this. The most likely one is that he's using it as a research tool, and he doesn't want someone else to beat him to the punch in his research. A second is that it's probably not really in a fit state for sharing as yet (the tool is not the goal of the research, after all).

      He's got a bunch of papers up describing how the tool works, so it can be reimplemented. Also, if he's like most academics, he'll probably talk your ear off if you ask him how it works. :)

      --
      "Software is too expensive to build cheaply"
    2. Re:Saw his talk at FSE by CH-BuG · · Score: 2, Insightful

      A second is that it's probably not really in a fit state for sharing as yet (the tool is not the goal of the research, after all).

      This is not a valid reason, I know a lot of people (including me) that would be happy to improve his research code into something useful for the community at large. I remember a paper describing a gcc extension to write semantic checks (for instance, reenable interrupts after disabling them). This program found an amazing number of bugs in the linux kernel. I really wish I could have something like that at hand!

  18. Re:I have no idea what this article means ! by RollingThunder · · Score: 2, Insightful

    And moreover, in the areas where you find these mechanically detectable bugs, the likelihood of more subtle bugs is higher, as these errors are made when the programmer isn't really thinking about what he's doing. It's like triaging your wounded code, to fix the worst first.

  19. Re:I have no idea what this article means ! by Anonymous Coward · · Score: 3, Informative

    Well let me summerize it for you:

    That paper explored the hypothesis that redundancies, like type erros, flag higher-level correctness mistakes. They evaluated the approach using four checkers which they applied to the Linux operating system. These simple analyses found many superising (to them) error types. Further, those errors correlated well with known hard errors: redundancies seemed to flag confused or poor programmers who were prone to other error types. According to them, these indicators could be used to decide where to audit a system.

  20. If unnecessary code is "redundant" by RhettLivingston · · Score: 2, Interesting

    I suppose they've hit my pet peeve. I've seen many simple problems turned into hideous monstrosities with many bugs by people trying to handle bugs that can't ever happen and imaginary special cases because they were never taught how to abstract a function. Perhaps it can't be taught. In 20+ years of programming, its been a very rare time when I've picked up code and not been able to cut out large chunks without replacing them.

  21. Re:Is this not Obvious by gmack · · Score: 5, Informative

    If only the story poster had actually read the paper.

    They used a custom checker that finds these things much more effiectivly than lint.

    I actually remember the flood of bug reports and kernel patches that toy of theirs generated the first few months they put it to use on the kernel.

  22. "redundancies ... correlate with ... errors" by voodoo1man · · Score: 2, Interesting
    At the risk of being modded redundant (hah!) I have to point out that a correlation between sloppy coding and errors does, in fact, exist. Many of us who write software have suspected this for a long time, and it is good to know that our hypothesis is supported by concrete research from the academic community, who seem to have finally proven that "redundancies seemed to flag confused or poor programmers who were prone to other error types."

    Hopefully, we can expect much more of such valuable breakthroughs from the academic community in the future, complete with papers full of badly formatted C code!

    --

    In the great CONS chain of life, you can either be the CAR or be in the CDR.

  23. Re:Using redundant code to find errors by nebbian · · Score: 2, Funny

    I was going to moderate this but there's no +1 Redundant.

  24. vi! by gregstoll · · Score: 4, Funny

    vi could do it!

  25. lclint pointer incorrect... by Samrobb · · Score: 2, Informative

    For the record, it's been moved...

    Larch FTP Site
    January 28, 1999

    Many files formerly on this site were moved elsewhere after a disk
    crash in March, 1998.

    The LCLint distribution can be found at
    ftp://ftp.sds.lcs.mit.edu/pub/lclint
    or http://www.sds.lcs.mit.edu/lclint

    --
    "Great men are not always wise: neither do the aged understand judgement." Job 32:9
  26. Linux for research by Champaign · · Score: 2, Informative

    Using Linux for academic research is hardly a new idea. In my group alone one of the profs has been publishing papers and giving talks about research using Linux since 2000.

    An example of such is http://plg.uwaterloo.ca/~migod/papers/evolution.pd f - about the evolution of Linux

  27. heed all warnings by fermion · · Score: 3, Insightful
    There are two practical upshots of this that I use in my own code. First, it is best to treat all warnings as bugs. Warnings are an indication that the compiler or programmer can get confused with the code. Neither is a good situation. Code that generates warnings should be rewritten in a more understandable manner. Some would say this stifles creativity. This may be true, but we can't all be James Joyce, and, as much as we may like to read his work, few of us would enjoy wading through such creative code.

    Second, use standard idioms. For some, that may mean learning the standard idioms. These should become second nature. Programmers should express their creativity in the logic, structure, and simplicity of the code, not the non standard grammar. Standard forms allow more accurate coding and easier maintenance.

    --
    "She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
  28. Re:Analysis "by file" vs "by function"? by smallfries · · Score: 2, Insightful

    How does this get marked up as +3 insightful? Did you read the paper?

    They are using analysis techniques to locate bugs at specific points within the parse-tree.Hence, they are locating bugs within specific functions rather than just files. As all of their examples showed. Sure, its a nice point. But it is what they are doing.

    --
    Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
  29. Smatch by Error27 · · Score: 3, Interesting

    The Stanford Checker is great. I was blown away when I read their papers last year. Their checker is not released yet, so I wrote a similar checker (smatch.sf.net) based on their publications.

    The poster mentions Lint, but I did not have any success using Lint on the kernel sources. The source is too unusual.

    Also Lint does not look for application specific bugs. For example, in the Linux kernel you are not supposed to call copy_to_user() with spinlocks held. It took me about 10 minutes to modify one my existing scripts to check for that in the Linux kernel last Monday. It found 16 errors. (It should have found more but I was being lazy.)

    A lot of the time, you can't tell what will be a common error until you try looking for it. One funny thing was that there were many places where people had code after a return statement. On the other hand, I didn't find even one place where '=' and '==' were backwards.

    It's fascinating stuff playing around with this stuff. I have been learning a lot about the kernel through playing around with Smatch.

    1. Re:Smatch by JohnFluxx · · Score: 3, Interesting

      Did you get the bugs that you found fixed?
      It wasn't clear if you submitted it.

      Btw, I'm mulling with the idea of writing a write-time checker that would do a lot of this sort of stuff but as you are coding. That way your favourite editor can underline errors (like a spell checker does).
      One of the things I was most interested in doing was number-ranges. Basically if you have a for-loop that loops x between 0 and say 10, then inside that for-loop you know x=[0-10]. Then you can check if you access an array outside of those bounds.
      Do you have any idea how useful this would be? Or any ideas if it has been done, or anything?

      It is an area that really interests me, but that I have no knowledge about :)

      JohnFlux

  30. Comparison to Redundant and Unused DNA Code? by mrflip · · Score: 3, Interesting
    This made me immediately think of the 'redundant/unused code' conundrum in biology (Sorting through DNA). Surprisingly little of the DNA string seems to be 'active' code, where active means 'codes for a gene.' Most of it is either has no use, or uses that are not clear to us now. One pedestrian use for this 'dead' code is simply to separate the genes logically and spatially; this reduces the probability of simultaneous defects in multiple genes.

    DNA code also has high redundancy, which allows error-correcting transcription and other hacks ( see Parity Code And DNA or DNA's Error Detecting Code)

    In both cases factors yielding robust DNA code are found to indicate bad digital computer code.

    flip

    (background: Ars Technica's Computational DNA primer

  31. Where there's smoke, there's fire by rufusdufus · · Score: 3, Insightful

    So many people have made silly comments about this being obvious, useless or whatever. This is probably because they did not actually READ the paper.

    The paper is not about obvious code redundancy bugs, it is about subtle errors which are not as simple as just duplicate code. It is about code that *appears* to be executed but actually is not.

    Go take a look at the examples and see how long it takes you to notice the different errors...now imagine have a thousand pages of code to peruse..would you catch it? Many of them probably not.

    The conclusion of the paper is basically, errors cluster around errors; finding a trivial unoptimal syntactical constructions tends to point to real bugs.

    Where there's smoke, there's fire.

  32. Even the smallest program... by OttoM · · Score: 2, Funny
    This is widely known:

    1. Every programs contains bugs.
    2. Every program contains redundancies and so can be made smaller without changing behavior.
    Therefore the empty program is redundant but still buggy.
  33. Related research at Berkeley: CQual by sailesh · · Score: 2, Informative

    CQual

    It's been used to find security holes.

  34. Parallel programming 101 by Skuto · · Score: 3, Interesting

    They made fools out of themselves with this one:

    if (!cam || !cam->ops)
    return -ENODEV; /* make this _really_ smp-safe */

    if (down_interruptible(&cam->busy_lock))
    return -EINTR;

    if (!cam || !cam->ops)
    return -ENODEV;

    Their comment: 'We believe this could be indication of a novice programmer...blabla...shows poor grasp of the code'.

    BZZZZZZZZZT

    Nice try kids, but unlike you, this piece of code was probably written by an experienced guy that has actually written code for parallel systems before. Since it's tricky, you would be excused if not for the 'novice programmer' comment above and the fact that the code itself says it's there for SMP safety.

    Here's a hint: UNTIL you acquire the lock on 'cam', any other process can change the value, including at the point BETWEEN the first check and the acquisation of the lock.

    --
    GCP

    1. Re:Parallel programming 101 by idletask · · Score: 2, Informative

      Gee, reread yourself... Sorry but they're 100% right. If they could acquire the lock, cam and cam->ops cannot be NULL due to the first check.

      BTW, in 2.4.18, the code now looks like this for this same function:

      struct cam_data *cam = dev->priv; int retval; if (!cam || !cam->ops) return -ENODEV; DBG("cpia_mmap: %ld\n", size); if (size > FRAME_NUM*CPIA_MAX_FRAME_SIZE) return -EINVAL; /* REDUNDANT! */ if (!cam || !cam->ops) return -ENODEV; /* make this _really_ smp-safe */ if (down_interruptible(&cam->busy_lock)) return -EINTR;

    2. Re:Parallel programming 101 by Salamander · · Score: 2, Insightful
      unlike you, this piece of code was probably written by an experienced guy that has actually written code for parallel systems before.

      I suggest you check out Dawson Engler's resume; he has almost certainly done 10x more parallel-systems development than you have. This particular code example might be a bad one, because the analysis that supports the author's conclusion is omitted from the article, but the basic point is still valid: code that contains duplicate condition checks like those in the example is more likely to contain real bugs than less duplicative code, and the "low-hanging fruit" can be identified automatically. It's not hard at all to see how deeper analysis, different rules, or annotation could do a better job of weeding out false duplicates without compromising the tool's ability to flag legitimate areas of concern.

      You're arguing about low-level implementation, when the author was trying to make a point about a high-level principle. That's the hallmark of an insecure junior programmer.

      --
      Slashdot - News for Herds. Stuff that Splatters.
    3. Re:Parallel programming 101 by Skuto · · Score: 2, Informative

      >Uhm, if cam was NULL after the first check,
      >wouldn't cam->busy_lock cause a segfault

      I don't feel bad about being flamed by people that missed my point, but I do feel bad about this. You are 100% right - the way I read the code can't possibly be correct.

      --
      GCP

  35. Re:I Hope They Didn't Get Paid by Temporal · · Score: 5, Insightful
    Read the paper, will you? Of course good programmers don't write redundant code... at least, not intentionally. So, when you do see redundant code in a program, it is more likely a typo, and if it's a typo, it is likely a bug. So, if you have a program which detects redundant code, it will likely find bugs for you. They wrote such a program. It found hundreds of bugs in the Linux kernel.

    Here's an example they cite from the Linux kernel:
    /* 2.4.1/net/appletalk/aarp.c:aarp_rcv */
    else { /* We need to make a copy of the entry. */
    da.s_node = sa.s_node;
    da.s_net = da.s_net;
    That last line assigns a variable to itself. Do you think that's what the programmer intended? Of course not. It's a bug. But no one caught it. If not for their program, maybe it would never have been caught.

    You think this research is useless? Do you always write bug-free code? Maybe you should run this program on your own code and see what happens.
  36. Dead code by xyote · · Score: 2, Insightful
    Dead code is most likely found in really old code that has been modified many many times. Does really old code that has been modified many many times have lots of bugs? Quite likely.


    Is this dead code going to get removed?


    No.


    Why not?


    Because, one, it's only an opinion that it's dead code. There could be some obscure case that no one imagined that could use it. Two, if some programmer removed it and it turned out that it was needed or the programmer screwed up the removal, the programmer would be blamed and take a lot of grief for it. If it ain't broke, don't fix it.


    Now, it could be that the dead code doesn't work properly for the obscure case. But how could you tell? Do you want to write a test case for code that no one can figure out how it gets invoked?

    1. Re:Dead code by Squeak · · Score: 2, Interesting

      On a project I worked on recently one file contained a comment reading something like: // The below code is unnecessary but the program // does not work if it is removed. // With the code commented out everything is fine.

      The group of people involved in that area of code were also masters of redundancy and inefficiency. Their code could often be rewritten and shortened to 20% of its original length. Not BY 20%, TO 20%.

      --
      This sig is a figment of your imagination.
    2. Re:Dead code by arkanes · · Score: 2, Insightful

      Well, the dead code detected by the program in the article is not an opinion - it's something that is provably unreachable. If you've got well documented functions, it should still be possible to either generate a test case or prove that certain code branches are unreachable. In fact, if you were working on REALLY mission critical stuff, one of the audits is to trace each and every possible branch of execution. Yes, thats an enormous amount of work. However, it's also (largely) possible to automate it.

  37. Confirms that 80/20 works for defects, too... by bryane · · Score: 2, Insightful
    This isn't really new information - there have been studies that show bugs cluster, as well as the intuition most programmers have that "this part is really bad"

    If you look at a CVS repository and identify those files that have high revision numbers, there's a good chance they are full of errors and need to be rewritten.

    One visualization is to color code according to it's age - old code blue, and new code red - then look at the results. You will often see that the red code clusters, and there are huge regions of blue that have been stable for years. You will also see relatively small clusters of differening shades of red, as people need to keep banging on the same problematic code.

  38. And for the java programmers ... by Paul+Lamere · · Score: 2, Interesting

    Try this: pmd

  39. Removing dead code... by wowbagger · · Score: 4, Insightful
    I have a saying:
    If a line of code doesn't exists, then it cannot contain a bug.


    Like more aphorisms, you can argue this, but my point is this - every line of code in a program is a potential bug. Every line of code requires a bit more grey matter to process, making your code just that much more difficult to understand, debug, and maintain.

    So I ruthlessly remove dead code. Often, I'll see big blocks like this:


    #ifdef old_way_that_doesnt_work_well
    blah;
    blah;
    blah;
    #endif


    And I will summarily remove them. "But they were there for archival purposes - to show what was going on" some will say. Bullshit! If you want to say what didn't work, describe it in a comment. As for preserving the code itself - that is what CVS is for!

    By stripping the code down to the minimum number of lines, it compiles faster, it checks out of and in to CVS faster, and it is easier to understand and maintain.

    I will often see the following in C++ code:


    void foo_bar(int unused1, int unused2)
    {
    unused1 = unused1; // silence compiler warning
    unused2 = unused2; // silence compiler warning
    }


    And I will recode it thus:

    void foo_bar(int , int )
    {

    }


    That silences the "unused variable" warning, and makes it DAMN clear in the prototype that the function will never use those parameters. (True, you cannot do this in C.)

    Code should be a lean, mean state machine - no excess fat. (NOTE - this does NOT me remove error checking, #assert's, good debugging code, or exception handlers).

  40. Re:I have no idea what this article means ! by fitten · · Score: 3, Interesting

    Yes, and to the point, critical systems software tests have use code coverage as one of the measures of being 'fit' to use. Dead code cannot be executed and thus falls into the 'untouched code' category. If the software package has much of this untouched/untouchable code, it can't be used in a critical system. For example (going from memory here), Motif couldn't be certified for digital graphical primary instrument display because of this (I think that even after exhaustive testing, it only hit the 66% coverage mark, IIRC) on the Boeing 777. It could therefore only be used for an alternate display and the primary displays had to be implemented using something else.

    The problem is that if the code can't be tested, it cannot be trusted and if *somehow* that code got executed while operational, the results could be "bad".

  41. You can find duplicated Java code... by tcopeland · · Score: 2, Informative

    ...using this thing here:

    http://pmd.sourceforge.net/cpd.html

    CPD uses a variant on Greedy String Tiling to find duplicated code in Java programs. There's also a JavaSpaces version since finding duplicated code is fairly parallelizable....

    Yours,

    Tom

  42. What papers have you published? by pclminion · · Score: 3, Interesting
    You're one of those guys who thinks that anyone who doesn't grasp precisely all the different technical fields you do, must be an fool.

    These researchers obviously have a good hold on compiler technology, since they implemented their checkers with xgcc. They also seem to understand logic quite well, since their code uses and extends on gcc's control-flow analysis algorithms. And they do, actually, understand what's going on here.

    As for your particular example, the check really is redundant, but it was almost definitely intentional. It's true that another processor could change the cam variable between the first check and the lock -- but taking the first check out would have no impact on the functionality or correctness of the code. It's just a performance enhancement so that the routine can exit early in the error case, without the overhead of locking the lock. Removing the bit of redundant code would just add a little overhead to the error case.

    In short, their checker found a true redundancy. They may have not realized its purpose since they don't have specific experience with this kind of parallel programming, but it's a redundancy. If you had actually read the paper instead of merely glancing over it, you would have seen that their checker respects the volatile nature of variables declared as such -- the checker is fully aware that a second thread can change the value between one operation and the other -- and it still figures out that the check is redundant.

    Here's a hint: don't go around claiming people are fools unless you've got some evidence. These guys had hundreds and hundreds of bugs to go through, and expecting them to perfectly analyze every last one of them is unfair.

    Oh, and -10 points for using "BZZZZZZT".