Slashdot Mirror


Coverity Report Finds OSS Bug Density Down Since 2006

eldavojohn writes "In 2008, static analysis company Coverity analyzed security issues in open source applications. Their recent study of 11.5 billion lines of open source code reveal that between 2006 and 2009 static analysis defect density is down in open source. The numbers say that open source defects have dropped from one in 3,333 lines of code to one in 4,000 lines of code. If you enter some basic information, you can get the complimentary report that has more analysis and puts three projects at the top tier in quality of the 280 open source projects: Samba, tor, OpenPAM, and Ruby. While Coverity has developed automated error checking for Linux, their static analysis seems to be indifferent toward open source."

15 of 79 comments (clear)

  1. Three? by Dhar · · Score: 5, Funny

    "...puts three projects at the top tier in quality of the 280 open source projects: Samba, tor, OpenPAM, and Ruby."

    Counting, apparently, was low in quality.

    1. Re:Three? by akadruid · · Score: 5, Funny

      and then you get so-called slashdotphiles, who think they can hear artifacts in the lossy story compression.

      let's see how you fare in a double blind test

      --
      "Those who cast the votes decide nothing; those who count the votes decide everything." (attrib. Joseph Stalin)
    2. Re:Three? by eldavojohn · · Score: 5, Insightful

      TFA says four.

      So, not only are the /. summaries merely paragraphs copied from the article nowadays, they're paragraphs copied incorrectly.

      So if my summary was "merely paragraphs copied from the article" then where did I get the 1 in 3,333 and 1 in 4,000 numbers from?

      Also, if all I did was copy/paste the article, I'd be plagiarizing and -- not only that -- I would have copy/pasted the correct count of the projects in Rung 3 status. Instead I skimmed the report and was thinking "Rung 3" when I wrote that sentence the three was put in instead of the four. Doesn't make me any less wrong but I hate anonymous non-constructive criticism that's modded up. I apologize for my human error, obviously the human editor also missed it. Since you're anonymous, I can't assume you're human and beg you to relate to my plight of errors. I'm sure my error made the summary completely unreadable. I'm also certain that you've published hundreds of articles on Slashdot without so much as a single error in any of them.

      You do know that the number of submissions I've had recently, almost all have had some flaw or error in them. Simply because I realize there's no reward for fact checking. And there's no penalty for getting an error published. So assuming the summary sells to eyeballs and there's no error large enough to get it rejected the next thing is timing. I've written submissions that have been beat out by a few minutes and I get marked "dupe" by firehose. So that pushes me from taking 10-15 minutes to create a summary to 2-3 minutes. Oh well, the worse penalty is if I respond to the article (like this) I'm modded down by righteous moderators. Doesn't really bother me.

      If the editors aren't catching the errors and I've got no incentive to reduce the errors, do you think they're going to go away?

      --
      My work here is dung.
    3. Re:Three? by evanbd · · Score: 2, Insightful

      We're bitching about the slashdot editors, not you. It's their job to catch submitter mistakes. That is what an editor does. The really annoying thing is they're as likely to "edit" the summary to introduce mistakes as to remove them.

    4. Re:Three? by Ihmhi · · Score: 2, Funny

      I have gold-plated Ethernet cables, so my Internets sound nice and crisp. You can really hear the richness in the lower kbps range.

  2. Oblig reference by StuartHankins · · Score: 4, Funny

    "... and puts three projects at the top tier in quality of the 280 open source projects: Samba, tor, OpenPAM, and Ruby."

    Our chief weapon is surprise...surprise and fear...fear and surprise....
    Our two weapons are fear and surprise... and ruthless efficiency....
    Our three weapons are fear, surprise, and ruthless efficiency...
    and an almost fanatical devotion to the Pope....
    Our four... no...
    Amongst our weapons... Amongst our weaponry...
    are such elements as fear, surprise...
    I'll come in again.

  3. Wonder when MS, IBM and others will publish? by MosesJones · · Score: 4, Interesting

    The question of course is "Is 4000 good, average or bad?" can't be answered because closed source companies just aren't going to publish this sort of information.

    So what we can say is that the quality of OSS is trending upwards, but we can't say whether this makes it better, equivalent or worse than close source competitors.

    What are the odds on any of them taking up the challenge?

    --
    An Eye for an Eye will make the whole world blind - Gandhi
    1. Re:Wonder when MS, IBM and others will publish? by Anonymous Coward · · Score: 2, Informative

      Actually the topic is the subject of research and the blog below quotes some book that says Microsoft is at 1/2000 lines of code.
      http://amartester.blogspot.com/2007/04/bugs-per-lines-of-code.html

      Of course, these studies try to assess the number of defects that have not been found yet... So the numbers are to take with a grain of salt, but apparently testing the software before delivery gets 90% of the bugs.

      The Coverity report is likely based on what the tool says, so you need a grain of salt for that too.

      The trend is probably what matters most. This stuff is really about improving your code, finding what's wrong, checking that you are making progress and trying hard enough.

    2. Re:Wonder when MS, IBM and others will publish? by jc42 · · Score: 2, Interesting

      There can be some serious "methodology" problems in many of the definitions of "bugs", that can seriously confuse the bug counters.

      An example that I like to use is a project I worked on in the late 1990s. An important part of the package that I delivered included a directory of several hundred C source files, mostly small, with at least one bug in each. The project's leaders got some chuckles out of mentioning this at meetings, commenting that they had no intention of letting me fix any of the bugs, since they were an important contribution to the project. This produced much confusion among the higher ups, who took some time understanding what was going on and how to account for it.

      Some readers might have guessed what my task was: Building a regression-testing suite for the C compiler. The directory in question was for testing the diagnostics in the compiler. Each source file had one or more carefully designed "bugs". The makefile ran the C compiler on each, and sent the stderr output to a validator that verified that the compiler had successfully identified the bug and produced the right error message.

      We had a bit of fun confusing people by asking them whether these test files really contained "bugs" or not. According to the C standard, they certainly did. But according to the test procedure, these weren't bugs; they were tools for testing the compiler. If they were "fixed", the test scripts would no longer be able to validate the compiler's error messages.

      The higher-ups did finally understand the value of this, and agreed that although this batch of files were full of "bugs", they shouldn't be counted as such in the bug reports.

      I also sometimes listed my job as the project "bugger". It's always fun to construct new words by stripping prefixes off words that usually have them. But I wasn't sure what term was best for the task of making sure that a routine actually contains the bug that the specs say it should have. "Debugging" doesn't seem right when the job is making sure that the right bug is there.

      (Actually, I mostly thought that the project had a minor management problem, since any competent software development manager should understand the value of making sure that the software's error messages are correct and useful. But we all know how rarely this is actually done well. How often does your compiler point to the right place in the code when it produces an error message? And how often does the message describe the actual error?)

      --
      Those who do study history are doomed to stand helplessly by while everyone else repeats it.
  4. Survivorship bias by vlm · · Score: 5, Interesting

    Survivorship bias

    http://en.wikipedia.org/wiki/Survivorship_bias

    The projects that were alive back then, and now, are obviously more mature, thus would have fewer bugs. Unless you believe in spontaneous generation of bugs at a constant rate in unchanged code (in my experience, actually not too unbelievable for old C++ compiled by the newest G++ due to specification drift)

    --
    "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
  5. Re:Umm yeah by Volante3192 · · Score: 3, Insightful

    If they check 1 line of code every second it would take 133,101.85 years to check 11.5 billion lines of code. At 1000 lines of code every second you are looking at 133.10 years to check that much code. At 4000 lines of code every second (e.g. 4GHz) you are looking at 33.2 years to check that much code.

    And if they were only using one system to do this, I'd imagine that would be a problem. I wonder, though, if you spread the processing across, oh, say, 512 processors, if you could get that time down under a month...

  6. Re:Umm yeah by Trepidity · · Score: 2, Informative

    Isn't 4000 lines/code a second 4 kHz, not GHz, if we're using Hz to measure the frequency of line-processing?

  7. Re:Umm yeah by Disgruntled+Goats · · Score: 5, Informative

    At 4000 lines of code every second (e.g. 4GHz) you are looking at 33.2 years to check that much code.

    GHz = 1 billion cycles per second. You're only about 6 orders of magnitude off.

  8. Re:Umm yeah by eldavojohn · · Score: 5, Insightful

    A: We know they didn't check the code by hand.

    Of course not, do you know what static code analysis is? I repeatedly said that in the summary.

    B: The methodology didn't classify defects (cosmetic, seucrity, minor, major. etc.)

    From the report, which is linked to in the article and you obviously didn't care to read before criticizing:

    NULL Pointer Deference
    Resource Leak
    Unintentional Ignored Expressions
    Use Before Test (NULL)
    Use After Free
    Buffer Overflow (statically allocated)
    Unsafe Use of Returned NULL
    Uninitialized Values Read
    Unsafe Use of Returned Negative
    Type and Allocation Size Mismatch
    Buffer Overflow (dynamically allocated)
    Use Before Test (negative)

    They then go on to discuss Function Length and Complexity Metrics.

    C: The numbers aren't normalized nor broken by application size.

    I don't understand how this is statistically relevant. The summary I gave lists by static code defect per line of code and looks at function length. Of course a project with 4 million lines of code would have more defects than one of 4 thousand lines of the code. The lines of code is the normalization!

    D: The use of a bug reporting database needs to be measured in regards to a baseline filing\fix % not a total volume (as we need to correlate new lines of code being added)

    Does it make any difference to the end user whether 90% of the project is new lines of code or 9% of the project is new lines of code?

    It reads like something from the Onion.

    You didn't read the report so you can't really speak.

    Dear Lord journalism is dead...

    Says the poster who didn't read or understand the report.

    --
    My work here is dung.
  9. Re:Fixing issues improves code... by chromatic · · Score: 2, Informative

    If you fix the issues, Coverity moves the project to a new rung and performs stricter analysis to find more types of errors.