Slashdot Mirror


Software Code Quality Of Apache Analyzed

fruey writes "Following Reasoning's February analysis of the Linux TCP/IP stack (putting it ahead of many commercial implementations for it's low error density), they recently pitted Apache 2.1 source code against commercial web server offerings, although they don't say which. Apparently, Apache is close, but no cigar..."

22 of 442 comments (clear)

  1. So if they found them... by Marx_Mrvelous · · Score: 5, Funny

    Why don't they fix them? It seems almost paradoxical, if you find .53 errors per thousands lines of code and fix them, then you'll have 0 errors. But since we can only fix errors we can detect, we only detect errors we can fix. Ok, it's too early on a Monday morning...

    --

    Moderation: Put your hand inside the puppet head!
    1. Re:So if they found them... by Jeremy+Erwin · · Score: 5, Informative
      If you download the defect report (available from here*, it will explain exactly where the bugs are.
      For instance, the first bug is

      DEFECT CLASS: Null Pointer Dereference DEFECT ID 1
      LOCATION: httpd-2.1/modules/aaa/mod_auth_basic.c :291
      DESCRIPTION The local pointer variable current_provider, declared on line 235, and assigned on line 257, may be NULL where it is dereferenced on line 291.
      PRECONDITIONS The conditional expression (res) on line 253 evaluates to false AND
      The conditional expression (!current_provider) on line 264 evaluates to true AND
      The conditional expression (!provider || !provider->check_password) on line 268
      evaluates to false AND
      The conditional expression (auth_result != AUTH_USER_NOT_FOUND) on line
      282 evaluates to false AND
      The conditional expression (!conf->providers) on line 287 evaluates to false.


      Each bug report is followed by the snippet of source code containing the defect.

      The metric report simply reports the statistics. For instance, the most bug ridden file is otherchild.c. The most common bug class is "dereferencing a NULL pointer".

      If the Apache developers simply want to fix the bugs, they can use the Defect Report. If they want conduct a brutal purge of their contributors, they can use the Metric report.

      *Yes, Reasoning wants an email address. They will mail you a URL (a rather simple one at that) to access the reports.
    2. Re:So if they found them... by tomstdenis · · Score: 5, Interesting

      Agreed. Things like splint often report "warnings" on code that shouldn't be. For instance

      int some_func(char *somebuf)
      {
      if (somebuf == NULL) return ERROR;
      somebuf[0] = 'a';
      return OK;
      }

      Will generate a warning with splint saying "pointer may be null" despite the fact it cannot be.

      Those tools are generally too sensitive and give too many false positives to be useful in the long run.

      Tom

      --
      Someday, I'll have a real sig.
    3. Re:So if they found them... by Anonymous Coward · · Score: 5, Insightful

      The funny thing is that this "bug" doesn't appear to actually be one...

      Note that current_provider is set to conf->providers on line 257. The loop starts and neither current_provider or conf->providers change. Then on line 287 there's a conditional break if conf->providers is NULL.

      If current_provider is going to be NULL at line 291, then conf->providers must be as well, so the conditional break will happen and the NULL dereference will be skipped.

      Or am I missing something else?

  2. apache 2.1? by fishynet · · Score: 5, Interesting

    2.1 is'nt even out yet! the latest is 2.0.46!

    --

    Cats: All your base are belong to us.
    Captain: Take off every sig !!
  3. It's not fair! by jpmahala · · Score: 5, Funny

    Just because Open-Source coders can't spell when they insert comments doesn't mean that they can't write good code!

  4. Defect? by Jason_says · · Score: 5, Interesting

    Reasoning found 31 software defects in 58,944 lines of source code of the Apache http server V2.1 code.

    so what are the calling a defect?

    1. Re:Defect? by richie2000 · · Score: 5, Informative
      From the report:
      NULL Pointer Dereference (Expression dereferences a NULL pointer) 29 instances
      Uninitialized Variable (Variable is not initialized prior to use) 2 instances

      They also list the files and code snippets where the errors were found.

      In addition, the comparison is made against an industry average of commercial code they have tested this way, NOT against other webservers.

      --
      Money for nothing, pix for free
  5. their own code? by Jearil · · Score: 5, Funny

    Why does it seem a bit odd to be testing software quality with other software? I wonder if they ran their own software through its own program, but then that gets kinda weird when a program starts noticing errors about itself... maybe it'd get depressed and start ranting at the creator on how they should have taken better care of it... ok, I need more sleep

  6. more to it than # flaws-per-unit-"whatever" by Asprin · · Score: 5, Insightful


    What bothers me about these articles is that there is more to software quality than the # of flaws-per-unit-"whatever".

    Like design.

    It seems to me most of the problems with Apache's main competitor in terms of software quality are the result of design and engineering choices made by MS's IIS development team.

    In other words, it does exactly what they designed it to do, but what they designed it to do was a very bad idea.

    --
    "Lawyers are for sucks."
    - Doug McKenzie
  7. No cigar, my ass. by KFury · · Score: 5, Insightful
    The article claims Apache's error density, based on a meager 5100 lines of code, is 0.53, while that of 'comparable commercial applications' is 0.51.

    The problems with this are:
    • 5100 lines of code does not give you a confidence range of less than 0.02, especially when the error rate can be expected to be heterogeneous across the code base, as would be the case in an open-source product where different code pieces are created by entirely different groups.
    • 'Comparable' my ass. If they can't provide details of what software they're comparing to (I somehow doubt they got a look at IIS source code) then the stats are worthless, because anyone who's ever programmed knows that quality control isn't a constant factor across commercial products any more than it is among open-source products.
    • What's the error rate of their 'defect analysis'? If they're so good at finding defects, why aren't they out there writing perfect software? If their defect detection rate is less than 98% accurate, then the difference between a rate of 0.51 and 0.53 is meaningless anyhow.
    • There's a big difference between caught coding exceptions and fundamental security problems. The first can cause code to run a little slower, the second can destroy your company. This testing methodology doesn't even look at the second.
  8. Does it matter? by pubjames · · Score: 5, Interesting


    So?

    There are errors and there are errors. There are error that don't matter a jot, and there are errors that are show-stoppers.

    I've worked on banking software containing code that was written in assembly for PD11s and developed over decades. The most horrible spaggetti code you could ever imagine. Why did the banks keep using it? Because for any particular input it always gave the correct output.

    Years of bug fixing had made the code horrible and probably full of errors if you were looking at it from a purely theoretical/software engineering viewpoint. But from an input/output point of view, it was faultless.

  9. what is a "software error"? by siskbc · · Score: 5, Insightful
    If Apache and, say, IIS are roughly equivalent in terms of code defects, you have to ask yourself "well, why does IIS have so many more general problems and security flaws then Apache, when they both carry the same general amount of coding defects?". Is IIS just inherinetly insucure because it is used on a Windows platform? Is it because hackers generally target IIS and not Apache (most people will rush to this conclusion)?

    First, are all of IIS's issues "software errors" per se? I'm wondering if all security problems would have been caught, or if that was really the goal of the analysis. Perhaps it was, but I'm not sure. One could contest that IIS has a lot of things unprotected, but that this doesn't constitute a software error.

    And as you say, severity would be another issue. It's always been typical open-source style to get the mission-critical parts hardened against nuclear attack, but leaving the other bits a tad soft. I wouldn't be surprised to learn that was the case with apache.

    One thing I want to know - did MS (or whoever) give these guys source or were they analyzing the binaries?

    --

    -Looking for a job as a materials chemist or multivariat

  10. Different standards? by NotClever · · Score: 5, Insightful
    When the same group said that the IP stack in Linux was cleaner than a comparable one, everyone was screaming from the rooftops that it validated the open source model. When they say that an open source project and a closed source project are roughly comparable, all of a sudden everyone criticizes the methodology of the report!

    --
    Hell, there are no rules here. We're trying to accomplish something. - Thomas Edison
  11. automatically detected defects exclude security by brlewis · · Score: 5, Insightful

    Another post seems to indicate this was done via software to automatically detect defects. Many (most?) security defects cannot be detected automatically, as they involve using the software in an unintended way.

  12. Bad Statistics... by FunkZombie · · Score: 5, Insightful

    Also keep in mind that defect density is just an average. If you have 31 defects in 60k lines of code, that is potentially 31 security risks, or out-of-operation risks. If the other software tested had double the lines of code (120k), the density would imply that they had slightly less than double the defects, so say 58 or 60. That implies _58_ potential security or uptime risks. In this case, imho, defect density is not a good indicator of the reliablity of the software.

    My general rule is that if someone is quoting statictics to you, they are lying. At least on average. :)

  13. FACT: Reading is Good by Cancel · · Score: 5, Informative
    That's not what they're saying at all. In fact, Reasoning concluded that there was no statistically significant difference in 'defect density' between Apache and the unnamed commercial product.
    "In our February study that compared the defect density of the Linux TCP/IP stack to the average defect density of commercially developed TCP/IP stacks, we concluded that Open Source had a significantly lower defect density compared to commercial equivalents," said Bill Payne, President & CEO of Reasoning. "We received numerous inquiries about that study and took seriously requests for us to examine defect density rates in a less mature Open Source application and compare it with the commercial equivalent. Taking advantage of our database of automated software code inspection projects, we were able to do exactly that, and found the difference in defect density between the two was not significant." (emphasis mine)
  14. Don't assume IIS by m00nun1t · · Score: 5, Insightful

    Ok, IIS is the obvious choice as being the second most popular web server after Apache. But I hardly think Microsoft will be letting these guys all over the IIS source code.

    It could also be Zeus, SunOne or one of the other lesser known web servers out there.

  15. Defect Details by Eustace+Tilley · · Score: 5, Informative
    Interested persons can download the full defect report free of charge.

    Some things I found interesting:
    1. Apache 2.1 (dev) is a mere 76,208 LOC.
    2. No memory leaks detected
    3. 29 NULL pointer dereferences
    4. 2 Uninitialized variables
    5. No bounds errors, no bad deallocs
    6. otherchild.c had a rate of 7 NULL pointer dereferences per 1000 KSLC


    7. One of the explanations (given by Reasoning) for a NULL pointer dereference is "can occur in low memory conditions," which I think means the original allocator did not check for malloc failure.

      So you can get a sense of what a defect looks like, here is #21. The orignal uses bold and fonts improve readability, but I don't know how to reproduce that in slashcode:
      DEFECT CLASS: Null Pointer Dereference

      DEFECT ID 21

      LOCATION: httpd-2.1/srclib/apr/misc/unix/otherchild.c : 137

      DESCRIPTION The local pointer variable cur, declared on line 126, and assigned on line 128, may
      be NULL where it is dereferenced on line 137.
      PRECONDITIONS The conditional expression (cur) on line 129 evaluates to false.
      CODE FRAGMENT
      124 APR_DECLARE(void) apr_proc_other_child_unregister(void *data)
      125 {
      126 apr_other_child_rec_t *cur;
      127
      128 cur = other_children;
      129 while (cur) {
      130 if (cur->data == data) {
      131 break;
      132 }
      133 cur = cur->next;
      134 }
      135
      136 /* segfault if this function called with invalid parm */
      137 apr_pool_cleanup_kill(cur->p, cur->data, other_child_cleanup);
      138 other_child_cleanup(data);
      139 }
  16. Re:Code defects appear to be a small part of the e by jdh-22 · · Score: 5, Insightful
    Every hacker on the planet has full access to the code - which means that they can review it and find vulnerabilities in it. Not many people have access to Windows or IIS code.
    To quote Bruce Schneier: "If I had a letter, sealed it in a locked vault and hid the vault somewhere in New York. Then told you to read the letter, thats not secruity, thats obsecurity. If I made a letter, sealed it in a vault, gave you the blueprints of the vault, the combinations of 1000 other vaults, access to the best lock smiths in the world, then told you to read the letter, and you still can't, thats security." Open source does have an upper hand on holes and bugs, but the code isn't where we should be looking.

    The majority of the secruity holes are from the people setting up the web servers. The holes are usually abused by "wanna-be" hackers, or script-kiddies. The problem is that people are not educated enough to run some of these programs. Being able to understand Apache, and how to make it operate correctly is not everyone's top priority. As long as it works, people don't care how it works (as goes for many other things in this world).
    --
    Every Super Villan uses Linux.
  17. Here are the links to the defect reports by arrogance · · Score: 5, Informative
    Defect Report

    Metric Report

    They make you fill out a form that asks for your email and then do an opt out checkbox at the bottom of the form (you have to check it to NOT get spam from them). The site's a bit slashdotted right now though.

  18. Re:Code defects appear to be a small part of the e by aziraphale · · Score: 5, Interesting

    One word: architecture.

    And not just the architecture of the web server, but the architecture of the entire platform. But specifically looking at the architecture of Apache versus the architecture of IIS, you'll immediately see that the goals of the two pieces of software are not the same. Look at things like IIS's metabase - the structural details of the server's configuration are kept in an in-memory data structure, which is easily modified while the server is running. Apache, in contrast, reads its configuration at startup, and uses it to determine which modules of code are loaded, and how they are used to process requests - fixing the behavior of the web server at startup.

    IIS follows typical MS enterprise software design - it has to interface with COM, and the NT security model, and active directory, and the registry, and a million other systems, all in the name of integration, and enterprise management. Apache doesn't have PHBs telling it that it needs another way for the metabase to be edited, or a new instrumentation API, or whatever else a particular large customer asked for - and can get on with just providing its facilities cleanly.

    That's why IIS has so many more security holes, even if it does (as may or may not be the case) have the same raw coding error rate as Apache.