Slashdot Mirror


Software Code Quality Of Apache Analyzed

fruey writes "Following Reasoning's February analysis of the Linux TCP/IP stack (putting it ahead of many commercial implementations for it's low error density), they recently pitted Apache 2.1 source code against commercial web server offerings, although they don't say which. Apparently, Apache is close, but no cigar..."

139 of 442 comments (clear)

  1. So if they found them... by Marx_Mrvelous · · Score: 5, Funny

    Why don't they fix them? It seems almost paradoxical, if you find .53 errors per thousands lines of code and fix them, then you'll have 0 errors. But since we can only fix errors we can detect, we only detect errors we can fix. Ok, it's too early on a Monday morning...

    --

    Moderation: Put your hand inside the puppet head!
    1. Re:So if they found them... by dkh2 · · Score: 3, Insightful

      Sure, they found them but, did they catalog them in any way. .53/KLOC errors translates to approx. 1 error every 1886 LOC on average. On top of that, on further investigation, which of these are actual errors and which only look like errors?

      I'm just glad I'm not the poor go-coder who has to go through the code to find and fix these few "errors."

      --
      My office has been taken over by iPod people.
    2. Re:So if they found them... by Jeremy+Erwin · · Score: 5, Informative
      If you download the defect report (available from here*, it will explain exactly where the bugs are.
      For instance, the first bug is

      DEFECT CLASS: Null Pointer Dereference DEFECT ID 1
      LOCATION: httpd-2.1/modules/aaa/mod_auth_basic.c :291
      DESCRIPTION The local pointer variable current_provider, declared on line 235, and assigned on line 257, may be NULL where it is dereferenced on line 291.
      PRECONDITIONS The conditional expression (res) on line 253 evaluates to false AND
      The conditional expression (!current_provider) on line 264 evaluates to true AND
      The conditional expression (!provider || !provider->check_password) on line 268
      evaluates to false AND
      The conditional expression (auth_result != AUTH_USER_NOT_FOUND) on line
      282 evaluates to false AND
      The conditional expression (!conf->providers) on line 287 evaluates to false.


      Each bug report is followed by the snippet of source code containing the defect.

      The metric report simply reports the statistics. For instance, the most bug ridden file is otherchild.c. The most common bug class is "dereferencing a NULL pointer".

      If the Apache developers simply want to fix the bugs, they can use the Defect Report. If they want conduct a brutal purge of their contributors, they can use the Metric report.

      *Yes, Reasoning wants an email address. They will mail you a URL (a rather simple one at that) to access the reports.
    3. Re:So if they found them... by MisterFancypants · · Score: 4, Insightful
      None of that bug report is at all useful if there is no logical way for all of those preconditions they listed to actually be met.

      I mean, yeah, it would be nice if code would explicitly check for a NULL before dereferencing, but if there's no earthly way for the pointer to actually BE a NULL pointer at that time (barring memory corruption -- in which case all bets are off and your code is doomed anyway) then I wouldn't call those errors.

      This whole exercise seems very suspect to me.

    4. Re:So if they found them... by tomstdenis · · Score: 5, Interesting

      Agreed. Things like splint often report "warnings" on code that shouldn't be. For instance

      int some_func(char *somebuf)
      {
      if (somebuf == NULL) return ERROR;
      somebuf[0] = 'a';
      return OK;
      }

      Will generate a warning with splint saying "pointer may be null" despite the fact it cannot be.

      Those tools are generally too sensitive and give too many false positives to be useful in the long run.

      Tom

      --
      Someday, I'll have a real sig.
    5. Re:So if they found them... by tomstdenis · · Score: 2, Informative

      Neat, well its been nearly a year since I used splint last. Maybe they just have updated the code.

      Eitherway I prefer

      "--std=c99 -pedantic -Wall -W -Wshadow" as my warnings for GCC. It catches a shit-load of common coding foobars and also ensures the code follows ISO C [definite bonus].

      Tom

      --
      Someday, I'll have a real sig.
    6. Re:So if they found them... by coliva · · Score: 2, Informative

      I found it interesting that they used a 1/31/03 version of Apache 2.1-dev. This wasn't mentioned anywhere in the article- either that it was a development version or that their analysis was of a development-level piece of software 5 months ago.

      It would be interesting to see how far 2.1 has progressed since then.

    7. Re:So if they found them... by Skjellifetti · · Score: 4, Informative
      None of that bug report is at all useful if there is no logical way for all of those preconditions they listed to actually be met.

      Well, Yes and No. The problem is that there may be no logical way that the pointer may be NULL today. But tomorrow, a new coder will add something that modifies the preconditions and suddenly that pointer can indeed be NULL. Even where you are sure that a condition is impossible, it is usually a good idea to check for NULL in order to avoid future errors.

      And for those who haven't seen this trick before, a nice habit to get into is to write your checks like so:
      if (NULL == myPointer) { ... }
      This lets the compiler catch errors where you meant '==' rather than just '='. As in
      /* Do we really mean this? */
      if (myPointer = NULL) { ... }
    8. Re:So if they found them... by Anonymous Coward · · Score: 5, Insightful

      The funny thing is that this "bug" doesn't appear to actually be one...

      Note that current_provider is set to conf->providers on line 257. The loop starts and neither current_provider or conf->providers change. Then on line 287 there's a conditional break if conf->providers is NULL.

      If current_provider is going to be NULL at line 291, then conf->providers must be as well, so the conditional break will happen and the NULL dereference will be skipped.

      Or am I missing something else?

    9. Re:So if they found them... by fnorky · · Score: 2, Insightful
      I found it interesting that they used a 1/31/03 version of Apache 2.1-dev. This wasn't mentioned anywhere in the article- either that it was a development version or that their analysis was of a development-level piece of software 5 months ago. It would be interesting to see how far 2.1 has progressed since then.

      After reading the review I came a way with the impression that the reviewers were trying to hide this very fact. No mention this is a development version of Apache. No mention of what the "several commercial equivalents" are. Not much to back up their claim "Apache http server V2.1 code has defect density rate similar to the average found within commercial applications - Findings differ from previous Open Source Study".

      I dare say that at first glance this this seems to be a case of FUD.

    10. Re:So if they found them... by apankrat · · Score: 2, Insightful

      .. But tomorrow, a new coder will add something that modifies the preconditions and suddenly that pointer can indeed be NULL.

      That's what assert() exists for. And 'preconditions' you are referring to are actually 'invariants', so if "suddenly that pointer can indeed be NULL" it means that someone broke a fundmental design assumption and should not be tweaking the code anyway.

      And for those who haven't seen this trick before, a nice habit to get into is to write your checks like so:..

      I found this trick pretty annoying. First of all any decent compiler can catch this with a warning. Second, if you are in fact misplacing == with = so often that you need a special habit for fighting it, then perhaps you should look at what you type :) There are plenty C language constructions that can ruin your code with a single misplaced character:

      "xFF" vs "\xFF"
      comma operator; for instance, f(param) vs f,(param)
      misplaced structure initializers
      etc, etc

      It does not mean the programmer need to guard against all these too, it just means that the code must be proofread as it's being written, which is a reasonable thing to expect from a professional developer.

      --
      3.243F6A8885A308D313
    11. Re:So if they found them... by Error27 · · Score: 2, Interesting

      FUD??? Gimme a break.

      It says pretty clearly that they purposely chose a less mature sample of open source software than they did last time. The point is, does open source software start out bug free or do the bugs get worked out with age?

    12. Re:So if they found them... by coliva · · Score: 2, Informative

      Correct. The title of the report is clear. However, that info didn't make it into the news release that they put out.

    13. Re:So if they found them... by Jeremy+Erwin · · Score: 4, Insightful

      The earlier study was of polished code, many iterations after release. This latest study is of an unpolished developers snapshot. I suppose that you might be able to divine some kind of wisdom about the development of open-source software-- Development branches shall be as stable as commercial code. Release branches shall be more so.

      The metrics report does mention the version number (dev-1/31/03), though the fact that this is development code is not explicitly noted No mentions is made who commissioned this study. Perhaps the company is simply fishing for clients.

    14. Re:So if they found them... by pclminion · · Score: 2, Insightful
      Considering that Brian Kernighan, co-inventor of the C language, advocates this coding style in his book The Practice of Programming, I think it might be you who's the moron (and the 12 year old). This is a classic error that thousands of programmers have made and continue to make. It's the difference of a single repeated keystroke.

      So shut up, you little twerp.

    15. Re:So if they found them... by conway · · Score: 3, Informative

      Turning on all warnings in gcc (-Wall) catches this, and many other common errors.
      (In effect it does a lint-like check on the source.)

    16. Re:So if they found them... by aborchers · · Score: 2, Funny
      It's an argument to a function. It cannot be modified by another thread/process.


      Thanks for the reality slap. Years of LISP and Java have made me weak and flabby. :-)

      --
      Trouble making decisions? Just flip for it.
    17. Re:So if they found them... by Ed+Avis · · Score: 2, Informative

      The null pointer in C is written as 0, and tests as false when used as a Boolean. It might be stored internally as the bit value 1010101, but still in C source code it is 0, and false. So

      if (pointer) ...

      is perfectly legal, and portable, C.

      --
      -- Ed Avis ed@membled.com
    18. Re:So if they found them... by the_duke_of_hazzard · · Score: 2, Insightful
      "The defect density of the Apache code inspected was 0.53 per thousand lines of source code, while the commercial average defect density came to 0.51 per thousand lines of source code."

      A simple reductio ad absurdum from this: if you produce thousands and thousands of lines of harmless, simple code to do something that could be done in a line, then your more verbose code is "better" than the concise one by this metric.

      This is assuming that it is possible to reliably statically test for errors in the first place, and that one "error" is equivalent to another... All seems a little suspect to me.

      This signature is intentionally pointless.

    19. Re:So if they found them... by aborchers · · Score: 2, Insightful

      Sorry to get pedantic, but char* buffers are not error prone. Programmers are prone to make errors when using them. Lack of maturity (so to speak) in the language and bad programmer form are not the same. Bad form is bad form in C or Java. That one lacks array bounds checking that the other provides is irrelevant. Languages that protect the programmer from errors may make bad form less likely to result in a failure, but failing to employ best practices in code design can still lead to hard-to-detect logic bombs.

      In this case, the bad form in using early returns is that using them leads one to not look at the whole routine as a cohesive whole where all the antecedents and consequents are correctly considered and accounted for. It's similar to why:

      if (a) { ...
      }
      else if (b) { ...
      }

      is bad form compared to

      if (a) { ...
      }
      else {
      if (b) { ...
      }
      }

      From tracing point of view, they are indistinguishable. They may even compile to the same set of instructions. The second, however, shows a level of diligence on the part of the engineer that all the possible routes are considered and there is no dangling consequent.

      Disclaimer: The real reasons why these things are bad form are practically impossible to convey in an example that doesn't make use of real code. i.e. it's the "..." bit that provides the opportunity for the bad-form constructs to leak bugs.

      --
      Trouble making decisions? Just flip for it.
    20. Re:So if they found them... by aborchers · · Score: 2, Informative

      A chained else-if structure is equivalent to a switch.


      Funny you should point that out: a chained else-if structure without a terminal else is equivalent to a switch without a default which is notoriously vulnerable to the same sort of logic errors.

      if you can use the simpler structure without duplicating code, then you should


      While I agree with that principle, the whole issue of good form (which I won't argue can be inefficient and cumbersome) is that following it slavishly can prevent the coding patterns that lead to hard-to-find bugs. It protects us from our own worst tendencies, one of which is assuming when we write the code that we know exactly what we mean it to do. :-) Optimization is a valuable step to be sure, but optimizing too soon is a route to buggy code.

      --
      Trouble making decisions? Just flip for it.
    21. Re:So if they found them... by Alsee · · Score: 2, Funny

      I don't think using a development branch is really a good choice at all. Dev branches are just that, development, not intended for normal, every day use (except by the very brave).

      Some people love the thrill of skydiving and opening their parachute 5 seconds before they hit the ground. Some people defy death by wrestling crocidiles bare handed. Others get a rush pushing 200 MPH going into a turn in a formula one race car.

      Me, I get my adrenaline pumping by running code on the development branch.

      -

      --
      - - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
    22. Re:So if they found them... by murr · · Score: 2, Informative

      Interestingly enough, that very first bug report demonstrates a limitation in the logical reasoning of the analysis tool, not a defect in the Apache code:

      current_provider was assigned from conf->providers (line 257), so it cannot possibly be NULL unless conf->providers is NULL, and that condition is tested for on line 287.

      NEXT!

  2. A bit late, aren't we? by Anonymous Coward · · Score: 2, Interesting
  3. apache 2.1? by fishynet · · Score: 5, Interesting

    2.1 is'nt even out yet! the latest is 2.0.46!

    --

    Cats: All your base are belong to us.
    Captain: Take off every sig !!
  4. It's not fair! by jpmahala · · Score: 5, Funny

    Just because Open-Source coders can't spell when they insert comments doesn't mean that they can't write good code!

    1. Re:It's not fair! by MrPerfekt · · Score: 4, Funny

      Unless they can't spell other things like...

      inklude
      dephine
      retern
      brake... etc.

      --
      I just wasted your mod points! HA!
    2. Re:It's not fair! by Drathos · · Score: 2, Insightful

      That's what compiler errors are for.. How else are you supposed to find typos when vim doesn't have a spellchecker? :)

      --
      End of line..
    3. Re:It's not fair! by Jucius+Maximus · · Score: 3, Funny
      "(putting it ahead of many commercial implementations for it's low error density)"

      This line gave me a good chuckle. I expect that most people did not even notice the grammatical error in a sentence talking about low error densities.

      Note: The rules for its/it's are not covered in Bob's Quick Guide To The Apostrophe, You Idiots since the Guide covers nouns and 'it' is a pronoun.

  5. Code defects appear to be a small part of the equa by mao+che+minh · · Score: 4, Insightful
    I suppose now we have to question the severity of the defects (and also factor in the implementation and use of the code). If Apache and, say, IIS are roughly equivalent in terms of code defects, you have to ask yourself "well, why does IIS have so many more general problems and security flaws then Apache, when they both carry the same general amount of coding defects?". Is IIS just inherinetly insucure because it is used on a Windows platform? Is it because hackers generally target IIS and not Apache (most people will rush to this conclusion)?

    But here's the kicker: the vast majority runs Apache on either BSD or Linux. All of this code, from the kernel to the library that tells Apache how to use PHP, is open source. Every hacker on the planet has full access to the code - which means that they can review it and find vulnerabilities in it. Not many people have access to Windows or IIS code. So why does IIS and Windows come out as far less secure, and is exploited so much more?

    I think the answer lies in the severity of the code defects, and the architecture and design of the operating system that powers the web server. And yes, I know that Apache can run on Windows.

  6. Wait a second by Knife_Edge · · Score: 3, Insightful

    Has Apache 2.1 been released as a stable, non-developmental release? If not I would say testing it for defects is a bit premature.

    1. Re:Wait a second by AftanGustur · · Score: 2, Interesting


      Has Apache 2.1 been released as a stable, non-developmental release?

      According to the official site.
      The latest 2.* relase is "2.0.46 " and version 2.1 is nowhere to be seen ....

      So the question is : Which version did they audit ??

      --
      echo '[q]sa[ln0=aln80~Psnlbx]16isb572CCB9AE9DB03273snlbxq' |dc
    2. Re:Wait a second by willamowius · · Score: 2, Funny

      They probably compared it to IIS 7.4 to make it a fair comparison. ;-)

  7. Defect? by Jason_says · · Score: 5, Interesting

    Reasoning found 31 software defects in 58,944 lines of source code of the Apache http server V2.1 code.

    so what are the calling a defect?

    1. Re:Defect? by Anonymous Coward · · Score: 2, Funny

      so what are the calling a defect?

      I guess would be quite a good example.

    2. Re:Defect? by richie2000 · · Score: 5, Informative
      From the report:
      NULL Pointer Dereference (Expression dereferences a NULL pointer) 29 instances
      Uninitialized Variable (Variable is not initialized prior to use) 2 instances

      They also list the files and code snippets where the errors were found.

      In addition, the comparison is made against an industry average of commercial code they have tested this way, NOT against other webservers.

      --
      Money for nothing, pix for free
  8. How do they get to look at closed source? by 3.5+stripes · · Score: 3, Interesting

    And don't most NDAs for when they do let you look forbid any competetive analysis?

    Or am I just too far out of that line of work to know how these things work?

    --


    He tried to kill me with a forklift!
  9. 2.1 ? by Aliencow · · Score: 4, Insightful

    Wouldn't that be unstable? I thought the latest was 2.0.46 or something.. If I'm not mistaken, it would be a bit like saying "Freebsd 4.8 has less bugs than Linux 2.5!"

  10. What do reasoning do? by SystematicPsycho · · Score: 4, Insightful

    So basically they offer a service like lclint only many times more advanced ? What is to say they haven't missed anything?

    This is probably a publicity stunt for them although a good one. I think it would be a good idea for them to sell software suites of their product if they don't already.

    --
    Analytic & algebraic topology of locally Euclidean meterization of infinitely differentiable Riemmanian manifold
  11. FACT: 3 is a larger number than 2 by TheRaven64 · · Score: 4, Insightful
    Hmm, so they looked at 58,944 lines of code, and found 31 defects? Did they find every defect? Can they prove this? What about those found in commercial code? If it were possible to find all of the defects in a piece of code this big in a small amount of time, then there would be no defects, since they would all be identified and fixed before release.

    As far as I can see, this article says 'We have two arbitary numbers, and one is bigger than the other. From this we deduce that Apache is not as good as commercial software.'

    --
    I am TheRaven on Soylent News
  12. Apache 2.1...? by bc90021 · · Score: 4, Insightful

    According to Apache.org, Apache's latest stable version is 2.0.46. Is that a typo on their part, or are they testing a development version? Also, since 1.3.27 is widely used, it would have been interesting to see how that stacked up as well, having been developed longer.

    Either way, to have only 31 errors in close to 60,000 lines of code is impressive!

    1. Re:Apache 2.1...? by Bu+Na+Dan · · Score: 2, Funny

      the error density in the announcement of reasoning.com is pretty high ... testing a non released software against an unknown commercial software ... sounds like an ancient tale. where are the people who accept this kind of crap?

    2. Re:Apache 2.1...? by jbp4444 · · Score: 3, Insightful

      I was quite impressed by the fact that Apache can cram all the functionality into ~59k lines. So besides defect rate, I would like to know how many lines of code the commercial package had ... 0.51 defects per 1000 lines sounds good, unless there are 1,000,000 lines more code in the commercial package.

    3. Re:Apache 2.1...? by pmz · · Score: 2, Insightful

      I was quite impressed by the fact that Apache can cram all the functionality into ~59k lines.

      Agreed. It would be interesting to know whether this low LOC is accomplished through good architecture that emphasizes simplicity and maintainability or "clever" hacks that compress a 10-line loop down into a three-line abomination of pointer arithmetic. I genuinely hope it is not the latter.

      Regardless, 59K lines is small enough a program that--given a good architecture--can be studied and debugged relatively easily by one or two people. I'd estimate that this is why Apache is known for its low number of exploits in spite of its enormous web server market share.

  13. "Defect Density"? by sparkhead · · Score: 4, Insightful
    A key reliability measurement indicator is defect density, defined as the number of defects found per thousand lines of source code.

    Since LOC is a poor metric, a "defect density" measurement based on that will be just as poor.

    Yes, I know there's not much else to go on, but something along the lines of putting the program through its paces, stress testing, load testing, etc. would be a much better measurement than a metric based on LOC.

  14. Open Source versus Closed by ElectronOfAtom · · Score: 3, Informative

    The difference is that now that someone has found 31 errors in the open source Apache software, they will be fixed fairly quickly whereas closed source software will have to have the company do a cost-benefit analysis, put together a team to do the fixes, probably charge to put out patches or minor upgrades (assuming the product is Microsoft's IIS ;b)...

    --
    Only two things are infinite, the universe, and human stupidity,
    and I'm not sure about the former.
  15. their own code? by Jearil · · Score: 5, Funny

    Why does it seem a bit odd to be testing software quality with other software? I wonder if they ran their own software through its own program, but then that gets kinda weird when a program starts noticing errors about itself... maybe it'd get depressed and start ranting at the creator on how they should have taken better care of it... ok, I need more sleep

  16. What kind of BS test is this? by dtolton · · Score: 2, Interesting

    They are comparing a development version to an un-named commercial web server?

    Why don't they compare it to apache 2.0.46 if they want a newer, but release product? I expect they did, but they didn't get the results they wanted.

    This is a development version, it's an odd numbered release for crying out loud.

    I wouldn't be suprised to see this is bankrolled by M$. Let's compare IIS in development to Apache 2.1, and then see what IIS bug density rate is.

    Bah!!

    --

    Doug Tolton

    "The destruction of a value which is, will not bring value to that which isn't." -John Galt
  17. Apache 2.1 does not yet exist by David+McBride · · Score: 4, Informative

    Umm, Apache 2.1 hasn't been released yet. Current latest stable is 2.0.46.

    I can only assume that they're looking through the current DEVELOPMENT codebase -- finding a higher ``defect density'' in such a development codebase compared with commercial offerings is not exactly unexpected.

    They're also some automated code inspection product; the press release doesn't go into details as to the severity of the defects found or the testing methodology.

    It'll be necessary to read through the full report before drawing any sound conclusions.

    1. Re:Apache 2.1 does not yet exist by David+McBride · · Score: 4, Informative

      The above link wants your email address. Bah.

      The direct URLs for the reports are:
      Defect Report
      Metric Report

  18. Links to the Reports (no free reg required) by Anonymous Coward · · Score: 2, Informative
    AC, thank you for contacting Reasoning!

    Here are the links to the Apache Open Source Inspection Report you requested:

    Apache Defect Report: http://www.reasoning.com/pdf/Apache_Defect_Report. pdf
    Apache Metric Report: http://www.reasoning.com/pdf/Apache_Metric_Report. pdf

    Reasoning provides the world's leading automated software inspection service. We boost the productivity of development teams by finding software defects faster and at a far lower cost than traditional approaches. Please let me know if you would like additional information. Thank you again for contacting Reasoning!

    Sincerely,
    Reasoning

  19. more to it than # flaws-per-unit-"whatever" by Asprin · · Score: 5, Insightful


    What bothers me about these articles is that there is more to software quality than the # of flaws-per-unit-"whatever".

    Like design.

    It seems to me most of the problems with Apache's main competitor in terms of software quality are the result of design and engineering choices made by MS's IIS development team.

    In other words, it does exactly what they designed it to do, but what they designed it to do was a very bad idea.

    --
    "Lawyers are for sucks."
    - Doug McKenzie
  20. Interesting, with or without modules? by hughk · · Score: 3, Interesting
    If anyone has an Apache 2.1 dist around, they say they checked 58,000 lines - does this seem reasonable? Is this with any of the modules such as PHP or Perl or is this raw????

    I know that Apache has vulnerabilities but it should come better than IIS. You can't realisticly give a verdict on IIS without looking at the libraries called.

    As for the rest, I can imagine some commercial products coming in better, but not many.

    --
    See my journal, I write things there
    1. Re:Interesting, with or without modules? by alder · · Score: 2, Informative
      they checked 58,000 lines - does this seem reasonable?
      It looks reasonable if they checked only the server "core".
      • All *.c files under httpd-2.0.46 - 375K lines
      • APR (i.e. srclib) - 230K lines
      • All modules - 93K lines
      • modules/http - 5K lines
      • modules/loggers - 1.6K lines
      • modules/cache - 0.4K lines
      • some files from modules/mappers - 4K lines
      375 - 230 - 93 + 5 + 1.6 + 0.4 + 4 = 63K in ~ 100 files

      Subtract 53 lines per file on Apache Software Licence and you'll end up with ~58K.

  21. No cigar, my ass. by KFury · · Score: 5, Insightful
    The article claims Apache's error density, based on a meager 5100 lines of code, is 0.53, while that of 'comparable commercial applications' is 0.51.

    The problems with this are:
    • 5100 lines of code does not give you a confidence range of less than 0.02, especially when the error rate can be expected to be heterogeneous across the code base, as would be the case in an open-source product where different code pieces are created by entirely different groups.
    • 'Comparable' my ass. If they can't provide details of what software they're comparing to (I somehow doubt they got a look at IIS source code) then the stats are worthless, because anyone who's ever programmed knows that quality control isn't a constant factor across commercial products any more than it is among open-source products.
    • What's the error rate of their 'defect analysis'? If they're so good at finding defects, why aren't they out there writing perfect software? If their defect detection rate is less than 98% accurate, then the difference between a rate of 0.51 and 0.53 is meaningless anyhow.
    • There's a big difference between caught coding exceptions and fundamental security problems. The first can cause code to run a little slower, the second can destroy your company. This testing methodology doesn't even look at the second.
    1. Re:No cigar, my ass. by HowlinMad · · Score: 3, Informative

      FYI

      5100 != 58,944

      58,944 is the number from the article.

  22. BSD codestyle... by BigBadDude · · Score: 3, Funny


    The defect density of the Apache code inspected was 0.53 per thousand lines of source code...


    We can bring this number down to 0.2 by avoiding the BSD style guidlines. No kiddings, have you seen the density of MFC code?

    BSD code:

    char*
    foo(int bar, double baz)
    {

    /* do something */
    return bar + random();

    }



    MS code:

    char* Foo(int nBar, double dBaz) { return bar + random() + m_ExtraWindowsBugModifier(); }

  23. This is an ad for their software by Sikmaz · · Score: 2, Insightful

    This looks like it was just an ad/demo of their code testing software.

    I am trying to get the main analysis downloaded now, but they must have been prepared for a slashdot posting ;)

  24. Does it matter? by pubjames · · Score: 5, Interesting


    So?

    There are errors and there are errors. There are error that don't matter a jot, and there are errors that are show-stoppers.

    I've worked on banking software containing code that was written in assembly for PD11s and developed over decades. The most horrible spaggetti code you could ever imagine. Why did the banks keep using it? Because for any particular input it always gave the correct output.

    Years of bug fixing had made the code horrible and probably full of errors if you were looking at it from a purely theoretical/software engineering viewpoint. But from an input/output point of view, it was faultless.

  25. Re:FACT: 3 is a larger number than 2 by frankthechicken · · Score: 2, Insightful

    Completely and utterly agree, I mean hell, I could write fifty thousand lines of code, each line completely and utterly with no meaning, run it through the checker and produce 0 defects, except for one overall defective piece of software. Does this article have any point whatsoever to it at all, I mean, even if the results had any meaning, what on earth is the point of comparing a known to an unknown ?

  26. what is a "software error"? by siskbc · · Score: 5, Insightful
    If Apache and, say, IIS are roughly equivalent in terms of code defects, you have to ask yourself "well, why does IIS have so many more general problems and security flaws then Apache, when they both carry the same general amount of coding defects?". Is IIS just inherinetly insucure because it is used on a Windows platform? Is it because hackers generally target IIS and not Apache (most people will rush to this conclusion)?

    First, are all of IIS's issues "software errors" per se? I'm wondering if all security problems would have been caught, or if that was really the goal of the analysis. Perhaps it was, but I'm not sure. One could contest that IIS has a lot of things unprotected, but that this doesn't constitute a software error.

    And as you say, severity would be another issue. It's always been typical open-source style to get the mission-critical parts hardened against nuclear attack, but leaving the other bits a tad soft. I wouldn't be surprised to learn that was the case with apache.

    One thing I want to know - did MS (or whoever) give these guys source or were they analyzing the binaries?

    --

    -Looking for a job as a materials chemist or multivariat

    1. Re:what is a "software error"? by Q2Serpent · · Score: 2, Informative

      Obviously they had source code access. That's the way reasoning works - their program reads in and parses the source code, generates a parse tree, and then analyzes that. That's why it's called "static analysis" - no binaries, runtimes, or testcases are needed, and errors can even be found in code that is never excercised.

    2. Re:what is a "software error"? by Tony-A · · Score: 4, Insightful

      It's always been typical open-source style to get the mission-critical parts hardened against nuclear attack, but leaving the other bits a tad soft.

      IMNSHO, that ought to be standard for any mission-critical software. Bugs and the places that bugs live in are not created equal. The beauty of Apache (at least 1.13) is that the overall system can be very robust and reliable with rather buggy modules. I suspect the problem with IIS is that everything assumes everything else is perfect, which overall doesn't quite work so well.

    3. Re:what is a "software error"? by beta21 · · Score: 2, Funny

      These acronyms sometimes get me IMNSHO?

      I Am Not A Single Horny Octupus?

  27. That's so weird ... by SuperDuG · · Score: 3, Interesting
    I found just the opposite.

    Important Tech City, CA, July 7th 2003
    For Immediate Release
    Sbj: Apache beats other webservers

    Recently we had our staff (some guys kid) look over the source code of 3 major webserver packages available, in that code nearly 8 million lines of error were found, but surprisingly the damned things still worked?!

    We placed a performance test (click a link and see if porn comes faster) with apached and 3 other commercial offerings. Apache seemed to knock them all of the water, boy will those other three companies be mad now.

    While we cannot tell you what the other three offerings were (that might make this whole thing more believeable) we can tell you that we think they're popular.

    Here's the results

    Apache ------------------- 104
    Com 1 --------32
    Com 2 -----------45
    Com 3 ---------------53

    As you can see by the clear test results, apache wins in all tests.

    Since when are unfounded results from a company that doesn't explain what the "32 defects" were, newsworthy. Don't act like these guys are worth my time, this is bullshit.

    --
    Ignore the "p2p is theft" trolls, they're just uninformed
  28. Re:Code defects appear to be a small part of the e by phre4k · · Score: 4, Informative

    Prette lame when we are talking server software where apache has the lead. (apache 63% vs IIS 25% netcraft.com)

    /Esben

    --
    "Nobody really checks their email any more. They just delete their spam"
  29. Dubious by cca93014 · · Score: 4, Insightful

    Is it just me that finds this entire concept of "code defects per 000 lines" sounding like a little bullshit?

    If the company has developed proprietary tools to enable them to identify defects in medium-sized software projects, which of the following business models do you think is more effective:

    1. Design proprietary tools to identify defects in medium-sized software projects.
    2. Fix defects
    3. Profit

    or

    1. Design proprietary tools to identify defects in medium-sized software projects.
    2. Sit around mumbling about defects, Open Source software, closed source software and why farting in the bath smells worse
    3. ???
    4. Profit

    Secondly, where on earth did they get hold of a closed source enterprise level (which Apache undoubtedly is) web server software codebase?

    "Hi, is that BEA? Do you mind if we take a copy of your entire code base so that we can peer review it against Apache's? What's that? Yes, Apache might come out on top, and we will make the results public..."

    How do they define a defect anyway? A memory leak? A missing overflow check? A tab instead of 4 spaces?

    It just sounds like bullshit to me...

  30. Different standards? by NotClever · · Score: 5, Insightful
    When the same group said that the IP stack in Linux was cleaner than a comparable one, everyone was screaming from the rooftops that it validated the open source model. When they say that an open source project and a closed source project are roughly comparable, all of a sudden everyone criticizes the methodology of the report!

    --
    Hell, there are no rules here. We're trying to accomplish something. - Thomas Edison
  31. automatically detected defects exclude security by brlewis · · Score: 5, Insightful

    Another post seems to indicate this was done via software to automatically detect defects. Many (most?) security defects cannot be detected automatically, as they involve using the software in an unintended way.

  32. If Apache is so poor in quality... by tsetem · · Score: 4, Funny

    ...then why is it their webserver? :)

    Of course it is Apache 1.3.23...

  33. So the error level in pre-release Apache ... by burgburgburg · · Score: 4, Insightful

    is equivalent to the error level in post-release commercial web serving software. Sounds like an endorsement to me.

    1. Re:So the error level in pre-release Apache ... by Kynde · · Score: 4, Insightful

      is equivalent to the error level in post-release commercial web serving software. Sounds like an endorsement to me.

      That, too, but I'm damn certain that they must have tried it on recent stable 2.0.46ish release aswell. The question is, why weren't those results made public?

      I'm guessing it's because the results were something that would've placed their "defect detection sw" into bad light. I.e. nothing as fancy as the forementioned "use of uninitialized variable" and "dereference of a NULL pointer" (which strikes really odd to me in the first place).

      Naturally the other explanation is endorsement. It would be so much not-the-first-time that I don't even bother... but I wouldn't bet that this is the case here, because the defect counts were only compared to production release code averages (which strikes me as the other extremely dubious part of this whole "experiment").

      --
      1 Earth is warming, 2 It's us, 3 it's royally bad, 4 we need to take action NOW
    2. Re:So the error level in pre-release Apache ... by yaphadam097 · · Score: 3, Insightful
      I've worked on open source projects and I've also worked in commercial development shops. I think that their findings are accurate but misleading:
      1. In my experience there are generally less bugs in pre-release code on a commercial project because there is a stronger culture of code ownership, and most if not all code is independently reviewed before being committed.
      2. There are generally a high number of defects in pre-release open source code, because developers commit early and commit often. Independent review happens more often in open source projects, but it usually happens after the code has already been committed to the dev branch (Before that, the geographically dispersed dev team has no access to it.)
      3. The quality of code released to production in a commercial environment is usually very similar to the quality of code in the development branch. Once it is reviewed and committed it enters a QA cycle where an independent team tries to find any bugs. At this point there is invariably strong pressure to release. So, bug fixes happen quickly and quality suffers (I've always found it ironic that we called this "Quality Assurance.")
      4. Once an open source project has been completed (Meaning all of the features have been developed) it enters a much longer period of code review, bug hunting, and alpha release. For a project like Apache it was over a year before anyone started to use 2.0 in production. Most commercial companies can't afford nearly that much "QA" time, because they are spending money to make money.
  34. Bad Statistics... by FunkZombie · · Score: 5, Insightful

    Also keep in mind that defect density is just an average. If you have 31 defects in 60k lines of code, that is potentially 31 security risks, or out-of-operation risks. If the other software tested had double the lines of code (120k), the density would imply that they had slightly less than double the defects, so say 58 or 60. That implies _58_ potential security or uptime risks. In this case, imho, defect density is not a good indicator of the reliablity of the software.

    My general rule is that if someone is quoting statictics to you, they are lying. At least on average. :)

    1. Re:Bad Statistics... by Lxy · · Score: 4, Funny

      My general rule is that if someone is quoting statictics to you, they are lying. At least on average. :)

      39% of Slashdot readers already know that.

      --

      There is no reasonable defense against an idiot with an agenda
      :wq
  35. to be expected from Open Source by Illserve · · Score: 3, Interesting

    By its very nature, Open source will tend to fix important bugs and leave unimportant ones unfixed, while standard QA processes associated with commercial software will tend to fix little UI issues during the release schedule before dealing with vulnerabilities.

    So seems pretty clear to me that in Open source, the ratio of showstopper bugs to miscolored widget bugs will be much lower than for commercial software.

    1. Re:to be expected from Open Source by Daniel_Staal · · Score: 2, Insightful

      I don't think the poster meant to dis commercial QA work: he was instead of the opinion that commercial software will value the widgets and so on more than open source does.

      That is: he is sure that *both* processes take into account severity and priority of bugs. The poster just felt that their priorities were different. (Polish being more important for commercial code, absolute correctness for open source. The question of the 'correct' balance is left up to the reader.)

      --
      'Sensible' is a curse word.
  36. FACT: Reading is Good by Cancel · · Score: 5, Informative
    That's not what they're saying at all. In fact, Reasoning concluded that there was no statistically significant difference in 'defect density' between Apache and the unnamed commercial product.
    "In our February study that compared the defect density of the Linux TCP/IP stack to the average defect density of commercially developed TCP/IP stacks, we concluded that Open Source had a significantly lower defect density compared to commercial equivalents," said Bill Payne, President & CEO of Reasoning. "We received numerous inquiries about that study and took seriously requests for us to examine defect density rates in a less mature Open Source application and compare it with the commercial equivalent. Taking advantage of our database of automated software code inspection projects, we were able to do exactly that, and found the difference in defect density between the two was not significant." (emphasis mine)
  37. Actually the article suggests apache is better by sterno · · Score: 4, Insightful

    This doesn't indicate that the commercial equivalents are better. You've got the DEVELOPMENT branch of Apache, which is derrived from the 2.0.x code which is a complete rework from the original 1.X branch of code. So it's a rather new code base and it's showing similar defect rates to a code base that has been around for a while. I'd say this prooves that open source is better.

    --
    This sig has been temporarily disconnected or is no longer in service
  38. Recursion by sterno · · Score: 2, Funny

    They didn't do that because if they did that, then they'd find bugs in their bug finder, so they'd have to run the bug finder on the bug finder to find bugs there, but then they'd have to run the bug finder on the bug finder on the...

    --
    This sig has been temporarily disconnected or is no longer in service
    1. Re:Recursion by fgb · · Score: 4, Funny

      That reminds me of an old (early 1980's) product named BILF (Basic Infinite Loop Finder). It was supposed to be run against BASIC source code and it would find all infinite loops in the code, or so the vendor claimed.
      A magazine reviewed the product. In their review they included a formal mathematical proof that such a program could never work. The vendor responded to the proof by saying that they would fix that problem in the next release!

    2. Re:Recursion by nick255 · · Score: 2, Interesting

      Yes the proof is quite a simple application of the famous halting problem proof.

      Imagine you made the program go into an infinite loop whenever the program it was analysing did not have an infinite loop.

      Them run the program on itself......

  39. Wrong Math by bstadil · · Score: 4, Insightful
    You got the math reversed

    The longer and more content you have per line the higher the likelyhood of error/ line.

    As example with one errror in 100 lines you get 1% error. Imagine you could do the whole thing in one line. Now you have 100% error.

    --
    Help fight continental drift.
    1. Re:Wrong Math by BigBadDude · · Score: 2, Informative

      yeah, that was actually my point. nice someone got it :)

      The source of most free software [KDE is an exception] tend to be smaller, more readable and more effective. Ever wondred why winword.exe is 10.598.984 bytes?

  40. In other news... I have begun testing by teamhasnoi · · Score: 4, Funny
    Apache 4.2 Alpha, a release that is yet to be even a twinkle in it's Daddies' eyes. I have found a whole bunch of errors, bad comments, a few scribbles on napkins, some old Populous save games, and a letter to 'Mom' asking for money.

    I compared this to my 'other' server, for now unnammed.

    My 'other' server brought me coffee, 2 pieces toast, 2 eggs OVER EASY, 4 strips of bacon, *and* Smucker's Grape Jelly with nary a mistep, or hesitation. This other server smiled, asked how my wife was, and brought me a new fork when I dropped my first one.

    Congratulations, Gloria! You win the 'great server' award!

    This article isn't worth the 2 dollar tip.

  41. Here's an idea by Daath · · Score: 4, Funny

    Why doesn't Reasoning fill the niche, and code a completely error free web server? They know other peoples mistakes, so they should know how to code an error free one.
    Well, seriously, I wouldn't put much in their obvious estimation.

    --
    Any technology distinguishable from magic, is insufficiently advanced.
  42. Don't assume IIS by m00nun1t · · Score: 5, Insightful

    Ok, IIS is the obvious choice as being the second most popular web server after Apache. But I hardly think Microsoft will be letting these guys all over the IIS source code.

    It could also be Zeus, SunOne or one of the other lesser known web servers out there.

  43. Apache 2 is not Apache 1 by defile · · Score: 2, Insightful

    The test may be more interesting if applied to Apache 1. As someone who has had to migrate a mod_perl site from Apache 1 to Apache 2, I can tell you that Apache 2 is a very new beast, and it doesn't shock me at all that there are dozens of bugs that still need to be shaken out. Fewer users are running Apache 2 in a production environment as well, since it's considered a development branch. See less eyeballs rule.

  44. Defect Details by Eustace+Tilley · · Score: 5, Informative
    Interested persons can download the full defect report free of charge.

    Some things I found interesting:
    1. Apache 2.1 (dev) is a mere 76,208 LOC.
    2. No memory leaks detected
    3. 29 NULL pointer dereferences
    4. 2 Uninitialized variables
    5. No bounds errors, no bad deallocs
    6. otherchild.c had a rate of 7 NULL pointer dereferences per 1000 KSLC


    7. One of the explanations (given by Reasoning) for a NULL pointer dereference is "can occur in low memory conditions," which I think means the original allocator did not check for malloc failure.

      So you can get a sense of what a defect looks like, here is #21. The orignal uses bold and fonts improve readability, but I don't know how to reproduce that in slashcode:
      DEFECT CLASS: Null Pointer Dereference

      DEFECT ID 21

      LOCATION: httpd-2.1/srclib/apr/misc/unix/otherchild.c : 137

      DESCRIPTION The local pointer variable cur, declared on line 126, and assigned on line 128, may
      be NULL where it is dereferenced on line 137.
      PRECONDITIONS The conditional expression (cur) on line 129 evaluates to false.
      CODE FRAGMENT
      124 APR_DECLARE(void) apr_proc_other_child_unregister(void *data)
      125 {
      126 apr_other_child_rec_t *cur;
      127
      128 cur = other_children;
      129 while (cur) {
      130 if (cur->data == data) {
      131 break;
      132 }
      133 cur = cur->next;
      134 }
      135
      136 /* segfault if this function called with invalid parm */
      137 apr_pool_cleanup_kill(cur->p, cur->data, other_child_cleanup);
      138 other_child_cleanup(data);
      139 }
  45. Defects and maturity of code base by the+eric+conspiracy · · Score: 4, Insightful

    This study makes a lot of sense to me - that the defect rate is tied to the maturity of the code base. I have long felt that Microsoft's business model where they redo the operating system in order to churn their user base and induce cash flow will always result in more defects and security problems than a model where software change is driven on a solely technical basis.

    I think the next step for these folks would be to take a project that has a long history, say perhaps Apache 1.x and show defect rates over the life of the project.

  46. Null dereferences and uninitialized variables by ByTor-2112 · · Score: 2, Informative

    29 possible "null dereferences" and 2 possible "uninitialized variables". Some of them are simple "fail to check return value of malloc() for null", and others are not bugs in the code but bugs in the logic of the scanner. This is, of course, a precursory review of their document. All in all, these are absolutely minor bugs if they are real at all.

  47. Having read the reports.. by David+McBride · · Score: 4, Insightful

    Well, the reports simply state that, in the 360 files they checked (most of them header files) they found 29 cases of a potential NULL pointer dereference and 2 potentially uninitialized variables. This is from the Apache 2.1 codebase as of 31st Jan this year, about 58k lines of code.

    Their automated checker also searched for out-of-bounds array accesses, memory leaks, and bad deallocations. It found none.

    They also state that they ran the same checks against other codebases, and found that they did marginally better, on average.

    In short, this report says that OLD development code for an unreleased opensource project is nearly as good as current commercial offerings. That's at best, when you consider the huge gamut of possible defects that this checker won't pick up. That margin probably disappears in the +/- of the sampling if you were to do a proper statistical analysis.

    The report is fairly useless. It certainly should not be taken as a reason to not trust Apache; to do so would be foolhardy particularly given Apache's track record.

    Oh, and Reasoning's webserver is being pounded into the ground. You can get my local copy of the reports from here.

  48. That was not the conclusion: RTFA by arrogance · · Score: 2, Interesting

    As others have stated, the article states that "the difference in defect density between the two was not significant." Meaning that defect density, especially with such a small differential, has little bearing on the overall quality of the software. We know nothing of the severity, impact, etc of the defects: they could all be cosmetic for all we know. This is probably nothing more than a marketing strategy by Reasoning: publish a study without any details on a hotly debated topic and see how many people check out their site. It'd be nice if they had a downloadable version of their software to test drive.

    FxCop is an example of a "defect" or code analysis tool. While I have NO idea of Reasoning's methodology, I know that with FxCop (which is specifically for .NET code analysis), you have to set it up to filter out the majority of its rules or you'll get 3000 instances of "You didn't name this variable the way MS says you're supposed to." FxCop is extensible though. The point is, not a single poster on this page (unless they work for the companies involved) knows what Reasoning's methodology or rule set was when they did this so we can glean virtually zero value from this analysis. I look forward to 600 anti-Microsoft posts because of it though....

  49. Re:Code defects appear to be a small part of the e by jdh-22 · · Score: 5, Insightful
    Every hacker on the planet has full access to the code - which means that they can review it and find vulnerabilities in it. Not many people have access to Windows or IIS code.
    To quote Bruce Schneier: "If I had a letter, sealed it in a locked vault and hid the vault somewhere in New York. Then told you to read the letter, thats not secruity, thats obsecurity. If I made a letter, sealed it in a vault, gave you the blueprints of the vault, the combinations of 1000 other vaults, access to the best lock smiths in the world, then told you to read the letter, and you still can't, thats security." Open source does have an upper hand on holes and bugs, but the code isn't where we should be looking.

    The majority of the secruity holes are from the people setting up the web servers. The holes are usually abused by "wanna-be" hackers, or script-kiddies. The problem is that people are not educated enough to run some of these programs. Being able to understand Apache, and how to make it operate correctly is not everyone's top priority. As long as it works, people don't care how it works (as goes for many other things in this world).
    --
    Every Super Villan uses Linux.
  50. sorry, but thats pure BS... by BigBadDude · · Score: 3, Informative


    One of the explanations (given by Reasoning) for a NULL pointer dereference is "can occur in low memory conditions," which I think means the original allocator did not check for malloc failure.


    appache got its own malloc() that kills the child (and closes connection) if it fails to allocate enough bytes.

    1. Re:sorry, but thats pure BS... by Eustace+Tilley · · Score: 2, Informative
      Hmm, Defect 10 is a little trickier:
      DEFECT CLASS: Null Pointer Dereference DEFECT ID 10
      LOCATION: httpd-2.1/modules/mappers/mod_negotiation.c : 2495
      DESCRIPTION The local pointer variable arr, declared on line 2349, and assigned on line 2365, may be NULL where it is dereferenced on line 2495. This NULL pointer dereference only happens in an Out Of Memory context.

      PRECONDITIONS The conditional expression (neg->send_alternates && neg->avail_vars->nelts) on
      line 2364 evaluates to true AND
      The function apr_array_make, called on line 2365, returns NULL AND
      The conditional expression (neg->send_alternates && neg->avail_vars->nelts) on
      line 2494 evaluates to true.

      CODE FRAGMENT
      2336 static void set_neg_headers(request_rec *r, negotiation_state *neg,
      2337 int alg_result)
      2338 {
      ...
      2349 apr_array_header_t *arr;
      ...
      2364 if (neg->send_alternates && neg->avail_vars->nelts)
      2365 arr = apr_array_make(r->pool, max_vlist_array, sizeof(char *));
      2366 else
      2367 arr = NULL;
      ...
      2494 if (neg->send_alternates && neg->avail_vars->nelts) {
      2495 arr->nelts--; /* remove last comma */
      2496 apr_table_mergen(hdrs, "Alternates",
      2497 apr_array_pstrcat(r->pool, arr, '\0'));
      2498 }
      2499
      2500 if (neg->is_transparent || vary_by_type || vary_by_language ||
      2501 vary_by_language || vary_by_charset || vary_by_encoding) {
      2502
      2503 apr_table_mergen(hdrs, "Vary", 2 + apr_pstrcat(r->pool,
      2504 neg->is_transparent ? ", negotiate" : "",
      2505 vary_by_type ? ", accept" : "",
      I traced through the code on lxr.webperf.org and it appears that pool_alloc can return NULL.

      Is the idea that this code will never be executed in an out-of-memory condition, because it is only executed by a child, and the child dies automatically on malloc failure?
  51. It's all in how you calculate a defect by sterno · · Score: 3, Insightful

    The thing that always kills IIS, is the integration it has with Windows. This isn't a defect in IIS, or Windows, per se, but rather a defect that arises because of how they integrate with eachother. A script executes on IIS in a way that's not inately a bug, but then when it interacts with Windows, Exchange, etc, suddenly it becomes one.

    Apache is just a webserver, and that's all. PHP, JSP, etc, are all separate applications treated separately. The integration does make things more efficient, yes, but also more prone to problems.

    --
    This sig has been temporarily disconnected or is no longer in service
  52. Re:Code defects appear to be a small part of the e by AftanGustur · · Score: 2, Interesting


    Is IIS just inherinetly insucure because it is used on a Windows platform? Is it because hackers generally target IIS and not Apache (most people will rush to this conclusion)?

    Microsoft will try to make people belive whatever is in their interests .. Even if it means contradicting themselves ..

    Last Friday Microsoft called all their Premier customers in France with "information" related to the upcoming "hackerfest" last Sunday.

    According to Microsoft mostly Unix and Linux servers would be the target of the hackers but it did not exclude IIS Web servers to come under attack.

    The FUD coming from MS is absolutely unbeleavable..

    --
    echo '[q]sa[ln0=aln80~Psnlbx]16isb572CCB9AE9DB03273snlbxq' |dc
  53. Something is wrong here... by XaXXon · · Score: 2, Insightful

    I have to play the BS card here.

    There is no magic "defect detector" for software. If there was such a thing, they would be making a helluva lot more money than they get for doing little defect tests.

    It is very difficult to prove a program to be correct, and there's a lot of REALLY smart people who have tried.

    Maybe these people have stuff than can look for buffer overflows and stuff, but actually being able to tell if Apache is returning the correct results requires far more than generic tests.

    And I'll all but guarantee they didn't get together an entire development team to understand the code base and how it works as apache is a very large and complex code base.

    Maybe they take what the find for their generic tests and extrapolate that if they find more generic problems there are probably more specialized errors as well, but they make it very clear in the report that the difference between .51 and .53 defects / KLoC (thousand lines of code) is statistical noise.

    Anyways, I'm not saying the entire thing is worthless, just not to read too much into it -- either this one that puts Apache slightly behind some unnamed commercial implementation or the one that put the Linux TCP/IP stack ahead of some other commercial implementation (though I'd say it would probably be easier to test a TCP/IP for correct behaviour than a web server).

  54. Re:Confuse with Linux? by bofkentucky · · Score: 2, Informative

    now they do, 2.0.x are stable, production releases 2.1.x are testing branches

    --
    09f911029d74e35bd84156c5635688c0
  55. Here are the links to the defect reports by arrogance · · Score: 5, Informative
    Defect Report

    Metric Report

    They make you fill out a form that asks for your email and then do an opt out checkbox at the bottom of the form (you have to check it to NOT get spam from them). The site's a bit slashdotted right now though.

  56. Re:Magic software by Eustace+Tilley · · Score: 2, Insightful
    Ok, pretend you are the magic software and you see this code:
    int ar[50];
    for (int i = 0; i<=50; i++) { ar = 1;}
    How are you going to "automatically" fix that? Change the comparison operator? Change the array size? Replace the loop with a library function?

    "Fixing" requires understanding the code's intent.
  57. This is a dupe by presroi · · Score: 2, Informative

    This Slashdot-Posting was featuring the same PR from Reasoning.

  58. Lies, damned lies, and statistics by UnknowingFool · · Score: 4, Insightful
    Numbers can mean anything. It's the interpretation that matters. 31 errors in 58,944 lines. Hmmm. Even if we take Reasoning's word that these are errors and not "features", that's 0.53 error rate. The unnamed commercial software had an error of 0.51. So what does that prove?

    1) Apache 2.1 has more bugs than some unknown commercial competitor. If the version is correct, a development (not-ready-for-release) build was pitted against a released commercial build. Not fair playing ground.

    2) Reasoning does not detail the severity or kind of the bugs. Certainly, a web server not being able to handle a type of format (pdf, csv, ogg vorbis) is less severe than a security hole. Pitted against IIS, I would trust Apache even if it had more bugs, because historically it has had fewer security patches. Check out Apache's 2.0 known patches vs IIS 5.0

    --
    Well, there's spam egg sausage and spam, that's not got much spam in it.
  59. Re:every program. by lucas_gonze · · Score: 2, Insightful

    that's not just reductio ad absurdem, it's actually useful. you should always write the least code possible, and since features mean code, you should have as few features as you can get away with.

  60. Re:Code defects appear to be a small part of the e by MisterFancypants · · Score: 2, Insightful
    Every hacker on the planet has full access to the code - which means that they can review it and find vulnerabilities in it.

    Do you know how long it takes to read someone else's code on something like an Apache-level webserver and understand it to the point where you can make useful changes and fixes? The big lie of the "all bugs are shallow" argument is that such a thing is simple, when in fact it is not.

    Fixing a non-obvious bug in a 100k or so line C or C++ project is hard enough when you wrote the code yourself. If someone else wrote the code, it is harder still.

  61. RTFAdvertising by tanguyr · · Score: 4, Insightful

    As has been pointed out a couple of times in other comments, 2.1 is the development branch of the Apache web server - ie "beta", "buggy", "work in progress", etc. etc. In stead of reading this as "Apache has roughly as many defects as closed source web servers" let's read this as "the development version of Apache has as many defects as... well, some unidentified (beta? shiping?) version of some unknown (iPlanet? IIS?) web server". But you can be *much* more confident that these defects will be fixed in Apache than in the *other* product.

    Heck, forget confidence - YOU CAN JUST CHECK.

    The fact that Reasoning didn't have to go and get permission from Apache to run this test - coupled with the fact that we don't even know what Apache is being compared to - is the *real* point behind this "article". /t

    ps: IANAL but don't they have to include a copy of the Apache License given that they publish fragments of the source code in their defect report?

    --
    #!/usr/bin/english
  62. Some "defects" aren't really... by peerogue · · Score: 2, Interesting

    Look at defect ID #26 in the report.

    You'll see that this can only happen when nItems is 0. This means that if a pre-condition was added to the routine tsort() that the nItems argument MUST be strictly positive, defect #26 vanishes.

    If I'd put:

    assert(nItems > 0);

    at the routine entry, it would prevent the further null-pointer dereference and spot the bug immediately when it occurs. I'm not sure how well a web-server crashing would be perceived, but that would not be worse as a kernel panic'ing, and there is indeed a potential bug there.

    My point is that to call #26 a defect (or not), we'd have to check all the callers, and if all the callers were to guarantee that nItems is strictly positive, then there would be no bug at all.

    Apart from this remark, I think that kind of work is really great. I'd love to see it applied to my favorite open-source Linux Gnutella client (all Gnutella clients are by definition an HTTP client/server). We'd see how a small open-source project compares to a big one.

  63. sco! by Anonymous Coward · · Score: 2, Funny

    The lower defect rate in Linux TCP/IP can only be explained by a large chunk of more mature, commercial, stable SCO UNIX code.

  64. Defect is too strong a word... by Bazman · · Score: 4, Insightful
    Take the null pointer dereferencing thing. All this program seems to do is see if there's a possible path for null-pointer dereferencing. It has no clue as to whether this is logically going to happen. For example:
    2815 while (1) {
    2816 ap_ssi_get_tag_and_value(ctx, &tag, &tag_val, 1);
    2817 if ((tag == NULL) && (tag_val == NULL)) { 2818 return 0;
    2819 }
    2820 else if (tag_val == NULL) {
    2821 return 1;
    2822 }
    2823 else if (!strcmp(tag, "var")) {
    2824 var = ap_ssi_parse_string(r, ctx, tag_val, NULL,
    2825 MAX_STRING_LEN, 0);
    The software claims that tag could be null on line 2823. But thats only if on return from ap_ssi_get_tag_and_value that tag is a NULL pointer and tag_val is non-NULL. If ap_ssi_get_tag_and_value cant return these conditions then this is not a defect. If anything its a red flag, in case the return values of ap_ssi_get_tag_and_value could satisfy that condition.

    I suspect the following code will be flagged as a defect:

    char *tag=NULL;
    doOrDie(&tag);
    strcmp(tag,"do");
    as long as doOrDie() does its job and never returns a NULL then where's the defect? The guys who wrote this tester seem to want you to check any pointer dereferencing against NULL before use - I might be doing this in my doOrDie() function, I dont want to have to do it twice.
    1. Re:Defect is too strong a word... by DrInequality · · Score: 3, Interesting
      Defect is way too strong. Take Defect 1. Can only possibly derefence a NULL pointer if a number of preconditions are true. The last one is (!conf->providers)[the pointer in question] must be false.

      !!conf->providers => conf->providers => conf->providers != NULL

      Their program has detected "defects" where there are none. Perhaps the greater coding style variation on open source projects exposes more defects in their automated program!

  65. Re:Code defects appear to be a small part of the e by AftanGustur · · Score: 2, Interesting


    Maybe that's because the majority of web servers are running on Unix/Linux?

    True, but according to statistics 56% of defaced webservers run Microsoft IIS, and (only) 34% Apache..

    This is not brand new data, but it is the latest I can find ... And If Microsoft had some stats showing different results, you can be sure they would publish them..

    The competition was about defacing 6000 webservers in 6 hours, so one would tend to conclude from the above that Microsoft IIS would be the primary targets..

    --
    echo '[q]sa[ln0=aln80~Psnlbx]16isb572CCB9AE9DB03273snlbxq' |dc
  66. Useless information presented confusingly by albin · · Score: 4, Interesting

    Slashdot's summary of this article is way off base, and the article itself couldn't be less useful. Counting the number of "errors" in lines of code... and the ratio is supposed to mean something to us? As compared to unnamed other software? C'mon, I have better things to do with my time.

    *plonk*

    --
    A hen is only an egg's way of making another egg. -- Samuel Butler
  67. OSS Standards by pmiller396 · · Score: 2, Insightful

    Okay, we've beat to death the fact it was a pre-release version. But look at it this way:

    When Open Source software is about the same quality as closed source, the developers consider it unstable and warn people that they may run into problems.

    It shows a big difference, to me, in the quality standards that OSS developers (and users) expect.

  68. Null pointers and uninitialized variables by mystran · · Score: 2, Insightful
    I don't know, probably some of these defects might be actual problems, but unless the software is real good, it's always possible that certain cases never happen, although automatic software can find "defects".

    As a rather "stupid" example, I had to initialize a Map to an empty HashMap just last week to get Sun's Java compiler accept my code, although the only two references to the Map where within two if-blocks, within the same function, both of which depended on the same boolean value, which wasn't changed in the whole function.

    There's a difference between defect and a bug. Tools that help in finding problems are great, but after all, they can only point possibly unsafe points. Ofcourse it's good to write code that doesn't trigger any such possibilities in the first place.

    --
    Software should be free as in speech, but if we also get some free beer, all the better.
  69. Re:Code defects appear to be a small part of the e by bwt · · Score: 3, Interesting


    One of the best ways to get to know a large code base like Apache or something else is to find a repeatable bug and track it down. To fix a bug you do not need to understand the whole program, just the relevent parts. I've submitted bug fixes to several projects, so I must strenuously disagree, especially because, ahem, I have never submitted a bug fix to a proprietary project because its impossible.

  70. Thank you, Captain Obvious by Sxooter · · Score: 2, Funny

    Well this certainly falls under the "duh" category. Freshly written code tends to have fewer bugs than older, well reviewed, well tested code.

    Wow, next we'll learn how you shouldn't buy any Ford, GM, or Chrysler product in the first year of production.

    --

    --- It is not the things we do which we regret the most, but the things which we don't do.
  71. Coding errors & program logic errors by MROD · · Score: 3, Insightful

    Of course, this test of the code is purely a test of coding errors rather than errors in the code logic.

    The most worrying errors in programs are generally not coding errors as they are either terminal (ie. crash) or they are benign (the error may cause memory corruption in a place where it does no harm). Of course, there are exceptions such as buffer overflows, but I'd class those, in general, into the logic error category.

    Logic or algorythmic errors are far more dangerous as they can be well hidden and are more likely to make the code do things unintended. The code itself may be perfect but if the algorithm is faulty then there's a major problem.

    --

    Agrajag: "Oh no, not again!"
  72. Re:FACT: 3 is a larger number than 2 by bwt · · Score: 2, Insightful

    I agree completely. Any metric based on Lines of Code anything is a harmful metric. Any metric based on defect counts is also harmful. Both of these are left-overs from attempts to (mis)-apply statistical process control. Control of crappy metrics give crappy quality.

    Suppose I had 100K lines of code with 100 defects. After reviewing my code I discovered that I could refactor it to 80K lines and suppose further that doing so had no effect on the defect count. Defects per line of code would look worse after an improvement.

    Also, given that this is an automated program, I have to ask how they calibrate and validate its results. How many of the 32 errors found actually aren't errors? How many existing known bugs were not found by this program. I really can't accept these results as anything more than fluff with numbers.

  73. Development release by Door-opening+Fascist · · Score: 3, Insightful

    Why did they use the development branch of Apache, when only a handful of sites are running it? I would have found an analysis of the stable 1.3 branch, which 60% of the web-serving world uses, to be more informative.

    1. Re:Development release by sabat · · Score: 3, Insightful

      Why did they use the development branch of Apache

      Let me restate this: why are they comparing pre-alpha software with production releases?

      Most simple answer: because they wanted to find flaws. The second most popular web software is ISS. This looks like a Microsoft tactic: anonymously hire this company to "evaluate" code so that the results look unbiased. Everyone will likely realize that the competitor is Microsoft's ISS, so it doesn't need to be stated bluntly. MS wins; another (small) battle for mindshare is won.

      --
      I, for one, welcome our new Antichrist overlord.
  74. BINGO by Anonymous Coward · · Score: 3, Informative

    In almost every case they listed the pathway was via a failed malloc.

    Apache has it's own malloc that kills the connection (and the child) if it fails.

    That code can never be reached. Their test is invalid.

  75. Apache 1.3? by Spazmania · · Score: 4, Interesting

    First, as many posters have noted, Reasoning DID NOT TEST APACHE 2.1. They tested Apache 2.1-dev. That's dev, as in development branch. As in: I have new untested code, so don't use me on a production server until I'm released in the STABLE series.

    For a valid comparison versus commercial software, the testers should have used Apache 2.0.46, the most current STABLE series release.

    Second, I'd be interested to see a comparison of 2.0.46 versus 1.3.27. I have a pet theory that multithreaded C code has more bugs than single-threaded C code, and I'd like to see whether there is evidence to support it.

    --
    Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
    1. Re:Apache 1.3? by Piquan · · Score: 2, Insightful

      I keep hearing this, and I'm not convinced.

      I didn't see anything in the article about what versions of closed-source codebases they used for comparison. But I'd hypothesize that it's code that they've been contracted to analyze. That means it's probably development code in that event, too.

      We can't gritch about them using Apache 2.1-dev unless we have reason to believe they didn't compare againt dev versions. We can gritch about not having this information.

  76. Re:Code defects appear to be a small part of the e by jdh-22 · · Score: 3, Interesting

    You have the wrong idea here. There is a point in which you must realize what information you can release without comprimising the security of your system. While I can give you the plans to my vault, I will not give you the combination, nor the first or second numbers in it.

    For the star wars geeks out there, if you were a Jedi, you don't go around telling everyone you're a Jedi, nor do you flash your light saber in public places. They do realize when to show their light saber, and when they can tell people they are a Jedi. Nor do they not tell anyone who they are, or never show their lightsaber.

    You might want to check out Secrets and Lies which will give you a better understanding of security philosphy.

    --
    Every Super Villan uses Linux.
  77. Re:Code defects appear to be a small part of the e by johnnyb · · Score: 3, Informative

    Actually, I've found that fixing bugs in large projects is about the same whether or not you are familiar with the project, provided that the author was no smoking crack at the time he wrote it.

    For example, I managed to code, test, and patch a "fix" for PostgreSQL this weekend in under 2 hours, having never seen the code before.

    The "fix" wasn't a bug, per se, i't just that the output of pg_dump wasn't optimal in my usage for dumping the schema for CVS revision control. I added two flags, -m -M, which molded the output to my liking.

    If you haven't seen your code in two months, you and an outsider have about the same chance at finding and detecting bugs/misfeatures.

  78. Errors mean nothing... by Foofoobar · · Score: 2, Insightful
    Errors in coding mean next to nothing when it is a machine that is checking the syntax of your code. Variations in coding techniques that are perfectly acceptable often show up as errors merely because the program doing the code checking does not understand your syntax. I've seen it happen time and again with error checkers and one could even say that 2% of all errors found by error checkers are mere differences in syntax.

    My wife who is a lead QA tester could vouch for that...

    --
    This is my sig. There are many like it but this one is mine.
  79. Hmm, the first claim seems to be wrong... by marcink1234 · · Score: 2, Insightful

    I have just read the first 'null dereference' claim and it seems to me that in fact it is not possible. Maybe we got amount of reasoning bugs?

  80. Re:Code defects appear to be a small part of the e by schon · · Score: 3, Insightful

    Every time I hear the "obscurity is not security" mantra I chuckle. Of course it isn't, but that doesn't make publishing the information a good idea.

    Nobody's saying that the information should be published - what they're saying is that you can't rely on that information being a secret.

    Is Fort Knox secure? Probably. If so, then why don't they publish the blueprints, guard rotation schedule and security policies?

    That's pretty much the point you're missing - even if that information was published, it wouldn't diminish the security of Fort Knox..

    If the people in charge relied on the fact that they don't publish those details, that would be obscurity, because it would lead them to make errors elsewhere. (Oh, it's OK if we leave the main vault open tonight - nobody knows that there will be no guards around it for 10 minutes at 3:30 AM tonight.)

  81. Re:Code defects appear to be a small part of the e by aziraphale · · Score: 5, Interesting

    One word: architecture.

    And not just the architecture of the web server, but the architecture of the entire platform. But specifically looking at the architecture of Apache versus the architecture of IIS, you'll immediately see that the goals of the two pieces of software are not the same. Look at things like IIS's metabase - the structural details of the server's configuration are kept in an in-memory data structure, which is easily modified while the server is running. Apache, in contrast, reads its configuration at startup, and uses it to determine which modules of code are loaded, and how they are used to process requests - fixing the behavior of the web server at startup.

    IIS follows typical MS enterprise software design - it has to interface with COM, and the NT security model, and active directory, and the registry, and a million other systems, all in the name of integration, and enterprise management. Apache doesn't have PHBs telling it that it needs another way for the metabase to be edited, or a new instrumentation API, or whatever else a particular large customer asked for - and can get on with just providing its facilities cleanly.

    That's why IIS has so many more security holes, even if it does (as may or may not be the case) have the same raw coding error rate as Apache.

  82. Microsoft C++ catches this. Doesn't gcc? by Phronesis · · Score: 2, Informative
    This lets the compiler catch errors where you meant '==' rather than just '='.

    MY compiler (Microsoft C++) does catch this

    if (myPointer = NULL) { ... }
    and issues a warning. Doesn't gcc?
    1. Re:Microsoft C++ catches this. Doesn't gcc? by Rasta+Prefect · · Score: 4, Informative
      This lets the compiler catch errors where you meant '==' rather than just '='.
      MY compiler (Microsoft C++) does catch this

      if (myPointer = NULL) { ... }
      and issues a warning. Doesn't gcc?


      Yes, it does. So does every other C compiler I've ever used (quite a few). I suspect the original poster may be the sort who ignores warnings....

      --
      Why?
  83. Worst kind of science by AYEq · · Score: 4, Interesting

    Reasoning's code inspection service is based on a combination of proprietary technology and repeatable process.

    Am I the only one who looks at reasoning's results with suspicion (even when I agree with them). Any analysis using methods that are not open and repeatable is not science. This just feels like marketing to me. (it is sad because the study of code quality is such a worthwhile pursuit)

  84. what doesn't kill you makes you stronger by f00zbll · · Score: 2, Insightful

    The report hardley takes down OSS or Apache. The report is reasonable and doesn't over extrapolate about quality. For me, the report is encouraging because MS has something like 80 programmers working on IIS and apache is made up of volunteers with far fewer resources, that is pretty darn impressive for alpha code. I haven't looked at the list of active committers lately, but I know it's no where near 80. Draw your own conclusions.

  85. everyone is reading this wrong by Major+Tom · · Score: 2, Insightful

    There is no need to freak out about this being some sort of attack on open source software or agonize over what the unnamed commercial product used for comparison was.

    The article seems to indicate that the .51 error density for "commercial software" is talking about commercial software in the abstract. Presumably, this isn't the error density of some secret web server, but the average density of all the commercial products they've analyzed so far.

    This report is simply an attempt to prove a simple hypothesis about OSS: it gets increasinly refined as it matures.

    Reasoning believes they've proved the hypothesis because Apache, a middle-aged project, I suppose, has an error density comparable to commercial software, while the TCP/IP stack, a mature project, has a significantly lower density.

    This isn't inteded to be a comparison of web servers (come on, people, *of course* they didn't have access to IIS) it is intended to be a mildy interesting observation about the life-cycle of open source software.

    It would be a lot more interesting if we could see an analysis of whether or not commercial software goes through a similar maturing process. Maybe commercial products also grow refined with age. Maybe not. If so, which matures faster?

    --
    What's good for the syndicate is good for the country. --Milo Minderbinder
  86. This sooo does not matter by LilMikey · · Score: 2, Insightful

    This is a pointless study. While yes, the slight possibility that one may dereference a NULL pointer is a bad thing it's miniscule compared to bad design. A perfectly programmed web server designed poorly will have bazillions more bugs and security flaws than a slightly bugged well-designed one. An objective code scanning bug-finder can't fix stupid.

    --
    LilMikey.com... I'll stop doing it when you sto
  87. prove it. by Mark19960 · · Score: 4, Interesting

    they dont say what they used for a comparison.
    when they tell us what they used, then I will believe it.
    this smells microsoft.

    bring it on! we want to know what it was compared against, sure as hell was NOT IIS...

  88. within the statistical margin of error by dh003i · · Score: 2, Insightful

    0.53 errors per 1000 for Apache, vs. 0.51 per 1000 for "commercial equivalents" (note, that they fail to say how many equivalents were used to generate the average, nor which ones)? That's definately within the margin of error. Not only that, but Apache is a less mature FS/OSS project, so the comparison seems to favor the FS/OSS model.

    Furthermore, while presumely many commercial equivalents were used to generate the commercial average, only one Apache was used to generate the FS/OSS average error density. Again, very crappy statistics.

    Even if 100 different FS/OSS projects like Apache and Apache were used to generate that 0.53 average, and 100 different commercial equivalents used to generate the commercial average, it's probably still within the margin of error (or standard deviation).

    In short, this study = completely insignificant. Likewise, so was their previous study showing that FS/OSS has a lower bug-density, as it only used one FS/OSS project. To get useful statistics, you need hundreds of data-points -- not one.

  89. The first "defect" is provably not a defect at all by Anonymous Coward · · Score: 3, Informative

    Looking at their first "bug", a little manual inspection shows that it's in the "can't happen" category, even without knowing about hidden information. The code looks like this:

    current_provider = conf->providers;
    do {
    {some safe code}
    if (!conf->providers) {
    break;
    }
    current_provider = current_provider->next;
    } while (current_provider);

    and they identify the second-to-last line as the "possible NULL pointer reference". Note that the "break" before that line will be taken if the pointer is NULL, so it can't happen. In fact, the static analysis could have determined this if it were a little better at propagating values.

    First conclusion: subtract at least one "bug" from the 31 defects in Apache. This lowers the rate to 0.51, the same as the "average commercial code" number they quote. Yahoo!

    Second conclusion: their static analysis must identify a lot of false positives, if the very first one in the list is one (I would look at more, but I should really get back to work...)