Slashdot Mirror


Software Code Quality Of Apache Analyzed

fruey writes "Following Reasoning's February analysis of the Linux TCP/IP stack (putting it ahead of many commercial implementations for it's low error density), they recently pitted Apache 2.1 source code against commercial web server offerings, although they don't say which. Apparently, Apache is close, but no cigar..."

37 of 442 comments (clear)

  1. Code defects appear to be a small part of the equa by mao+che+minh · · Score: 4, Insightful
    I suppose now we have to question the severity of the defects (and also factor in the implementation and use of the code). If Apache and, say, IIS are roughly equivalent in terms of code defects, you have to ask yourself "well, why does IIS have so many more general problems and security flaws then Apache, when they both carry the same general amount of coding defects?". Is IIS just inherinetly insucure because it is used on a Windows platform? Is it because hackers generally target IIS and not Apache (most people will rush to this conclusion)?

    But here's the kicker: the vast majority runs Apache on either BSD or Linux. All of this code, from the kernel to the library that tells Apache how to use PHP, is open source. Every hacker on the planet has full access to the code - which means that they can review it and find vulnerabilities in it. Not many people have access to Windows or IIS code. So why does IIS and Windows come out as far less secure, and is exploited so much more?

    I think the answer lies in the severity of the code defects, and the architecture and design of the operating system that powers the web server. And yes, I know that Apache can run on Windows.

  2. Wait a second by Knife_Edge · · Score: 3, Insightful

    Has Apache 2.1 been released as a stable, non-developmental release? If not I would say testing it for defects is a bit premature.

  3. 2.1 ? by Aliencow · · Score: 4, Insightful

    Wouldn't that be unstable? I thought the latest was 2.0.46 or something.. If I'm not mistaken, it would be a bit like saying "Freebsd 4.8 has less bugs than Linux 2.5!"

  4. What do reasoning do? by SystematicPsycho · · Score: 4, Insightful

    So basically they offer a service like lclint only many times more advanced ? What is to say they haven't missed anything?

    This is probably a publicity stunt for them although a good one. I think it would be a good idea for them to sell software suites of their product if they don't already.

    --
    Analytic & algebraic topology of locally Euclidean meterization of infinitely differentiable Riemmanian manifold
  5. FACT: 3 is a larger number than 2 by TheRaven64 · · Score: 4, Insightful
    Hmm, so they looked at 58,944 lines of code, and found 31 defects? Did they find every defect? Can they prove this? What about those found in commercial code? If it were possible to find all of the defects in a piece of code this big in a small amount of time, then there would be no defects, since they would all be identified and fixed before release.

    As far as I can see, this article says 'We have two arbitary numbers, and one is bigger than the other. From this we deduce that Apache is not as good as commercial software.'

    --
    I am TheRaven on Soylent News
  6. Apache 2.1...? by bc90021 · · Score: 4, Insightful

    According to Apache.org, Apache's latest stable version is 2.0.46. Is that a typo on their part, or are they testing a development version? Also, since 1.3.27 is widely used, it would have been interesting to see how that stacked up as well, having been developed longer.

    Either way, to have only 31 errors in close to 60,000 lines of code is impressive!

    1. Re:Apache 2.1...? by jbp4444 · · Score: 3, Insightful

      I was quite impressed by the fact that Apache can cram all the functionality into ~59k lines. So besides defect rate, I would like to know how many lines of code the commercial package had ... 0.51 defects per 1000 lines sounds good, unless there are 1,000,000 lines more code in the commercial package.

  7. "Defect Density"? by sparkhead · · Score: 4, Insightful
    A key reliability measurement indicator is defect density, defined as the number of defects found per thousand lines of source code.

    Since LOC is a poor metric, a "defect density" measurement based on that will be just as poor.

    Yes, I know there's not much else to go on, but something along the lines of putting the program through its paces, stress testing, load testing, etc. would be a much better measurement than a metric based on LOC.

  8. more to it than # flaws-per-unit-"whatever" by Asprin · · Score: 5, Insightful


    What bothers me about these articles is that there is more to software quality than the # of flaws-per-unit-"whatever".

    Like design.

    It seems to me most of the problems with Apache's main competitor in terms of software quality are the result of design and engineering choices made by MS's IIS development team.

    In other words, it does exactly what they designed it to do, but what they designed it to do was a very bad idea.

    --
    "Lawyers are for sucks."
    - Doug McKenzie
  9. No cigar, my ass. by KFury · · Score: 5, Insightful
    The article claims Apache's error density, based on a meager 5100 lines of code, is 0.53, while that of 'comparable commercial applications' is 0.51.

    The problems with this are:
    • 5100 lines of code does not give you a confidence range of less than 0.02, especially when the error rate can be expected to be heterogeneous across the code base, as would be the case in an open-source product where different code pieces are created by entirely different groups.
    • 'Comparable' my ass. If they can't provide details of what software they're comparing to (I somehow doubt they got a look at IIS source code) then the stats are worthless, because anyone who's ever programmed knows that quality control isn't a constant factor across commercial products any more than it is among open-source products.
    • What's the error rate of their 'defect analysis'? If they're so good at finding defects, why aren't they out there writing perfect software? If their defect detection rate is less than 98% accurate, then the difference between a rate of 0.51 and 0.53 is meaningless anyhow.
    • There's a big difference between caught coding exceptions and fundamental security problems. The first can cause code to run a little slower, the second can destroy your company. This testing methodology doesn't even look at the second.
  10. what is a "software error"? by siskbc · · Score: 5, Insightful
    If Apache and, say, IIS are roughly equivalent in terms of code defects, you have to ask yourself "well, why does IIS have so many more general problems and security flaws then Apache, when they both carry the same general amount of coding defects?". Is IIS just inherinetly insucure because it is used on a Windows platform? Is it because hackers generally target IIS and not Apache (most people will rush to this conclusion)?

    First, are all of IIS's issues "software errors" per se? I'm wondering if all security problems would have been caught, or if that was really the goal of the analysis. Perhaps it was, but I'm not sure. One could contest that IIS has a lot of things unprotected, but that this doesn't constitute a software error.

    And as you say, severity would be another issue. It's always been typical open-source style to get the mission-critical parts hardened against nuclear attack, but leaving the other bits a tad soft. I wouldn't be surprised to learn that was the case with apache.

    One thing I want to know - did MS (or whoever) give these guys source or were they analyzing the binaries?

    --

    -Looking for a job as a materials chemist or multivariat

    1. Re:what is a "software error"? by Tony-A · · Score: 4, Insightful

      It's always been typical open-source style to get the mission-critical parts hardened against nuclear attack, but leaving the other bits a tad soft.

      IMNSHO, that ought to be standard for any mission-critical software. Bugs and the places that bugs live in are not created equal. The beauty of Apache (at least 1.13) is that the overall system can be very robust and reliable with rather buggy modules. I suspect the problem with IIS is that everything assumes everything else is perfect, which overall doesn't quite work so well.

  11. Dubious by cca93014 · · Score: 4, Insightful

    Is it just me that finds this entire concept of "code defects per 000 lines" sounding like a little bullshit?

    If the company has developed proprietary tools to enable them to identify defects in medium-sized software projects, which of the following business models do you think is more effective:

    1. Design proprietary tools to identify defects in medium-sized software projects.
    2. Fix defects
    3. Profit

    or

    1. Design proprietary tools to identify defects in medium-sized software projects.
    2. Sit around mumbling about defects, Open Source software, closed source software and why farting in the bath smells worse
    3. ???
    4. Profit

    Secondly, where on earth did they get hold of a closed source enterprise level (which Apache undoubtedly is) web server software codebase?

    "Hi, is that BEA? Do you mind if we take a copy of your entire code base so that we can peer review it against Apache's? What's that? Yes, Apache might come out on top, and we will make the results public..."

    How do they define a defect anyway? A memory leak? A missing overflow check? A tab instead of 4 spaces?

    It just sounds like bullshit to me...

  12. Different standards? by NotClever · · Score: 5, Insightful
    When the same group said that the IP stack in Linux was cleaner than a comparable one, everyone was screaming from the rooftops that it validated the open source model. When they say that an open source project and a closed source project are roughly comparable, all of a sudden everyone criticizes the methodology of the report!

    --
    Hell, there are no rules here. We're trying to accomplish something. - Thomas Edison
  13. automatically detected defects exclude security by brlewis · · Score: 5, Insightful

    Another post seems to indicate this was done via software to automatically detect defects. Many (most?) security defects cannot be detected automatically, as they involve using the software in an unintended way.

  14. So the error level in pre-release Apache ... by burgburgburg · · Score: 4, Insightful

    is equivalent to the error level in post-release commercial web serving software. Sounds like an endorsement to me.

    1. Re:So the error level in pre-release Apache ... by Kynde · · Score: 4, Insightful

      is equivalent to the error level in post-release commercial web serving software. Sounds like an endorsement to me.

      That, too, but I'm damn certain that they must have tried it on recent stable 2.0.46ish release aswell. The question is, why weren't those results made public?

      I'm guessing it's because the results were something that would've placed their "defect detection sw" into bad light. I.e. nothing as fancy as the forementioned "use of uninitialized variable" and "dereference of a NULL pointer" (which strikes really odd to me in the first place).

      Naturally the other explanation is endorsement. It would be so much not-the-first-time that I don't even bother... but I wouldn't bet that this is the case here, because the defect counts were only compared to production release code averages (which strikes me as the other extremely dubious part of this whole "experiment").

      --
      1 Earth is warming, 2 It's us, 3 it's royally bad, 4 we need to take action NOW
    2. Re:So the error level in pre-release Apache ... by yaphadam097 · · Score: 3, Insightful
      I've worked on open source projects and I've also worked in commercial development shops. I think that their findings are accurate but misleading:
      1. In my experience there are generally less bugs in pre-release code on a commercial project because there is a stronger culture of code ownership, and most if not all code is independently reviewed before being committed.
      2. There are generally a high number of defects in pre-release open source code, because developers commit early and commit often. Independent review happens more often in open source projects, but it usually happens after the code has already been committed to the dev branch (Before that, the geographically dispersed dev team has no access to it.)
      3. The quality of code released to production in a commercial environment is usually very similar to the quality of code in the development branch. Once it is reviewed and committed it enters a QA cycle where an independent team tries to find any bugs. At this point there is invariably strong pressure to release. So, bug fixes happen quickly and quality suffers (I've always found it ironic that we called this "Quality Assurance.")
      4. Once an open source project has been completed (Meaning all of the features have been developed) it enters a much longer period of code review, bug hunting, and alpha release. For a project like Apache it was over a year before anyone started to use 2.0 in production. Most commercial companies can't afford nearly that much "QA" time, because they are spending money to make money.
  15. Bad Statistics... by FunkZombie · · Score: 5, Insightful

    Also keep in mind that defect density is just an average. If you have 31 defects in 60k lines of code, that is potentially 31 security risks, or out-of-operation risks. If the other software tested had double the lines of code (120k), the density would imply that they had slightly less than double the defects, so say 58 or 60. That implies _58_ potential security or uptime risks. In this case, imho, defect density is not a good indicator of the reliablity of the software.

    My general rule is that if someone is quoting statictics to you, they are lying. At least on average. :)

  16. Actually the article suggests apache is better by sterno · · Score: 4, Insightful

    This doesn't indicate that the commercial equivalents are better. You've got the DEVELOPMENT branch of Apache, which is derrived from the 2.0.x code which is a complete rework from the original 1.X branch of code. So it's a rather new code base and it's showing similar defect rates to a code base that has been around for a while. I'd say this prooves that open source is better.

    --
    This sig has been temporarily disconnected or is no longer in service
  17. Wrong Math by bstadil · · Score: 4, Insightful
    You got the math reversed

    The longer and more content you have per line the higher the likelyhood of error/ line.

    As example with one errror in 100 lines you get 1% error. Imagine you could do the whole thing in one line. Now you have 100% error.

    --
    Help fight continental drift.
  18. Re:So if they found them... by dkh2 · · Score: 3, Insightful

    Sure, they found them but, did they catalog them in any way. .53/KLOC errors translates to approx. 1 error every 1886 LOC on average. On top of that, on further investigation, which of these are actual errors and which only look like errors?

    I'm just glad I'm not the poor go-coder who has to go through the code to find and fix these few "errors."

    --
    My office has been taken over by iPod people.
  19. Don't assume IIS by m00nun1t · · Score: 5, Insightful

    Ok, IIS is the obvious choice as being the second most popular web server after Apache. But I hardly think Microsoft will be letting these guys all over the IIS source code.

    It could also be Zeus, SunOne or one of the other lesser known web servers out there.

  20. Defects and maturity of code base by the+eric+conspiracy · · Score: 4, Insightful

    This study makes a lot of sense to me - that the defect rate is tied to the maturity of the code base. I have long felt that Microsoft's business model where they redo the operating system in order to churn their user base and induce cash flow will always result in more defects and security problems than a model where software change is driven on a solely technical basis.

    I think the next step for these folks would be to take a project that has a long history, say perhaps Apache 1.x and show defect rates over the life of the project.

  21. Having read the reports.. by David+McBride · · Score: 4, Insightful

    Well, the reports simply state that, in the 360 files they checked (most of them header files) they found 29 cases of a potential NULL pointer dereference and 2 potentially uninitialized variables. This is from the Apache 2.1 codebase as of 31st Jan this year, about 58k lines of code.

    Their automated checker also searched for out-of-bounds array accesses, memory leaks, and bad deallocations. It found none.

    They also state that they ran the same checks against other codebases, and found that they did marginally better, on average.

    In short, this report says that OLD development code for an unreleased opensource project is nearly as good as current commercial offerings. That's at best, when you consider the huge gamut of possible defects that this checker won't pick up. That margin probably disappears in the +/- of the sampling if you were to do a proper statistical analysis.

    The report is fairly useless. It certainly should not be taken as a reason to not trust Apache; to do so would be foolhardy particularly given Apache's track record.

    Oh, and Reasoning's webserver is being pounded into the ground. You can get my local copy of the reports from here.

  22. Re:Code defects appear to be a small part of the e by jdh-22 · · Score: 5, Insightful
    Every hacker on the planet has full access to the code - which means that they can review it and find vulnerabilities in it. Not many people have access to Windows or IIS code.
    To quote Bruce Schneier: "If I had a letter, sealed it in a locked vault and hid the vault somewhere in New York. Then told you to read the letter, thats not secruity, thats obsecurity. If I made a letter, sealed it in a vault, gave you the blueprints of the vault, the combinations of 1000 other vaults, access to the best lock smiths in the world, then told you to read the letter, and you still can't, thats security." Open source does have an upper hand on holes and bugs, but the code isn't where we should be looking.

    The majority of the secruity holes are from the people setting up the web servers. The holes are usually abused by "wanna-be" hackers, or script-kiddies. The problem is that people are not educated enough to run some of these programs. Being able to understand Apache, and how to make it operate correctly is not everyone's top priority. As long as it works, people don't care how it works (as goes for many other things in this world).
    --
    Every Super Villan uses Linux.
  23. It's all in how you calculate a defect by sterno · · Score: 3, Insightful

    The thing that always kills IIS, is the integration it has with Windows. This isn't a defect in IIS, or Windows, per se, but rather a defect that arises because of how they integrate with eachother. A script executes on IIS in a way that's not inately a bug, but then when it interacts with Windows, Exchange, etc, suddenly it becomes one.

    Apache is just a webserver, and that's all. PHP, JSP, etc, are all separate applications treated separately. The integration does make things more efficient, yes, but also more prone to problems.

    --
    This sig has been temporarily disconnected or is no longer in service
  24. Lies, damned lies, and statistics by UnknowingFool · · Score: 4, Insightful
    Numbers can mean anything. It's the interpretation that matters. 31 errors in 58,944 lines. Hmmm. Even if we take Reasoning's word that these are errors and not "features", that's 0.53 error rate. The unnamed commercial software had an error of 0.51. So what does that prove?

    1) Apache 2.1 has more bugs than some unknown commercial competitor. If the version is correct, a development (not-ready-for-release) build was pitted against a released commercial build. Not fair playing ground.

    2) Reasoning does not detail the severity or kind of the bugs. Certainly, a web server not being able to handle a type of format (pdf, csv, ogg vorbis) is less severe than a security hole. Pitted against IIS, I would trust Apache even if it had more bugs, because historically it has had fewer security patches. Check out Apache's 2.0 known patches vs IIS 5.0

    --
    Well, there's spam egg sausage and spam, that's not got much spam in it.
  25. Re:So if they found them... by MisterFancypants · · Score: 4, Insightful
    None of that bug report is at all useful if there is no logical way for all of those preconditions they listed to actually be met.

    I mean, yeah, it would be nice if code would explicitly check for a NULL before dereferencing, but if there's no earthly way for the pointer to actually BE a NULL pointer at that time (barring memory corruption -- in which case all bets are off and your code is doomed anyway) then I wouldn't call those errors.

    This whole exercise seems very suspect to me.

  26. RTFAdvertising by tanguyr · · Score: 4, Insightful

    As has been pointed out a couple of times in other comments, 2.1 is the development branch of the Apache web server - ie "beta", "buggy", "work in progress", etc. etc. In stead of reading this as "Apache has roughly as many defects as closed source web servers" let's read this as "the development version of Apache has as many defects as... well, some unidentified (beta? shiping?) version of some unknown (iPlanet? IIS?) web server". But you can be *much* more confident that these defects will be fixed in Apache than in the *other* product.

    Heck, forget confidence - YOU CAN JUST CHECK.

    The fact that Reasoning didn't have to go and get permission from Apache to run this test - coupled with the fact that we don't even know what Apache is being compared to - is the *real* point behind this "article". /t

    ps: IANAL but don't they have to include a copy of the Apache License given that they publish fragments of the source code in their defect report?

    --
    #!/usr/bin/english
  27. Defect is too strong a word... by Bazman · · Score: 4, Insightful
    Take the null pointer dereferencing thing. All this program seems to do is see if there's a possible path for null-pointer dereferencing. It has no clue as to whether this is logically going to happen. For example:
    2815 while (1) {
    2816 ap_ssi_get_tag_and_value(ctx, &tag, &tag_val, 1);
    2817 if ((tag == NULL) && (tag_val == NULL)) { 2818 return 0;
    2819 }
    2820 else if (tag_val == NULL) {
    2821 return 1;
    2822 }
    2823 else if (!strcmp(tag, "var")) {
    2824 var = ap_ssi_parse_string(r, ctx, tag_val, NULL,
    2825 MAX_STRING_LEN, 0);
    The software claims that tag could be null on line 2823. But thats only if on return from ap_ssi_get_tag_and_value that tag is a NULL pointer and tag_val is non-NULL. If ap_ssi_get_tag_and_value cant return these conditions then this is not a defect. If anything its a red flag, in case the return values of ap_ssi_get_tag_and_value could satisfy that condition.

    I suspect the following code will be flagged as a defect:

    char *tag=NULL;
    doOrDie(&tag);
    strcmp(tag,"do");
    as long as doOrDie() does its job and never returns a NULL then where's the defect? The guys who wrote this tester seem to want you to check any pointer dereferencing against NULL before use - I might be doing this in my doOrDie() function, I dont want to have to do it twice.
  28. Coding errors & program logic errors by MROD · · Score: 3, Insightful

    Of course, this test of the code is purely a test of coding errors rather than errors in the code logic.

    The most worrying errors in programs are generally not coding errors as they are either terminal (ie. crash) or they are benign (the error may cause memory corruption in a place where it does no harm). Of course, there are exceptions such as buffer overflows, but I'd class those, in general, into the logic error category.

    Logic or algorythmic errors are far more dangerous as they can be well hidden and are more likely to make the code do things unintended. The code itself may be perfect but if the algorithm is faulty then there's a major problem.

    --

    Agrajag: "Oh no, not again!"
  29. Re:So if they found them... by Anonymous Coward · · Score: 5, Insightful

    The funny thing is that this "bug" doesn't appear to actually be one...

    Note that current_provider is set to conf->providers on line 257. The loop starts and neither current_provider or conf->providers change. Then on line 287 there's a conditional break if conf->providers is NULL.

    If current_provider is going to be NULL at line 291, then conf->providers must be as well, so the conditional break will happen and the NULL dereference will be skipped.

    Or am I missing something else?

  30. Development release by Door-opening+Fascist · · Score: 3, Insightful

    Why did they use the development branch of Apache, when only a handful of sites are running it? I would have found an analysis of the stable 1.3 branch, which 60% of the web-serving world uses, to be more informative.

    1. Re:Development release by sabat · · Score: 3, Insightful

      Why did they use the development branch of Apache

      Let me restate this: why are they comparing pre-alpha software with production releases?

      Most simple answer: because they wanted to find flaws. The second most popular web software is ISS. This looks like a Microsoft tactic: anonymously hire this company to "evaluate" code so that the results look unbiased. Everyone will likely realize that the competitor is Microsoft's ISS, so it doesn't need to be stated bluntly. MS wins; another (small) battle for mindshare is won.

      --
      I, for one, welcome our new Antichrist overlord.
  31. Re:Code defects appear to be a small part of the e by schon · · Score: 3, Insightful

    Every time I hear the "obscurity is not security" mantra I chuckle. Of course it isn't, but that doesn't make publishing the information a good idea.

    Nobody's saying that the information should be published - what they're saying is that you can't rely on that information being a secret.

    Is Fort Knox secure? Probably. If so, then why don't they publish the blueprints, guard rotation schedule and security policies?

    That's pretty much the point you're missing - even if that information was published, it wouldn't diminish the security of Fort Knox..

    If the people in charge relied on the fact that they don't publish those details, that would be obscurity, because it would lead them to make errors elsewhere. (Oh, it's OK if we leave the main vault open tonight - nobody knows that there will be no guards around it for 10 minutes at 3:30 AM tonight.)

  32. Re:So if they found them... by Jeremy+Erwin · · Score: 4, Insightful

    The earlier study was of polished code, many iterations after release. This latest study is of an unpolished developers snapshot. I suppose that you might be able to divine some kind of wisdom about the development of open-source software-- Development branches shall be as stable as commercial code. Release branches shall be more so.

    The metrics report does mention the version number (dev-1/31/03), though the fact that this is development code is not explicitly noted No mentions is made who commissioned this study. Perhaps the company is simply fishing for clients.