Software Code Quality Of Apache Analyzed
fruey writes "Following Reasoning's February analysis of the Linux TCP/IP stack (putting it ahead of many commercial implementations for it's low error density), they recently pitted Apache 2.1 source code against commercial web server offerings, although they don't say which. Apparently, Apache is close, but no cigar..."
Why don't they fix them? It seems almost paradoxical, if you find .53 errors per thousands lines of code and fix them, then you'll have 0 errors. But since we can only fix errors we can detect, we only detect errors we can fix. Ok, it's too early on a Monday morning...
Moderation: Put your hand inside the puppet head!
http://www.neowin.net/comments.php?id=12345&catego ry=main
--r
2.1 is'nt even out yet! the latest is 2.0.46!
Cats: All your base are belong to us.
Captain: Take off every sig !!
Just because Open-Source coders can't spell when they insert comments doesn't mean that they can't write good code!
But here's the kicker: the vast majority runs Apache on either BSD or Linux. All of this code, from the kernel to the library that tells Apache how to use PHP, is open source. Every hacker on the planet has full access to the code - which means that they can review it and find vulnerabilities in it. Not many people have access to Windows or IIS code. So why does IIS and Windows come out as far less secure, and is exploited so much more?
I think the answer lies in the severity of the code defects, and the architecture and design of the operating system that powers the web server. And yes, I know that Apache can run on Windows.
Has Apache 2.1 been released as a stable, non-developmental release? If not I would say testing it for defects is a bit premature.
Reasoning found 31 software defects in 58,944 lines of source code of the Apache http server V2.1 code.
so what are the calling a defect?
And don't most NDAs for when they do let you look forbid any competetive analysis?
Or am I just too far out of that line of work to know how these things work?
He tried to kill me with a forklift!
Wouldn't that be unstable? I thought the latest was 2.0.46 or something.. If I'm not mistaken, it would be a bit like saying "Freebsd 4.8 has less bugs than Linux 2.5!"
So basically they offer a service like lclint only many times more advanced ? What is to say they haven't missed anything?
This is probably a publicity stunt for them although a good one. I think it would be a good idea for them to sell software suites of their product if they don't already.
Analytic & algebraic topology of locally Euclidean meterization of infinitely differentiable Riemmanian manifold
Would someone please tell me what the point of releasing an article comparing one known product against an un-named one?
As far as I can see, this article says 'We have two arbitary numbers, and one is bigger than the other. From this we deduce that Apache is not as good as commercial software.'
I am TheRaven on Soylent News
I once traced sendmail's source code. Absolutely messy.
According to Apache.org, Apache's latest stable version is 2.0.46. Is that a typo on their part, or are they testing a development version? Also, since 1.3.27 is widely used, it would have been interesting to see how that stacked up as well, having been developed longer.
Either way, to have only 31 errors in close to 60,000 lines of code is impressive!
libertarianswag.com
Since LOC is a poor metric, a "defect density" measurement based on that will be just as poor.
Yes, I know there's not much else to go on, but something along the lines of putting the program through its paces, stress testing, load testing, etc. would be a much better measurement than a metric based on LOC.
The difference is that now that someone has found 31 errors in the open source Apache software, they will be fixed fairly quickly whereas closed source software will have to have the company do a cost-benefit analysis, put together a team to do the fixes, probably charge to put out patches or minor upgrades (assuming the product is Microsoft's IIS ;b)...
Only two things are infinite, the universe, and human stupidity,
and I'm not sure about the former.
Why does it seem a bit odd to be testing software quality with other software? I wonder if they ran their own software through its own program, but then that gets kinda weird when a program starts noticing errors about itself... maybe it'd get depressed and start ranting at the creator on how they should have taken better care of it... ok, I need more sleep
They are comparing a development version to an un-named commercial web server?
Why don't they compare it to apache 2.0.46 if they want a newer, but release product? I expect they did, but they didn't get the results they wanted.
This is a development version, it's an odd numbered release for crying out loud.
I wouldn't be suprised to see this is bankrolled by M$. Let's compare IIS in development to Apache 2.1, and then see what IIS bug density rate is.
Bah!!
Doug Tolton
"The destruction of a value which is, will not bring value to that which isn't." -John Galt
I can't think of any reason why should anybody trust this analysis until they publish the methods used. Anybody can say "Hey, I tested something using my proprietary method, and $foo has more bugs than $bar!". Unfortunately, such tests really don't say anything substantial about the quality of software. IMHO.
Hypothesis: Taking down IIS, Windows or Microsoft is more fun/cool.
Umm, Apache 2.1 hasn't been released yet. Current latest stable is 2.0.46.
I can only assume that they're looking through the current DEVELOPMENT codebase -- finding a higher ``defect density'' in such a development codebase compared with commercial offerings is not exactly unexpected.
They're also some automated code inspection product; the press release doesn't go into details as to the severity of the defects found or the testing methodology.
It'll be necessary to read through the full report before drawing any sound conclusions.
Here are the links to the Apache Open Source Inspection Report you requested:
Apache Defect Report: http://www.reasoning.com/pdf/Apache_Defect_Report. pdf
. pdf
Apache Metric Report: http://www.reasoning.com/pdf/Apache_Metric_Report
Reasoning provides the world's leading automated software inspection service. We boost the productivity of development teams by finding software defects faster and at a far lower cost than traditional approaches. Please let me know if you would like additional information. Thank you again for contacting Reasoning!
Sincerely,
Reasoning
What bothers me about these articles is that there is more to software quality than the # of flaws-per-unit-"whatever".
Like design.
It seems to me most of the problems with Apache's main competitor in terms of software quality are the result of design and engineering choices made by MS's IIS development team.
In other words, it does exactly what they designed it to do, but what they designed it to do was a very bad idea.
"Lawyers are for sucks."
- Doug McKenzie
I know that Apache has vulnerabilities but it should come better than IIS. You can't realisticly give a verdict on IIS without looking at the libraries called.
As for the rest, I can imagine some commercial products coming in better, but not many.
See my journal, I write things there
The problems with this are:
Kevin Fox
The defect density of the Apache code inspected was 0.53 per thousand lines of source code...
We can bring this number down to 0.2 by avoiding the BSD style guidlines. No kiddings, have you seen the density of MFC code?
BSD code:
char*
foo(int bar, double baz)
{
return bar + random();
}
MS code:
char* Foo(int nBar, double dBaz) { return bar + random() + m_ExtraWindowsBugModifier(); }
This looks like it was just an ad/demo of their code testing software.
;)
I am trying to get the main analysis downloaded now, but they must have been prepared for a slashdot posting
I see the point in automatically checking the
source code for common programming errors,
but how can such a system ever find semantic
errors, such as complicated protocol handling
issues?
It seems to me that those just happen to be
strong points of open source software.
So?
There are errors and there are errors. There are error that don't matter a jot, and there are errors that are show-stoppers.
I've worked on banking software containing code that was written in assembly for PD11s and developed over decades. The most horrible spaggetti code you could ever imagine. Why did the banks keep using it? Because for any particular input it always gave the correct output.
Years of bug fixing had made the code horrible and probably full of errors if you were looking at it from a purely theoretical/software engineering viewpoint. But from an input/output point of view, it was faultless.
For the same reason that windows boxes get hacked more often. The more a platform is used the more attacks on it.
This comparision is tottaly fishy!
In Soviet Rush, today's Tom Sawyer gets high on you.
Completely and utterly agree, I mean hell, I could write fifty thousand lines of code, each line completely and utterly with no meaning, run it through the checker and produce 0 defects, except for one overall defective piece of software. Does this article have any point whatsoever to it at all, I mean, even if the results had any meaning, what on earth is the point of comparing a known to an unknown ?
Still, mitigate that with the pre-release status of Apache 2.1 and it cancels out.
Kevin Fox
First, are all of IIS's issues "software errors" per se? I'm wondering if all security problems would have been caught, or if that was really the goal of the analysis. Perhaps it was, but I'm not sure. One could contest that IIS has a lot of things unprotected, but that this doesn't constitute a software error.
And as you say, severity would be another issue. It's always been typical open-source style to get the mission-critical parts hardened against nuclear attack, but leaving the other bits a tad soft. I wouldn't be surprised to learn that was the case with apache.
One thing I want to know - did MS (or whoever) give these guys source or were they analyzing the binaries?
-Looking for a job as a materials chemist or multivariat
Since when are unfounded results from a company that doesn't explain what the "32 defects" were, newsworthy. Don't act like these guys are worth my time, this is bullshit.
Ignore the "p2p is theft" trolls, they're just uninformed
Prette lame when we are talking server software where apache has the lead. (apache 63% vs IIS 25% netcraft.com)
/Esben
"Nobody really checks their email any more. They just delete their spam"
Is it just me that finds this entire concept of "code defects per 000 lines" sounding like a little bullshit?
If the company has developed proprietary tools to enable them to identify defects in medium-sized software projects, which of the following business models do you think is more effective:
1. Design proprietary tools to identify defects in medium-sized software projects.
2. Fix defects
3. Profit
or
1. Design proprietary tools to identify defects in medium-sized software projects.
2. Sit around mumbling about defects, Open Source software, closed source software and why farting in the bath smells worse
3. ???
4. Profit
Secondly, where on earth did they get hold of a closed source enterprise level (which Apache undoubtedly is) web server software codebase?
"Hi, is that BEA? Do you mind if we take a copy of your entire code base so that we can peer review it against Apache's? What's that? Yes, Apache might come out on top, and we will make the results public..."
How do they define a defect anyway? A memory leak? A missing overflow check? A tab instead of 4 spaces?
It just sounds like bullshit to me...
Invoicing, Time Tracking, Reporting
Hell, there are no rules here. We're trying to accomplish something. - Thomas Edison
Another post seems to indicate this was done via software to automatically detect defects. Many (most?) security defects cannot be detected automatically, as they involve using the software in an unintended way.
Turing says no.
turn up the jukebox and tell me a lie
It says the results are "objective and comparable across software applications, development methodologies, and coding styles".
Boffoonery - downloadable Comedy Benefit for Bletchley Park
You compare 50k lines of semi-optimized beta code against 1000k lines of a "commercial" product. This is why statistics are the best liars. What does it take to get the "commercial" product to lower the contamination in parts per thousand?
Bloat.
Add a few more comments as unessential error checking, hell, add DRM to check to see if you are hosting the lastest Emimem MP3s. That should do it.
If anything is defective or dense, it's the people who came up with the statistics for the sake of PR.
...then why is it their webserver? :)
Of course it is Apache 1.3.23...
is equivalent to the error level in post-release commercial web serving software. Sounds like an endorsement to me.
Also keep in mind that defect density is just an average. If you have 31 defects in 60k lines of code, that is potentially 31 security risks, or out-of-operation risks. If the other software tested had double the lines of code (120k), the density would imply that they had slightly less than double the defects, so say 58 or 60. That implies _58_ potential security or uptime risks. In this case, imho, defect density is not a good indicator of the reliablity of the software.
:)
My general rule is that if someone is quoting statictics to you, they are lying. At least on average.
By its very nature, Open source will tend to fix important bugs and leave unimportant ones unfixed, while standard QA processes associated with commercial software will tend to fix little UI issues during the release schedule before dealing with vulnerabilities.
So seems pretty clear to me that in Open source, the ratio of showstopper bugs to miscolored widget bugs will be much lower than for commercial software.
This doesn't indicate that the commercial equivalents are better. You've got the DEVELOPMENT branch of Apache, which is derrived from the 2.0.x code which is a complete rework from the original 1.X branch of code. So it's a rather new code base and it's showing similar defect rates to a code base that has been around for a while. I'd say this prooves that open source is better.
This sig has been temporarily disconnected or is no longer in service
This is a development version, it's an odd numbered release for crying out loud.
You refer to the version numbering rules used by the developers of the Linux kernel. Does Apache follow the numbering scheme of Linux?
Will I retire or break 10K?
Hmm, so they looked at 58,944 lines of code, and found 31 defects? Did they find every defect? Can they prove this?
Proving program correctness and bugfreeness is real hard. If they did find every defect and they can prove it, then I supect that it would be a significant breakthrough in Computer Science, not to mention a comercial goldmine.
As you can imagine, I am a bit sceptical.
My Karma: ran over your Dogma
StrawberryFrog
They didn't do that because if they did that, then they'd find bugs in their bug finder, so they'd have to run the bug finder on the bug finder to find bugs there, but then they'd have to run the bug finder on the bug finder on the...
This sig has been temporarily disconnected or is no longer in service
The longer and more content you have per line the higher the likelyhood of error/ line.
As example with one errror in 100 lines you get 1% error. Imagine you could do the whole thing in one line. Now you have 100% error.
Help fight continental drift.
So if they can write software to automatically spot coding errors, then it must be possible for them to automatically fix them, no?
I compared this to my 'other' server, for now unnammed.
My 'other' server brought me coffee, 2 pieces toast, 2 eggs OVER EASY, 4 strips of bacon, *and* Smucker's Grape Jelly with nary a mistep, or hesitation. This other server smiled, asked how my wife was, and brought me a new fork when I dropped my first one.
Congratulations, Gloria! You win the 'great server' award!
This article isn't worth the 2 dollar tip.
here
I wonder what scope of errors they are looking at? For instance, are they counting assignment errors (overflow), IIS->Com higher level type errors, or both.
-t
http://unmoldable.com W:"No one of consequence" I:"I must know" W:"Get used to disappointment"
Why doesn't Reasoning fill the niche, and code a completely error free web server? They know other peoples mistakes, so they should know how to code an error free one.
Well, seriously, I wouldn't put much in their obvious estimation.
Any technology distinguishable from magic, is insufficiently advanced.
Ok, IIS is the obvious choice as being the second most popular web server after Apache. But I hardly think Microsoft will be letting these guys all over the IIS source code.
It could also be Zeus, SunOne or one of the other lesser known web servers out there.
Read reviews of shopping cart software
The test may be more interesting if applied to Apache 1. As someone who has had to migrate a mod_perl site from Apache 1 to Apache 2, I can tell you that Apache 2 is a very new beast, and it doesn't shock me at all that there are dozens of bugs that still need to be shaken out. Fewer users are running Apache 2 in a production environment as well, since it's considered a development branch. See less eyeballs rule.
Some things I found interesting:
One of the explanations (given by Reasoning) for a NULL pointer dereference is "can occur in low memory conditions," which I think means the original allocator did not check for malloc failure.
So you can get a sense of what a defect looks like, here is #21. The orignal uses bold and fonts improve readability, but I don't know how to reproduce that in slashcode:
DEFECT CLASS: Null Pointer Dereference
DEFECT ID 21
LOCATION: httpd-2.1/srclib/apr/misc/unix/otherchild.c : 137
DESCRIPTION The local pointer variable cur, declared on line 126, and assigned on line 128, may
be NULL where it is dereferenced on line 137.
PRECONDITIONS The conditional expression (cur) on line 129 evaluates to false.
This study makes a lot of sense to me - that the defect rate is tied to the maturity of the code base. I have long felt that Microsoft's business model where they redo the operating system in order to churn their user base and induce cash flow will always result in more defects and security problems than a model where software change is driven on a solely technical basis.
I think the next step for these folks would be to take a project that has a long history, say perhaps Apache 1.x and show defect rates over the life of the project.
29 possible "null dereferences" and 2 possible "uninitialized variables". Some of them are simple "fail to check return value of malloc() for null", and others are not bugs in the code but bugs in the logic of the scanner. This is, of course, a precursory review of their document. All in all, these are absolutely minor bugs if they are real at all.
Well, the reports simply state that, in the 360 files they checked (most of them header files) they found 29 cases of a potential NULL pointer dereference and 2 potentially uninitialized variables. This is from the Apache 2.1 codebase as of 31st Jan this year, about 58k lines of code.
Their automated checker also searched for out-of-bounds array accesses, memory leaks, and bad deallocations. It found none.
They also state that they ran the same checks against other codebases, and found that they did marginally better, on average.
In short, this report says that OLD development code for an unreleased opensource project is nearly as good as current commercial offerings. That's at best, when you consider the huge gamut of possible defects that this checker won't pick up. That margin probably disappears in the +/- of the sampling if you were to do a proper statistical analysis.
The report is fairly useless. It certainly should not be taken as a reason to not trust Apache; to do so would be foolhardy particularly given Apache's track record.
Oh, and Reasoning's webserver is being pounded into the ground. You can get my local copy of the reports from here.
As others have stated, the article states that "the difference in defect density between the two was not significant." Meaning that defect density, especially with such a small differential, has little bearing on the overall quality of the software. We know nothing of the severity, impact, etc of the defects: they could all be cosmetic for all we know. This is probably nothing more than a marketing strategy by Reasoning: publish a study without any details on a hotly debated topic and see how many people check out their site. It'd be nice if they had a downloadable version of their software to test drive.
.NET code analysis), you have to set it up to filter out the majority of its rules or you'll get 3000 instances of "You didn't name this variable the way MS says you're supposed to." FxCop is extensible though. The point is, not a single poster on this page (unless they work for the companies involved) knows what Reasoning's methodology or rule set was when they did this so we can glean virtually zero value from this analysis. I look forward to 600 anti-Microsoft posts because of it though....
FxCop is an example of a "defect" or code analysis tool. While I have NO idea of Reasoning's methodology, I know that with FxCop (which is specifically for
"for it's low error density"
for it is low error density?
TARD.
If you read the actual report, it does cite what type of defects they looked for, and what they actually found.
29 NULL pointer dereferences
2 Unitialized variables
The unitialized variable is just a -Wall issue, the NULL pointer thing may or may not be serious depending on the context...
It's hard to tell the cool to chill, my favorite hotel room has a view to an ill.
The majority of the secruity holes are from the people setting up the web servers. The holes are usually abused by "wanna-be" hackers, or script-kiddies. The problem is that people are not educated enough to run some of these programs. Being able to understand Apache, and how to make it operate correctly is not everyone's top priority. As long as it works, people don't care how it works (as goes for many other things in this world).
Every Super Villan uses Linux.
One of the explanations (given by Reasoning) for a NULL pointer dereference is "can occur in low memory conditions," which I think means the original allocator did not check for malloc failure.
appache got its own malloc() that kills the child (and closes connection) if it fails to allocate enough bytes.
The thing that always kills IIS, is the integration it has with Windows. This isn't a defect in IIS, or Windows, per se, but rather a defect that arises because of how they integrate with eachother. A script executes on IIS in a way that's not inately a bug, but then when it interacts with Windows, Exchange, etc, suddenly it becomes one.
Apache is just a webserver, and that's all. PHP, JSP, etc, are all separate applications treated separately. The integration does make things more efficient, yes, but also more prone to problems.
This sig has been temporarily disconnected or is no longer in service
Is IIS just inherinetly insucure because it is used on a Windows platform? Is it because hackers generally target IIS and not Apache (most people will rush to this conclusion)?
Microsoft will try to make people belive whatever is in their interests .. Even if it means contradicting themselves ..
Last Friday Microsoft called all their Premier customers in France with "information" related to the upcoming "hackerfest" last Sunday.
According to Microsoft mostly Unix and Linux servers would be the target of the hackers but it did not exclude IIS Web servers to come under attack.
The FUD coming from MS is absolutely unbeleavable..
echo '[q]sa[ln0=aln80~Psnlbx]16isb572CCB9AE9DB03273snlbxq' |dc
Exactly. Mostly you are fixing security-critical bugs, the other ones remain non-recognized and untouched (what is the bug anyway?). It does not matter (mostly) it your original raw code has 1.0 or 0.5 bugs per line. If the secutitycritical bugs are, say only 10 percent, and you fix them only in the first case, you still have 0.9 > 0.5, but much more secure code.
You obviously didn't read the article.
They stated the defect percentage between Apache and typical commercially available web servers were so minute. This means there's not a heck of a lot of difference between the two...in source code errors that is.
The fact the results might be somewhat useful to the Apache community is a happy bonus.
Boffoonery - downloadable Comedy Benefit for Bletchley Park
Questioning Apaches superiority by comparing it with inferiority?
...by looking for bugs in a development version?
...by kissing /.-editor's for spreading this incompetent FUD?
Man, someone needs to re-define the meaning of "serious journalism"... especially when it comes to reports about open source projects and there is no corp. which can kick the authors ass for bad journalism.
I have to play the BS card here.
.51 and .53 defects / KLoC (thousand lines of code) is statistical noise.
There is no magic "defect detector" for software. If there was such a thing, they would be making a helluva lot more money than they get for doing little defect tests.
It is very difficult to prove a program to be correct, and there's a lot of REALLY smart people who have tried.
Maybe these people have stuff than can look for buffer overflows and stuff, but actually being able to tell if Apache is returning the correct results requires far more than generic tests.
And I'll all but guarantee they didn't get together an entire development team to understand the code base and how it works as apache is a very large and complex code base.
Maybe they take what the find for their generic tests and extrapolate that if they find more generic problems there are probably more specialized errors as well, but they make it very clear in the report that the difference between
Anyways, I'm not saying the entire thing is worthless, just not to read too much into it -- either this one that puts Apache slightly behind some unnamed commercial implementation or the one that put the Linux TCP/IP stack ahead of some other commercial implementation (though I'd say it would probably be easier to test a TCP/IP for correct behaviour than a web server).
Analysis of the quantity of bugs in a software application is by no means a qualitative analysis of the performance of that application.
.02 in software bugs over a commercial, proprietary httpd server any day.
The predominant httpd servers available on the market today are Apache; iPlanet/SunOne; and IIS. Additionally, there are lesser-known httpd servers (zeus, cern), as well as 'niche' httpd servers (caucho) which typically perform additional functions to parsing HTML code (such as acting as a Java server, etc).
According to Netcraft, Apache is the #1 httpd server in use today, and has been for nearly 7 years.
Regardless of the purported 'quality' provided by commercial, closed-source alternatives, the Apache httpd server is the only solution in the marketplace that supports - in a stable, qualitative fashion - a startling variety of additional software to provide functionality to a website.
A primary example of this bundled flexibility would be the vast number of scripting languages supported by Apache. Java, Perl, PHP, and TCL are all free, stable, and work wonderfully with Apache. This kind of flexibility in application environments is simply unparalleled by the other httpd servers.
You might say that 'you can run java, perl, php, and tcl on iPlanet or IIS, though'. Sure you can. Have you tried that?
First, your commercial vendor won't support it - Microsoft will only support you if you're running ASP.NET et al on IIS; Sun will only support you if you're running Java on iPlanet.
Second, non-supported scripting languages often don't work on non-apache httpd servers. Why? Because the source code for the httpd server isn't available to the scripting language developers - making intelligent integration more difficult - additionally, the major vendors don't test competitive scripting language functionality on their products, meaning that while the writers of PHP, Perl, TCL, etc may offer a version of their product for other httpd servers - Microsoft and Sun aren't testing them on their httpd servers - plus, they aren't guaranteed to work, and often don't. (At my company, we've never been able to get PHP to work correctly under iPlanet - and guess what? Sun doesn't give a shit. Big surprise, huh?).
Commerical httpd servers may indeed have less bugs - but they certainly are not as stable in performance, nor do they support as wide a variety of available software extensions - as Apache.
I'll gladly take that extra
Metric Report
They make you fill out a form that asks for your email and then do an opt out checkbox at the bottom of the form (you have to check it to NOT get spam from them). The site's a bit slashdotted right now though.
The # flaws per leads to:
-Every program can be at least one line shorter.
-Every program has a least x bug per xxx lines.
Conclusion:
The ideal program has no lines and no bugs.
and to prevent any insightful moddings of this post:
Yes, the design is more important than the quality of the software, ask MS about this.
Apache isn't dying.
So whatever these people claim about the quality of Apache is really not useful. For being the most used web server software (a factor of 3 over certain commercial offerings) with continued growth, it suffers from the least bugs and is generally the most stable.
Are we to read this as anything other than FUD?
Join Tor today!
This Slashdot-Posting was featuring the same PR from Reasoning.
Tonight (if its not already done) all those 31 "bugs" are removed from the apache CVS tree. Now, who said opensource development is not effective?
When will the "only 0.51/KLOC" IIS bugs will be removed? next service pack?
bance.net
Reasoning's code inspection service is based on a combination of proprietary technology and repeatable process. The results are objective and comparable across software applications, development methodologies, and coding styles
I have been thinking of writing a program that can detect security holes (buffer overflows in particular) automatically. It would be very hard. But they claim they have such a program, that just finds bugs automatically, and all they use it for is counting them? Somehow I can't believe that. So I guess they don't have such software and are doing something which isn't really objective and comparable at all...
Another reason for this suggestion is that they count bugs per line of code. That for one is not comparable across coding styles.
The site reasoning.com is running Apache/1.3.23 (Unix) (Red-Hat/Linux) on Linux.
Maybe that's because the majority of web servers are running on Unix/Linux? Or maybe that involves too much common sense to be believed? I guess so.
1) Apache 2.1 has more bugs than some unknown commercial competitor. If the version is correct, a development (not-ready-for-release) build was pitted against a released commercial build. Not fair playing ground.
2) Reasoning does not detail the severity or kind of the bugs. Certainly, a web server not being able to handle a type of format (pdf, csv, ogg vorbis) is less severe than a security hole. Pitted against IIS, I would trust Apache even if it had more bugs, because historically it has had fewer security patches. Check out Apache's 2.0 known patches vs IIS 5.0
Well, there's spam egg sausage and spam, that's not got much spam in it.
What is a line of code anyway? Is that the number of hard returns or the number of semicolons? Even so, can we talk about the number of times a line of code is executed? For instance, an efficiently looped statement can often be broken out into tedious and unnecessary repetition. In this way, bad style can reduce your "defect density" by padding your overall volume of text. At the very least I can change
int x,y;
into
int x;
int y;
and reduce my defect density by 50% (if the above code weren't brilliantly flawless).
* Please do not read my signature.
That's why they rewrote IIS 6.0 (included with .net server 2003) from scratch.
"When a ball dreams, it dreams it's a frisbee"
Visible errors in the code. dereferencing NULL pointers, not saying 'int i=0;', etc. not "well, if you issue this crafted URL to the app, and its a full moon and a monday and the weather is 78F, you can get root!"
did you even bother to read the article?
oh wait, this is slashdot. my bad.
Do you know how long it takes to read someone else's code on something like an Apache-level webserver and understand it to the point where you can make useful changes and fixes? The big lie of the "all bugs are shallow" argument is that such a thing is simple, when in fact it is not.
Fixing a non-obvious bug in a 100k or so line C or C++ project is hard enough when you wrote the code yourself. If someone else wrote the code, it is harder still.
As has been pointed out a couple of times in other comments, 2.1 is the development branch of the Apache web server - ie "beta", "buggy", "work in progress", etc. etc. In stead of reading this as "Apache has roughly as many defects as closed source web servers" let's read this as "the development version of Apache has as many defects as... well, some unidentified (beta? shiping?) version of some unknown (iPlanet? IIS?) web server". But you can be *much* more confident that these defects will be fixed in Apache than in the *other* product.
/t
Heck, forget confidence - YOU CAN JUST CHECK.
The fact that Reasoning didn't have to go and get permission from Apache to run this test - coupled with the fact that we don't even know what Apache is being compared to - is the *real* point behind this "article".
ps: IANAL but don't they have to include a copy of the Apache License given that they publish fragments of the source code in their defect report?
#!/usr/bin/english
Look at defect ID #26 in the report.
You'll see that this can only happen when nItems is 0. This means that if a pre-condition was added to the routine tsort() that the nItems argument MUST be strictly positive, defect #26 vanishes.
If I'd put:
assert(nItems > 0);
at the routine entry, it would prevent the further null-pointer dereference and spot the bug immediately when it occurs. I'm not sure how well a web-server crashing would be perceived, but that would not be worse as a kernel panic'ing, and there is indeed a potential bug there.
My point is that to call #26 a defect (or not), we'd have to check all the callers, and if all the callers were to guarantee that nItems is strictly positive, then there would be no bug at all.
Apart from this remark, I think that kind of work is really great. I'd love to see it applied to my favorite open-source Linux Gnutella client (all Gnutella clients are by definition an HTTP client/server). We'd see how a small open-source project compares to a big one.
Netcraft on Reasoning.com's webserver. Apache 1.3.23 on Redhat.
-phish
I mean, slashdot has an apache section (which is retarded as the radio section, and only slightly more popular)... shouldn't a story about apache go there?
The report says they're using 1/31/2003 code.
IIRC, in 1/31/2003 Apache 2.1 branch was only a couple of months old, it wasn't even alpha quality...
I'm curious who is footing the bill for this "research"?
grisha.org
http://slashdot.org/article.pl?sid=03/05/11/001
./ yet.
Just remember to move the last url 0 back one space, after ctrl-c and ctrl-v the above url.Sorry I worked for the firm and do not know how to link in htm with
OH THE SHAME I fell off the wagon and use sigs again!
Thank you so much. I *hate* when people pull that crap out because guess what in the server room Windows is still losing. Thanks.
Cypherpunks: Civil Liberty Through Complex Mathematics. Those who live by the sword die by the arrow.
Mis-information at it's finest.
:
Who paid them to do this report? Microsoft?
Maybe slashdot's new byline should be
Slashdot,
Misinformation that pretends to be news for geeks.
Stuff that looks like it should matter, but dosen't.
Not quite as catchy though is it huh?
The lower defect rate in Linux TCP/IP can only be explained by a large chunk of more mature, commercial, stable SCO UNIX code.
If the Apache developers simply want to fix the bugs, they can use the Defect Report. If they want conduct a brutal purge of their contributors, they can use the Metric report.
Any developer knows that code that crashes is rarely the code that contains the defect. If this were the case then the bug would have been found long ago because its faultly behaviour would have presented itself immediately. Difficult bugs (those probably found by this test) are those that start somewhere in the code but do not surface until much later, masking their true identity.
That being said, certainly a list of crash sites may provide hints as to where to look for the real bug.
that's the first time that phrase has ever been used before.
I suspect the following code will be flagged as a defect:
as long as doOrDie() does its job and never returns a NULL then where's the defect? The guys who wrote this tester seem to want you to check any pointer dereferencing against NULL before use - I might be doing this in my doOrDie() function, I dont want to have to do it twice.Who judged the code they used to judge Apache? I bet there code has even more defects...
Maybe that's because the majority of web servers are running on Unix/Linux?
True, but according to statistics 56% of defaced webservers run Microsoft IIS, and (only) 34% Apache..
This is not brand new data, but it is the latest I can find ... And If Microsoft had some stats showing different results, you can be sure they would publish them..
The competition was about defacing 6000 webservers in 6 hours, so one would tend to conclude from the above that Microsoft IIS would be the primary targets..
echo '[q]sa[ln0=aln80~Psnlbx]16isb572CCB9AE9DB03273snlbxq' |dc
Slashdot's summary of this article is way off base, and the article itself couldn't be less useful. Counting the number of "errors" in lines of code... and the ratio is supposed to mean something to us? As compared to unnamed other software? C'mon, I have better things to do with my time.
*plonk*
A hen is only an egg's way of making another egg. -- Samuel Butler
Poor Example. A better one is testing your spam blocker by "open sourcing" your email address (plastering it all over the web, usenet, opting in, etc) vs keeping your email address private.
Okay, we've beat to death the fact it was a pre-release version. But look at it this way:
When Open Source software is about the same quality as closed source, the developers consider it unstable and warn people that they may run into problems.
It shows a big difference, to me, in the quality standards that OSS developers (and users) expect.
As a rather "stupid" example, I had to initialize a Map to an empty HashMap just last week to get Sun's Java compiler accept my code, although the only two references to the Map where within two if-blocks, within the same function, both of which depended on the same boolean value, which wasn't changed in the whole function.
There's a difference between defect and a bug. Tools that help in finding problems are great, but after all, they can only point possibly unsafe points. Ofcourse it's good to write code that doesn't trigger any such possibilities in the first place.
Software should be free as in speech, but if we also get some free beer, all the better.
One of the best ways to get to know a large code base like Apache or something else is to find a repeatable bug and track it down. To fix a bug you do not need to understand the whole program, just the relevent parts. I've submitted bug fixes to several projects, so I must strenuously disagree, especially because, ahem, I have never submitted a bug fix to a proprietary project because its impossible.
Well this certainly falls under the "duh" category. Freshly written code tends to have fewer bugs than older, well reviewed, well tested code.
Wow, next we'll learn how you shouldn't buy any Ford, GM, or Chrysler product in the first year of production.
--- It is not the things we do which we regret the most, but the things which we don't do.
Here's a duplicate code report (generated by CPD) for a checkout from the APACHE_2_0_BRANCH as of about a month ago. Time for some refactoring....
The Army reading list
Of course, this test of the code is purely a test of coding errors rather than errors in the code logic.
The most worrying errors in programs are generally not coding errors as they are either terminal (ie. crash) or they are benign (the error may cause memory corruption in a place where it does no harm). Of course, there are exceptions such as buffer overflows, but I'd class those, in general, into the logic error category.
Logic or algorythmic errors are far more dangerous as they can be well hidden and are more likely to make the code do things unintended. The code itself may be perfect but if the algorithm is faulty then there's a major problem.
Agrajag: "Oh no, not again!"
I agree completely. Any metric based on Lines of Code anything is a harmful metric. Any metric based on defect counts is also harmful. Both of these are left-overs from attempts to (mis)-apply statistical process control. Control of crappy metrics give crappy quality.
Suppose I had 100K lines of code with 100 defects. After reviewing my code I discovered that I could refactor it to 80K lines and suppose further that doing so had no effect on the defect count. Defects per line of code would look worse after an improvement.
Also, given that this is an automated program, I have to ask how they calibrate and validate its results. How many of the 32 errors found actually aren't errors? How many existing known bugs were not found by this program. I really can't accept these results as anything more than fluff with numbers.
To quote me:
"If I had a letter, sealed it in a vault, gave you the blueprints of the vault, the combinations of 1000 other vaults, access to the best lock smiths in the world, then told you to read the letter, and you still can't, and I decided that I was safe from the rest of the world, then I am a stupid sysadmin."
If I have a vault that claims to be secure, then I am an idiot for publishing the details. True, obscurity will not substitute for security, but it does slow the attacker down, and possibly raise the cost of attack so that it is not economic to attack my vault.
Every time I hear the "obscurity is not security" mantra I chuckle. Of course it isn't, but that doesn't make publishing the information a good idea. Is Fort Knox secure? Probably. If so, then why don't they publish the blueprints, guard rotation schedule and security policies? Because that would be stupid, that's why.
Go tell every LAN admin that they need to publish their LAN architecture, firewall config and security policies, because "obscurity is not security." Watch them laugh their ass off at you.
Maybe I drank too much coffee this morning...
If you are going to compare apache w/ IIS you need to compare apache+php (or modperl or similar) w/ IIS+ASP. the addition of a server side programming lang adds alot of complexity. how many of the IIS bugs are in the iis server itself vs. its handler dlls. all the ida and indexing service attacks were this type of vuln.
Why did they use the development branch of Apache, when only a handful of sites are running it? I would have found an analysis of the stable 1.3 branch, which 60% of the web-serving world uses, to be more informative.
In almost every case they listed the pathway was via a failed malloc.
Apache has it's own malloc that kills the connection (and the child) if it fails.
That code can never be reached. Their test is invalid.
First, as many posters have noted, Reasoning DID NOT TEST APACHE 2.1. They tested Apache 2.1-dev. That's dev, as in development branch. As in: I have new untested code, so don't use me on a production server until I'm released in the STABLE series.
For a valid comparison versus commercial software, the testers should have used Apache 2.0.46, the most current STABLE series release.
Second, I'd be interested to see a comparison of 2.0.46 versus 1.3.27. I have a pet theory that multithreaded C code has more bugs than single-threaded C code, and I'd like to see whether there is evidence to support it.
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
You have the wrong idea here. There is a point in which you must realize what information you can release without comprimising the security of your system. While I can give you the plans to my vault, I will not give you the combination, nor the first or second numbers in it.
For the star wars geeks out there, if you were a Jedi, you don't go around telling everyone you're a Jedi, nor do you flash your light saber in public places. They do realize when to show their light saber, and when they can tell people they are a Jedi. Nor do they not tell anyone who they are, or never show their lightsaber.
You might want to check out Secrets and Lies which will give you a better understanding of security philosphy.
Every Super Villan uses Linux.
Actually, I've found that fixing bugs in large projects is about the same whether or not you are familiar with the project, provided that the author was no smoking crack at the time he wrote it.
For example, I managed to code, test, and patch a "fix" for PostgreSQL this weekend in under 2 hours, having never seen the code before.
The "fix" wasn't a bug, per se, i't just that the output of pg_dump wasn't optimal in my usage for dumping the schema for CVS revision control. I added two flags, -m -M, which molded the output to my liking.
If you haven't seen your code in two months, you and an outsider have about the same chance at finding and detecting bugs/misfeatures.
Engineering and the Ultimate
My wife who is a lead QA tester could vouch for that...
This is my sig. There are many like it but this one is mine.
I have just read the first 'null dereference' claim and it seems to me that in fact it is not possible. Maybe we got amount of reasoning bugs?
Mabye I'm naive, but I read the press release as an endorsement of open source, and a prediction that as the code (the development branch of Apache in this case) matures and is subject to peer review, the code quality will improve (by whatever measure Reasoning is using).
Here's why I think that:
Every time I hear the "obscurity is not security" mantra I chuckle. Of course it isn't, but that doesn't make publishing the information a good idea.
Nobody's saying that the information should be published - what they're saying is that you can't rely on that information being a secret.
Is Fort Knox secure? Probably. If so, then why don't they publish the blueprints, guard rotation schedule and security policies?
That's pretty much the point you're missing - even if that information was published, it wouldn't diminish the security of Fort Knox..
If the people in charge relied on the fact that they don't publish those details, that would be obscurity, because it would lead them to make errors elsewhere. (Oh, it's OK if we leave the main vault open tonight - nobody knows that there will be no guards around it for 10 minutes at 3:30 AM tonight.)
One word: architecture.
And not just the architecture of the web server, but the architecture of the entire platform. But specifically looking at the architecture of Apache versus the architecture of IIS, you'll immediately see that the goals of the two pieces of software are not the same. Look at things like IIS's metabase - the structural details of the server's configuration are kept in an in-memory data structure, which is easily modified while the server is running. Apache, in contrast, reads its configuration at startup, and uses it to determine which modules of code are loaded, and how they are used to process requests - fixing the behavior of the web server at startup.
IIS follows typical MS enterprise software design - it has to interface with COM, and the NT security model, and active directory, and the registry, and a million other systems, all in the name of integration, and enterprise management. Apache doesn't have PHBs telling it that it needs another way for the metabase to be edited, or a new instrumentation API, or whatever else a particular large customer asked for - and can get on with just providing its facilities cleanly.
That's why IIS has so many more security holes, even if it does (as may or may not be the case) have the same raw coding error rate as Apache.
What's wrong with the bottom code?
Everybody take a deep breath.
Their conclusion is that while the INITITAL defect rate of Apache is roughly equivalent to a closed source product (since they are testing a development release), the Open Source methodology reduces the defects to a greater extent and results in code with fewer defects over time.
They are saying that Open Source coding methods are producing _better_ code in the long run.
Freedom Is Universal
Linux-Universe
Why does IIS and Windows come out as far less secure? That is easy. It is less secure by design. Or should I say security was not a factor (or at least not a critical factor) is the design of Windows. Over the years MS has tried to add security to an inherently insecure system.
I once heard Bruce Schneier say "you can't build a secure system on an insecure foundation" (at a DefCon some years ago). He was talking about ALL OS's, not just Windows. Linux, BSD and Windows are all inherently insecure.
What little I know of IIS suggests that it was designed on the assumption that the security in Windows would be enough. Oviously it isn't enough.
Why is IIS and Windows exploited so much? Well, it seems that the vast majority of exploits are done by script kiddies. Script kiddies and the ones who make the scripts seem to go after the easiest targets. Linux and BSD are also inherently insecure, but are tougher nuts to crack than Windows.
That's pretty much the point you're missing - even if that information was published, it wouldn't diminish the security of Fort Knox.
That makes no sense at all. No installation has bulletproof secuirty. I agree (as I said above) with the assertion that obscurity is no substitute for security.
If you had a security system that was totally unbreachable (which does not exist), then, yes, you could publish away.
Unfortunately, no security installation is 100% secure. If it is less than 100% secure, then nondisclosure raises the security. Inversely, disclosure lowers the security.
Again, we agree that (at one extreme) obscurity alone will not make something secure. The other extreme is that security with full disclosure somehow exists, and that we should totally dismiss any solution that relies on obscurity to raise the cost of breaching it.
This is nonsense. Back to Fort Knox and firewall configs, keeping that stuff secret does, indeed, raise the secuirty.
"I told you they would listen to reason"
Real SUV's don't have cupholders
It's 5:42 A.M., do you know where your stack pointer is?
What's your point? IIS was written from scratch. What make scratch so much more secure?
Do you mean that IIS was designed fundamentally insecure? Where did you get that information?
MY compiler (Microsoft C++) does catch this
and issues a warning. Doesn't gcc?Maybe we will fond out which company they used. Having valid third party tests confirm that your webserver is coded better then apache would be hard for any company to pass up. Especially Marketing types who advertise stuff they imagine they heard.
Of course it may not be DESIGNED any better, but they dot every i and cross every t. That's pretty incredible in my experience.
Your comments are not really interesting nor informative or enlightening. The same old sales job about IIS problems being the result of severe bugs and bad design.
Here is the real story. Windows worst enemy are the dumb sysadmins who are put in charge of running the boxes. In fact, for a lot of companies (including my employer), there are no dedicated sysadmins. Since the overhead seems minimal, programmers are put in charge of running the boxes, which is the beginning of the bad rap that IIS gets nowdays. The last thing a programmer wants to worry about is locking down an IIS box. After 6 months of developing an application, these guys want to work on something else, and focusing on configuring the application and the OS is extremely boring an uninteresting.
Don't worry about the errors in the code, I'm sure the apache developers will listen to Reason.
I didn't see anything about this in the article or on their website (i didn't look too hard). Did anyone else find anything? They infer better than three "9"s of accuracy (31 bugs in 60k), but how much better? If I run their product on a project with millions of lines am I going to be chasing false positives all month? Are they finding bugs, possible bugs, or what? Sounds fishy to me...
Quite typical of the slashdot crowd, the slurs and insults are being spewed from both sides of the 'community mouth' when it comes to MSFT vs. F/OS/S. In this case, IIS sucks because it is closed source, so the bugs are worse. Other times, it sucks because MCSEs and department supervisors around the world don't know how to configure it (look at the dumb users! hahah). Finally we have the matter of familiarity and popularity, ften pointed to by sympathetic and/or apologistic commentators, usually to the tune of much derision and contempt... but that is not the case here, as Apache has the market.
The reality, of course lay squarely in the middle of these extreme opinions. IIS (pre-Server 2003 versions, anyway) had some flaws and NT has some flaws, namely shipping with a soft pre-configuration (which, believe it or not, makes sense from a certain standpoint. it's called ease of use). This is often the main reason for IIS being a major target, and most exploits are performed against flaws that have already been patched. Of course, this crowd will spit on automatic updates as an invasion of privacy or some such malarkey, while trumpeting Apache's superiority in between their own patch binges. As you can see, there is very little space for MSFT to do right here. They do a great job given what pressures they face, and the new IIS/S'03 is fantastic, and though we'll need top wait on indications of security, things are looking good.
But as it turns out, the most likely cause of the rampant IIS exploits is that your hax0rs and script-kiddies are often the same F/OS/S enthusiasts flaming MSFT in a forum like slashdot, bearing a senseless grudge against an important and influential developer like MSFT and gleefully proving their point with cheap trick after cheap trick.
Finally, as an aside, I had a friend visiting this weekend, here in Seattle. He's an F/OS/S enthusiast, so I took him on a tour of 1 Microsoft Way in Redmond. Aside from the pleasant, tranquill atmosphere and a small group of Indian developers playing Futball, the only thing that stood out for us was the many banners urging everyone at the company to "make it trustworthy", hung over every door, on every light-post and wall, it seemed. It struck me that the last couple times MSFT set their sights on a goal like that, it was dominance of the web (IE) and total hardware compatibility (win95). Regardless of your personal feelings on the matter, it is hard to argue that they didn't succeed in those pursuits!
Am I the only one who looks at reasoning's results with suspicion (even when I agree with them). Any analysis using methods that are not open and repeatable is not science. This just feels like marketing to me. (it is sad because the study of code quality is such a worthwhile pursuit)
You mean you hate people like yourself?
The Netcraft figure cited compares domain names. When you compare the actual number of public webserving hosts, Apache and IIS are pretty much equal. When you add IIS's considerable lead in private intranet hosting, they are the larger target. Then add the number of personal machines (unintentionallly) running a webserver, and IIS comes out as the best worm food.
Yeah, you're right, probably a more truthful statement would have been:-
I usually write fifty thousand lines of code, each line completely and utterly with no meaning, run it through the checker and produce 0 defects, except for one overall defective piece of software
I'd like to say that this wmay have been the most intelligent and interesting discussion I've sesn on slashdot in quite some time.
Why do you think IIS is exploited "so much more"? Not in my experience, and not for many others as well. I host on multiple platforms, the only one that has ever been hacked is the Redhat box running Apache. Never have I lost a FreeBSD or Windows box. IIS can be just as good or better than Apache in many ways. Most people are using statistics derived from attrition.org that are three years old to show that IIS is hacked more often. Times have changed my friend. Linux systems running Apache are nailed far more often these days, and a great percentage of the mass hacks involve Apache. Anyway, it really kind of comes down to the setup doesn't it?
The report hardley takes down OSS or Apache. The report is reasonable and doesn't over extrapolate about quality. For me, the report is encouraging because MS has something like 80 programmers working on IIS and apache is made up of volunteers with far fewer resources, that is pretty darn impressive for alpha code. I haven't looked at the list of active committers lately, but I know it's no where near 80. Draw your own conclusions.
You either work on the Windows Kernel for MS or you are age 13.
Most bugs are due to subtle errors that were introduced by the coder that will look correct. good examples if an extra semicolon directly after the if and prior to the block/statement. This tends to happen more so if you use ansi style rather than k&r. it is a subtle bug that is easily overlooked by the eye (all statements end in semi-colon).
The grandparent offered a good suggestion on defensive coding. I would suggest that you take it.
But they _do_ publish the information... the guards need to know when they are working for instance. The guards spouces need to know when they are working too. See how this data is going beyond the needs of just the securing of Ft. Knox.
There is no need to freak out about this being some sort of attack on open source software or agonize over what the unnamed commercial product used for comparison was.
.51 error density for "commercial software" is talking about commercial software in the abstract. Presumably, this isn't the error density of some secret web server, but the average density of all the commercial products they've analyzed so far.
The article seems to indicate that the
This report is simply an attempt to prove a simple hypothesis about OSS: it gets increasinly refined as it matures.
Reasoning believes they've proved the hypothesis because Apache, a middle-aged project, I suppose, has an error density comparable to commercial software, while the TCP/IP stack, a mature project, has a significantly lower density.
This isn't inteded to be a comparison of web servers (come on, people, *of course* they didn't have access to IIS) it is intended to be a mildy interesting observation about the life-cycle of open source software.
It would be a lot more interesting if we could see an analysis of whether or not commercial software goes through a similar maturing process. Maybe commercial products also grow refined with age. Maybe not. If so, which matures faster?
What's good for the syndicate is good for the country. --Milo Minderbinder
This is a pitfall of any statistical measure, but is not an indictment of statistics itself as a science or as a tool. You need to understand the metric (in this case, looking at the raw numbers would reveal that the total defects has not increased) to use it properly.
scripsit demaria:
I don't think so... I'm not sure what I would do with a `r00ted' Windows box if it were given to me; why expend effort on it?
In principio creauit Linus Linucem.
I welcome someone else spending time debugging my code. Sometimes they find benign bugs, sometimes they find real bugs. It's all good. It helps make my product stronger. I assume Apache will shortly have 0 bugs as reported by this automaton.
Don't waste time getting bogged down in what appear to me to be irrelevant statistics about what percentage of bugs per whatever.... Just fix the bugs and move on.
I would expect the current report from Reasoning to be bogus also. The previous report (on TCP/IP stacks) was about code density. This means a code base that was 3 times as bloated (code size) but only had twice the number of bugs would come out as being better than its competitors. And that report did not give information on code size or total number of bugs or on the performance of the tcp/ip stack.
I18N == Intergalacticization
So, if I insert 9 empty lines between each line of code, I've just lowered the defect density by 90%??? Are they counting comments and whitespace in the LOC count?
"Freedom means freedom for everybody" -- Dick Cheney
I've examined defect #1, and it obviously isn't a bug (the code checks the variables and breaks out of the loop if it is NULL). This casts serious doubts as to the accuracy of their results, doesn't it? Anybody want to examine the other 30 "defects"?
"Freedom means freedom for everybody" -- Dick Cheney
It appears that you ASS-U-ME that you never make mistakes in your code. Hence my earlier suggestion that you either work at MS or are a very junior coder.
This is a pointless study. While yes, the slight possibility that one may dereference a NULL pointer is a bad thing it's miniscule compared to bad design. A perfectly programmed web server designed poorly will have bazillions more bugs and security flaws than a slightly bugged well-designed one. An objective code scanning bug-finder can't fix stupid.
LilMikey.com... I'll stop doing it when you sto
they dont say what they used for a comparison.
when they tell us what they used, then I will believe it.
this smells microsoft.
bring it on! we want to know what it was compared against, sure as hell was NOT IIS...
What does this have to do with the web server performance? 53 vs. 51 defects is all well and good, but a) how often do these occur, and b) what about actual running time? This test seems worthless...
god's lonely man
Lets complain about open vs closed source quality. Yet, the guy codes at work then comes home at night and does some free code...
:)
Then we look at the linux kernal and see all the sco unix code it contains.
0.53 errors per 1000 for Apache, vs. 0.51 per 1000 for "commercial equivalents" (note, that they fail to say how many equivalents were used to generate the average, nor which ones)? That's definately within the margin of error. Not only that, but Apache is a less mature FS/OSS project, so the comparison seems to favor the FS/OSS model.
Furthermore, while presumely many commercial equivalents were used to generate the commercial average, only one Apache was used to generate the FS/OSS average error density. Again, very crappy statistics.
Even if 100 different FS/OSS projects like Apache and Apache were used to generate that 0.53 average, and 100 different commercial equivalents used to generate the commercial average, it's probably still within the margin of error (or standard deviation).
In short, this study = completely insignificant. Likewise, so was their previous study showing that FS/OSS has a lower bug-density, as it only used one FS/OSS project. To get useful statistics, you need hundreds of data-points -- not one.
social sciences can never use experience to verify their statemen
This just feels like marketing to me.
Duh! It's a press release, not a scientific paper.
They are simply saying - look, we found 31 bugs in Apache, imagine how many bugs we can find in your software.
I'd be much more interested in a metric based on a number-of-statements metric.
This wouldn't be difficult to calculate: obviously, they have lexers and parsers which grok C code. This insulates the metric against coding styles, so that, say, some guy who litters his source with comments and braces, but writes the exact same effective code as another fellow will have statistically similar defect densities.
The comments and braces (and whatnot) aren't the only examples where this is useful, either.. Consider the two snippets which follow:
char *strcpy(char *P, const char *Q) {
const char *p;
for (p = P; *Q; *p++ = *Q++);
*p = (char)0;
return P;
}
and
char *strcpy(char *P, const char *Q)
{
char *p = P;
while ((*p++ = *Q++));
*p = (char)0;
return P;
}
as well as
cdecl
char *strcpy(char *P, const char *Q)
{
char *p = P;
do
{
*p = *Q;
p++;
Q++;
} while (*Q);
return p;
}
Even without comments, it's pretty plain to see that these segments, which pretty much all implement the same function with effectively the same code, with probably the same defects, have drastically different line counts -- but VERY similar statement counts. Remember, ++ side-effects, assignment operations, the three parts of the for(;;), would all be considered individual statements, even if they are not blatantly decomposed as such in the source code.
And, for the -fpedantic PITAs out there, no I haven't even bothered to compile the code (or really think about it). It's just a friggin' example!
Do daemons dream of electric sleep()?
so, evidently numbers are involved with this somehow... facinating!
-pyrrho
so it occurs to me to run a few test of my own... now that we are going to count not only actual errors, but errors that might potentially be added to the language.
[scribble scribble][calculate][ponder] yes... the number of defects, including potential defect is exactly infinite!
Now, if only they used a language where it is impossible to code a defect, like Java, or Godland, there would be no problem!
oh, the sarcasm! I'm so full of it!
-pyrrho
I looked at the "defect" report for apache, and 29 of the 31 errors were null pointer dereferences (the other two were references to unitialized variables). NO array overruns. I'd rather much have a null de-reference (from run-time allocated memory, ostensibly) than an array overrun, which could be used to do a buffer-overflow attack. Apache had none of those.
Furthermore, almost all the errors were in a handful of files, which one could probably assume weren't particularly critical. I'd love to see a re-analysis of Apache's "guts," as I believe it would be rock-solid.
-Looking for a job as a materials chemist or multivariat
Looking at their first "bug", a little manual inspection shows that it's in the "can't happen" category, even without knowing about hidden information. The code looks like this:
current_provider = conf->providers;
do {
{some safe code}
if (!conf->providers) {
break;
}
current_provider = current_provider->next;
} while (current_provider);
and they identify the second-to-last line as the "possible NULL pointer reference". Note that the "break" before that line will be taken if the pointer is NULL, so it can't happen. In fact, the static analysis could have determined this if it were a little better at propagating values.
First conclusion: subtract at least one "bug" from the 31 defects in Apache. This lowers the rate to 0.51, the same as the "average commercial code" number they quote. Yahoo!
Second conclusion: their static analysis must identify a lot of false positives, if the very first one in the list is one (I would look at more, but I should really get back to work...)
Defect ID 14 in the report complains that:
"the expression pattern++ = '\0' is not a valid pointer."
in this line of code:
1480 *pattern++ = '\0';
Apparently their C parser has incorrect precedence rules; they think the assignment operator "=" binds more tightly than the pointer dereference "*".
So far, every one of their "defects" that I've examined look like non-bugs. This one is not only not a bug, but it doesn't even parse the C code properly.
We can't assume Apache and IIS are roughly equivalent in terms of code defects, and we certainly can't make any assumptions on the OS based on the fragmentary information given by Reasoning.
For one, a large number of the "defects" listed by Reasoning are false positives. Such as warning about dereferencing a NULL pointer where the pointer cannot possibly be NULL due to an action on the previous line.
And second, we have no idea what they compared Apache to or how they got ahold of the source code to these mystery commercial offerings. They could be making everything up, and I'm inclined to believe that they are given the reluctance of commercial providers to disclose source code.
The facts is, IIS has a much smaller market share than Apache according to netcraft and is closed-source so attackers can't just read the code... Yet it's broken more often according to Zone-H and more advisories come out for IIS than Apache according to CERT.
Statistically speaking, IIS must have a much higher incidence of severe defects.
Your comment was not insightful. It was misleading.
Rewriting something from scratch is unfortunately the surest way to introduce new bugs. If the architecture was completely flawed it'll pay off eventually, but I would NOT trust the first iteration, just like I wouldn't buy a new car in its first model year. No one's had a chance to really hammer on it yet.
When you add IIS's considerable lead in private intranet hosting
/Esben
Tell me if it so private how can people hack it. INTRANET. means that the people who has access/use are very limited. How can it be a goal for hackers if it is so limited?
"Nobody really checks their email any more. They just delete their spam"
I modded you up cause you had good knowledge in your post, but why are you AC?
Now all my moderation in this thread is gone. Gimme a break Slashdot, why can't moderators post AC in threads they moderate? Are they going to put all five points on their AC post? It's going fucking show up in meta-moderation and they're going to get screwed anyways.
Fuck you Slashdot. THINK a little bit about the implementation with checks and balances before you rush to check out your new affiliate check from ThinkGeek.
Let me point out that the problem phrase is "security through obscurity". Both security and obscurity are useful.
Obscurity isn't hiding it in a vault somewhere in New York, and telling you to try to read the letter. Obscurity is when you don't want anyone to read the letter, and so you bury it in a vault and don't tell anyone.
But you can't depend on obscurity. You especially can't depend on obscurity if you're trying to sell a product. So if that's so, you'd better have security in addition to any obscurity you have.
Correct Horse Battery Staple: 72 bits of entropy. Enter "Correct H" into google. When it generates the phrase, that's
Apache, in contrast, reads its configuration at startup, and uses it to determine which modules of code are loaded, and how they are used to process requests - fixing the behavior of the web server at startup.
Not true. The command "killall -SIGUSR1 httpd" will tell apache to reread it's config file.
Changes can quite happily be made to the server configuration without having to restart it.
Free software licenses work in because of copyright law. Copyright law says that you cannot copy the code, but the authors grant you exceptions under contract law if you obey the terms of the license.
However, quoting excerpts is considered fair use under copyright law, so they can ignore the license.
I'm not a lawyer either.
Hahahaha!
If I recall the original name of Apache, it was a play on words of "a patch." Considering that it is "a patch", the results is really not that surprising when compared to its commercial counterparts. The good thing is that it's free. Yay!!!
hahahhahaha
have you seen how many security exploits have been released for apache over the years?
No one should trust apache.
Why can't somebody write a little preprocessor that fixes all the warning-type code prepending stuff like
... }
if (NULL == myPointer) {
where necessary so as to minimize the count of these sorts of "errors"?
seems trivial to me, but i'm not that good a coder
hOW hard is it to just fool the thing by making a preprocessor that "corrects" "incorrect" code (ADDS THE CODE NECESSARY E.G. ETC). Wouldn"t that be easy?
Of course, they were all fixed within 7 hours.
You have not heard of the Design by Contract paradigm? Because this is what we're talking about here.
In design by contract, the callee makes up the contract (precondition) and the caller promises to never violate the precondition. If you do, it's an error in the caller.
You can choose to enforce the precondition by checking it in the routine. When your software is validated, you can remove this runtime checking. But with design by contract, you never replicate the check of the precondition.
After you have started practicing this paradigm, you will see that your code is clearer, is less cluttered with tests, and more robust than it can be with defensive programming.
Just to update those interested, I contacted Reasoning and asked why they used a pre-release development version of Apache to analyze, and their response was that they are hoping to show the impace of peer inspections over time, and they will be posting more inspections of Apache as the code matures. I think it's great that the unstable development branch has only as many defects as the commercially available web servers... Should be interesting!