Bug Hunting Open-Source vs. Proprietary Software
PreacherTom writes "An analysis comparing the top 50 open-source software projects to proprietary software from over 100 different companies was conducted by Coverity, working in conjunction with the Department of Homeland Security and Stanford University. The study found that no open source project had fewer software defects than proprietary code. In fact, the analysis demonstrated that proprietary code is, on average, more than five times less buggy. On the other hand, the open-source software was found to be of greater average overall quality. Not surprisingly, dissenting opinions already exist, claiming Coverity's scope was inappropriate to their conclusions."
I scanned through the article, it didn't seem to mention how they tested the top proprietary software. I can well understand that there are are a lot of bugs in open source code since it is written by humans. But human also right the proprietary code. How did they test it?
"Thanks for all the money you paid to us. We've used it to buy off ISO among other things" -Microsoft
Knuth used to have this great offer where he'd send you a check for pi or e or something if you managed to find a bug in his code.
Well, what is a bug?
I doubt he'd send me a check if I told him that TeX doesn't have an easily accessible iconic user interface. No, his concept of a bug is a deviation from the specified functionality.
But what if that functionality is wrong or sucks?
Apple does really well at creating functionality that doesn't suck. They suffer from the same problems of deviations from the spec as much as anyone, but they manage to mold their spec around what users want. Microsoft, to some extent, does the same and they release products that conform to what users want (generally) because they change the spec as necessary when customers demand change.
If you are implementing towards a standard (like most OSS projects with any traction are wont to do), then you are necessarily restricted by what that spec says. If the spec says to do something inane, the standard-follower must implement it that way.
I don't really have a point here except to say that unless they say "this is what we mean by bug", there can be no way to really examine their results.
You are assuming that "a whole lot of people" actually check the code and submit patches for FOSS projects... My guess is that most testing, even of FOSS software, is done with the compiled program, not by reading the source code.
Clever signature text goes here.
"Deanna Asks A Ninja: What is the circumference of a moose?!"
"It's michael pailum with his face in a pie times douglas adams squared."
This answer makes as much sense as the article.
Except "Ask A Ninja" made more sense. And was more accurate. And more entertaining.
Can I just get a Ninja hit out on this guy something so these articles will not make it slashdot anymore?
An open source software is tested by a whole lot of people over the world and everyone is free to take the code and test if. On the other hand in case of proprietary software this is not the case and is tested by far less number of individuals.
That sounds rather idealistic... The coverage on OSS varies a lot. Most is not tested much, and the testing is not systematic and analyzed, but ad hoc. And if a bug is found, many just shrug and think of it as buggy software, but don't do more about it. There is a world of difference between the high standards of large projects like the linux kernel, apache and eclipse vs. the thousands of small projects found on freshmeat or just googling for something to scratch your itch.
The problem is that there are different types of Bugs. things like a typo in a help file, or American spelling vs British spelling, vs a bug were the app crashes the system when installed on a system with an early version of Quicktime are clasdsified differently.
The summary just says all bugs, which is not fair if the proprietary has 5 times the number of critical or super-critical bugs.
"It is a greater offense to steal men's labor, than their clothes"
Somebody please explain to me exactly what kind of software bug can be found by automatic scanning that isn't found by standard debugging and compile-time checks. If a computer can ascertain exactly what the programmer intended to do, why do we need programmers?
The simple answer to this is that they can't. That's the point behind hiring human codeslingers to write applications. Considering that most software bugs are logic bugs (off by one, etc) that can't be directly seen in the code without actually, you know, RUNNING the program, I find it difficult to believe that AI has come to the point where it can guess the coder's intentions and infer the purpose of an application.
Additionally, where exactly did this proprietary code come from? I don't know which companies these programs were released by, but other than the SharedSource initiative which mostly caters to programming students (if they still even have this program), Microsoft would call their next o/s Windows Sh*tburger (I think I'll trademark that) before they would release their code, especially to a third party consulting firm.
I find this article highly suspect
as scanned by coverity.
;)
linux 2.6: 3,315,274 lines of code, 0.138 / 1000 lines of code.
kde: 4,518,450 lines of code, 0.012 bugs / 1000 lines of code.
based on this I would say we are doing pretty good with open source.
but we shouldn't forget that this tool only scans coding errors, not coding logic.
wine for example only has 0.112 / 1000 lines of code as well.
and we all know it by far doesn't always do what we want it to do.
...and while it is on the list on the web page, I was happy to determine that most of the issues they found were false alarms. They found three real bugs, none of which were likely to bite, and even if they did bite it is not exploitable. Nonetheless, those bugs probably wouldn't have been found otherwise, so I was happy for the scan.
Rather than brag (I won't say who I am or the name of my project), I'm just going to sit back and read all the defensive flames from self-appointed "security experts" whose open-source project didn't do so well. After all the flames from these "security experts" that I've endured, I'm going to enjoy watching them squirm.
It's karma.
Why does this surprise anyone? Propriety software traditionally undergoes a formalized, designed testing process. It's not perfect, but it's an ordered approach to boundary testing, design level implementation of quality, and more. Open source software must rely on after-the-fact testing in the form of "this broke when I tried to do this".
In the end, it comes down to black box vs. white box testing. Commercial software has a strong QA engineering component. Open Source software relies primarily on a black box testing approach.
Open source has MANY benefits and MANY advantages over commercial software. This just doesn't happen to be one of them, but unlike the commercial software, the bug fix cycle on open sourced stuff can be a LOT quicker, so it evens out in the end.
While I appreciate that PreacherTom was good enogh to bring this to us, the sentence "...no open source project had fewer software defects than proprietary code." just does not match TFA.
TFA says that no open source project is as good as the BEST of proprietary, but it also says that the AVERAGE open source is better than the AVERAGE proprietary.
No, *popular* open-source software is 5x as buggy as *safety-critical* closed software. The linked dissenting opinion is at least partly right; they're comparing apples to oranges.
Maybe they should try comparing open- and closed-source software that's actually trying to solve the same problem? That'd be a bit more valid of a comparison...
the report (carried by Business Week) said that the Porpriatary software that beat out the open source stuff was avionics software or controls for reactors or other heavy industrial software. That stuff is all small, done in assembly, and extensively tested.
It was not an apples to apples comparison, more like apples to diamonds. Dom't worry, just fix any real problems identified. Many of the bugs found are theoretical, not real. Many others are style questions. the experts will probably never quit arguing about what is 'good' and what is 'bad'.
There are also some very real race conditions, memory leaks, and things like that. A real list by line number would be nice.
That said, what would really matter is a comparison program by program. I don't think my quick and dirty one off is really in the same class as the Kernal, or Firefox,I havn't seen how this diferentiates.
Everybody knows 3 people with my name.
Open-source software is expensive if you want a commercial support contract (because you are asking a professional to spend a lot of time learning).
Closed-source software doesn't have the function that you want, and you cannot fix it to add the funcion that you want.
You pays your money and you takes your choice. You can always stick to pencil-and-paper, and not use this 'software' stuff at all, if you prefer.
The article makes it quite clear that the proprietary software which is much better that open source is mission-critical software. A class of software where ensuring minimum bugs is a top priority, and also a class of software which mostly does just not exist in OSS. If you are an OSS developer, would you try to develop open source air traffic control software? And even if yes, how would you do it anyway?
Basically, my own conclusion from reading the article was that it IS possible to write excellent software with very few bugs, if that is a top priority. And, that the author seems to say that while mission-critical software (which happens to be proprietary) is fortunately much better than the rest, among all that other non-mission-critical software, open source tends to be better than proprietary.
Not surprising, and quite encouraging...
they tested it by using a program that systemattically scans code for common errors.
A method known to have flaws. It raises a ton of false positives, things that might "look like" potential bugs but aren't because of the data flow. You have to do a data flow analysis to see if they really are bugs.
For example, not checking for buffer overflows when copying strings, etc, is usually considered a (potential) bug. Certainly it is when dealing with unknown input. However, in a function buried deep behind layers of other code that has already filtered out potentially overrunning strings, it is quite safe. (At least until said code is changed, so it is considered bad practise, but the fact remains that if there's no data path that can cause a problem, it's not a bug.)
Mind, there's bugs and there's bugs. The Space Shuttle software is generally considered to be about the most defect-free software around, but they'll launch with some bugs. Specifically, minor bugs in the display (ie a misspelled word, a column that doesn't quite align) are allowed because they're a much lower risk than going in and changing the software to fix them (and possibly introducing a different bug).
-- Alastair
He's comparing "bugs" in a project such as Apache with "bugs" in the software controlling a jet engine on an airplane.
He refuses to accept that different projects have different requirements. When the project results in people dying if it fails, you spend a LOT more money and time finding all the "bugs".
When the worst that happens is that you don't see a web page, your money/time requirements are not so high.
Even so, from his finding, Open Source is, on average, better than the closed source projects (not counting the closed source projects that result in loss-of-life in the event of a failure).
He's an idiot for confusing the different requirements.
The selection of programs from the two populations of programs (open source, proprietary) are not going to be comparable: vendors of proprietary software have a say over which code gets scanned, and they are going to select a different population of programs than the company selected for open source projects. This isn't a fixable problem: there is no way of doing this sort of study so that you can compare the two data sets. The best they could do is compare something like OpenOffice against Microsoft Office, or Apache against IIS.
Furthermore, Coverity simply cannot accomplish what they claim to accomplish: there is no way of detecting "bugs" automatically--if there were, compilers would already be doing it. Coverity effectively does little more than compare code against a set of internal coding conventions; that can be useful if it's done right, but it's not a measure of code quality. Some completely correct code will score thousands of violations against their tool, while other code may contain thousands of bugs, none of which register. Furthermore, it is likely that a lot of their customers are Windows based and that Coverity is biased towards Windows-based coding conventions, giving more false positives on non-Windows code. Before publishing such comparisons, Coverity first would need to demonstrate that their tool does not contain such biases.
Finally, and perhaps most importantly, the company isn't publishing its data, so nobody can verify or even evaluate their claims. Not only do they fail to publish their raw data (obviously, they can't do that for proprietary software), they also fail to list their summary statistics by vendor and project (which they could, but obviously won't do). They don't even give a summary statistic by class of application, class of organization, and code size. Their results are meaningless because they're not reproducible.
These numbers tell you nothing about FOSS code quality relative to commercial code quality. What they tell you is that Coverity apparently doesn't know how to do statistics, misrepresents what their product can do, and doesn't know how to report experimental results properly. Now, do you want to put your trust in such a company?
This is just smart marketing. Imagine they put up a survey that did not make any controversial claims (something like, open source and proprietary software are comparable), then would that generate as much heat? Now many people hear about the company because more people talk about this now than if the survey said something less controversial.
Now to compare every open source software application to aerospace software is really comparing apples to oranges. There is a big difference in the expected quality between an editor and an aerospace application. It's alright even if my editor crashes once in every 20 times I invoke it. Is that acceptable with an aeroplane?
I'm sure the folks at Coverity understand all this. But if they really speak what is right, they will not get all the eyeballs and publicity. In classic slashdot lingo:
1. Do something (anything) that involves open source and proprietary software
2. Make claims that sound outrageous / controversial
3. Profit! (with all the free publicity)
I'm much more funny, interesting and insightful than the moderators think
- This is the typical meaning behind the word "bug".
- That's where your TeX complaint would fall under. It's "by design" that it doesn't have an iconic user interface, but that doesn't mean it's something that shouldn't be addressed ever
- This is actually a result of the bug tracking system that we use. Rather than sending e-mail, which often gets lost, we often track work items as bugs. For example, "Need to turn off switch X on the test server when we get to milestone Y"
To further complicate things, there is a severity and priority attached to every bug. Severity is a measure of the impact the bug has on the customer/end-product. It can range from 1 (Bug crashes system) to 4 (Just a typo). Priority is a measure of the importance of the bug. It ranges from 0 (Bug blocks team from doing any further work, must fix now), to 3 (Trivial bug, fix if there is time). (I don't know why the ranges don't match, BTW, seems silly to me)
As anyone who works on large-scale project probably knows, there are always a wide range of bugs, across all the pri/sev levels. To me, a simple count of all the bugs isn't terribly useful. A project could have a ton of bugs, but most of them being DCRs (which are knowingly going to be postponed till the next release) and/or low pri/sev bugs. Or maybe it's the beginning of the project and they're all known work items. Or a project could have only a few bugs, but with all of them being critical pri/sev ones.
So, whenever I see a report that simply talks about bug count, I take it with a huge grain of salt. If I had to guess (I skimmed the article), it seems like OSS projects have far more bugs, but perhaps lower pri/sev since the product itself has been evaluated as being higher quality. In the end, it's the quality that the customer really cares about.
-- jchenx
The '11 of the top 15' is also grossly misleading. There were twice as many proprietory projects as OSS projects - you'd expect them to take 10 out of the top 15 slots. Deviation by 1 from that is lost in the noise.
I agree with your analysis - I've been on the fixing end of a lot of these kinds of reports, and have known that the flagged error can never occur, but the linting nazis insist that there must be zero warnings at any cost.
I thought Voyager was the ultimate in stable code, not the space shuttle?
FatPhil
Also FatPhil on SoylentNews, id 863
> I guess we cannot blame Micro$oft (directly) for this
Or indirectly, or even mention them anywhere on the same page.
Just because something shows Open Source software in a bad light don't automatically assume that Microsoft is behind it.
I don't think Microsoft actually care that much about 99% of Open Source software out there. They probably even use a whole bunch in house.
There's a whole lot less conspiracies in the world than you think.
What they have not done is compare comperable projects -- IE to Firefox, Open Office to MS Office, Windows to OS-X to Linux-KDE. There is, as far as I know, no Open Source software product that is really intended for mission critical applications -- I guess maybe SSH might qualify, but I don't see it in their list.
So, I think what we have here is a comparison of Apples to Turnips using a dubiously calibrated error-o-graph machine that uses an unknown technology to perform undefined tests on software.
Don't get me wrong. I sure as hell wouldn't run a nuclear power plant with Linux-X-Windows-whatever. Nor with Windows -- neither Windows 9 nor NT based Windows. They don't meet my admittedly subjective standards of quality either. But if we waited for near perfect software quality, we'd still be trying to get text mode right. Personally, I 'd vote for that because I think building major structures on weak foundations will likely lead to big trouble a decade or three out, but I think I'd lose that vote about 93 to 1 with maybe 6 abstensions.
You can't see ANYTHING from a car, You've got to get out of the goddamned contraption and walk...Edward Abbey
And n00b developers are also capable of finding bugs. Aren't they? .... you have test cases for the software? So you run them after your refactoring? What now? All pass as before? Oops, if so: then you had no test case for that piece of suspicious code you just have fixed! So you still don't know if there was an error or not!
No they are not to the extend of a experienced developt.
going through the code dow not find bugs. Either you do a formal correct approach, that is a walk through or a code inspection then you may find bugs, or you only have the chance to find occasional off by one errors in a loop or array index. Just by looking over code as you say in your n00b appoach you only find suspicious pieces of code.
What now? You change it to be less suspicious? And then? You commit it? So you don't know if somethign elsewhere is breaking now because of your change? Ah
Testing means to DEFINE how individual pieces of code should behave and writing a test case exactly for that. Changing software and fixing bugs means to have tests, lots of tests, not eyeballs.
angel'o'sphere
P.S. that does not mean that formal walk throughs / inspections don't work, they do!! But informal ones are only for educational purpose intersting.
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
Now these days you often get studies claiming that proprietary software is less buggy than free software, but it misses some very significant points, the ones we used to respond to MozillaQuest articles still apply very much to today:
- Free software projects very often have an open bug database so it's easy to see how many open bugs are in a project, most proprietary software doesn't have an open bug database so you have to trust the manufacturer and your own testing
- Not all bugs in open databases are really bugs. Some are requests for enhancement, some are duplicates and some are rants
- In some cases one persons bug may be another persons feature (e.g. if an application does something differently to the platform guidelines, some people may like this alternative behaviour, others will consider it a bug).
- The profit motive - companies have a lot to lose by letting people know about bugs, volunteer led projects tend to want people to know about bugs in the hope someone will help fix them (this is getting a bit blurred now that more and more organisations are making money off free software but the fact still is with proprietary software you can't fix the bugs so they gain nothing by telling you about them)
Sorry if this is redundant, I'm working on call at the moment and was halfway through typing this when I had some work to do!Quality of Programmers is critical...
Which would you rather have 100 monkeys programming on a project or 10 skilled programmers?
More programmers and more 'eyes' on a project does not mean it is going to be inherently more bug free. In fact with a group of bad programmers in the mix, it can cause severe harm to a project.
I'm not knocking Open Source, but for people to just expect it to be better because more people have access to work on it, have obviously not met as many programmers as I have.
There are a lot programmers that put time into project (and yes Open Source) that have no business developing a VB application for a 10yr old kiddie game, yet they are taking part in large scale coding projects that truly would be better off with them not working on it.
When working with XWindows years ago, I ran into a few people that scared the hell out of me and other people. They had no vision or scope past the specific things they were trying to do, and would often come up with modifications or 'features' that would break more than it added to the project.
In the Windows world of 3rd Party developers I have also found 100s of people I wouldn't want them to develop Hello World. As they had no concept or regard to security, Unicode or many other features that would fail when the applications would run on a non-English system with a user having administration privledges.
You can even find many commericial products in the 3rd party Windows world that also have these problems, but are yet produced by big companies are popular products.
I wish that all ideas would be welcomed into a project, but the people having the final say could trump crap programmers and crap ideas if they are detrimental to the project.
When you look at the Linux Kernel or BSD, you can quickly understand why Linus and others don't want to let the 'deciding' control into the masses, or both of these core OS would become crap in a matter of months of unregulated programmer additions.
SIGSEGV caught, terminating
wait... not that kind of sig.
Coverity's study is based on their analysis of the code itself, not on bug reports, so the considerations you mention are not relevant.
A guess is that this is because much of emacs' functionality is implemented in elisp code, which is not part of the core program and so not included in the source line count, whereas most of vim is implemented directly.
I just ran across an interesting blog entry by Federico Mena Quintero that linked to two interesting papers : When should a test be automated? and Classic testing mistakes.
Automated tests are often more expensive to write than manual tests, so when should you do automated tests versus manual ones? The answer is not always automated tests. Also typical unit testing is only part of what kind of testing needs to happen.
Using Ring Buffer Logging to Help Find Bugs was also another good link that Federico had in another entry.
openoffice code is a mess because i is very old... remember that there was a staroffice in DOS time, the same code was update over and over and over before release to the opensource community...
since then many people try to clean it, but its hard and risky to clean a such big app
most projects have a coding style that everyone should follow, and many force you to comply if they want their code to be accepted
Higuita
Coverity, of course, knows that reports like this will be written up in exactly the way this summary was, clearly associating their company with the idea of enumerating the bugs in a piece of source code. While not illegal, this type of marketing is of course deceptive; while published papers describe the type of defects (or non-defects) actually detected, the overwhelming volume of commentary will reflect the broader, and incorrect, view that Coverity == bug-finder. It would be just as meaningful (which is to say, not very) to publish the number of lint warnings or missed opportunities to qualify pointer arguments with the const keyword, and neither would require an expensive piece of overhyped software.
Just Say No to Coverity's marketing gimmicks.
The codebase is very old, contains a bunch of legacy stuff nobody really understands, as the codebase has passed hands from a German company to Sun to the OpenOffice.org foundation. It's also picked up a layer of java along the way (for whatever reason).
It's too bad because it actually works kinda okay, but it's a real effort to get your hands dirty with.
Blender is also like that... it seems when a codebase has 'gotten around' it tends to pick up the bad habits of all the hands its been through.
MySQL is a bad state because it's really only developed by MySQL AB -- no one else is contributing to it so they have no reason to make it any more maintainable than it is. PostgreSQL, on the other hand, had the luxury of being the fruit of some academic research projects and was rewritten once or twice, so it's a little more maintainable.
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON