450 Million Lines of Code Can't Be Wrong: How Open Source Stacks Up
An anonymous reader writes "A new report details the analysis of more than 450 million lines of software through the Coverity Scan service, which began as the largest public-private sector research project focused on open source software integrity, and was initiated between Coverity and the U.S. Department of Homeland Security in 2006. Code quality for open source software continues to mirror that of proprietary software — and both continue to surpass the industry standard for software quality. Defect density (defects per 1,000 lines of software code) is a commonly used measurement for software quality. The analysis found an average defect density of .69 for open source software projects, and an average defect density of .68 for proprietary code."
"450 Million Lines of Code Can't Be Wrong"
should have been
"450 Million Lines of Code Can't ALL Be Wrong"
Just ask apk!
Propietary defects are ones that may cause financial harm. FOSS defects are ones that cause annoyance.
I know that our code has more defects than we'd consider fixing purely because the CBA isn't there.
Code quality for open source software continues to mirror that of proprietary software ....
That thar is fight'n words, pardner!
Open Source is SUPERIOR!
the very definition of 'proprietary software' indicates you dont have access to the code to calculate defect density, and even if you did you cannot independently verify the code you have is production code. how did the researchers quantify it?
Good people go to bed earlier.
Compelling numbers, friend.
I'd like to have my own personnel verify thi... ah, right.
It must be just me, but if we could simply find all these defects with a scan, why weren't they fixed before release?
definitely more fun than 68 :P
Never antropomorphize computers, they do not like that
"Code quality for open source software continues to mirror that of proprietary software — and both continue to surpass the industry standard for software quality."
What is this third kind of software that is neither open source nor proprietary which is bringing down the average industry standard for software quality? Because if there is only open source and proprietary then they can't both be better than average. Or perhaps the programmers are from Lake Wobegon?
I don't think that it will be that hard to find at least 450 million lines of code that are wrong.
Heck, I can probably find that in my unfinished/abandoned projects folder.
Errors per lines of code may give you a hard number, but that number has nothing to do with the quality of code. It only takes one well-placed error to ruin a piece of software.
If the defect density is 0.69 per 1000 lines of code, then of 450 million lines of code, more than 300000 are wrong. Therefore, so is the title.
Actually, this study does not say anything directly about code quality, because Density = Total Defects Found / Code Size. The problem is with the "Total Defects Found" part. How they are found and how they are reported may differ vastly from one project/company to other. The report sais that the quality of code increases with larger codebases in propertiary projects. In fact, the best you can say is that the metric decreases with larger codebases in propertiary projects. Maybe many of the defects have not been found yet in propertiary projects. Maybe they have less manpower to seek the errors, maybe they just don't care as long anything does not crash. But smaller defects may be in the code. Open source code is more open to "finding the defects", thus possibly obtaining worse "quality" they are talking about in the article. I think this has to be kept in mind when reading the report.
Everyone knows OSS doesn't have defects, it just develops random features
Stuff written in less popular old obsolete languages that is very sloppy yet unique and confusing and difficult to wrap your head around and extremely buggy is a pain and thus it costs companies money to pay someone to do work on it. Not only does it take an intuitive person who can tolerate annoyingly crappy code, but they'd be stupid to take the job in the first place unless the payoff was worth it. Since open source is free and clean and popular, its friggin awesome.
Quality metrics can have unexpected side effects.
"No matter where you go, there you are." -- Buckaroo Banzai
FTA:
The article gives numbers: above 1M LOC, defect density increases for open source projects, and decreases for proprietary projects.
Increasing defect density with size is plausible: beyond a certain size, the code base becomes intractable.
Decreasing defect density with size is harder to understand: why should the quality fairy only visit specially big proprietary projects?
Perhaps the way those proprietary projects get into the MLOC range in the first place is with huge tracts of boilerplate, duplicated code, or machine-generated code.
That would inflate up the denominator in the defects/KLOC ratio.
But then that calls the whole defects/KLOC metric into question.
Why on earth do they choose 2 colours that are hard to tell apart in that graph ? They were black & dark blue. It took me several seconds to work out which was which. Many other reports/... seem to do similar.
In open source, a defect gets fixed when someone feels the urge to fix it. Most of the time it is because it is their own dog food. Many open source projects are actually used by their own developers and they fix the issues that irritate them most. And rest of the bugs are based on impact on other users and passion about the software project
In a closed source project, it is often the bugs that affect the loudest paying customer gets fixed. If it is not going to advance sales, it wont get fixed.
Given this dynamic it is not at all surprising both methods have similar levels of that elusive "quality". I think software development should eventually follow the model of academic research. There is scientific research done by the universities that have no immediate application or exploitation potentials. The tenured academic professors teach courses and do research on such topics. Then as the commercialization potential gets understood, it starts going towards sponsored projects and eventually it goes into commercial R&D and product development.
Similarly we could envision people who teach programming languages to college maintaining open source projects. The students develop features and fix bugs for course credit. As the project matures, it might go commercial or might stay open source or it could become a fork. The professors who maintain such OSS projects should get similar bragging rights and prestige like professors who publish academic research on language families or bird migration or the nature of urban planning in ancient Rome.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
Where everybody is above average. Or is there a 3rd category of software other than proprietary and open?
Dude,
Go in for treatment. Please.
Proprietary, open, and 'subject to endless litigation regarding its status' (SCO and friends), maybe?
Internally-written software that is not being released for ``external'' consumption, perhaps? There's likely far more of that in use than what is being sold for profit or being given away.
CUR ALLOC 20195.....5804M
What, however, are the propensities for it compared to other sources?
Code quality for open source software continues to mirror that of proprietary software — and both continue to surpass the industry standard for software quality. Defect density (defects per 1,000 lines of software code) is a commonly used measurement for software quality.
Since there are two types of software open source and proprietary and both of them surpass the industry standard for software quality, what exactly is the industry standard based on?
The article states that the industry standard is 1 defect per 1,000 lines of code. But at the rates given, open source is 1 defect in 1,449 lines of code and proprietary software is 1 defect in 1,470 lines of code. Maybe it's time to change the industry standard?
Counterintuitively, defect density is actually an INVERSE indication of quality - better quality code will have MORE defects per line.
The reason I say that is because better code has fewer lines per problem. Consider strcpy(), a function to copy array of characters (a C string). You can't use strcpy() in your cd - you're supposed to create strcpy(), copying each element of the array.
Take a moment to consider how you'd write that before looking below.
Roughly how many lines of code did you use to copy an array? Here's what a typical corporate programmer might do:
while (source[i] != '\0')
{
dest[i] = source[i];
i++;
}
So one error in that code would be 1 defect per five lines or so.
Here's all the code you need, what a better programmer would write:
while (*dest++ = *src++);
If the typical programmer and the expert both had exactly one error, the expert would have five times as many bugs PER LINE than the typical programmer! So you're better off with code that has a higher density of errors - better code will have fewer lines per error.
This is the same reason LOC is an inverse indicator of productivity. Yesterday I fixed a junior programmers code tat looked like this:
if ($category = 'rings') {
$page = 'rings.html'
}
if ($category = 'necklaces') {
$page = necklaces.html'
}
if ($category = 'bracelets') {
$page = 'bracelets.html'
}
if ($category = 'loose_stones') {
$page = ''loose_stones.html'
}
if ($category = 'charms') {
$page = 'charms.html'
}
Of course I changed that code to, well, zero lines, I just used the $category variable where he had used the $page variable. Code which accomplishes a task in zero to one line is better software, written by a better programmer, than code that uses eightteen lines to accomplish the same thing.
The more eyes that can view a piece of code the more bugs can be spotted and the better the algorithm development can be. Open source code is also a great way to teach young developers because the best way to learn to programming to is to read code and program, something which can't be easily done by locked down software.
At least half of those lines come from PHP alone! Magnificent!
most "open source" is written by people working for corporations, so this comparison is idiotic in its design.
What are they counting as a "defect"?
Their FAQ lists example, but ends with "and many more".
Which leads us to the question of who set the "industry standard" at 1.0, and what did THEY define "defect" to mean? If it is a standard there should be a standard list of defect types.
Learning HOW to think is more important than learning WHAT to think.
"Coverity Scan service ... was initiated between Coverity and the U.S. Department of Homeland Security"
If software is a "Homeland Security" issue, shouldn't they be focusing on the proprietary software that most consumers, businesses and government agencies are using?
the article mentions "defect" 18 times but does not define it. a few examples in the end but really, what are they measuring?
You can have poorly written code, but a good program.
You can have perfectly written code, but a shoddy bit of software.
Take for example an OS that hangs because the network layer is pegging the CPU somewhere some how.
Vs an OS that continues to be responsive even if the network layer overloaded.
So if they are looking at .69 defect density vs .68 defect density. The community driven software which is designed for an end user vs for a marketing staff to force up-on an end user is going to be close to 100% better.
Most OSS projects, even when used in critical places, don't have the money to pay for this very expensive (AFAIK ~ 50k$) tool. Proprietary vendors get payed for their product and can spend money for this tool or other ways (like more staff) to reduce the defects.
HACK THE PLANET!
It doesn't matter if it's proprietary or open source, the danger is in any system that is compromised.
Homeland security needs to protect infrastructure and other interests that can impact that state of the nation. Something as benign as somebody hacking the AP twitter feed and posting that a bomb injured the president cost the market over $100B. A series of hacking attacks can result in economic or social destabilization.
Software is also built in layers, so some parts are proprietary, others are open, but a vulnerability in either one can cause issues with all parts of the system.
D6 63 0D 70 89 81 BB 8E 7B 7C 5F 5D 54 EA AB 73
...and look how that turned out.
I've seen the guts of a fair bit of commercial code, and it's usually not that great. Couple of stories; back in the OS/2 days I had a customer complain that the OS/2 time API specified you could set milliseconds, but this didn't appear to be the case. Well I just so happened to have access to the assembly language function in OS/2 that did that (IIRC it was shipped on one of their dev CDs) and upon examination it appeared that it was keeping an eye on 2 different interrupts. The first one was a clock tick interrupt that happend every few milliseconds and incremented the millisecond counter by however many milliseconds that was (22 sticks in my head for some reason.) However, a comment in the function stated that these could occasionally be missed, and so whenever the periodic 1-second interrupt rolled around, it would zero the millis out. Brilliant.
Every IBM story should have a Sun story to go along with it, as karmic retribution for that time I was walking along behind some of their engineers and they were dissing on the code quality in the Linux kernel. Yeah well, I've seen the code out of Sun, too. Like that webapp they did where all the authentication routines were static methods. Worked great as long as there was only one user!
If anything, professional programmers are on average worse. I had to clean up behind one, back in the '90's, who didn't realize that C strings were null terminated. Every fucking string in her code (Which she'd apparently NEVER compiled or run) was exactly long enough for the constant string she was assigning. These people sneak into your company and work there until getting caught. Usually they have another job lined up and bail just before getting caught. At least the open source guys ENJOY programming.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
What this tells me is that current business practices are flawed. There are commercial software companies that are able to produce quality code that exceed most, if not all, open source projects. But such companies are not the norm.
Here are some questions we should ask:
1. Does commercial software have a realistic incentive to reach for excellence in coding? Maybe..
2. Does commercial software have enough resources to produce excellent software? Demonstratable so, but..
3. Does commercial software use their resources efficiently and effectively? I suspect not.
“Common sense is not so common.” — Voltaire
OTOH majority of code is not hand written anyway but copy-pasted and/or automatically generated by some fancy tool so of course you have less faults.
Tomorrow's headline:
"Unfortunately, a bug in the code used to detect programming errors caused these estimates to be severely under-estimated..."
Presumably the definition of defect is something flagged by Coverity, a tool similar to lint. We use where I work. It is slightly better than gcc -Wall or g++ -Wall. Being clean in Coverity is like being clean in any such tool. It finds a lot of dangerous code and brain-dead bugs, but can't ensure good or correct code.
If the 300 open source projects weren't randomly selected (and judging from the article, they weren't), they aren't a representative sample of open source projects.
There is no "-1 offended" or "-1 you don't agree with me" mod options for a reason.
There is a big difference between getting paid to show up to work, and getting paid to write solid robust software.
“Common sense is not so common.” — Voltaire
For fuck sake, PLEASE STOP THAT! ` ` and ' ' are NOT proper quotation marks!
First off: I'm glad that people write software and don't care about being compensated for it. But I continually find that open source software is written by programmers for programmers. I'm an engineer, when I sit down at a piece of software, I expect it to work and be able to use it intuitively. Most open source projects write code or come up with gui's that are so arcane it takes you a week to figure out how to use it, I don't have that kind of time and I don't have that kind of time when I work. I've picked up several block diagram programs that are free\open source in the last few days, they are not intuitive and have a huge learning curve just to get them to work. Paid projects have people that sit around all day and says "How can I improve the interface to make it more usable"
http://slashdot.org/comments.pl?sid=3040317&cid=40946043
http://slashdot.org/comments.pl?sid=3040729&cid=40949719
http://slashdot.org/comments.pl?sid=3040697&cid=40949343
http://slashdot.org/comments.pl?sid=3040597&cid=40948659
http://slashdot.org/comments.pl?sid=3037687&cid=40947927
http://slashdot.org/comments.pl?sid=3040425&cid=40946755
http://slashdot.org/comments.pl?sid=3040317&cid=40946043
http://slashdot.org/comments.pl?sid=3038791&cid=40942439
http://slashdot.org/comments.pl?sid=3024445&cid=40942207
http://slashdot.org/comments.pl?sid=3038597&cid=40942031
http://slashdot.org/comments.pl?sid=3038601&cid=40942085
http://slashdot.org/comments.pl?sid=3040803&cid=40950045
http://slashdot.org/comments.pl?sid=3040867&cid=40950563
http://slashdot.org/comments.pl?sid=3040921&cid=40950839
http://slashdot.org/comments.pl?sid=3041035&cid=40951899
http://slashdot.org/comments.pl?sid=3041081&cid=40952169
http://slashdot.org/comments.pl?sid=3041091&cid=40952383
http://slashdot.org/comments.pl?sid=3041123&cid=40952991
http://slashdot.org/comments.pl?sid=3041313&cid=40954201
http://slashdot.org/comments.pl?sid=3042199&cid=40956625
http://slashdot.org/comments.pl?sid=3029723&cid=40897177
http://slashdot.org/comments.pl?sid=3029589&cid=40894889
http://slashdot.org/comments.pl?sid=3027333&cid=40886171
http://slashdot.org/comments.pl?sid=3042451&cid=40959497
http://slashdot.org/comments.pl?sid=3042547&cid=40960279
http://slashdot.org/comments.pl?sid=3042669&cid=40962027
http://slashdot.org/comments.pl?sid=3042765&cid=40965091
http://slashdot.org/comments.pl?sid=3042765&cid=40965087
http://slashdot.org/comments.pl?sid=3043535&cid=40967049
You sure did own apk hard. He's not going to reply because he's crying to his mommy. apk has proven himself to be technically incompetent and unemployable time and time again. Congratulations to brave people like you who put him in his place. I can't believe how thoroughly you DESTROYED him. Great job.
I agree -- apk gets his ass spanked every time he posts. He must have a fetish for being spanked or else he'd stop coming back for more.
You guys are right -- APK completely demolished in this tech debate as he always does. Why does he keep coming back for more when he knows he'll never win?
Defect density (defects per 1,000 lines of software code) is a commonly used measurement for software quality. The analysis found an average defect density of .69 for open source software projects, and an average defect density of .68 for proprietary code.
If its that easy to determine which lines of code are defective, then why not simply allow the detection software to make the fix? For example, if you are certain the code is incorrect, then you certainly must know what is the correct code, or you cannot say for sure that it is wrong.
Let's say two different programs, A and B, do the same thing, and they each have 6 bugs. If program A has twice as many LoC (Lines of Code) as program B, then program A gets the higher score! Program A has half the error density of program B; But program A is clearly inferior, as it uses more memory, uses more disk space, probably runs more slowly, and is harder to debug.
I can easily fatten up any program to use more LoC, and not just with newlines, with real code, that might even be executed now and then. Coverity could, I suppose, counter my sabotage with a code-coverage tool to find the bloat, but there are sneaky ways to fool that, too.
I18N == Intergalacticization
Your "better code" is actually not equivalent (the first loop doesn't copy the nul terminator).
So you're pointing out that the longwinded code doesn't even result in a valid C string. Ergo, the longer code is broken.
<quote> Even if it was equivalent, I don't think I would necessarily call it "better".</quote>
The authors of every major language library call it "better". That example is copy-paste from glibc and everyone from Stroustrup to Knuth has used the same code.
Given that Knuth knows far better than anyone on Slashdot, I'll stick with what he uses.
You're embarassing yourself Jeremiah Cornelius http://slashdot.org/comments.pl?sid=3581857&cid=43276741 since you posted that using your registered username by mistake (instead of your usual anonymous coward submissions by the 100's the past 2-3 months now on slashdot shown in your post I just repiled to in fact) giving away it's you spamming this forums almost constantly, just as you have in the post I just replied to.
Didn't appear to be the case here, 3x in a row http://it.slashdot.org/comments.pl?sid=3725365&cid=43659719 and all of Jeremiah Cornelius' "$10,000 Challenge" spams don't count. They're purest trolling, the only thing he does (like the online pest he is).
He won't answer a simple question http://slashdot.org/comments.pl?sid=3733655&cid=43678303 (since he outright lies on trolling others by ac posts, uses sockpuppets galore, & was caught doing so in the link below red-handed), & he also claims to have worked at Microsoft (b.s., imo because of what I state next), & yet got completely spanked by "yours truly" on actual computer technical information regarding custom hosts files - very fundamental networking & algorithmically oriented stuff no less, real CSC-101 stuff http://it.slashdot.org/comments.pl?sid=3725365&cid=43659719 (which he can't disprove, but it certainly did prove his ac trollings which he outright lied about & by the 100's that he gave away he was doing here http://slashdot.org/comments.pl?sid=3581857&cid=43276741 ).
Bottom-line here? Hey - Unlike YOU, I don't NEED 'support' like you do, sockpuppet master: Facts did you in & they're ALL I need.
APK
P.S.=> You fail Jeremiah Cornelius - you're pitiful... apk