Code Quality: Open Source vs. Proprietary
just_another_sean sends this followup to yesterday's discussion about the quality of open source code compared to proprietary code. Every year, Coverity scans large quantities of code and evaluates it for defects. They've just released their latest report, and the findings were good news for open source. From the article:
"The report details the analysis of 750 million lines of open source software code through the Coverity Scan service and commercial usage of the Coverity Development Testing Platform, the largest sample size that the report has studied to date. A few key points: Open source code quality surpasses proprietary code quality in C/C++ projects. Linux continues to be a benchmark for open source quality. C/C++ developers fixed more high-impact defects. Analysis found that developers contributing to open source Java projects are not fixing as many high-impact defects as developers contributing to open source C/C++ projects."
Sunlight is the best bleach.
Java project developers participating in the Scan service only fixed 13 percent of the identified resource leaks, whereas participating C/C++ developers fixed 46 percent. This could be caused in part by a false sense of security within the Java programming community, due to protections built into the language, such as garbage collection. However, garbage collection can be unpredictable and cannot address system resources so these projects are at risk.
This is especially amusing in light of all the self-righteous bashing that C was getting over OpenSSL's problems. Seems it's true that using a "safe "language just makes the programmer lazy.
Always proprietary ( your own code )
First, we shouldn't confuse Coverity's numerical measurements with actual code quality, which is a much more nuanced property.
Second, this report can't compare open source to proprietary code, even on the narrow measure of Coverity defect counts. In the open source group, the cost of the tool is zero (skewing the sample versus the commercial world) and Coverity reserved the rights to reveal data. Would commercial customers behave differently if they were told Coverity might reveal to the world their Coverity-alleged-defect data?
Again, having good Coverity numbers can't be presumed to be causally related to quality. For example, Coverity failed to detect the "heartbleed" bug, demonstrating that the effect of bugs on quality is very nonlinear. 10 bugs is not always worse than 1 bug; it depends on what that one bug is.
Yeah, I have seen the source code to the Windows 7 OS, CISCO's iOS and LINUX of course.
They all suck equally.
However, that being said, I am currenrlty running a version of the LINUX OS I built and modified for my customers use in a PostGRES server which is quite large.
Open Source wins again because I can correct the suck. :-)
Got Geometrodynamics? Awe, too hard to figure out? Too bad.
From another article/puff-piece on the site:
"Since 2006, Python has achieved a defect density of .005 (or .005 defects per 1,000 lines of code) and has eliminated all high-risk defects [that they recognized] in its codebase."
So why didn't this scan catch it? Scans are not good enough for everything. It was a weird memcpy call.
The problem often with open source is culpability. I'm not going to yell about the openSSL guys. They are good guys...at least from a coding perspective and from what i hear from a human perspective as well.
If it was MS you bet everyone will be yelling and there is someone you could actually sue.
This sounds like a MS report claiming there stuff is great. I think we need for more humility here and less deflection.
Also, fixing is great an all, but how bout not producing in the first place. Shouldn't these scans be part of the builds before stuff even gets into git.
The report doesn't really go into an important measure.
What is the defect density of the new code that is being added to these projects?
Large projects and old projects in particular will demonstrate good scores in polishing - cleaning out old defects that are present. The new code that is being injected into the project is really where we should be looking... Coverity has the capability to do this, but it doesn't seem to be reported.
Next year it would be very interesting to see the "New code defect density" as a separate metric - currently it is "all code defect density" which may not reflect if Open Source is *producing* better code. The report shows that the collection of *existing* code is getting better each year.
First, we shouldn't confuse Coverity's numerical measurements with actual code quality, which is a much more nuanced property.
Yeah, but good quality might well correspond to some sort of measurable anyway. Provided you've got the right measure. Maybe some sort of measure of the degree of interconnectedness of the code? The more things are isolated from each other, across lots of levels (in a fractal dimension sense, perhaps) the better things are likely to be.
Maybe that would only apply to a larger project, and I'm not sure what effect system libraries (and other externals) would have. Yet the fact that it might be a scale-invariant approach makes me a bit more hopeful, as it wouldn't be so susceptible to the "ravioli code" problem, where the code's nicely packaged up into little pieces, but the pieces interconnect in a horrible mess of higher-level spaghetti code. Worked on a large project? You'll have probably seen it in the wild. (Yeah, I've had people argue to me that their code didn't use goto and so it had no spaghetti code problems, despite the fact that everything was so nastily interconnected that nobody else could understand it. If that's not indicative of a problem, what is?)
"Little does he know, but there is no 'I' in 'Idiot'!"
Stuff that matters?
I'm not convinced that they can ever truly account for bias in the data in this case. With open source, a far greater quantity of code can be analysed, and as such you are likely to eventually approximate the mean "code quality" of all software.
With proprietary code, the ratio being analysed is in all likelihood a far smaller subset of the whole, and so is far more susceptible to outliers dragging the quality metric (regardless of whatever that is) one way or the other.
I would expect both "open source" code to be of approximately equal quality to proprietary code. In each ideology you will get people who care (about quality), and people who don't, in approximately equal proportions, the same with skill, ingenuity and passion for the work.
There are such a massive range in quality across all code, that drawing a generic "x is better" conclusion is ridiculous at best, and purely the realm of the uptight illogical zealots on either side of the debate.
Coverity: Hey you, proprietary software developer with the deep pockets. Yeah, you. We've got this great tool for finding software defects. You should buy it.
Proprietary software developer: get lost.
Coverity: Hey, open source dudes, we've got this great defect scanner. Want to use it? Free of course!
Open source dudes: Meh, why not?
Coverity: Hey proprietary software developer, did we mention those dirty hippie neck beards are beating the stuffing out of you in defect (that we detect)-free code?
PSD: Fine, how much?
That's because C/C++ developers don't count on the garbage man to take their trash out.
This is a useless analogy. Code Quality is a function of both skill and the stewardship of the team supporting the code. Tools help as well but you can write some elegant, high quality code regardless of the language chosen. You can also write some real shit too but ultimately how many defects a piece of software has comes down to the design and testing that goes along with it. Some bodies of work get rigorous testing and it's not like OpenSSL's recent problem wasn't about deficient design it was about a faulty implementation. Faulty implementations in logic happen all the time and there are some bugs that just take awhile to become known. I mean even with test driven development and tools for code analysis probably couldn't have found this particular issue but considering how long it was in the code base without somebody questioning it goes back to not only stewardship by the team but the rest of the world who are using the code. If anything this situation points out that FOSS can have vulnerabilities just like proprietary software however the advantage is that with FOSS you can get it fixed much more quickly and because other people can see the implementation it can become scrutinized by folks outside the team that develops and maintains it.
In the case of Heartbleed the system works. A problem was found, it was fixed it's now just a matter of rolling out the fix and regressions are put into place to help insure that it doesn't happen again. The repercussions of what it means is that another gaping hole in our privacy was closed and that "bad guys" may have stolen data, rollout the fix ASAP. Your guess is as good as mine as to what was stolen is a matter of research and conjecture at this point. I doubt that the bad guys will tell us what they gained by exploiting it. Let's also be sure that until the systems with the bug are patched, they're vulnerable so cleanup on aisle 5.
To be honest it's a bit naive if we all assume that FOSS software that handles security doesn't have potential vulnerabilities. Likewise it's also naive to assume that proprietary code has it licked as well given the revelations of NSA spying for the past year. Given that there are numerous nefarious companies that sell vulnerabilities to anybody who can pay for it, that means unless you're buying them you probably will never know what is exposed until somebody trips over it. What this means for everybody that you can depend on is when those vulnerability-selling companies are out of business can assume that your software is free of the easier to exploit vulnerabilities; governments will always use all their tools to get intelligence including subverting standards and paying off companies who can give them access to what they want.
Harrison's Postulate - "For every action there is an equal and opposite criticism"
Now all we have to do is get Dice to Open Source their stuff so we can FIX IT!
The more complex the task, the simpler the steps need to be.
The more things are isolated from each other, across lots of levels (in a fractal dimension sense, perhaps) the better things are likely to be.
Language has a lot to do with that.
If your project is written in a managed language, allocated memory is always initialised first, there is no pointers arithmetic and array bounds are always checked, so it's impossible to read random data from memory.
If your project is written in C, all code has access to all memory.
with nearly 2x the LOC.
Java calls C for anything performance-critical, anyway.
Code repositories were compromised by the NSA (or other capable group)
"If any question why we died, Tell them because our fathers lied."
"My name is Linux Torvalds... and I pronounce him 'Linus'...".
"If your project is written in a managed language, allocated memory is always initialised first, there is no pointers arithmetic and array bounds are always checked, so it's impossible to read random data from memory."
Except when you forgot to remove some reference to an object, so it's still stitting around in a list somewhere because it can't be garbage-collected, and some code then uses whatever objects happen to be in that list.
No language is safe for an unthinking programmer to use.
Good news: Open source has good quality
Bad news: Sometimes we get a bug that that affects most of the internet (Heartbleed)......
Just kidding
http://saveie6.com/
That would be: ... the fix will be obvious to someone.
Given enough eyeballs, all bugs are shallow
Eric S Raymond
Although ESR called it " Linus' Law", it's ESR's writing, from CATB. Linus has a completely different concept that he calls "Linus' Law". Linus talks about motivations for what we do.
With all the noise about OpenSSL lately, running this Coverity test on it (and other security software like GNUTLS) and sharing the results seems like it would be a good thing...
Your four-sentence comment has five glaring errors that make it obvious that you have absolutely no idea what you're talking about. You very much remind me of the job applicant who told me he has experience in C, C+, and C++.
If you have good quality people, especially a good leader, your code will be good.
Even if the people are relatively inexperienced.
At this point, just about everything in IT/CS is a research project, not innovation.
So it's a matter of diligently doing the work based on past archetypes.
Futurist Traditionalism
Of course for the poster child of all code proprietary and closed, even secret, look at Microsoft. Their record of vulnerabilities upon vulnerabilities goes back well over a decade and there seems no end in sight for this circus.
The fiasco with OpenSSL is due to everyone giving zero dollars, thanks for the freeware.
Coverity is no the best "yardstick". Too many false negatives and too expensive.
Some open source projects will have better code then closed source projects and vice vesa, you can't just make a clean line.
That's an interesting thought. Had it been typed, it might be a typo. I was thinking of a guy who said that, out loud, face-to-face. That's not the only comment that made it clear he was claiming four times as much as he in fact knew.
Of course, in a interview I give someone leeway - my mind went blank once in an interview when I was asked "what are the four pillars of object oriented programming?". At the time, I could have implemented objects in C using the preprocessor*, but interview stress caused a brainfart. This guy was obviously clueless and trying to BS his way through it, though. Perhaps hoping he'd only be interviewed by a manager who wasn't a programmer.
* thanks to Perl for teaching me objects from the inside out. Understanding Perl's implementation of objects, I could see that language support for objects in 98% syntactic sugar, object.method() is the same thing as function(* object), where "object" is an associative array aka namespace aka lookup table, plus a list of class names it has.
are you sure about that?
that's valid C#, all you need to do is inject something like that into the codebase and let the JIT compile it (using all the lovely features they added to support dynamic code) and you're good to get all the memory you like.
Now I know the CLR will not let you do this so easily, but there's always a security vulnerability lying around waiting to be discovered that will, or an unpatched system that already has such a bug found in any of the .NET framework, for example this one that exploits... a "buffer allocation vulnerability", and is present in Silverlight.
The moral is ... don't think C programs are somehow insecure and managed languages are perfectly safe.
Remember that Linux itself is written in C.
So you can't use Linux any more, since it is just C.
A managed language would not have protected against Heartbleed, because the program maintained it's own freelist to prevent memory from being unallocated. If it did not do this then being written in a managed language would have prevented Heartbleed - but then again, if it did not do this then the C code wouldn't have been vulnerable either.
Guess which language the JVM is mostly written in? Dumbass.
The only problem with this is, of course, that what they claim to be doing (automatically examining code for defects) is literally impossible.
...home?
Most people will put more effort into something that will be public (both out of positive motivation and the negative motivation of shaming.)
Open Source will always, in general, be better than closed source. Again - in general. There are people who will engineer things properly irrespective of whether or not someone will be browsing your github account or checking it out of the company's private server... Too bad there's not more of them ;).
Loading...
One advantage Open Source has is that there are no deadlines and a good project leader can simply reject sub par code. For commercial code no company is going to pay a programmer big bucks and simply throw away his output because it sucks.
This is really silly without more insight on the data.
Open Source code quality varies a lot, just like proprietary code.
Some of the open source code gets much less scrutiny than (internally) peer-reviewed code in a proprietary setting.
This idea that we can compare "Open Source vs proprietary" and say what the "best" is is childish.
See subject above.
Yes, so your argument is that you can, with great difficulty cause a possible security issue in C#, but in order to do so, you have to basically say... I'm about to do something possibly bad, please don't check to make sure what I'm doing is bad. Then modify the compiler from default to allow said code to be compiled, then put it into a fully trusted assembly so it bypasses all security checks, and THEN you might have an issue.
and this is comparision to where in C/C++ where you can write an exploit in 2 lines of code by accident, using nothing but defaults.
Another problem with the comparison is that the average closed-source project is four times as big as the average open-source project. I'd expect defect density to go up with size of codebase. (Of course, this may not be an issue with what Coverity detects, but if so that emphasized that Coverity doesn't find all the important defects.)
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
How can software that has any bugs be considered as good quality??? I guess that if guns are legal in your country, then buggy software may be too.
This is the same broken metric that Coverity has been mis-using year after year.
"Defect density (defects per 1,000 lines of software code) is a commonly used measurement for software quality, and a defect density of 1.0 is considered the accepted industry standard for good quality software."
In other words, if you double the size of the code base by adding no-op code, you increase your quality score.
Also, if you leave the bugs in, but reduce code size, you are reducing your quality score.
I18N == Intergalacticization