Code Quality In Open and Closed Source Kernels
Diomidis Spinellis writes "Earlier today I presented at the 30th International Conference on Software Engineering a research paper comparing the code quality of Linux, Windows (its research kernel distribution), OpenSolaris, and FreeBSD. For the comparison I parsed multiple configurations of these systems (more than ten million lines) and stored the results in four databases, where I could run SQL queries on them. This amounted to 8GB of data, 160 million records. (I've made the databases and the SQL queries available online.) The areas I examined were file organization, code structure, code style, preprocessing, and data organization. To my surprise there was no clear winner or loser, but there were interesting differences in specific areas. As the summary concludes: '..the structure and internal quality attributes of a working, non-trivial software artifact will represent first and foremost the engineering requirements of its construction, with the influence of process being marginal, if any.'"
That is if you can figure out which of the 12 links are the actual FA and which are supporting material.
..that Open Source code is of quality, but at least the point of things like the GPL is that you have the power to change that, and improve that code..
"The OpenSolaris kernel was a welcomed surprise: it was the only body of source code that did not require any extensions to CScout in order to compile."
Given that the Solaris kernel has been compiled by two very different compilers (Sun Studio, of course, and gcc), it isn't that surprising. Because of the compiler issues, it is likely the most ANSI compliant of the bunch.
If I am understanding correctly, you were looking for 'winners' and 'losers' (weasel words in and of themselves, but anyway...) in terms of 'quality' (another semi-subjective term that could make someone go crazy and drive motorcycles across the country for the rest of their lives).
You found that '..the structure and internal quality attributes of a working, non-trivial software artifact will represent first and foremost the engineering requirements of its construction, with the influence of process being marginal, if any.' -- or in plain English: "the app specs had a much bigger influence when compared to internal efficiencies".
I would wonder if you're just seeing a statistical wash-out. Are you dealing with data sets (tens of millions of lines and thousands of functions) that are so large, that patterns simply get washed out in the analysis?
Oh dear, my post is no more clear than the summary...
davejenkins.com |
It's not a very good summary, but the paper is well-written, which is interesting considering that the author is the one who submitted the summary to Slashdot. I suspect that he assumes we have more familiarity with the subject than we actually do.
I'm sorry, but if this is what passes for serious academic computer-science work, close the schools. This all appears to boil down to: quality code (definition left to the reader) is produced by good programmers (can't define, but I know one when I see his/her code) who are given the time to produce quality code. Rushed projects by teams of average-to-crappy programmers results in low-quality code. All the tools and management theories in the world have little impact on this basic fact of life. My PhD, please?
It's obvious what the results would be.
Half completed, unpolished commercial software usually stays unreleased and safe from this sort of scrutiny. However many of the same types of projects get left out in the open and easily visible to everybody when developed as open source. The low code quality of these projects would drag down the average for open source projects as a whole.
On the lighter side, you could say that you'd only consider software that was "out of beta" or version 1.0 or greater, but that would leave out most open source projects and commercial "Web 2.0" products....
If good code and bad code were a simple automated analysis away, don't you think everyone would be doing it? What methodolgy could possibly give a quantitative weighting for "quality"?
"To my surprise there was no clear winner or loser..." Not really a surprise at all, actually.
People make claims about the need for closed source all the time, usually revolving around the need to a predictable level of quality, or some other factor. The fact is, this results proves that its a wash whether you choose open or closed--so why not choose open?
There's a deep significance here I'm failing to capture completely. Someone else word it better if they can. But there didn't need to be some blow-out victory of open source over closed source for this to be a victory. All open source needed to do was compare--which it did, clearly--with closed source, in terms of value, to secure its worth.
I didn't write the last part when I submitted the story, and, yes, the summary given here is comprehensible, because it appears out of context. What the sentence '..the structure and internal quality attributes of a working, non-trivial software artifact will represent first and foremost the engineering requirements of its construction, with the influence of process being marginal, if any.' means is that when you build something complex and demanding, say a dam or an operating system kernel, the end result will have a specific level of quality, no matter how you build it. For this reason the differences in the software built with a tightly-controlled proprietary software process and that built using an open-source process are not that big.
Using one of the tools you propose, you will still not obtain results regarding the analysability, changeability or readability of the code.
There is a company that, at the heart of their business, exists a 6000 line SQL statement that no one understands, no one can modify, and occasionally doesn't work without anyone knowing why but a restart of the program seems to take care of it.
It has lasted that way for a very very long time.
Is it good code simply as function of its survival and (sort of) working?
I tend to think of good code like good engineering or good architecture. Surely you wouldn't define good architecture as "a building that remains standing," would you? The layout of the rooms, how well that space is used, how well it fits the needs of the users, how difficult it is to make modifications, etc all factor in to "good design" and have nothing to do with whether the building "works."
I am not sure you can put a metric to it anymore than I could put a metric to measuring the quality of abstract expressionism or how well a circuit is laid out--there may be metrics to aid in the process, but in the end one can't necessarily assign a numerical rating to the final outcome for the purpose of rating.
That doesn't mean that there isn't such a thing as good quality and bad quality code.
Integrate Keynote and LaTeX
"You can either have software quality or you can have pointer arithmetic, but you cannot have both at the same time."
How useful is it to write something about computers that needs to be translated for the slashdot audience? Jargon is a great way to provide specialized information to insiders quickly and efficiently, but this is slashdot. If slashdot readers need for you to restate your description of a problem or observation related to the Linux kernel (even if that description is taken out of context), could it be that the paper could be written in a more open manner? The quote you provided from your paper seems to speak to a narrow audience; how narrow must your audience be, however, if it excludes a good portion of slashdot's readers?
If I seem overly critical, I do not mean to, it is only that I hate to see good, useful research made less accessible to non-academics by the use of academic language.
Yes, but there is absolutely no evidence that open source is any better in this respect than commercial software (in fact the actual evidence points to it being little different in this respect). And when it DOES crash, a 1-800 number is often better than a pile of badly commented code.
It will, in the end, come down to a value proposition. The value proposition of freedom to modify code is very hard to quantify, so that will probably factor into the eventual success of open source not at all. The actual quality, usability, documentation, trainability, ease of install, compatibility with existing infrastructure (usually Microsoft), etc., will probably be the deciding factors, and I don't see open source having a clear-cut advantage in those metrics.
E pluribus unum