Slashdot Mirror


Code Quality In Open and Closed Source Kernels

Diomidis Spinellis writes "Earlier today I presented at the 30th International Conference on Software Engineering a research paper comparing the code quality of Linux, Windows (its research kernel distribution), OpenSolaris, and FreeBSD. For the comparison I parsed multiple configurations of these systems (more than ten million lines) and stored the results in four databases, where I could run SQL queries on them. This amounted to 8GB of data, 160 million records. (I've made the databases and the SQL queries available online.) The areas I examined were file organization, code structure, code style, preprocessing, and data organization. To my surprise there was no clear winner or loser, but there were interesting differences in specific areas. As the summary concludes: '..the structure and internal quality attributes of a working, non-trivial software artifact will represent first and foremost the engineering requirements of its construction, with the influence of process being marginal, if any.'"

17 of 252 comments (clear)

  1. Re:Is it just me? by Anonymous Coward · · Score: 5, Insightful

    That is if you can figure out which of the 12 links are the actual FA and which are supporting material.

  2. No-one has ever claimed by wellingtonsteve · · Score: 4, Insightful

    ..that Open Source code is of quality, but at least the point of things like the GPL is that you have the power to change that, and improve that code..

    1. Re:No-one has ever claimed by KutuluWare · · Score: 3, Insightful
      You haven't been paying attention to many Open Source proponents if you haven't ever seen them claim that Open Source code is of vastly superior quality than proprietary. Hell, ESR's claim to fame is a whole paper he wrote on that exact topic. For example, the OSI itself puts this claim at the very top of their advocacy document on selling OSS to your management:

      The foundation of the business case for open-source is high reliability. Open-source software is peer-reviewed software; it is more reliable than closed, proprietary software. Mature open-source code is as bulletproof as software ever gets.
        Open Source Case for Business There is a pretty clear divide in the F/OSS community between the OSI-side-people, who view Open Source as a development model that leads to better software with fewer bugs and quicker turnaround; and the FSF-side-people who think of Free Software as a moral imperative that leads to more freedom in addition to better software with fewer bugs and quicker turnaround.

      Having worked heavily in both areas of software development, I think this particular article's conclusion was obvious: code quality depends on the people who wrote it, not the process the used to license it. But only people who have done extensive proprietary and open-source development could really see that first-hand, and our opinions are automatically dismissed as being pro-Microsoft shills. Thus, I predict this paper will be roasted over an open flame, crushed into a tiny ball, soaked in gasoline, lit on fire, and ejected into deep space by the most devoted open source proponents in both camps.
  3. CScout Compilation by allenw · · Score: 5, Insightful

    "The OpenSolaris kernel was a welcomed surprise: it was the only body of source code that did not require any extensions to CScout in order to compile."

    Given that the Solaris kernel has been compiled by two very different compilers (Sun Studio, of course, and gcc), it isn't that surprising. Because of the compiler issues, it is likely the most ANSI compliant of the bunch.

  4. statistical wash-out? by davejenkins · · Score: 4, Insightful

    If I am understanding correctly, you were looking for 'winners' and 'losers' (weasel words in and of themselves, but anyway...) in terms of 'quality' (another semi-subjective term that could make someone go crazy and drive motorcycles across the country for the rest of their lives).

    You found that '..the structure and internal quality attributes of a working, non-trivial software artifact will represent first and foremost the engineering requirements of its construction, with the influence of process being marginal, if any.' -- or in plain English: "the app specs had a much bigger influence when compared to internal efficiencies".

    I would wonder if you're just seeing a statistical wash-out. Are you dealing with data sets (tens of millions of lines and thousands of functions) that are so large, that patterns simply get washed out in the analysis?

    Oh dear, my post is no more clear than the summary...

    1. Re:statistical wash-out? by raddan · · Score: 3, Insightful
      With regard to the guy who went crazy and drove his motorcycle across the country-- I think the point of the book was to demonstrate that "subjective" and "objective" are specious terms. Science gets all hot and bothered when words like "good" and "bad" are used, but not when words like "point" are used. So if we can make allowances for axiomatic terms, why not so-called "qualitative" terms? After all, the word "axiom" means, according to Wikipedia:

      The word "axiom" comes from the Greek word axioma a verbal noun from the verb axioein, meaning "to deem worthy", but also "to require", which in turn comes from axios, meaning "being in balance", and hence "having (the same) value (as)", "worthy", "proper". Among the ancient Greek philosophers an axiom was a claim which could be seen to be true without any need for proof. Indeed, if you look at many of our "quantitative" measures, they are, at their heart, a formalization of "goodness" and "badness". If you're a mathematician, you might argue that this is not true (since there are loads of mathematical constructs whose only requirement is simply self-consistency and not some conformance to any external phenomenon), but if you're an engineer, you're whole career balances on the fine points of "goodness" and "badness". It is an essential concept!

      My personal opinion is that if statistics are a wash-out in general, then the researcher is asking the wrong questions. I know that the author pre-defined his metrics in order to avoid bias, but that's not necessarily good science. Scientific questions should be directed toward answering specific questions, and the investigatory process must allow the scientist to ask new questions based on new data.

      There is clear non-anecdotal evidence that these operating systems behave differently (and, additionally, we assign a qualitative meaning to this behavior), so the question as I understand it is: is this a result of the development style of the OS programmers? The author should seek to answer that question as unambiguously as possible. If the answer to that question is "it is unclear", then the author should have gone back and asked more questions before he published his paper, because all he has shown is that the investigatory techniques he used are ill-suited to answering the question he posed.
  5. Re:Is it just me? by raddan · · Score: 4, Insightful

    It's not a very good summary, but the paper is well-written, which is interesting considering that the author is the one who submitted the summary to Slashdot. I suspect that he assumes we have more familiarity with the subject than we actually do.

  6. Really? by jastus · · Score: 3, Insightful

    I'm sorry, but if this is what passes for serious academic computer-science work, close the schools. This all appears to boil down to: quality code (definition left to the reader) is produced by good programmers (can't define, but I know one when I see his/her code) who are given the time to produce quality code. Rushed projects by teams of average-to-crappy programmers results in low-quality code. All the tools and management theories in the world have little impact on this basic fact of life. My PhD, please?

  7. Re:Not that surprising by ivan256 · · Score: 3, Insightful

    It's obvious what the results would be.

    Half completed, unpolished commercial software usually stays unreleased and safe from this sort of scrutiny. However many of the same types of projects get left out in the open and easily visible to everybody when developed as open source. The low code quality of these projects would drag down the average for open source projects as a whole.

    On the lighter side, you could say that you'd only consider software that was "out of beta" or version 1.0 or greater, but that would leave out most open source projects and commercial "Web 2.0" products....

  8. KLOCs? by Baavgai · · Score: 4, Insightful

    If good code and bad code were a simple automated analysis away, don't you think everyone would be doing it? What methodolgy could possibly give a quantitative weighting for "quality"?

    "To my surprise there was no clear winner or loser..." Not really a surprise at all, actually.

  9. The winner is still open source by abolitiontheory · · Score: 3, Insightful
    Does anybody see that these results are in still favor of open source? The fact is, it's actually a beautiful thing that the difference in quality is marginal. This equality then becomes the rubric by which to judge other elements of the design process, and choices about whether to develop and deploy programs with open source or closed source.

    People make claims about the need for closed source all the time, usually revolving around the need to a predictable level of quality, or some other factor. The fact is, this results proves that its a wash whether you choose open or closed--so why not choose open?

    There's a deep significance here I'm failing to capture completely. Someone else word it better if they can. But there didn't need to be some blow-out victory of open source over closed source for this to be a victory. All open source needed to do was compare--which it did, clearly--with closed source, in terms of value, to secure its worth.

  10. Re:Is it just me? by Diomidis+Spinellis · · Score: 4, Insightful

    I didn't write the last part when I submitted the story, and, yes, the summary given here is comprehensible, because it appears out of context. What the sentence '..the structure and internal quality attributes of a working, non-trivial software artifact will represent first and foremost the engineering requirements of its construction, with the influence of process being marginal, if any.' means is that when you build something complex and demanding, say a dam or an operating system kernel, the end result will have a specific level of quality, no matter how you build it. For this reason the differences in the software built with a tightly-controlled proprietary software process and that built using an open-source process are not that big.

  11. Re:Stupid metrics by Diomidis+Spinellis · · Score: 3, Insightful
    It took me about two months of work to collect these metrics. Yes, running in addition the code of the four kernels through a static analysis tool would have been even better, but this would have been considerably more work: You need to adjust each tool to the peculiarities of the code, add annotations in the code, weed out false positives, and then again you only get one aspect of quality, that related with bugs, like deadlocks and null pointer indirections.

    Using one of the tools you propose, you will still not obtain results regarding the analysability, changeability or readability of the code.

  12. Re:"Code quality" is bunk by Llywelyn · · Score: 3, Insightful

    There is a company that, at the heart of their business, exists a 6000 line SQL statement that no one understands, no one can modify, and occasionally doesn't work without anyone knowing why but a restart of the program seems to take care of it.

    It has lasted that way for a very very long time.

    Is it good code simply as function of its survival and (sort of) working?

    I tend to think of good code like good engineering or good architecture. Surely you wouldn't define good architecture as "a building that remains standing," would you? The layout of the rooms, how well that space is used, how well it fits the needs of the users, how difficult it is to make modifications, etc all factor in to "good design" and have nothing to do with whether the building "works."

    I am not sure you can put a metric to it anymore than I could put a metric to measuring the quality of abstract expressionism or how well a circuit is laid out--there may be metrics to aid in the process, but in the end one can't necessarily assign a numerical rating to the final outcome for the purpose of rating.

    That doesn't mean that there isn't such a thing as good quality and bad quality code.

    --
    Integrate Keynote and LaTeX
  13. Re:Not that surprising by FishWithAHammer · · Score: 3, Insightful

    Generally speaking, commercial desktop apps are still way ahead of their open counterparts, with the exception of code development tools and anything that directly implements a standard (browsers, mail clients, etc.) Code development tools? VS says hi. (And somebody is now going to leap in and say that that monstrosity Eclipse is somehow "better" than VS...this will be amusing.)
    --
    "You can either have software quality or you can have pointer arithmetic, but you cannot have both at the same time."
  14. Re:Is it just me? by legutierr · · Score: 5, Insightful

    How useful is it to write something about computers that needs to be translated for the slashdot audience? Jargon is a great way to provide specialized information to insiders quickly and efficiently, but this is slashdot. If slashdot readers need for you to restate your description of a problem or observation related to the Linux kernel (even if that description is taken out of context), could it be that the paper could be written in a more open manner? The quote you provided from your paper seems to speak to a narrow audience; how narrow must your audience be, however, if it excludes a good portion of slashdot's readers?

    If I seem overly critical, I do not mean to, it is only that I hate to see good, useful research made less accessible to non-academics by the use of academic language.

  15. Re:Not that surprising by samkass · · Score: 3, Insightful

    Yes, but there is absolutely no evidence that open source is any better in this respect than commercial software (in fact the actual evidence points to it being little different in this respect). And when it DOES crash, a 1-800 number is often better than a pile of badly commented code.

    It will, in the end, come down to a value proposition. The value proposition of freedom to modify code is very hard to quantify, so that will probably factor into the eventual success of open source not at all. The actual quality, usability, documentation, trainability, ease of install, compatibility with existing infrastructure (usually Microsoft), etc., will probably be the deciding factors, and I don't see open source having a clear-cut advantage in those metrics.

    --
    E pluribus unum