Code Quality In Open and Closed Source Kernels
Diomidis Spinellis writes "Earlier today I presented at the 30th International Conference on Software Engineering a research paper comparing the code quality of Linux, Windows (its research kernel distribution), OpenSolaris, and FreeBSD. For the comparison I parsed multiple configurations of these systems (more than ten million lines) and stored the results in four databases, where I could run SQL queries on them. This amounted to 8GB of data, 160 million records. (I've made the databases and the SQL queries available online.) The areas I examined were file organization, code structure, code style, preprocessing, and data organization. To my surprise there was no clear winner or loser, but there were interesting differences in specific areas. As the summary concludes: '..the structure and internal quality attributes of a working, non-trivial software artifact will represent first and foremost the engineering requirements of its construction, with the influence of process being marginal, if any.'"
Final line in the paper: "Therefore, the most we can read from the overall balance of marks is that open source development approaches do not produce software of markedly higher quality than proprietary software development."
Interesting, but not shocking for those who have worked with disciplined commercial teams. I wonder what the results would be in less critical areas than the kernel, say certain types of applications.
I haven't seen anybody else comment on the fact that the statement that the quality of the code had more to do with the engineering than the process through which the code was developed is quite interesting.
;)
From my personal experiences, it typically seems code is written to solve a specific need. Said another way, in the pursuit of solving a given problem, whatever engineering is required to solve the problem must be accomplished - if existing solutions to problems can be recognized, they can be used (for example, Gang of Four/GOF patterns), otherwise, the problem must have a new solution engineered.
Seeing as how there are teams successfully developing projects (with both good, and bad code quality) using traditional OO/UML modeling, the software development life-cycle, capability maturity model, scrum, agile, XP/pair programming, and a myriad of other methods, it would seem to be that what the author is saying is, it didn't necessarily matter which method was used, it was how the solution was actually built (the.. robustness of the engineering) that mattered.
Further clarification on the difference between engineering and "process" would strengthen this paper.
I went to a Microsoft user group event some time ago - and the presenter described what they believed the process of development of code quality looked like. They suggested the progression of code quality was something like:
crap -> slightly less crappy -> decent quality -> elegant code.
Sometimes, your first solution at a given problem is elegant.. sometimes, it's just crap.
Anyways, just my two cents. Maybe two cents too many..
SixD
Sorry, I've been in the business for over 25 years and had to hear one pin head after another spout about code quality or productivity. Its all subjective at best.
The worst looking piece of spaghetti code could have fewer bugs, be more efficient, and be easier to maintain than the most modular object oriented code.
What is the "real" measure of quality or productivity? Is it LOC? No. Is it overall structure? no. Is it the number of "globals?" maybe not.
The only real measure of code is the pure and simple darwinian test of survival. If it lasts and works, its good code. If it is constantly being rewritten or is tossed, it is bad code.
I currently HATE (with a passion) the current interpretation of the bridge design pattern so popular these days. Yea, it means well, but it fails in implementation by making implementation harder and increasing the LOC benchmark. The core idea is correct, but it has been taken to absurd levels.
I have code that is over 15 years old, almost untouched, and still being used in programs today. Is it pretty? Not always. Is it "object oriented" conceptually, yes, but not necessarily. Think the "fopen,"fread," file operations. Conceptually, the FILE pointer is an object, but it is a pure C convention.
In summation:
Code that works -- good.
Code that does not -- bad.