I've occasionally daydreamed a fun academic paper would be to collect sets of password hashes, rub them up against a rainbow table, and make graphs and correlations and wild assumptions about the correlation coeff of IQ and rate of easily cracked pwd vs site etc etc. Sounds like fun so its probably been done before.
The rules of academic publishing are that you have to cite relevant related work. This includes both fresh results and old classics. Where possible, we tried to cite the most recent studies. Some studies that are appear dated indicate a research opportunity to update the corresponding area. Also, it would be wrong to dismiss a paper because of its age. Some of the older studies we cite present theoretical frameworks of enduring value and importance, demonstrated by the thousands of citations they have received over the years. For instance, the 2003 study by Venkatesh and his colleagues on the user acceptance of information technology, which we cite, has received almost five thousand citations. It would be wrong to ignore it, just because of its age.
You have a point here. And you haven't mentioned the huge cost associated with procurement processes for proprietary software, especially in the public sector. These can drag on for months. In contrast, acquiring an open-source product is often simply a matter of a one-click download. Even if the organization's legal has trouble understanding open source licenses, this is a hurdle you have to overcome just once.
The article cited refers to software planted on the phone exchange, not the towers. The rogue wiretapping software was essentially a rootkit, complete with a backdoor for future access and detection countermeasures.
A few hours after replying to the "code quality is that it 'works'" comment, I read Joseph Bergin's Do the Right Thing design pattern in an IEEE Software article. I found it quite funny.
The absolute worst part of critiques like yours is the ideas it gives pin headed MBAs who bungee jump into engineering departments, book in hand, with no practical experience. The ideas spouted by the book become the drive, not the product. It is an almost certainty the project will be dreadfully late or never finished. I absolutely agree.
Please let me clarify here:
I can not extrapolate the agreeable portions of your thought to the seemingly obvious short comings of the Windows operating system. On any facet, whether it is security, stability, functionality or reliability. Windows is, far behind on all fronts.... aside from secrecy from a Microsoft point of view. I'm not claiming anything regarding these external quality attributes of Windows, the metrics I collected just show that there are no vast differences in the code's quality.
Or, perhaps, the WRK has been a meticulous focus at Microsoft before it's release... this is likely possible, as it's WIDELY known, from nearly ALL examples of closed source proprietary software being released to the Open Source, that it takes years just to clean up and prepare for the ultra high standards of the OS community. This is entirely possible. In fact, a README file in the distribution states:
The primary modifications to WRK
from the released kernel are related to cleanup and removal of server
support, such as code related to the Intel IA64.
Economy is an attribute different from quality, and this is where engineering comes in. The engineer has to balance the various demands on quality against factors like cost, time to market, and customer demands. All your arguments are perfectly valid, and they are engineering decisions.
To my surprise there was no clear winner or loser..
Forget what you *think* you're measuring (code quality). Instead, consider whether you're measuring anything at all. That is, is there any information in the data you've measured?
In the past other researchers have used a few of the metrics I used to measure what they called a system's maintainability, and they were able to match this with the subjective perceptions of developers at HP regarding the code's quality. So these measures are not just noise.
For another indication, consider this figure, showing a trend that matches our expectations: how the maintainability of the FreeBSD system is, in general, falling over time. Again, this is derived from some of the metrics I used to compare the four kernels. These metrics do not yield noise.
My personal opinion is that if statistics are a wash-out in general, then the researcher is asking the wrong questions. I know that the author pre-defined his metrics in order to avoid bias, but that's not necessarily good science. Scientific questions should be directed toward answering specific questions, and the investigatory process must allow the scientist to ask new questions based on new data.
There is clear non-anecdotal evidence that these operating systems behave differently (and, additionally, we assign a qualitative meaning to this behavior), so the question as I understand it is: is this a result of the development style of the OS programmers? The author should seek to answer that question as unambiguously as possible. If the answer to that question is "it is unclear", then the author should have gone back and asked more questions before he published his paper, because all he has shown is that the investigatory techniques he used are ill-suited to answering the question he posed. Wait a minute here: being unable to prove a hypothesis is a long-established scientific path. Due to the tiny number of samples (four kernels) I could not prove my case with statistical rigor, but still publishing the results that show I could not find a difference is the scientifically honest thing to do. Reformulating the questions until you find an answer that suits you distorts the picture due to the file drawer problem.
I've put the data and the SQL queries on the web. It is therefore easy for you to do what you suggest, because the filenames are stored in the database. Just perform a cascade delete for the files you think that don't belong to each system's core and rerun the queries. I'd be interested to know the results.
what was the most foul comment you encountered:D ? and where did it reside Decency laws in various parts of the world, do not allow me to answer this question. However, I can say that in total the four kernels contain in C files 18389 comments marked XXX. The most famous Unix comment is of course the well-known "You are not expected to understand this". See dmr's page for more details. This is also an interesting comment, especially considering the current troubles of the person who wrote it.
This is a very perceptive comment. It goes deeper: Linux and FreeBSD can be (and often are) configured by end-users. These can tailor the kernel in hundreds of ways. In FreeBSD 6.2 I measure more than 340 kernel options. These are mostly handled by the preprocessor.
I don't think that my results can support us in making arguments regarding 'slightly' higher quality, or 'exactly the same quality'. My figures are based on possibly interdependent, unweighted, and unvalidated metrics. Therefore they only allow us to make conclusions involving large differences.
The preprocessor algorithm I described in the Dr. Dobb's article is the one I used for parsing the code of this study. A strange preprocessor construct in the Linux kernel caused the macro-expansion algorithm I used previously to fail.
What I'm saying is that when we're looking at maintainability of a large operating system (FreeBSD) there are few outliers. Therefore, one can make the case that in another similarly large operating system we can get a representative picture of its maintainability by looking at a subset of its code. My conclusion is related to the Law of large numbers, nothing deeper or more complex.
Think of my argument as looking at the people living in China and seeing that there are no areas occupied by giants or dwarfs. I then say that based on that fact, I can obtain the average height of people living in America, by looking at the people living in California.
It is not a water-tight argument, but it is the best argument I could make. I really wish Microsoft would supply me with more code (and ideally also process data) to study, but this is the best I could do with the available code.
With a liberal reading of "if it works" you're right. You can say that if the code is functional, reliable, usable, efficient, maintainable, and portable, then it is of high quality. But this is a circular definition, because this is how software quality is defined. As somebody else posted earlier, the quest for quality can lead you to an endless motorcycle trip on America's back-roads.
The way you license code can't directly affect its quality, but the way you develop it can. Here are some possible ways in which a company can affect (positively or negatively) the quality of the software:
Have managers and an oversight group control quality (+)
Through its bureaucracy remove incentives to find creative solutions to quality problems (-)
Pay for developers to attend training courses (+)
Provide a nice environment free of distractions that allows developers to focus on developing quality software (+)
Buy expensive tools that can detect quality problems (+)
Developers take their paycheck for granted and loose interest in what they are doing (-)
Developers write obfuscated code for job security (-)
And here are some possible ways in which an open source development effort can affect (positively or negatively) the quality of the software:
Volunteers are more motivated than paid employees (+)
Nobody takes responsibility for the overall quality of the code; responsibility is diffused (-)
Working conditions can be suboptimal (-)
Developers work part-time (-)
Developers eat their own dog food and therefore care about their code (+)
There are many eyeballs to spot code problems (+)
There are no marketing pressures to deliver substandard work (+)
Developers are geographically dispersed and can't communicate easily (-)
Both lists can be expanded, and many of the arguments can be refuted. Still you get the idea: the inputs to the two development processes differ substantially and this could affect quality.
This is a very interesting comment. I had not thought of my results in this light, because, based on my experience as a (minor) FreeBSD committer and as a Windows user, I was expecting to see a large difference in favor of open source code. Yes, in the way you put it, open source is a winner.
Ten years ago I wrote an article criticizing the Windows API. Most of what I wrote then, continues to be true today. Based on that external view of Windows, and the BSODs I regularly see, I was expecting to find in the kernel many worse things. The header file you mention is a clear manifest of an inappropriate design, and I suspect that at higher levels of system functionality (say OLE or the GDI) there will be more parts of similarly bad quality.
You can automatically recognize some bad smells of poor quality code. However, this will still let through poor quality code that has been explicitly written to guard against the bad smells. So, you can say for sure that some code stinks, but you can't (and I suspect you will never be able to) tell that some code excels.
Coding to achieve some code quality metrics is dangerous, but so is saying that code that works is good. Let me give you two examples of code I've written long time ago, and that still survives on the web.
This example is code that works and also has some nice quality attributes: 96% of the program lines (631 out of the 658) are comment text rendering the program readable and understandable. With the exception of the two include file names (needed for a warning-free compile) the program passes the standard Unix spell checker without any errors.
This example is also code that works, and is quite compact for what it achieves.
I don't consider any of the two examples quality code. And sprucing bad code with object orientation, design patterns, and a layered architecture will not magically increase its quality. On the other hand, you can often (but now always) recognize bad quality code by looking at figures one can obtain automatically. If the code is full of global variables, gotos, huge functions, copy-pasted elements, meaningless identifier names, and automatically generated template comments, you can be pretty sure that its quality is abysmal.
It took me about two months of work to collect these metrics. Yes, running in addition the code of the four kernels through a static analysis tool would have been even better, but this would have been considerably more work: You need to adjust each tool to the peculiarities of the code, add annotations in the code, weed out false positives, and then again you only get one aspect of quality, that related with bugs, like deadlocks and null pointer indirections.
Using one of the tools you propose, you will still not obtain results regarding the analysability, changeability or readability of the code.
It's not a very good summary, but the paper is well-written, which is interesting considering that the author is the one who submitted the summary to Slashdot. I suspect that he assumes we have more familiarity with the subject than we actually do. In my submission I did not include the last sentence with the "summary", which, I agree, is completely incomprehensible in the form it appears.
I've occasionally daydreamed a fun academic paper would be to collect sets of password hashes, rub them up against a rainbow table, and make graphs and correlations and wild assumptions about the correlation coeff of IQ and rate of easily cracked pwd vs site etc etc. Sounds like fun so its probably been done before.
Yes, it's been done on 70 million passwords. See http://www.cl.cam.ac.uk/~jcb82/doc/B12-IEEESP-analyzing_70M_anonymized_passwords.pdf
The rules of academic publishing are that you have to cite relevant related work. This includes both fresh results and old classics. Where possible, we tried to cite the most recent studies. Some studies that are appear dated indicate a research opportunity to update the corresponding area. Also, it would be wrong to dismiss a paper because of its age. Some of the older studies we cite present theoretical frameworks of enduring value and importance, demonstrated by the thousands of citations they have received over the years. For instance, the 2003 study by Venkatesh and his colleagues on the user acceptance of information technology, which we cite, has received almost five thousand citations. It would be wrong to ignore it, just because of its age.
You have a point here. And you haven't mentioned the huge cost associated with procurement processes for proprietary software, especially in the public sector. These can drag on for months. In contrast, acquiring an open-source product is often simply a matter of a one-click download. Even if the organization's legal has trouble understanding open source licenses, this is a hurdle you have to overcome just once.
The article cited refers to software planted on the phone exchange, not the towers. The rogue wiretapping software was essentially a rootkit, complete with a backdoor for future access and detection countermeasures.
Very well put. I didn't have space to explain this in the submission's summary, but this is the gist of the paper.
Economy is an attribute different from quality, and this is where engineering comes in. The engineer has to balance the various demands on quality against factors like cost, time to market, and customer demands. All your arguments are perfectly valid, and they are engineering decisions.
Forget what you *think* you're measuring (code quality). Instead, consider whether you're measuring anything at all. That is, is there any information in the data you've measured?
In the past other researchers have used a few of the metrics I used to measure what they called a system's maintainability, and they were able to match this with the subjective perceptions of developers at HP regarding the code's quality. So these measures are not just noise.For another indication, consider this figure, showing a trend that matches our expectations: how the maintainability of the FreeBSD system is, in general, falling over time. Again, this is derived from some of the metrics I used to compare the four kernels. These metrics do not yield noise.
I've put the data and the SQL queries on the web. It is therefore easy for you to do what you suggest, because the filenames are stored in the database. Just perform a cascade delete for the files you think that don't belong to each system's core and rerun the queries. I'd be interested to know the results.
Thanks for pointing this out.
I don't think that my results can support us in making arguments regarding 'slightly' higher quality, or 'exactly the same quality'. My figures are based on possibly interdependent, unweighted, and unvalidated metrics. Therefore they only allow us to make conclusions involving large differences.
The preprocessor algorithm I described in the Dr. Dobb's article is the one I used for parsing the code of this study. A strange preprocessor construct in the Linux kernel caused the macro-expansion algorithm I used previously to fail.
Think of my argument as looking at the people living in China and seeing that there are no areas occupied by giants or dwarfs. I then say that based on that fact, I can obtain the average height of people living in America, by looking at the people living in California.
It is not a water-tight argument, but it is the best argument I could make. I really wish Microsoft would supply me with more code (and ideally also process data) to study, but this is the best I could do with the available code.
With a liberal reading of "if it works" you're right. You can say that if the code is functional, reliable, usable, efficient, maintainable, and portable, then it is of high quality. But this is a circular definition, because this is how software quality is defined. As somebody else posted earlier, the quest for quality can lead you to an endless motorcycle trip on America's back-roads.
- Have managers and an oversight group control quality (+)
- Through its bureaucracy remove incentives to find creative solutions to quality problems (-)
- Pay for developers to attend training courses (+)
- Provide a nice environment free of distractions that allows developers to focus on developing quality software (+)
- Buy expensive tools that can detect quality problems (+)
- Developers take their paycheck for granted and loose interest in what they are doing (-)
- Developers write obfuscated code for job security (-)
And here are some possible ways in which an open source development effort can affect (positively or negatively) the quality of the software:- Volunteers are more motivated than paid employees (+)
- Nobody takes responsibility for the overall quality of the code; responsibility is diffused (-)
- Working conditions can be suboptimal (-)
- Developers work part-time (-)
- Developers eat their own dog food and therefore care about their code (+)
- There are many eyeballs to spot code problems (+)
- There are no marketing pressures to deliver substandard work (+)
- Developers are geographically dispersed and can't communicate easily (-)
Both lists can be expanded, and many of the arguments can be refuted. Still you get the idea: the inputs to the two development processes differ substantially and this could affect quality.This is a very interesting comment. I had not thought of my results in this light, because, based on my experience as a (minor) FreeBSD committer and as a Windows user, I was expecting to see a large difference in favor of open source code. Yes, in the way you put it, open source is a winner.
Ten years ago I wrote an article criticizing the Windows API. Most of what I wrote then, continues to be true today. Based on that external view of Windows, and the BSODs I regularly see, I was expecting to find in the kernel many worse things. The header file you mention is a clear manifest of an inappropriate design, and I suspect that at higher levels of system functionality (say OLE or the GDI) there will be more parts of similarly bad quality.
You can automatically recognize some bad smells of poor quality code. However, this will still let through poor quality code that has been explicitly written to guard against the bad smells. So, you can say for sure that some code stinks, but you can't (and I suspect you will never be able to) tell that some code excels.
This is a very good point...
This example is code that works and also has some nice quality attributes: 96% of the program lines (631 out of the 658) are comment text rendering the program readable and understandable. With the exception of the two include file names (needed for a warning-free compile) the program passes the standard Unix spell checker without any errors.
This example is also code that works, and is quite compact for what it achieves.
I don't consider any of the two examples quality code. And sprucing bad code with object orientation, design patterns, and a layered architecture will not magically increase its quality. On the other hand, you can often (but now always) recognize bad quality code by looking at figures one can obtain automatically. If the code is full of global variables, gotos, huge functions, copy-pasted elements, meaningless identifier names, and automatically generated template comments, you can be pretty sure that its quality is abysmal.
Using one of the tools you propose, you will still not obtain results regarding the analysability, changeability or readability of the code.