Scott Trappe's Answers About Code Quality

Posted by Roblimo on Thursday March 20, 2003 @04:45AM from the I-want-an-editor-with-a-do-what-I-mean-command dept.

We got some excellent questions for Scott Trappe, the code quality expert who runs Reasoning. In return, he gave us some excellent thoughts on how to improve code quality, along with some insight about how he reached his now-famous "Linux TCP/IP stack is better than most proprietary TCP/IP stacks" conclusion.

1) sample size and conclusions
by tim_maroney

How can any conclusions about the relative virtues of two development methodologies with a universe in the millions of components be drawn from a single sample, and one as small and atypical as a TCP/IP stack?

Scott:

You raise an excellent point, and answering it will take some exposition, so bear with me.

You're right, these results do not "prove" that Open Source is superior to proprietary development, nor were we trying to prove superiority of either development process. However, it does counter the numerous articles and studies written alleging that Open Source software is inherently inferior in quality because there is no formal development process, no comprehensive test plan or infrastructure and no dedicated resources to provide follow-on support and maintenance. Our study is a single counterexample which discredits those assertions. We are in the process of inspecting other Open Source projects to see if these results are anomalous or not, and will report our findings.

We chose to examine the TCP/IP stack because it's "atypical" in a way that makes it perfect for comparison: it has a very well-defined set of published requirements that have been stable for several years; adherence to published standards is an essential element of any implementation; there are hundreds of books and articles covering design alternatives, performance measurements, and sample implementations; and there are publicly available conformance tests.

Thus, unlike most software projects, all the implementations we looked at had a common set of requirements to meet and access to a wealth of information about how to design a high performance, well behaved implementation. All of the implementations have been in existence for several years. Thus in comparing them, we didn't have to consider the quality of the requirements document, the uniqueness of the design challenges, nor the resources available for validation.

In short, all the ingredients exist for all of these implementations to be of very high quality. The commercial implementations should enjoy the significant advantages that other studies have pointed out, i.e., resources available to them to insure comprehensive testing, response to bugs, etc. The five commercial implementations were developed at large corporations that have well-deserved reputations for developing high quality products, and invest significant resources to insure that quality. I don't believe it is a question of developer skill; the companies all have smart, capable, experienced engineers.

That an Open Source implementation would compare so favorably to commercial implementations was a surprising result and one that we thought others would be interested in. The question remains, what is different about the Open Source development process that, at least in this case, resulted in such a difference in measured quality? One of the most obvious differences between Open Source and proprietary development projects is the peer review process -- code inspection -- that is very intense for major Open Source projects, and underemployed in most commercial development projects because of resource and time constraints.

2) What needs to happen?
by argmanah

If open source has such a direct correlation to better quality, why do you feel companies are still keeping their source proprietary? Do you think that we should try and convince them to open source their code in every case, and if so, what do you think needs to happen before they can be convinced to change their minds?

Scott:

As I mentioned in the answer to the previous question, we haven't proven that Open Source is always superior, just that it can be. One of the clear advantages enjoyed by Open Source development projects is the luxury of time: the developers can take as long as they deem necessary to get it "right." Commercial projects rarely have this opportunity. Customer demands or competitive pressures often force aggressive schedules, and teams simply run out of time.

Even if Open Source was always superior in quality, quality and profitability is not the same thing. Most commercial software companies are leery of Open Source because they don't see how to make money at it. Red Hat, perhaps the most successful company with a business model based on Open Source reported only a tiny profit last year. Until executives are convinced that making their software Open Source will not negatively impact their profitability or valuation, it won't happen.

I think trying to lobby companies to make ALL their code Open Source will fail. However, I can see a hybrid approach where companies make some of their software Open Source and keep some of it closed. For example, some of the engineers and marketing staff inside Reasoning lobbied me to make our "Code Collection" tools Open Source. They convinced me that this provided a benefit for our customers at relatively little cost. Other companies may be persuaded to do the same on at least a limited basis.

3) Influence of project size
by arvindn

The parallelizability of bug-fixing is quite clearly very effective for high-visibility projects such as the linux kernel and apache. However, considering that most open-source projects have only between 1 and 5 developers, how popular do you think a project needs to be for it to significantly benefit from people looking at the source code?

Scott:

I think a project of ANY size benefits from peer review, and that it is even more important if there is only a single developer! That said, there is no question that as the number of active developers on a project grows, the opportunity for misunderstandings and miscommunication grows geometrically. A significant portion of the errors we find are clearly the result of two (or more) developers working on different parts of the code not understanding the interfaces between their components.

4) Quality Software vs Fewer Bugs?
by gosand

I work in software Quality Assurance, and have for going on 10 years now. My experience tells me that true software bugs are only part of the quality of software. So much can get lost in the software development lifecycle. An unclear requirement can travel through the lifecycle and come out the other end as a bug to the customer. Usability is another part of quality. It could be bug-free, but if it is really difficult to use or doesn't fit the needs of the customer, it may not matter.

It sounds like your company focuses on analyzing the code bugs, and not necessarily the perceived bugs. What are your opinions on this? I know that locating and eliminating the bugs *is* a critical part of software QA, but do you feel that bug-free ensures true quality? A bug-free Open Source project may still be too difficult to use or confusing for the non-technically inclined.

Scott:

You are absolutely right, and we don't claim that Illuma is a panacea. For the two issues you raised, there are good methods available today. Conventional black-box testing is a good way to confirm that an application meets its specified requirements, and usability testing with "real" users is important to determine if the customer's needs are truly being met. The problem that we see is that typically there are so many "code bugs" in the software that QA ends up spending most of their time running into those and not having enough time to adequately confirm conformance to the requirements. What Reasoning seeks to do is change that by providing developers with frequent, in-depth feedback on code bugs so that when QA does get the application, it is largely functional and robust.

5) General quality of programming
by pro-mpd

Do you find that the quality of the programming depends upon the geographic location of the programmers? So, for instance, an open source program will be troubleshot and combed over by people from potentially a dozen different countries. Closed source software is checked by people where it is written. Since, as a general rule, education varies in quality and areas of emphasis around the world, does it help having people attacking a program from many different angles (i.e., open source, cheked world wide) rather than simply drawing from a set of people who may share many of the same abilities, backgrounds, etc.?

Scott:

You have really posed two different questions: (1) how does software quality vary based on the location of the developers, and (2) do geographically distributed teams result in better quality because of the (implied) diversity of the developers?

To answer the first question, there are certainly significant differences in the quality of education around the world, and I believe cultural factors also play a role. However, numerous studies have shown that the programmer productivity and quality varies over a huge range - 10 to 1 or more, even when controlling for ethnicity and education. So I believe quality depends more on the individuals that make up the team than does their location.

For the second question, I disagree with the underlying assumption. Open Source projects could well be composed of people from similar backgrounds and education. In contrast, many commercial development teams, whether geographically concentrated or distributed, have a tremendous diversity among their developers. Again, I think the individual programmers are by far the most significant factor, and that location plays a much smaller role.

6) Issues behind test cases for proprietary vs. open
by Tekmage

One of the bigger challenges facing open source projects as compared to their proprietary equivalents is how to manage confidentiality of test cases. With companies such as Red Hat and Ximian involved, it's certainly less of an issue for their core products and projects they over-see, but there will always be cases where there is friction when the best/only person who can fix a particular problem is on the outside, unable to work with the test cases in question.

What are your thoughts on this trade-off between test case management and confidentiality as it relates to proprietary v.s. open source code development?

Scott:

This is a common problem regardless of the state of the source. For example, a semiconductor company may run into a serious defect with a device simulation program. They may view their "test case" -- the chip they are designing - as being so proprietary that they are not willing to share it with anyone outside, regardless of whether the application is Open Source or closed. Open Source may have an advantage if it is practical to bring in an outside expert on the application and do the bug-fixing work on the customer's premises. The framework that exists for open source projects (SourceForge.net, etc.) is better suited for this than most proprietary development environments. However, companies will always look for a competitive edge to deliver superior results relative to their competitors. If the company is based on Open Source, then they must identify other means to differentiate themselves. Red Hat, Suse, and others differentiate based on (a) components included in their distribution; (b) a "certification" that the integrated components will work together; and (c) quality of service and support. Thus, keeping test suites proprietary is a logical way for a company like Red Hat, for example, to differentiate its version of Linux from others, and I think this will be common in businesses based on Open Source.

7) How do you maintain your neutrality?
by arvindn

Given that on more than one occasion "independent institutions" which conducted similar studies (and concluded that closed source is superior) were revealed to have been sponsored by the other side [microsoft.com], how do you convince other people of your neutrality? Since you are selling a service [reasoning.com], not a product, I would guess that the confidence of your customers in your independence is pretty important from a business perspective. How do you win and keep that confidence? The article notes that you agree with ESR's pro open-source reasoning. Wouldn't the perception of your having a OSS bias be something you'd want to avoid?

Scott:

As I answered in question #1, we didn't set out to prove Open Source superior or inferior to proprietary software. Our mission is to give our customers a competitive edge by helping them deliver great software--by accelerating their development process, saving time and money. Code inspection has been demonstrated to be the single most effective way to eliminate defects, and we offer a way for organizations to perform thorough inspections quickly and cheaply. When we started our comparison, we didn't know how it would turn out, but the results confirm our belief that code inspection -- which happens naturally for major Open Source applications -- has a dramatic impact on software quality.

8) So if open source is so good...
by anthony_dipierro

Where can I get the source code to these automated inspection tools?

Scott:

How much are you willing to pay? :-)

Seriously, as I discussed in the answer to question #2, I haven't seen a way to provide Reasoning's shareholders with an equivalent (much less superior) return by making our source code Open. I think this is one of the most significant challenges that advocates of Open Source have yet to successfully address.

9) The future of automated code inspection
by phamlen

According to the article, it appears that you look for buffer overflows, freeing memory early, and other memory issues.

What errors are currently hard to detect automatically but which you would really like to be able to find? What is the next category of errors that you're trying to detect with automatic code inspection?

To give you some ideas, what about:

"unrefactored" code - code which has a lot of duplication and should be cleaned up
"untested" code - code (or branches in the code) that are currently untested by unit tests?
"programmer intention" errors - code which doesn't do what the programmer intends

Scott:

Well, I've always wanted an editor with a "DWIM" command (Do What I Mean) so I wouldn't have to think so hard when I'm writing -- whether it's code or just a letter. :-)

More seriously, as was alluded to in the answer to question #4, automated inspection can detect "errors of effect" but not "errors of intent". By that, I mean that we cannot validate that the application does what the requirements document says it should. For example, if the requirement for a routine is that it calculate the geometric mean of a list of numbers, but instead it calculates the arithmetic mean, we cannot detect that. Today, we detect NULL pointer dereferences, references to uninitialized variables, array bounds violations (a superset of buffer overflows), memory leaks, and bad deallocations.

In general, a static analysis approach is limited to identifying defects resulting from (a) internal inconsistencies -- such as setting a pointer to NULL at one point in the program and dereferencing that pointer at a later point; and (b) violations of constraints imposed by the language semantics or standard library functions.

So, staying within those bounds, I'd like to find: (a) Concurrency-related defects: deadlocks, race conditions, etc. These are notoriously difficult to find and even harder to test. (b) Resource leaks: similar to memory leaks but applied to other types of resources such as file descriptors, handles, etc. (c) Protocol violations: misusing an API, such as trying to read or write a file after it has been closed. (d) Dead code: code that could never be executed. (e) Performance bottlenecks: either inefficient algorithms (such as using Bubble sort), or indications of "hot spots" in the code.

Your ideas are good too; so far we have focused on "critical defects" -- those likely to make the application abort, hang, or silently corrupt data. Your first example is more in the category of "poor programming practice". I can see integrating this type of analysis in the future, because there is a correlation between overly complicated routines and the likely presence of bugs.

10) Test first
by neurojab

What do you think about the new "test first" software development methodology? For those that haven't heard of it, it's a method wherein the test cases for a program are written, and no code is written that doesn't cause a failing test case to pass. All test cases are automated and run after every code change. Would you advocate this in an open-source project? This would mean every contributor would write test cases for each new feature, and add it to a project's common test case repository... What do you think?

Scott:

In my experience, programmers like to write code. Period. They don't like to write documentation, they don't like to write system tests, and they don't like to write unit tests. Programmers are also optimists--how else could they tackle building these enormously complex systems and think they had any chance of working? Programmers like instant gratification (who doesn't?). They enjoy coming up with a solution to a problem and seeing that solution implemented immediately.

Because programmers are optimists, that is reflected in their unit tests. Time and time again I've seen developer-written tests that demonstrate the feature works -- because the tests reflect the thinking of the developer about how the feature will be used. They rarely do a good job of testing corner cases, limits, or "unusual" situations (like running out of memory or other finite resources).

I think the "test first" methodology is too at odds with what motivates programmers to do what they do. Would Linux have ever been created if Linus' original postings to the net had been test cases for a UNIX-like operating system? And invited others to write more test cases? How many would have responded? How many would have become excited about the prospect of building an Open Source operating system if the first year was going to be spent writing unit tests?

Maybe I'm just a skeptic, but Test First reminds me of so many other software development methodologies proposed over the years that promise great benefits but rarely deliver them. That said, I am excited by other aspects of some of the newer methodologies, such as the "program in pairs" in Extreme Programming.

5 of 113 comments (clear)

Min score:

Reason:

Sort:

Making a return by Surak · 2003-03-20 05:04 · Score: 2, Informative

Seriously, as I discussed in the answer to question #2, I haven't seen a way to provide Reasoning's shareholders with an equivalent (much less superior) return by making our source code Open. I think this is one of the most significant challenges that advocates of Open Source have yet to successfully address.

Really? No open source advocates have addressed this at all?

--
My journal has hot /. gossip.
Testing by argmanah · 2003-03-20 05:14 · Score: 4, Informative

In my experience, programmers like to write code. Period. They don't like to write documentation, they don't like to write system tests, and they don't like to write unit tests.
In a corporate environment, isn't this what testers are for? You don't waste the programmers time on this, you have testers write the test cases, write system test scripts, automate the testing process, and execute tests. The programmers can do what they're paid to do. As long as the requirements are well defined, the people writing the test cases and the people writing the code don't have to be the same people.

In fact, they shouldn't be the same people. That way there are two sets of people going over the requirements. Sometimes a developer will interpret a requirement differently than a tester. It allows ambiguous requirements to be found during the development cycle rather than having a person write the test, write the code, and have the customer say "I didn't want it to work like that."

--
Overrated Moderation: This posts sucks... because.
1. Re:Testing by ClosedSource · 2003-03-20 05:34 · Score: 2, Informative
  
  Well, in a lot companies unit testing by programmers is considered an important step. You don't really want to hand over your application to the test group and have them find a trival bug in the 1st hour of testing.
  
  This is particularly true in companies where the hand-off is somewhat formal and may require paperwork. Of course, any bugs found have to be documented as well which takes more time.
  
  So the bottom line is that the total cost involved for having only the testers test the code, is often higher than the cost of having the programmer do her own unit testing.
  
  For system testing, however, I agree with you.
Re:Why Didn't They Ask The Metrics Question? by paranoic · 2003-03-20 05:59 · Score: 2, Informative

You might want to look at the Jan. issued of Dr. Dobbs, Automated Defect Identification by
by Kevin Smith
They are probably counting lint or other types of compiler warnings.
smatch by Error27 · 2003-03-20 08:55 · Score: 3, Informative

I have to toot on my own trumpet.

Check out kbugs.org. These are the smatch results from testing the 2.5.65 kernel. We found 1400 possible bugs in the 2.5.65 kernel but probably over half of those are false positives.

Smatch is an open source checker that finds similar sorts of problems to the Reasoning software. For example, both look for dereference bugs.

The bad news is that smatch is still in the pre-alpha stages and it only works on C for now. And also the kbugs.org site is crappy.

The cool thing about smatch is that you can write checks which are custom to your code. Mostly it is used for the kernel, but Michael Stefaniuc has used it for Wine specific bugs as well.