Estimating the Size/Cost of Linux

← Back to Stories (view on slashdot.org)

Estimating the Size/Cost of Linux

Posted by ryuzaki0 on Friday July 5, 2002 @02:09AM from the now-thats-a-lotta-dough dept.

2bits writes "Wow... A Billion Dollars Worth Of Software On My System For Free! Check This Guy Out, He Came Up With A Counting / Pricing Method For Quite A Few Types of Source Code. Here is the Program. The results on the site are sorta dated, based on RH 7.1, but the app is pretty cool!... Hey, I can finally find out how much all my side projects are worth / costing me..."

4 of 196 comments (clear)

Min score:

Reason:

Sort:

Re:Billion dollars? by virve · 2002-07-05 02:21 · Score: 2, Informative

Where did he get the billion dollar estimate from? I see no direct correspondance between lines of code and monetary value.

He specifically talks about cost not value. But you are right that the correlation between sloc and cost is a non-trivial one. That is one reason why cost estimation is hard but it is far easier than guessing cost of a project before one has the source.

--
virve
Re:value? by GigsVT · 2002-07-05 03:36 · Score: 2, Informative

cost of using linux.

For many Windows "sysadmins", the cost of is the cost of actually learning the basics of how TCP/IP works, some basics about how their computer works, and basics about how some application level protocols work.

The hidden cost of Linux is the time you have to spend learning things you should already know, for many Windows admins.

--
I've had enough abrasive sigs. Kittens are cute and fuzzy.
Re:value? by The+Creator · 2002-07-05 05:23 · Score: 2, Informative

>If you can convince someone to pay $1,000,000 for linux, then it's worth $1,000,000. that's it

I bet if it was an exlusive licence, M$ whould shell it up :)

--

FRA: STFU GTFO
Responses from the author!! by dwheeler · 2002-07-05 07:02 · Score: 5, Informative
Since I'm the author of this paper (More than a Gigabuck: Estimating GNU/Linux's Size), I suppose I should respond to some of the comments made here:
1. How did I arrive at the estimate of $1 billion? The short answer is "see the paper". I wrote a tool to compute the number of physical source lines of code (SLOC), used Boehm's well-repected COCOMO model to determine the effort (in person-years) from the SLOC, and then converted that effort into an estimated development cost using programmer salary averages and wrap rates. See the paper for the details.
2. It's true that there's no necessary relationship between cost and value. I don't see how that contradicts the paper; the paper never claims that there is one. Clearly, you can spend $1 million to develop a program that is worthless; it happens all too often. Proprietary vendors make money by making more money from sales than it cost to develop the software, so proprietary software vendors are very aware of the difference betwen value and cost. Look carefully at the phrasing. All the paper says is that "Had this Linux distribution been developed by conventional proprietary means, it would have cost over $1.08 billion (1,000 million) to develop in the U.S. (in year 2000 dollars)." The paper does not claim that Red Hat actually spent $1 billion, or that their distributions' sale value is related to this development cost figure. Indeed, what the paper shows is that by using OSS/FS approaches, it's possible to build large systems that would cost over $1 billion to develop using conventional proprietary means.
3. Several have complained about the use of COCOMO for estimating effort from lines of code. COCOMO is certainly not perfect, but it's a well-tested, widely accepted, and widely used model. It's also very clearly documented, so there are no "hidden assumptions". In particular, the model and constants used in COCOMO are based on a wide variety of real projects. It's rediculous to believe that its results are accurate to the nearest hour; as noted throughout the paper, this is only an estimate. A few people have noted that their software took less time to develop, but there are many factors at work. One is that highly experienced people can develop code more quickly; however, not everyone is equally skilled, so with large systems and many developers this effect should even out. Another is that COCOMO includes design time, documentation time, and testing time. Also, this includes not only an average U.S. programmers' salary, but also the wrap rate for overhead (building costs, insurance, and so on) - which programmers don't see in their paychecks, but are certainly paid for by traditional businesses. Don't like COCOMO? That's fine - use your own model, preferably one that's been widely tested in the industry. This paper shows you exactly how to do this sort of analysis.
4. I do not claim that every line of code is a "complete rebuild". I'm simply trying to estimate how much it would be take to build the system if it was rebuilt.
5. The problems with physical SLOC's sensitivity to formatting is well-documented, and I note that in the paper. It's not as bad as you'd think when analyzing larger systems, due to averaging. But if you would rather use logical SLOC, feel free to write code to do that and contribute it to sloccount. In short, instead of complaining, contribute.
6. As documented in the paper, I only used Basic COCOMO. I don't have enough information about each project to really use the more detailed COCOMO models effectively. However, the paper has all you need if you want to do more detailed analysis using other effort and cost estimation models, including the versions of COCOMO that require more input (e.g., Intermediate COCOMO).
7. SLOC isn't a very good measure of productivity, but it's generally a very good way to estimate effort. This distinction is important. If programmer A can do something in 100 SLOC, and programmer B needs 10,000 SLOC to do the same thing, it's crazy to think that programmer B is more productive. But it is reasonable to believe that it will take more effort for programmer B to do the same thing (and thus more money). It's possible to game this (e.g., creating separate print commands for each letter to be output as a string), but the resulting code is pretty ugly and programmers generally only intentionally game things if they believe having higher SLOC values will improve their salaries (an unlikely claim for the software in the Red Hat Linux distribution). The paper only measures effort to develop Red Hat Linux 7.1. You'll have to determine if that's a comparable level of functionality to other systems.
8. This doesn't count "the operating system". It counts "Red Hat Linux 7.1". Thus, it includes the word processors, spread sheets, and so on. It's not as easy to determine what to leave out; you could compute just the minimal "base", but few people would want to use such a system. Again, I think that's extremely clearly stated in the paper.
9. Others have been inspired by my paper to do an analysis of the Debian GNU/Linux distribution, using my tool sloccount. You can see their very interesting paper Counting Potatoes: The size of Debian 2.2 at http://people.debian.org /~jgb/debian-counting. They found that Debian 2.2 includes more than 55 million physical SLOC, and would have cost nearly $1.9 billion USD using over 14,000 person-years to develop using traditional proprietary techniques.
10. Yeah, I need a better picture. I just haven't gotten around to it.
--
- David A. Wheeler (see my Secure Programming HOWTO)