Estimating the Size/Cost of Linux

Billion dollars? by SpatchMonkey · 2002-07-05 02:13 · Score: 1

Where did he get the billion dollar estimate from? I see no direct correspondance between lines of code and monetary value.

Re:Billion dollars? by SpatchMonkey · 2002-07-05 02:19 · Score: 1

I understand that, but I find the methods he has used to come to that figure. His very simplistic formula is listed in section 3.7. Compared to the analysis listed in the rest of the document, which is very interesting, this cost estimation seems relatively niave.
Re:Billion dollars? by virve · 2002-07-05 02:21 · Score: 2, Informative

Where did he get the billion dollar estimate from? I see no direct correspondance between lines of code and monetary value.

He specifically talks about cost not value. But you are right that the correlation between sloc and cost is a non-trivial one. That is one reason why cost estimation is hard but it is far easier than guessing cost of a project before one has the source.

--
virve
Re:Billion dollars? by Oculus+Habent · 2002-07-05 02:27 · Score: 3, Insightful
Sure, but what about the time spent in bug fixes, patches, etc? I supposed you can do something like this:
- Standard programming takes A minutes per line on average.
- Bug fix/patching programming takes B minutes per line on average.
- Standard/Patch programming take up C/D percent of the time.
- Average (mode, perhaps) programmer salary is E dollars.
Programming cost = E dollars * ((X lines * C * percent * A minutes) + (X lines * D percent * B minutes))

You could even go fancy and calculate lines-per-minute based on each langauge. But then, what about Man pages, documentation, support sites, etc. These are things you would pay for in commercial software. Shouldn't these be a factor as well?
--
That what was all this school was for... to teach us how to solve our own problems. -- janeowit
Re: Billion dollars? by Black+Parrot · 2002-07-05 03:27 · Score: 2

> Where did he get the billion dollar estimate from? I see no direct correspondance between lines of code and monetary value.

Using his numbers, I calculate that my part time effort on a hobby project over the last 9 months has resulted in a quarter of a million dollars worth of code.

Any takers?

--
Sheesh, evil *and* a jerk. -- Jade
Re:Billion dollars? by Anonymous Coward · 2002-07-05 04:50 · Score: 0

Linux code is at least as good as windows'. So, start with what it cost to develop windows and add from there...
Re:Billion dollars? by Anonymous Coward · 2002-07-05 08:32 · Score: 0

I've known some programmers. Their view has been that they will get paid for anything they possibly can. Example: While testing a long-running section of code a programmer who shall remain nameless need only run the program in the background for 4 days continuously. This programmer bills the customer for (4 days * 24 hours) of work for the hours racked up while an old box pulled out of mothballs sat and hummed.

Yes, it's possible the box could have been doing some other useful work, but the fact of the matter was that this was a computer which wasn't being used for anything before or after this job was completed. Some of you will no doubt insist that screwing the customer is the only ethical thing to do in order to make sure everyone pays enough for their software but the fact of the matter is that for no effort and at no cost to him he has just pumped the customer for 96 hours of downloading porn and otherwise going about his pre-employment activities.

It's not how bad the code is or how big the code is, it's for how many hours the customer can be billed. If that means you type "#BBBBBB" for a million lines and can con someone into paying for it, somehow that's considered a justifiable alternative to useful work.

lets see here..... by Anonymous Coward · 2002-07-05 02:14 · Score: 4, Funny

[cmdrtaco@localhost]$ est slashcode Analyzing slashcode..... Result: $6.66 [cmdrtaco@localhost]$

Re:lets see here..... by Anonymous Coward · 2002-07-05 02:24 · Score: 0, Funny

according to the VA Linux's most recent quarterly filings, slashcode has a negative value.
Re:lets see here..... by Anonymous Coward · 2002-07-05 02:51 · Score: 1, Insightful

who the fuck is modding that offtopic? did you not read the article? the article deals with cost estimation of source code. In this post, we see a satirical representation of what CmdrTaco might experience if he were to run the cost estimator tool (the topic of the article) against slashcode, the code that runs slashdot. The little value returned is the running gag that slashcode is a pos (similar to the one gag of slashdot always being infected with the latest IIS security hole)

The resulting value of 666 is also a common joke among geeks.

sigh -- Maybe this is why some people have .sigs saying offtopic means the moderator missed the joke.
Re:lets see here..... by Anonymous Coward · 2002-07-05 03:30 · Score: 0

burn in hell heathen!

WTF? by Ctrl-Z · 2002-07-05 02:14 · Score: 1, Flamebait

Okay, so now Slashdot is posting this story that is over a year old?

From the header of the paper:

More Than a Gigabuck: Estimating GNU/Linux's Size
David A. Wheeler (dwheeler@dwheeler.com)
June 30, 2001 (updated November 8, 2001)
Version 1.06

--
www.timcoleman.com is a total waste of your time. Never go there.

Re:WTF? by be-fan · 2002-07-05 02:18 · Score: 2, Funny

The funny thing is that this story was posted on Slashdot a year ago!

--
A deep unwavering belief is a sure sign you're missing something...
Re:WTF? by Anonymous Coward · 2002-07-05 02:39 · Score: 0

Yes, i remember having read that one a while ago.

Slow news day, Taco? by damiam · 2002-07-05 02:16 · Score: 5, Interesting

Good god, people. This app has been out there for years. It's been mentioned in prevoius /. stories. Most people already know about it. This isn't news.

I know I'll get modded down for saying this, but Taco, as an "editor", couldn't you at least have fixed This Guy's Moronic Capitalization Scheme?

--
It's hard to be religious when certain people are never incinerated by bolts of lightning.

Re:Slow news day, Taco? by carlos_benj · 2002-07-05 02:30 · Score: 3, Funny

...couldn't you at least have fixed This Guy's Moronic Capitalization Scheme?

That's not a scheme. The entire post is a very long title for a very short book he's writing...

--
--

As a matter of fact, I am a lawyer. But I play an actor on TV.
Re:Slow news day, Taco? by Anonymous Coward · 2002-07-05 02:50 · Score: 0

Actually, it's wrong even for title/headline case.
Re:Slow news day, Taco? by stikves · 2002-07-05 04:44 · Score: 2

Forget it. This is Slashdot. You can find articles with,
- Typing errors (25 hours per day)
- Incorrect information
- Seen "n" time stories
everday. We also have "trolls", "flameblaits" here. Once we also had "first posters". But I think they are gone (at least after I set minimum rating to +2).

Get used to it!
Re:Slow news day, Taco? by damiam · 2002-07-05 04:54 · Score: 1

I'm aware of all this ("The Who Towers" :-). But this just seems worse than usual.

--
It's hard to be religious when certain people are never incinerated by bolts of lightning.

Yeah.... by graphicartist82 · 2002-07-05 02:16 · Score: 3, Funny

A Billion Dollars Worth Of Software On My System For Free!

Yeah, that's what happens when you use P2P _WAY_ too much

Re:Yeah.... by Anonymous Coward · 2002-07-05 03:08 · Score: 0

Some one mod this up!

I don't believe it!!!! by joshsnow · 2002-07-05 02:16 · Score: 0, Troll

Someone finally acknowledging that OpenSource/Free(beer) software actually has an associated cost - what next? Wait - is that a flying pig I see?

Interesting. by jellomizer · 2002-07-05 02:17 · Score: 2

Although I rember this article in the Past a fiew months ago. But I am to lazy to look it up. But it is instering how the Open Source movement just by a lot of people just doing a lot of little things (and some not so little) has created a product that would take a lot resources for a large company to complete. Open Source Software in my opinion is the only way the Little Guy to play with the Big Guns.

--
If something is so important that you feel the need to post it on the internet... It probably isn't that important.

Re:Interesting. by rnd() · 2002-07-05 02:32 · Score: 2

Open Source Software in my opinion is the only way the Little Guy to play with the Big Guns
Not the only way. A bunch of coders could put together a software company and develop great products and recruit top talent. The company would grow and might eventually displace Microsoft.
Microsoft was once a couple of college-age kids who stayed up all night writing code who happened to get the DOS contract.
Companies have an advantage over OSS developers in that when the company is poised for success, people want to invest money in the company in order to reap larger returns later. This gives the company the advantage of more money to recruit top full time talent, etc. Most people regrettably have bills to pay, and the poorly funded nature of most OSS projects will always limit the amount of some people's time that the projects can obtain.

--
Amazing magic tricks
Re:Interesting. by carlos_benj · 2002-07-05 02:32 · Score: 1

Although I rember this article in the Past a fiew months ago. But I am to lazy to look it up. But it is instering how the Open Source movement just by a lot of people just doing a lot of little things (and some not so little) has created a product that would take a lot resources for a large company to complete. Open Source Software in my opinion is the only way the Little Guy to play with the Big Guns.

--
If My spelling bugs you. Then my work is done.

In that case, you can go home now.

--
--

As a matter of fact, I am a lawyer. But I play an actor on TV.
Re:Interesting. by carlos_benj · 2002-07-05 02:54 · Score: 2, Interesting

Microsoft was once a couple of college-age kids who stayed up all night writing code who happened to get the DOS contract.

The chances of that happening again are fairly slim. This was clearly a case of being in the right place at the right time. A couple of years later and they would have found themselves trying to supplant the standard desktop OS. The combination of the right hardware platform, a 'new' OS and a viable business app all had to click at the same time. Had the PC revolution started years earlier and those same two college kids tried to unseat that alternate universe's Microsoft juggernaut it wouldn't happen, no matter how good a marketeer Bill is.

Companies have an advantage over OSS developers in that when the company is poised for success, people want to invest money in the company in order to reap larger returns later.

Precisely. Given the dominance of Microsoft in the market, those savvy people aren't likely to gamble with funds they want a return on. That's why OSS really is a viable way significant inroads can be made in the market. You now have several companies helping to fund that development. Entire countries are looking to OSS to free them from the Microsoft treadmill of costly upgrades and zany licensing fees. The momentum is building and Microsoft sees it. They don't have a problem with Apple because they see them as a niche player, but I don't think they'd be writing licenses with anti-GPL language in it if they didn't genuinely see it as a threat to marketshare. As much as some of us like to bash Microsoft the executives are not stupid and are quite capable of interpreting the GPL and understanding that their 'take' on the license just isn't supported by the GPL's language.

--
--

As a matter of fact, I am a lawyer. But I play an actor on TV.

How to put MS's 40 billion to good use. by Anonymous Coward · 2002-07-05 02:18 · Score: 0

Maybe MS could spend the money for a working OS in about a week or two.

I want to know by Anonymous Coward · 2002-07-05 02:18 · Score: 0

how much can I charge for my 5 line C++ "Hello world\n" program? ;-)

Re:I want to know by Anonymous Coward · 2002-07-05 03:09 · Score: 0

come on, some moderater has to find this funny ;)

At last! The real name of the X-Window System(tm) by Anonymous Coward · 2002-07-05 02:19 · Score: 0

This must be about the first time I've read an article about Linux, (GNU/Linux, if that's what you like to call it), that hasn't called the X-Window-System(tm), X-Windows.

As far as I am aware, Windows(tm) is a trademark of Microsoft Corporation(tm). The X Consortium actually give recommended names for X, in the X man page.

bad news for Linux? by tps12 · 2002-07-05 02:19 · Score: 5, Funny

This looks like a serious problem for Linux distributors like Red Hat, Mandrake, and Debian. They sell their products (which consist of software and support and manuals) for $40-$100, usually. Now we see that what they put into their product (i.e., the cost) is orders of magnitude beyond that. Even if Red Hat sold every single copy it packaged (it doesn't even come close), and even if nobody downloaded it for free or copied the CDs for a friend (again, an incredibly optimistic assumption), it would still be looking at huge losses.

This might have worked a few years ago, but with accounting practices coming under scrutiny across the board, I fear that these companies are headed for trouble.

--

Karma: Good (despite my invention of the Karma: sig)

Re:bad news for Linux? by John+Hasler · 2002-07-05 02:37 · Score: 2, Flamebait

This looks like a serious problem for Linux
distributors like Red Hat, Mandrake, and Debian.
They sell their products ... for $40-$100,
usually.

Wrong. Debian doesn't sell anything.

Now we see that what they put into their product
(i.e., the cost) is orders of magnitude beyond
that.

Wrong again. Red Hat's costs are what they actually spend, not what the stuff they distribute would have cost if it had not been given to them.

even if nobody downloaded it for free...

There's your clue: _Red_ _Hat_ downloads the stuff they distribute for free.

--
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
Re:bad news for Linux? by dattaway · 2002-07-05 02:49 · Score: 2, Insightful

A serious problem for them?

The IRS is going to love me come audit day...
Re:bad news for Linux? by jsse · 2002-07-05 03:58 · Score: 3, Funny

To: ceo@redhat.com
From: Congress

Dear Sir,

We figured out recently that you are selling software which worths 1 billion dollar at suspiciously low price(~$30-$200).

Worse still, you also allow people downloading your software products from your website for Free! We've reason to suspect that you also involved in anti-competitive practices.

I hereby invite you and your accountants to come to congress to answer some of our questions.

Best Rgds,

P.S. Do not attempt to destroy any accounting records, we are watching you.
Re:bad news for Linux? by Anonymous Coward · 2002-07-05 04:53 · Score: 0

tps12, you'd better watch your back. PhysicsGenius has got your routine bested. You might get the Funny mods, but PG gets all the class points.
Re:bad news for Linux? by Anonymous Coward · 2002-07-05 05:46 · Score: 0

That is harsh. I have nothing but respect for my colleague, Mr. Genius.

I am also the first to admit that I suck.

OTOH, I take my karma where I can get it, and to get my +0 Bonus back, I will take a Funny mod any day.
Re:bad news for Linux? by sir99 · 2002-07-05 16:22 · Score: 0, Offtopic

People that miss the joke are pretty funny themselves :)
Humor? What's that, the stuff in your eye? (vitreous humor, for the people I'm referring to)

--
The ocean parts and the meteors come down
Laid out in amber, baby.
Re:bad news for Linux? by Anonymous Coward · 2002-07-09 16:50 · Score: 0

I am also the first to admit that I suck.
Yeah right, I doubt that you are the first troll admiting to sucking goatse's asshole!

This guy is a fraud by Randy+Rathbun · 2002-07-05 02:19 · Score: 0, Flamebait

Why? He obviously does not use Linux. Just look at his picture! What Linux user out there is gonna be caught dead wearing a white shirt and a tie? Okay, maybe to a wedding/funeral, but that's it.

He also went off and shaved and combed his hair for his picture.

The man just ain't right, I tell you!

Re:This guy is a fraud by Anonymous Coward · 2002-07-05 05:59 · Score: 0

I remember the last wedding/funeral I heard about. This Palestinian group wanted to celebrate the happy event (wedding) by firing a mortar. That's right - a fucking mortar. Ended up killing about 25 people, including the groom (funeral). Darwin awards candidates, all.

Hmmm... sloccount, you say? by jaunty · 2002-07-05 02:21 · Score: 1

woody:~#apt-cache search sloccount
sloccount - Programs for counting physical source lines of code (SLOC) ...so it appears theres a *.deb of it already (or is this an old story...) Hmmm... you be the judge.

--
Why did I post this? Ask me now!

Re:Hmmm... sloccount, you say? by Ctrl-Z · 2002-07-05 02:26 · Score: 1

EVAL: it appears theres a *.deb of it already (or is this an old story...)

RESULT: TRUE.

--
www.timcoleman.com is a total waste of your time. Never go there.

Six dollars and 66 cents? by Anonymous Coward · 2002-07-05 02:21 · Score: 0, Funny

For slashcode? Dude, you got ripped off!

Re:Six dollars and 66 cents? by Anonymous Coward · 2002-07-05 02:45 · Score: 0

Did not, asswipe. Just because I chose to take another angle does not mean I missed his reference.

Here's one applicable to you: Slashbot shows the world he has a single-digit IQ. Film at 11.
Re:Six dollars and 66 cents? by Anonymous Coward · 2002-07-05 02:57 · Score: 0

Did not

Slashdotter whines about being made fun of. Film at 11.
Re:Six dollars and 66 cents? by Anonymous Coward · 2002-07-05 04:34 · Score: 0

Argument between indistinguishable ACs. Film at 11.
Re:Six dollars and 66 cents? by Theologian · 2002-07-05 05:14 · Score: 1

Did not, asswipe.
Sounds like someone needs to go back and do some paperwork using 2-ply and some sandpaper.....

--

Crapdot
News from birds. Stuff that splatters.

value? by rnd() · 2002-07-05 02:25 · Score: 3, Insightful

It's fun to see someone do somthing like this. However the fact that most people don't use Linux means that the value of using Linux is less than the cost of using linux. Therefore, since the source code is free there must be other costs that are preventing most people from using Linux.

Instead of wasting time figuring out ficticious pricing based on the way that corporate america prices software, why not figure out a way to remove the aforementioned hidden costs from Linux so that the masses can begin to see what many of us on /. have known for a while: That GNU Linux and Open Source Software represent a great choice.

--

Amazing magic tricks

Re:value? by GigsVT · 2002-07-05 03:36 · Score: 2, Informative

cost of using linux.

For many Windows "sysadmins", the cost of is the cost of actually learning the basics of how TCP/IP works, some basics about how their computer works, and basics about how some application level protocols work.

The hidden cost of Linux is the time you have to spend learning things you should already know, for many Windows admins.

--
I've had enough abrasive sigs. Kittens are cute and fuzzy.
Re:value? by The+Creator · 2002-07-05 05:23 · Score: 2, Informative

>If you can convince someone to pay $1,000,000 for linux, then it's worth $1,000,000. that's it

I bet if it was an exlusive licence, M$ whould shell it up :)

--

FRA: STFU GTFO
Re:value? by ztwilight · 2002-07-05 08:36 · Score: 1

Therefore, since the source code is free there must be other costs that are preventing most people from using Linux.

Actually, since there exists the thing called a Microsoft Tax, and since Microsoft makes software which is generally slightly easier to use, and since people are used to running Windows, and a few other reasons, people are still running Windows.

--
Who moved my sig?
Re:value? by rnd() · 2002-07-05 12:17 · Score: 2

Actually, since there exists the thing called a Microsoft Tax, and since Microsoft makes software which is generally slightly easier to use, and since people are used to running Windows, and a few other reasons, people are still running Windows.
You hit the nail on the head. From an economic perspective, the improved ease-of-use of Microsoft software combined with the benefit derived from the fact that people are used to Windows have a 'value' greater than the licensing fees that Microsoft charges.

--
Amazing magic tricks
Re:value? by okmijnuhb · 2002-07-05 13:34 · Score: 1

This is the perfect way to promote Gnu/Linux. Much the way software companies use phony methods to measure financial losses to piracy, when you download Gnu/Linux for free, you can see how much it's 'really' worth. Although '# of lines of code' might be an unfair measure,since open source tends to be more stable. So just add 50% to that value.

Hmmm by cca93014 · 2002-07-05 02:25 · Score: 1

It may well containt "over 30 million physical source lines of code (SLOC)", but what about the lines of source code? Eh?

Didn't think about that, did you?

--

Invoicing, Time Tracking, Reporting

Nonsense by qlmatrix · 2002-07-05 02:28 · Score: 2, Interesting

I don't think the measurement of the length of code or the time one has or might have been taken to produce the code is in any way related to the value for the use of the software produced.

The same people that argue in these categories do also try to legitimate open source software by their better "quality" in terms of fewer errors. The result of this argument is that MS software would be great to use if it contained less errors. But that's not the main point. As can be seen when MS does such horrible things like allowing themselves to destroy your software (DRM EULA change) the problem is not the result but the way they produce their software. I'd argue that because their development model is bad the resulting software is bad, too, bad that's only a minor problem in comparison to the harm they do to the software culture in general.

Re:Nonsense by Anonymous Coward · 2002-07-05 20:39 · Score: 0

I don't think the measurement of the length of code or the time one has or might have been taken to produce the code is in any way related to the value for the use of the software produced.

I've never really believed in this LOC stuff, and like to think of the customer benefit of a piece of code, which can be huge if it saves them manual processing time on a weekly or daily basis.
Its nice to be told though that something a group of 3 of us did in 6 months should have taken 10 man years, hmmm. I was using Emacs for a big chunk of that. I can work a lot faster now in Eclipse.
Its person-year estimates for Java must be well over the mark, or Java is very easy to write in, or very verbose, or there's massive variation between developers. I can run it against small packages that may have taken a few days or a week for just me, and get comparatively big results, like 1.78 man months. SLOC is about half of wc -l.
Has anyone else compared this program against their own code bases?

slashdotted! by Anonymous Coward · 2002-07-05 02:30 · Score: 0

This paper analyzes the amount of source code in GNU/Linux, using Red Hat Linux 7.1 as a representative GNU/Linux distribution, and presents what I believe are interesting results.

In particular, it would cost over $1 billion ($1,000 million - a Gigabuck) to develop this GNU/Linux distribution by conventional proprietary means in the U.S. (in year 2000 U.S. dollars). Compare this to the $600 million estimate for Red Hat Linux version 6.2 (which had been released about one year earlier). Also, Red Hat Linux 7.1 includes over 30 million physical source lines of code (SLOC), compared to well over 17 million SLOC in version 6.2. Using the COCOMO cost model, this system is estimated to have required about 8,000 person-years of development time (as compared to 4,500 person-years to develop version 6.2). Thus, Red Hat Linux 7.1 represents over a 60% increase in size, effort, and traditional development costs over Red Hat Linux 6.2. This is due to an increased number of mature and maturing open source / free software programs available worldwide.

Many other interesting statistics emerge. The largest components (in order) were the Linux kernel (including device drivers), Mozilla (Netscape's open source web system including a web browser, email client, and HTML editor), the X Window system (the infrastructure for the graphical user interface), gcc (a compilation system), gdb (for debugging), basic binary tools, emacs (a text editor and far more), LAPACK (a large Fortran library for numerical linear algebra), the Gimp (a bitmapped graphics editor), and MySQL (a relational database system). The languages used, sorted by the most lines of code, were C (71% - was 81%), C++ (15% - was 8%), shell (including ksh), Lisp, assembly, Perl, Fortran, Python, tcl, Java, yacc/bison, expect, lex/flex, awk, Objective-C, Ada, C shell, Pascal, and sed.

The predominant software license is the GNU GPL. Slightly over half of the software is simply licensed using the GPL, and the software packages using the copylefting licenses (the GPL and LGPL), at least in part or as an alternative, accounted for 63% of the code. In all ways, the copylefting licenses (GPL and LGPL) are the dominant licenses in this GNU/Linux distribution. In contrast, only 0.2% of the software is public domain.

This paper is an update of my previous paper on estimating GNU/Linux's size, which measured Red Hat Linux 6.2 [Wheeler 2001]. Since Red Hat Linux 6.2 was released in March 2000, and Red Hat Linux 7.1 was released in April 2001, this paper shows what's changed over approximately one year. More information is available at http://www.dwheeler.com/sloc. 1. Introduction The GNU/Linux operating system has gone from an unknown to a powerful market force. Netcraft found that, of the systems running web servers on June 2001, GNU/Linux was now the second most popular operating system (with 29.6%, versus Windows' 49.6%) [Netcraft 2001]. Another survey, of primarily European and educational sites, found that GNU/Linux was used more than any other operating system (of the sites it surveyed) [Zoebelein 1999]. IDC found that 25% of all server operating systems purchased in 1999 were GNU/Linux, making it second only to Windows NT's 38% [Shankland 2000a].

There appear to be many reasons for this, and not simply because GNU/Linux can be obtained at no or low cost. For example, experiments suggest that GNU/Linux is highly reliable. A 1995 study of a set of individual components found that the GNU and GNU/Linux components had a significantly higher reliability than their proprietary Unix competitors (6% to 9% failure rate with GNU and Linux, versus an average 23% failure rate with the proprietary software using their measurement technique) [Miller 1995]. A ten-month experiment in 1999 by ZDnet found that, while Microsoft's Windows NT crashed every six weeks under a ``typical'' intranet load, using the same load and request set the GNU/Linux systems (from two different distributors) never crashed [Vaughan-Nichols 1999].

However, possibly the most important reason for GNU/Linux's popularity among many developers and users is that its source code is generally ``open source software'' and/or ``free software''. A program that is ``open source software'' or ``free software'' is essentially a program whose source code can be obtained, viewed, changed, and redistributed without royalties or other limitations of these actions. A more formal definition of ``open source software'' is available from the Open Source Initiative [OSI 1999], a more formal definition of ``free software'' (as the term is used in this paper) is available from the Free Software Foundation [FSF 2000], and other general information about these topics is available at Wheeler [2000a]. Quantitative rationales for using open source / free software is given in Wheeler [2000b]. The GNU/Linux operating system is actually a suite of components, including the Linux kernel on which it is based, and it is packaged, sold, and supported by a variety of distributors. The Linux kernel is ``open source software''/``free software'', and this is also true for all (or nearly all) other components of a typical GNU/Linux distribution. Open source software/free software frees users from being captives of a particular vendor, since it permits users to fix any problems immediately, tailor their system, and analyze their software in arbitrary ways.

Surprisingly, although anyone can analyze GNU/Linux for arbitrary properties, I have found little published analysis of the amount of source lines of code (SLOC) contained in a GNU/Linux distribution. Microsoft unintentionally published some analysis data in the documents usually called ``Halloween I'' and ``Halloween II'' [Halloween I] [Halloween II]. Another study focused on the Linux kernel and its growth over time is by Godfrey [2000]; this is an interesting study but it focuses solely on the Linux kernel (not the entire operating system). Paul G. Allen posted some results from running Scientific Toolworks, Inc.'s tools on the Linux kernel, but this analysis only considered C code (including headers) - ignoring the many other languages used in constructing the Linux kernel (e.g., assembly language), and only concentrating on the kernel. The Free Code Graphing Project at http://fcgp.sourceforge.net generates a graphical representation of a program (currently, the Linux kernel), but only of the C code. In a previous paper, I examined Red Hat Linux 6.2 and the numbers from the Halloween papers [Wheeler 2001].

This paper updates my previous paper, showing estimates of the size of one of today's GNU/Linux distributions, and it estimates how much it would cost to rebuild this typical GNU/Linux distribution using traditional software development techniques. Various definitions and assumptions are included, so that others can understand exactly what these numbers mean. I have intentionally written this paper so that you do not need to read the previous version of this paper first.

For my purposes, I have selected as my ``representative'' GNU/Linux distribution Red Hat Linux version 7.1. I believe this distribution is reasonably representative for several reasons:

Red Hat Linux is the most popular Linux distribution sold in 1999 according to IDC [Shankland 2000b]. Red Hat sold 48% of all copies in 1999; the next largest distribution in market share sales was SuSE (a German distributor) at 15%. Not all GNU/Linux copies are ``sold'' in a way that this study would count, but the study at least shows that Red Hat's distribution is a popular one.
Many distributions (such as Mandrake) are based on, or were originally developed from, a version of Red Hat Linux. This doesn't mean the other distributions are less capable, but it suggests that these other distributions are likely to have a similar set of components.
All major general-purpose distributions support (at least) the kind of functionality supported by Red Hat Linux, if for no other reason than to compete with Red Hat.
All distributors start with the same set of open source software projects from which to choose components to integrate. Therefore, other distributions are likely to choose the same components or similar kinds of components with often similar size for the same kind of functionality.

Different distributions and versions would produce different size figures, but I hope that this paper will be enlightening even though it doesn't try to evaluate ``all'' distributions. Note that some distributions (such as SuSE) may decide to add many more applications, but also note this would only create larger (not smaller) sizes and estimated levels of effort. At the time that I began this project, version 7.1 was the latest version of Red Hat Linux available, so I selected that version for analysis.

Note that Red Hat Linux 6.2 was released on March 2000, Red Hat Linux 7 was released on September 2000 (I have not counted its code), and Red Hat Linux 7.1 was released on April 2001. Thus, the differences between Red Hat Linux 7.1 and 6.2 show differences accrued over 13 months (approximately one year).

Clearly there is far more open source / free software available worldwide than is counted in this paper. However, the job of a distributor is to examine these various options and select software that they believe is both sufficiently mature and useful to their target market. Thus, examining a particular distribution results in a selective analysis of such software.

Section 2 briefly describes the approach used to estimate the ``size'' of this distribution (more details are in Appendix A). Section 3 discusses some of the results. Section 4 presents conclusions, followed by an appendix. GNU/Linux is often called simply ``Linux'', but technically Linux is only the name of the operating system kernel; to eliminate ambiguity this paper uses the term ``GNU/Linux'' as the general name for the whole system and ``Linux kernel'' for just this inner kernel. 2. Approach My basic approach was to:

install the source code files in uncompressed format; this requires carefully selecting the source code to be analyzed.
count the number of source lines of code (SLOC); this requires a careful definition of SLOC.
use an estimation model to estimate the effort and cost of developing the same system in a proprietary manner; this requires an estimation model.
determine the software licenses of each component and develop statistics based on these categories.

More detail on this approach is described in Appendix A. A few summary points are worth mentioning here, however. 2.1 Selecting Source Code

I included all software provided in the Red Hat distribution, but note that Red Hat no longer includes software packages that only apply to other CPU architectures (and thus packages not applying to the x86 family were excluded). I did not include ``old'' versions of software, or ``beta'' software where non-beta was available. I did include ``beta'' software where there was no alternative, because some developers don't remove the ``beta'' label even when it's widely used and perceived to be reliable.

I used md5 checksums to identify and ignore duplicate files, so if the same file contents appeared in more than one file, it was only counted once (as a tie-breaker, such files are assigned to the first build package it applies to in alphabetic order).

The code in makefiles and Red Hat Package Manager (RPM) specifications was not included. Various heuristics were used to detect automatically generated code, and any such code was also excluded from the count. A number of other heuristics were used to determine if a language was a source program file, and if so, what its language was.

Since different languages have different syntaxes, I could only measure the SLOC for the languages that my tool (sloccount) could detect and handle. The languages sloccount could detect and handle are Ada, Assembly, awk, Bourne shell and variants, C, C++, C shell, Expect, Fortran, Java, lex/flex, LISP/Scheme, Makefile, Objective-C, Pascal, Perl, Python, sed, SQL, TCL, and Yacc/bison. Other languages are not counted; these include XUL (used in Mozilla), Javascript (also in Mozilla), PHP, and Objective Caml (an OO dialect of ML). Also code embedded in data is not counted (e.g., code embedded in HTML files). Some systems use their own built-in languages; in general code in these languages is not counted.

Re:slashdotted! by Anonymous Coward · 2002-07-05 02:32 · Score: 0

2.2 Defining SLOC
The ``physical source lines of code'' (physical SLOC) measure was used as the primary measure of SLOC in this paper. Less formally, a physical SLOC in this paper is a line with something other than comments and whitespace (tabs and spaces). More specifically, physical SLOC is defined as follows: ``a physical source line of code is a line ending in a newline or end-of-file marker, and which contains at least one non-whitespace non-comment character.'' Comment delimiters (characters other than newlines starting and ending a comment) were considered comment characters. Data lines only including whitespace (e.g., lines with only tabs and spaces in multiline strings) were not included.
Note that the ``logical'' SLOC is not the primary measure used here; one example of a logical SLOC measure would be the ``count of all terminating semicolons in a C file.'' The ``physical'' SLOC was chosen instead of the ``logical'' SLOC because there were so many different languages that needed to be measured. I had trouble getting freely-available tools to work on this scale, and the non-free tools were too expensive for my budget (nor is it certain that they would have fared any better). Since I had to develop my own tools, I chose a measure that is much easier to implement. Park [1992] actually recommends the use of the physical SLOC measure (as a minimum), for this and other reasons. There are disadvantages to the ``physical'' SLOC measure. In particular, physical SLOC measures are sensitive to how the code is formatted. However, logical SLOC measures have problems too. First, as noted, implementing tools to measure logical SLOC is more difficult, requiring more sophisticated analysis of the code. Also, there are many different possible logical SLOC measures, requiring even more careful definition. Finally, a logical SLOC measure must be redefined for every language being measured, making inter-language comparisons more difficult. For more information on measuring software size, including the issues and decisions that must be made, see Kalb [1990], Kalb [1996], and Park [1992].
Note that this required that every file be categorized by language type (so that the correct syntax for comments, strings, and so on could be applied). Also, automatically generated files had to be detected and ignored. Thankfully, my tool ``sloccount'' does this automatically. 2.3 Estimation Models
This decision to use physical SLOC also implied that for an effort estimator I needed to use the original COCOMO cost and effort estimation model (see Boehm [1981]), rather than the newer ``COCOMO II'' model. This is simply because COCOMO II requires logical SLOC as an input instead of physical SLOC.
Basic COCOMO is designed to estimate the time from product design (after plans and requirements have been developed) through detailed design, code, unit test, and integration testing. Note that plans and requirement development are not included. COCOMO is designed to include management overhead and the creation of documentation (e.g., user manuals) as well as the code itself. Again, see Boehm [1981] for a more detailed description of the model's assumptions. Of particular note, basic COCOMO does not include the time to develop translations to other human languages (of documentation, data, and program messages) nor fonts.
There is reason to believe that these models, while imperfect, are still valid for estimating effort in open source / free software projects. Although many open source programs don't need management of human resources, they still require technical management, infrastructure maintenance, and so on. Design documentation is captured less formally in open source projects, but it's often captured by necessity because open source projects tend to have many developers separated geographically. Clearly, the systems must still be programmed. Testing is still done, although as with many of today's proprietary programs, a good deal of testing is done through alpha and beta releases. In addition, quality is enhanced in many open source projects through peer review of submitted code. The estimates may be lower than the actual values because they don't include estimates of human language translations and fonts.
Each software source code package, once uncompressed, produced zero or more ``build directories'' of source code. Some packages do not actually contain source code (e.g., they only contain configuration information), and some packages are collections of multiple separate pieces (each in different build directories), but in most cases each package uncompresses into a single build directory containing the source code for that package. Each build directory had its effort estimation computed separately; the efforts of each were then totalled. This approach assumes that each build directory was developed essentially separately from the others, which in nearly all cases is quite accurate. This approach slightly underestimates the actual effort in the rare cases where the development of the code in separate build directories are actually highly interrelated; this effect is not expected to invalidate the overall results.
For programmer salary averages, I used a salary survey from the September 4, 2000 issue of ComputerWorld; their survey claimed that this annual programmer salary averaged $56,286 in the United States. I was unable to find a publicly-backed average value for overhead, also called the ``wrap rate.'' This value is necessary to estimate the costs of office space, equipment, overhead staff, and so on. I talked to two cost analysts, who suggested that 2.4 would be a reasonable overhead (wrap) rate. Some Defense Systems Management College (DSMC) training material gives examples of 2.3 (125.95%+100%) not including general and administrative (G&A) overhead, and 2.81 when including G&A (125% engineering overhead, plus 25% on top of that amount for G&A) [DSMC]. This at least suggests that 2.4 is a plausible estimate. Clearly, these values vary widely by company and region; the information provided in this paper is enough to use different numbers if desired. These are the same values as used in my last report. 2.4 Determining Software Licenses A software license determines how that software can be used and reused, and open source software licensing has been a subject of great debate. The Software Release Practice HOWTO [Raymond 2001] discusses briefly why license choices are so important to open source / free software projects:
The license you choose defines the social contract you wish to set up among your co-developers and users ...
Who counts as an author can be very complicated, especially for software that has been worked on by many hands. This is why licenses are important. By setting out the terms under which material can be used, they grant rights to the users that protect them from arbitrary actions by the copyright holders.
In proprietary software, the license terms are designed to protect the copyright. They're a way of granting a few rights to users while reserving as much legal territory is possible for the owner (the copyright holder). The copyright holder is very important, and the license logic so restrictive that the exact technicalities of the license terms are usually unimportant.
In open-source software, the situation is usually the exact opposite; the copyright exists to protect the license. The only rights the copyright holder always keeps are to enforce the license. Otherwise, only a few rights are reserved and most choices pass to the user. In particular, the copyright holder cannot change the terms on a copy you already have. Therefore, in open-source software the copyright holder is almost irrelevant -- but the license terms are very important.

Well-known open source licenses include the GNU General Public License (GPL), the GNU Library/Lesser General Public License (LGPL), the MIT (X) license, the BSD license, and the Artistic license. The GPL and LGPL are termed ``copylefting'' licenses, that is, the license is designed to prevent the code from becoming proprietary. See Perens [1999] for more information comparing these licenses. Obvious questions include ``what license(s) are developers choosing when they release their software'' and ``how much code has been released under the various licenses?''
An approximation of the amount of software using various licenses can be found for this particular distribution. Red Hat Linux uses the Red Hat Package Manager (RPM), and RPM supports capturing license data for each package (these are the ``Copyright'' and ``License'' fields in the specification file). I used this information to determine how much code was covered by each license. Since this field is simply a string of text, there were some variances in the data that I had to clean up, for example, some entries said ``GNU'' while most said ``GPL''. In some cases Red Hat did not include licensing information with a package. In that case, I wrote a program to attempt to determine the license by looking for certain conventional filenames and contents.
This is an imperfect approach. Some packages contain different pieces of code with difference licenses applying to different pieces. Some packages are ``dual licensed'', that is, they are released under more than one license. Sometimes these other licenses are noted, while at other times they aren't. There are actually two BSD licenses (the ``old'' and ``new'' licenses), but the specification files don't distinguish between them. Also, if the license wasn't one of a small set of common licenses, Red Hat tended to assigned nondescriptive phrases such as ``distributable''. My automated techniques were limited too, in particular, while some licenses (e.g., the GPL and LGPL) are easy to recognize automatically, BSD-like and MIT-like licenses vary the license text and so are more difficult to recognize automatically (and some changes to the license would render them non-open source, non-free software). Thus, when Red Hat did not identify a package's license, a program dual licensed under both the BSD and GPL license might only be labelled as having the GPL using these techniques. Nevertheless, this approach is sufficient to give some insight into the amount of software using various licenses. Future research could examine each license in turn and categorize them; such research might require several lawyers to determine when two licenses in certain circumstances are ``equal.''
One program worth mentioning in this context is Python, which has had several different licenses. Version 1.6 and later (through 2.1) had more complex licenses that the Free Software Foundation (FSF) believes were incompatible with the GPL. Recently this was resolved by another change to the Python license to make Python fully compatible with the GPL. Red Hat Linux 7.1 includes an older version of Python (1.5.2), presumably because of these licensing issues. It can't be because Red Hat is unaware of later versions of Python; Red Hat uses Python in its installation program (which it developed and maintains). Hopefully, the recent resolution of license incompatibilities with the GPL license will enable Red Hat to include the latest versions of Python in the future. In any case, there are several different Python-specific licenses, all of which can legitimately be called the ``Python'' license. Red Hat has labelled Python itself as having a ``Distributable'' license, and package Distutils-1.0.1 is labelled with the ``Python'' license; these labels are kept in this paper.

No more functions for me... by evilviper · 2002-07-05 02:31 · Score: 3, Funny

I'll never use macros, functions, classes, or the stl again!

"Look, I wrote a program which does the exact same thing as another program, but mine is worth much, much more!"

--
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant

Re:No more functions for me... by rjw57 · 2002-07-05 03:22 · Score: 3, Insightful

Thats precisely the point. Not using STL or standard functions increases the time taken to code, the amount of programming required and decreases the maintainability of the code -- in short your code would _cost_ _more_ to develop if you were company paying for it.

cost != value in general

--
Rich
Re:No more functions for me... by Tony-A · 2002-07-05 07:56 · Score: 2

Good one!
I'll never use macros, functions, classes, or the stl again!
"Look, I wrote a program which does the exact same thing as another program, but mine is worth much, much more!"

Costs much, much more. Almost certainly.
Worth much, much more. Maybe.

With the cheaper way, you are at the mercy of the subroutines (of whatever binding) that you are using. The price is some variant of DLL hell.
With the more expensive way, everything is or can be optimized for exactly what you are doing. You don't need to solve problems you don't have. The price is a vastly larger scope of responsibility.

Which is better depends of course on the context.
Good example of the difficulties of defining any rational metric on software.
Re:No more functions for me... by evilviper · 2002-07-05 13:02 · Score: 2

Not using STL or standard functions increases the time taken to code
If I was to essentially rewrite the STL myself, sure...

I was just implying that I would cut-and-paste every relevant piece of STL code into my program, rather than '#include'ing it.

--
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant

Yeah, whatever... by st0rmshad0w · 2002-07-05 02:33 · Score: 1

Just try explaining it to your insurance company after your house gets robbed, or some idiot airport security inspector accidently trashes your laptop.

Heck, given that theory, one fire should net me more than enough to retire on.

Re:Yeah, whatever... by Anonymous Coward · 2002-07-05 05:31 · Score: 0

Or explain it to customs when you attempt to take a copy to another country...

Slashdot costs industry $1billion/year by pubjames · 2002-07-05 02:34 · Score: 5, Interesting

I love these kind of stats.

Slashdot has, say, 100,000 US readers per day.

Each spends an hour reading slashdot when they should be working.

Let's say an average Slashdot reader is worth say, $40 an hour, and they read Slashdot on 300 days during the year.

That means Slashdot costs the USA $1,200,000,000 dollars a year! Crikey! Don't tell Bush!

Re:Slashdot costs industry $1billion/year by Anonymous Coward · 2002-07-05 02:57 · Score: 0

Problems with your analysis:

1) Less than 1% of slashbots have ever held a job.
2) The average slashbot is worthless as a human being.

Your final result is correct, however.

That means Slashdot costs the USA $1,200,000,000 dollars a year!

Yes. In welfare payments.
Re:Slashdot costs industry $1billion/year by Anonymous Coward · 2002-07-05 03:13 · Score: 0

If you're going to analyze things to death, let me give it a whirl. Based on your two points, and you objection to the parent post, the final result is NOT correct. In order for slashdot to cost the US 1.2B a year in welfare, that would have to mean that the slashbots don't have a job BECAUSE they read slashdot. But you seem to disagree with that contention.
Re:Slashdot costs industry $1billion/year by Garg · 2002-07-05 04:14 · Score: 1

You're assuming that if I'm not goofing off reading /., then I'm not goofing off come other way...

You underestimate me, sir.

Garg

--
Garg
Alumnus, Xavier's School for Gifted Youngsters
Re:Slashdot costs industry $1billion/year by Anonymous Coward · 2002-07-05 05:28 · Score: 0

I wouldn't take the collective mass of slashdotters for $40. In fact, I'd say that Slashdot is saving the US money by providing an environment where the geeks are less likely to breed.
Re:Slashdot costs industry $1billion/year by bogie · 2002-07-05 05:52 · Score: 1

While your post is still funny, I don't think the average Slashdot reader earn $83,200 a year. If they did we would have a hell of a lot more buying power and could change the landscape of the software industry overnight with the right coordination.

--
If you wanna get rich, you know that payback is a bitch
Re:Slashdot costs industry $1billion/year by Papineau · 2002-07-05 06:24 · Score: 2

He's not saying that you're paid $40/h. That's what a typical Slashdot reader costs to his employer (salary, rent for building, phone line, equipment, etc.). It's the amount of money the employer must pay in a year for employing an employee divided by the hours worked by the employee. It's probably used with regular time only, else it'll end up lower (more hours without the related rise in costs).
Re:Slashdot costs industry $1billion/year by jrexilius · 2002-07-05 06:30 · Score: 0

$85k/yr is avg salary for good unix engineer..
Re:Slashdot costs industry $1billion/year by xtremex · 2002-07-05 10:32 · Score: 1

Very true....and that number is double what an MCSE is worht :)

--
If you're not a Liberal in your 20's, then you have no heart.If you're still a Liberal in your 30's you have no brain.
Re:Slashdot costs industry $1billion/year by Anonymous Coward · 2002-07-06 02:58 · Score: 0

> Let's say an average Slashdot reader is worth say, $40 an hour, and they read Slashdot on 300 days during the year.
That means Slashdot costs the USA $1,200,000,000 dollars a year! Crikey! Don't tell Bush!

Considering the alternative is porn,
Just think of all the marriages that slashdot is saving!
Bush wouldn't mind that.

But.. by iONiUM · 2002-07-05 02:34 · Score: 1

A shorter program that did the same thing as a longer program, but was more efficient than a longer program might have taken much more time/effort to code.. I don't think it could possibly take this into consideration.
Personally, I'd feel bad if I wrote a program which was just a bunch of spaghetti.

Now we know why... by Navius+Eurisko · 2002-07-05 02:35 · Score: 2

Microsoft puts so much code bloat into their programs...

Re:Now we know why... by Anonymous Coward · 2002-07-05 02:52 · Score: 0

look at the article, Mozilla (M18) has more lines of code than the Linux Kernal!!!
CAN SOMONE SAY BLOAT?

Makes you wonder... by Anonymous Coward · 2002-07-05 02:37 · Score: 0

Part of that $1bil could have helped feed a programmers family and gone toward making a more stable OS. And with all the layoffs in the industry, don't you just feel aweful for patronizing such software?!

Yeah, right by af_robot · 2002-07-05 02:38 · Score: 1

So 10 billion lines of bad bloated code will worth more that 10.000 lines of pure, clean and fast code?

Re:Yeah, right by Oculus+Habent · 2002-07-05 02:58 · Score: 2, Insightful

I think Microsoft has proved that true.

Bloated code may not be best, but it gets out the door faster.

Can you imagine what would happen in Microsoft cleaned the code to Windows XP? Imagine, they release an 40-mb service pack that trim's the OS size down 300MB, decreases boot-time by 75%, improves program launch speed 300%, improves security, stability, and functionality; all while making the OS easier to upgrade, and implement.

Of course, when this release is finally out in 2057, it won't make much difference.

--
That what was all this school was for... to teach us how to solve our own problems. -- janeowit
Re:Yeah, right by martyn+s · 2002-07-05 03:01 · Score: 1

See this post

The study talks about cost, not value.

That's 1 Billion (year 2000) US dollars by Anonymous Coward · 2002-07-05 02:42 · Score: 0

But how many rupees?
Just think how much they could have saved if they had outsourced it to an Indian contractor!

Re:That's 1 Billion (year 2000) US dollars by Anonymous Coward · 2002-07-05 07:38 · Score: 0

Just think how much they could have saved if they had outsourced it to an Indian contractor!
Yeah, but the man pages would read
Dear sir, to obtain a directory listing, we can type 'ls'. u can also use svral cmdline options, which are all diferent.
Much lookig forward to next topic. thank!!1!
Hmmmmm.

His Paper Is Bunk by dbretton · 2002-07-05 02:42 · Score: 5, Insightful

To put it mildly...

In his paper, he uses the basic COCOMO model for estimating the cost. This model, quite frankly, sucks. Boehm's book even states, more or less, that the COCOMO model is only accurate to a factor of 10.

Since I no longer have the Boehm book, this quote from a google-found web page will have to do. This is a quote of a quote from Boehm's book, Software Engineering Economics:

"Basic COCOMO is good for rough order of magnitude estimates of software costs, but its accuracy is necessarily limited because of its lack of factors to account for differences in hardware constraints, personnel quality and experience, use of modern tools and techniques, and other project attributes known to have a significant influence on costs."

Basically, this means that the estimate could be anywhere from $100M->10B in true cost.

At the very least, this kid should have stated which of the model variants he was using.

Better yet, he should have subdivided the source code into multiple categories: kernel+drivers, tools, productivity software, etc. etc., and then applied the various models to them.

Just my 2 bits.

BTW, here is the google-found page which has the quote I stole. Plus, it gives a nice, albeit brief, overview of COCOMO.

-d

Re:His Paper Is Bunk by sean23007 · 2002-07-05 02:55 · Score: 2

If it's off by a factor of 10, how could it range between 100M and 10B? Wouldn't that be 2 factors of 10? And that's a whole hell of a lot of linux!

--

Lack of eloquence does not denote lack of intelligence, though they often coincide.
Re:His Paper Is Bunk by sk8king · 2002-07-05 03:15 · Score: 1

factor of 10 difference from 1B [either greater or lesser]. The original poster is correct.
Re: His Paper Is Bunk by Black+Parrot · 2002-07-05 03:20 · Score: 1

> Basically, this means that the estimate could be anywhere from $100M->10B in true cost.

So if you're buying argue for $100M, but if you're selling then politely suggest that $10B is more accurate.

--
Sheesh, evil *and* a jerk. -- Jade
Re:His Paper Is Bunk by gosand · 2002-07-05 03:25 · Score: 2

Another quote by Boehm, as quoted in Software Engineering A Practitioner's Approach, 3rd edition, by Roger S. Pressman:
Today, a software cost estimation model is doing well if it can estimate software development costs within 20% of actual costs, 70% of the time, and on its own turf (that is, within the class of projects to which it has been calibrated)...This is not as precise as we might like, but it is accurate enough to provide a good deal of help in software engineering economic analysis and decision making.
I type this in from the dusty book sitting on my desk, which was the textbook for my last CS class in college, back in '93. Software engineering. Most useful class I ever took in college.
This is hardly an endorsement of COCOMO. (COnstructive COst MOdel) Not to slam the author of the paper, it was an interesting idea. Just don't go around thinking that his findings are entirely accurate.

--

My beliefs do not require that you agree with them.
Re:His Paper Is Bunk by sean23007 · 2002-07-05 03:56 · Score: 2

Yes, thank you. Bear in mind that he did not say that the "estimate" was $1B, which was a key assumption that I did not make. And isn't it possible to tell whether you're off by a factor of 10 too high or too low? I mean, that's something a human should pick up on pretty easily, so one of the options could be dropped, in all likelihood.

--

Lack of eloquence does not denote lack of intelligence, though they often coincide.
Re:His Paper Is Bunk by Random+Walk · 2002-07-05 05:01 · Score: 1, Flamebait

Talking about accuracy: his program estimates 11.71 person-years to build one of the applications I have developed. Actually, I am working three years in my spare time on it ... maybe I have unknowingly figured out how to warp time ?
Re:His Paper Is Bunk by Anonymous Coward · 2002-07-05 05:20 · Score: 0

Christ, you're such an idiot. Just put down the mouse and back slowly away from the computer.
Re:His Paper Is Bunk by damiam · 2002-07-05 11:31 · Score: 1

A similar story: I'm working on a pet project that's currently at about 550 lines of C. I, a reletively incompetent programmer, have spent about ten hours on it. A good programmer could have written it in two hours or less. Yet, sloccount estimates the total cost as $14,835, and the total development time as 1.32 months.

--
It's hard to be religious when certain people are never incinerated by bolts of lightning.

Don't be confused by EdMcMan · 2002-07-05 02:44 · Score: 2, Interesting

Well, when I saw the tidbit on /., I thought, wow, a billion dollars worth of software in a Linux distro? That is not what this article says. It simply says that RedHat (would have) had to pay the developers a billion dollars to complete that much work. To find out how much it should probably cost, add some money for profit, and divide that by how many probably users there are. This would only make sense for Linux as a whole, and not just RedHat.

Re:Don't be confused by Anonymous Coward · 2002-07-05 06:05 · Score: 0

It also ignores code reuse, which is quite popular in open sores software (Most X11 Window Managers are modifications of twm or fvwm). Heck, look at cp, mv, and rm. They're 90% similar, but are being considered 3 separate and unique programs.

PWPBOT IS DEAD by Anonymous Coward · 2002-07-05 02:46 · Score: 0, Funny

I just heard some sad news on talk radio - troller/crapflooder pwpbot was found dead in its basement this morning. There weren't any more details. I'm sure everyone in the Slashdot community will miss it - even if you didn't enjoy his work, there's no denying its contributions to popular culture. Truly an Slashdot icon.

isn't SLOC junk? by *weasel · 2002-07-05 02:47 · Score: 3, Interesting

if analyzing SLOC says nothing about developer contributions, efficiency, or effectiveness - then isn't estimating value based off SLOC fundamentally flawed?

i mean, you can't have it both ways. Either SLOC shows how productive programmers are, or it doesn't.

if it does - then get over the SLOC analysis in your job reviews.
if it doesn't - then you cannot even remotely accurately guage monetary worth through SLOC.

good luck to the people trying to estimate worth of OSS. good luck to the people trying to estimate the worth of programmers.

i just don't know why people don't count 'Customer Problems Solved Over Time' as the end-all, be-all.

(and time and energy fixing software bugs doesn't count. that's not the customers problem. it's the developers)

who cares how many SLOC are in a product. how many needs of the end user does it fulfill, and how long did it take to get done from the word 'go'?

yeah, you'd need to define customer needs much more carefully than most shops do... but isn't that part of the eXtreme Programming retinue /. loves to trumpet?

--
// "Can't clowns and pirates just -try- to get along?"

Re:isn't SLOC junk? by p3d0 · 2002-07-05 04:38 · Score: 2

SLOC is not a good measure of how "good" software is; merely of how complex it is, and how long it takes to develop. Studies have shown that SLOC is better at this than most other metrics:
...lines of code has commonly been found to outperform many of the more complex composite measures of software development.

- Powell, 1998

(Citeseer says it was published in 1996, but it's actually from 1998.)

--
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....

value / payback Linux-centric? by fw3 · 2002-07-05 02:52 · Score: 2

the fact that most people don't use Linux means that the value of using Linux is less than the cost of using linux.

The cost analysis was done based on linux, however most of the code analysed in fact is for things that run on other platforms, and much of which was in development for years before linux 0.9 hit the 'Net.

So the measure of value based on who uses Linux includes everyone who uses linux-hosted apache servers. The more general case includes everyone who accesses servers that depend on (Perl, BIND, sendmail, mysql .... etc) or were/are developed using (X11, CVS, bitkeeper, emacs, gcc .... etc)

The economic value isn't small. That much I'm pretty certain of, just how big, well it works for me, I'll leave the analysis to the economists.

--
Linux is Linux, if One need clarify their dist: <Dist>/GNU Linux
bsds are of course just BSD

Inflated prices? by sean23007 · 2002-07-05 02:53 · Score: 2

I kind of hope that nobody uses this to price software that they're selling to a company, lest they lose their credibility. There is no assurance that this guy did not lean toward making this software seem more valuable than it really is, thus making open source software more attractive (because you're getting something for nothing). I'd be careful using this in any other capacity than your home computer for the purpose of having fun.

On a similar note, do the prices seem accurate, for those of you who have used this thing?

--

Lack of eloquence does not denote lack of intelligence, though they often coincide.

Moderation on CRACK ! by Anonymous Coward · 2002-07-05 02:54 · Score: 0

This guy is correct.
The story already appeared on /. a years ago.
If you use moderation to abreact your sexual impotence, then get viagra or stop moderating.

Good Lord by Numeros · 2002-07-05 02:56 · Score: 0

Good lord, taco, you should have known that ~7000 stories ago somebody posted this already!!!
</sarcasm>

Come on people, cut the guy some slack. I am sure you can't remember every story posted!!

Re:Good Lord by EastCoastSurfer · 2002-07-05 03:40 · Score: 3, Insightful

I would agree, but even the crappy slashdot search came up with the old story post while searching for SLOC. It only came back with 3 stories including this current one. The best part is that Taco also posted the original.
Re:Good Lord by Zenithal · 2002-07-05 05:25 · Score: 0, Offtopic

> but even the crappy slashdot search

Then why did you use it?

> The best part is that Taco also posted the
> original.

Why are you reading this site?

I'm sure this'll cost me, but I don't care. I'm fucking sick of whining, bitching and moaning from people about how bad slashdot sucks ON SLASHDOT. For the love of God. What the hell is wrong with you people?

You do realize that you're the absolute prototypical hypocrite. Right?
Don't get me wrong, there's plenty of stuff about Slashdot, just like everything else, that could use improvement. But directionless, solutionless criticism solves absolutely nothing.

If you have a suggestion, make it. If you have a baseless complaint or a mindless comment about how bad the editors are, or how inappropriate a story is, wtf are you doing on the site to begin with, and certainly wtf are you doing posting in the story you think is inappropriate?!

I'm sick to death of impotent eliteists dictating what a good story is and isn't... and then many of the same cloned complaints getting modded up. That's insane. It has nothing to do with the topic, it has no contributory value. It's meaningless diatribe.

What makes me even MORE annoyed is that I've just wasted the last 10 minutes of my life, adding to the offtopic, worthless discussion because it just pisses me off so f**king much that people get a wicked, free service like slashdot, and then use it to complain about getting it.

*deep breath*
Ok. It's ok. 10 9 8 7...

--

Aaron
AaronCameron.net
Re:Good Lord by EastCoastSurfer · 2002-07-05 06:16 · Score: 1

What makes me even MORE annoyed is that I've just wasted the last 10 minutes of my life, adding to the offtopic, worthless discussion because it just pisses me off so f**king much that people get a wicked, free service like slashdot, and then use it to complain about getting it.

LOL, take that breath :) I understand your rant, but don't think that I haven't tried to improve it. I also never said that slashdot sucks, they just seem not be trying very much lately.

For awhile I was trying to look for and send timely news articles in hopes of getting recent news up on slashdot. Out of the the 3 I sent all 3 were rejected. Ok, thats fine I thought, it must not be a good slashdot story. Then, all 3 stories get accepted by someone else, albeit days later.

> but even the crappy slashdot search

Then why did you use it?

The /. search is pretty crappy. I was going to use it for Taco's defense, but it even found the duplicate article.

The reason I complained here about slashdot was that I don't want to continue to see it degrade. I have been in the process of persuading friends into the whole linux/OS thing and with slashdot(for better or worse) being the "news" site for OS software had them reading it daily. The last time I asked one particular friend if he had seen such and such story he said that he quit reading /. His reasons were that most of the news was old stuff that he had already seen and that it had become just a zealotry rant site.

*sigh* I guess by your logic I should just give up and quit coming here. Should I quit voting too?

handle by stud9920 · 2002-07-05 02:59 · Score: 1

"2bits writes"

With such a handle, how much pennies is his opinion worth ?

An interesting thumbsuck by Twylite · 2002-07-05 03:04 · Score: 5, Interesting

Running the same SLOC figures against the statistics from the Function Points methodology and you get a different picture. You are looking at 2500 person years of effort, with a cost optimum development time of 6.5 years. However, to deal with the complexity involved you will need approximately 3000 average and 1500 above average developers (at average development rate you could expect a 13 year delivery!). Total price tag: around $2 billion (that's 2e9, in case your definition of billion is different).

Of course, this is still a very skewed figure. There is no accounting for the quality of code (at the end of such a complex development cycle, you could expect as many as 7 million defects!), and both FP and COCOMO estimate development effort inclusive of design work and documentation, which in OpenSource typically don't match those in mature commercial development environments (from which the FP and COCOMO statistics are derived).

There is also a huge, and invalid, assumption made by the author, regarding the application of COCOMO (and my FP calculations suffer the same problem). The complexity of a system is MORE than the sum of its parts. This is because developer productivity declines as system complexity increases.

At 10,000 FP, as developer is often only 60% as productive compared to 1,000 FP. The situation is obviously far worse at 300,000 FP (the entire distribution), yet the kernel itself only weighs in at around 20,000 FP. And even then, clear modularisation reduces complexity for individual developers. So it is grossly unfair to base calculations on the system as a whole.

The kernel (around 2.5 MLOC) as a single system would be a task for 300 skilled developers over around 3 years, while the Gimp (around 500 KLOC, still near the top of the list in size) would be looking at 50 developers over 18 months. More complex projects need relatively more time and more developers. Doing all these projects in parallel (assuming it were possible - which is isn't because of dependancies, and that's another factor) would take less than the most complex task (kernel = 3 years) and relatively less developers than estimated based on the complexity of that task (30 MLOC / 2.5 MLOC * 300 developers = max 3600 for entire distribution). Max cost: 3600 * 3 * $55k = $594 million.

And you're STILL not accounting for the fact that employing someone costs a lot more than just paying a salary. Which puts all estimates (mine and the authors) up.

--
i-name =twylite [http://public.xdi.org/=twylite], see idcommons.net

Re:An interesting thumbsuck by foobar104 · 2002-07-05 04:19 · Score: 2

Total price tag: around $2 billion (that's 2e9, in case your definition of billion is different).

Man, what a bargain! Over two thousand man years of effort for only $1,024!

Of course, the poster meant 10e9, not 2e9. Or 2e30, I guess, but I'm assuming 10e9.
Re:An interesting thumbsuck by Hydrogenoid · 2002-07-05 04:43 · Score: 1

Err...
2e9 is a short form of 2 * 10^9, ya know...
So, yes, he was right...
Re:An interesting thumbsuck by Chandon+Seldon · 2002-07-05 04:44 · Score: 1

He meant 2e9 which expands as 2 * (10 ^ 9)

--
-- The act of censorship is always worse than whatever is being censored. Always.
Re:An interesting thumbsuck by foobar104 · 2002-07-05 04:51 · Score: 2

2e9 is a short form of 2 * 10^9, ya know...

Really? No, I didn't know. Ooooops....
Re:An interesting thumbsuck by Anonymous Coward · 2002-07-05 07:17 · Score: 0

Besides, 2^9 is 512.

as everybody knows : by stud9920 · 2002-07-05 03:06 · Score: 2, Insightful

Linux is free (as in beer) if you time is worthless.
</flamebait>

No, he's right by vrt3 · 2002-07-05 03:07 · Score: 2

Because we don't know if it's off to the low or to the high. If his estimate was 10 times too low, it was really 10B; if it was 10 times to high, it was really 100M.

--
This sig under construction. Please check back later.

Re:No, he's right by sean23007 · 2002-07-05 03:50 · Score: 2

Oh, okay. Makes sense. I didn't see it said that the "estimate" in question was $1B. Had that been there it would have been much simpler.

--

Lack of eloquence does not denote lack of intelligence, though they often coincide.
Re:No, he's right by p3d0 · 2002-07-05 04:23 · Score: 1

If only someone had included that estimate someplace obvious where you couldn't possibly miss it, like in the story itself...
Wow... A Billion Dollars Worth Of Software On My System For Free!

--
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....

Re:His Paper Is Bunk. You're right! by msevior · 2002-07-05 03:14 · Score: 5, Interesting

A proof point from Abiword. A just ran the program over our abi-unstable directory. About 300,000 LOC estimated cost to produce about $10,000,000.

I also ran the program over the abiword plugins directory. Estimated cost to produce, $1,200,000.

Now I know from direct experience that building the main code base of the AbiWord Word Processor took about 100 times more effort than the plugins.

Cheers

Martin Sevior
AbiWord Developer

!!!! NEWBIE ALERT !!!! by Anonymous Coward · 2002-07-05 03:14 · Score: 0

Oculus Habent is a newbie - he is really new to computers .

Re:!!!! NEWBIE ALERT !!!! by Oculus+Habent · 2002-07-05 04:21 · Score: 1

I didn't intend to say that bloated code was "better" because it was faster to market. On the contrary. Bloated code runs slowly and is more prone to failure and security issues. I apologize for the implication.

As for newbie, I've only been using programming for 14 years, so I can see how you could make that mistake.

--
That what was all this school was for... to teach us how to solve our own problems. -- janeowit

Linux's true cost: by Anonymous Coward · 2002-07-05 03:18 · Score: 4, Funny

Priceless

No, SLOC isn't Junk, and You Missed the Point by dbretton · 2002-07-05 03:23 · Score: 3, Insightful

if analyzing SLOC says nothing about developer contributions, efficiency, or effectiveness - then isn't estimating value based off SLOC fundamentally flawed?

1) SLOC says nearly *EVERYTHING* about developer contributions. After all, the SLOC is what the developer contributes.

2) Efficiency is a measurable metric, and can be quite as simple as (SLOC/MM)-(NumBugs/MM), where MM=Man-Month.
While there is a variance in the efficiency of programmers, for any given company a median efficiency can be determined. From this, a decent cost-estimate for SLOC may be determined.

i just don't know why people don't count 'Customer Problems Solved Over Time' as the end-all, be-all.

That collected metric would have almost no utility, unless you could atomize the concept of a 'customer problem'.

"Well, it took us 6MM to craete that web-based
accounting system, so it should take us about
the same to develop these kernel drivers"

Something like the above doesn't help anyone. It doesn't help the programmers who take part in recording the data; it doesn't help the managers plan and predict the product lifecycle; it doesn't help the customer in letting him know when to expect to see the next product release.

What you failed to do was drill down further in your analysis of the problem.
Let's say you just finished putting out product "X", which solved some customer problem. Now the customer wants product "Y" to solve some other problem. How do you estimate "Y" based upon "X"?
Answer: Break it down. "X" required the following capabilities: A,B,C, and D. You recorded and tracked the amount of time it took to accomplish each capability.

Now, you break down the customer problem, "Y", and determine what it would take to solve it.
If you did a good job at atomizing the customer problem on project "X", then you should have been able to come up with an average amount of time/AtomicProblem. Apply this metric and Viola!, you should have a good idea about the scope of "Y".
Many people like to take the AtomicProblem and equate it to a SLOC estimate.

What SLOC counting does is try to establish a commonality among various projects so that future projects of various natures may be estimated using previous metrics. This is not perfect, but it should be used as an aid in determining overall project scope and costs.

i mean, you can't have it both ways. Either SLOC shows how productive programmers are, or it doesn't.

SLOC shouldn't be used to estimate programmer productivity. It should be used to estimate project productity.

-D

Lies, damned lies, and statistics by Control-Z · 2002-07-05 03:32 · Score: 3, Funny

Obligatory Simpsons quote:

"Oh, people can come up with statistics to prove anything, Kent. 14% of people know that."

Re:Lies, damned lies, and statistics by damiam · 2002-07-05 11:34 · Score: 1

And, of course, 82% of statistics are made up on the spot.

--
It's hard to be religious when certain people are never incinerated by bolts of lightning.

Worrying? by peterpi · 2002-07-05 03:39 · Score: 1

Disclaimer: I know very little of what goes on at kernel level on a unix system.

Does anybody else find it worrying that the kernel is by far the largest component of RHL? I kinda expected it to be one of the smaller of the large projects; way smaller than the likes of KDE / GNOME / Gimp / etc..

Debian 10k packages by cyrilc · 2002-07-05 03:47 · Score: 1

I just wonder how much the Debian/GNU Linux would have costed based on the same calculation knowing that it now includes more than 10K packages

It's even more interesting from an accounting view by rcs1000 · 2002-07-05 03:52 · Score: 1, Redundant

If a corporation buys a Linux seat (or heck, downloads an ISO) then it has acquired an asset. Admittedly a digital one, but an asset nonetheless.

Now, if GE can revalue its pension assets upwards, when their value has gone down, then surely the corporation can revalue it to a 'market' rate of (say) $10,000 a seat.

Rolling it out to all the people in your organisation then, gosh!, your company is suddenly as profitable as Enron or WorldCom were.

Best of all, so long as you never run out of blank CDs, your company can continue to make massive profits.

--
--- My dad's political betting

value? by White+Shade · 2002-07-05 03:56 · Score: 2

I thought the value of a program (or any other noun) is related only to the amount of money that someone will pay for it ... If you can convince someone to pay $1,000,000 for linux, then it's worth $1,000,000. that's it.

a nifty little formula which analyzes the actual FUNCTION of a program to figure out how much it's worth is all well and good, but it doesn't mean anything. I bet the functional worth of Internet Explorer is quite a lot, but no one's willing to pay for it, so it's, in reality, worth nothing.

--
ìì!

Fun but meaningless... by mwillems · 2002-07-05 04:10 · Score: 2

These stats, of course, are fun but entirely meaningless.

If you are going to take the entire design cost into one copy, ok, so let's also add the cost of the CD (probably five billion or so in development cost) and the cost of the Microprocessor used to beta-test: around 50 billion I am guessing. Quite an expensive copy of RedHat.

The serious point is: to be at all meaningful, "cost" needs to be divided by number of users over the lifetime of the product. I would love to see those stats (and compare them to MS).

I venture Linux would still outvalue MS on that basis (if only because there are fewer users).

Michael

--

---
BDOS ERR ON A:>

visible man, with invisible shirt by Dr.+Awktagon · 2002-07-05 04:12 · Score: 4, Funny

Note to Mr. Wheeler: when your shirt is the same color as the background of your web site, you might want to put a thin border around the picture with your favorite free image editing software.. though I'm wondering why exactly your picture is there at all..

funny, but actually closer to $1,000,000 by Pinball+Wizard · 2002-07-05 04:15 · Score: 4, Funny

Sloccount run on Slashcode 2.25 gives us this:

Total Estimated Cost to Develop = $ 996,916

I would have posted the entire output of the program, but unfortunately, their million-dollar lameness filter wouldn't let me!

--

No, Thursday's out. How about never - is never good for you?

Of course it's a billion dollars. by blair1q · 2002-07-05 04:23 · Score: 2

Of course it cost a billion dollars to write the software everyone has on their machine. But Microsoft has $40 billion in the bank and collects $7-15 billion a year in revenues.

You do the math.

--Blair

Re:His Paper Is Valuable by mysticgoat · 2002-07-05 04:24 · Score: 3, Insightful

His paper is valuable, priceless even, in that it is throwing a spotlight on a part of the Open Source phenomenon that has not yet come into public discussion.

While I don't know COCOMO, I accept that his numbers are highly suspect. But you have provided a range of accuracy that corrects for this. I am very confident that any reasonable assessment of the Linux development effort is going to be greater than $100 million and less than $10 billion.

So it is indisputable that Linux is a resource whose development effort exceeds $100 million.

And no reasonable person can question that this resource is now available at very low cost to anyone or any institution, on a global level.

It is difficult to see how anyone could not recognize that the use of this resource increases global wealth. Linux does make the world pie bigger.

I think that is the real story here. Linux is a tool, a lever, that has required at least $100 million of effort to develop, but which anyone can put to work for extremely low cost. I think this kind of phrasing needs to be brought to the attention of those who are being FUDded by groups that feel threatened by Open Source.

The software industry is losing BILLIONS! by jellybear · 2002-07-05 04:29 · Score: 1

Just as I thought! Every copy of linux is costing the software industry over a billion dollars!

so by my calculation... by Anonymous Coward · 2002-07-05 04:36 · Score: 0

a gigabuck of non-business capable software = a gigabuck down the drain

does this get you thinking ?

Wow! by SomeOtherGuy · 2002-07-05 04:37 · Score: 2

One thing I got was that the amount of lines of code in Mozilla were about the same as everything else (minus the kernel) put together...

--
(+1 Funny) only if I laugh out loud.

Ahem... by Anonymous Coward · 2002-07-05 04:37 · Score: 0

That's "NMUBER OF TEH BESAT"

damages incurred.. by mcdade · 2002-07-05 04:42 · Score: 2

so does this mean that all the people who had their place raided and their linux box taken that they now incurred $1billion in damages???

Software is not worth "SLOC" by stikves · 2002-07-05 04:49 · Score: 2

Software value should not be calculated by the amount vendor spends, but by the amount "user gains".

Linux saves software cost. Also linux saves you from NIMDA. But linux means more expenses in tech team.

So value of linux is =
Value of Windows
+ Value that would be lost due to NIMDA, etc
- Cost of tech department difference

Which I guess is "much" more than $1G in total.

cost to develop != market value by jafac · 2002-07-05 04:58 · Score: 2

I mean, come on, sure, some of this stuff was written by the finest minds in the industry, who could easily have feched premium rates for their work, but chose not to for "the good of humanity" (or some other variation of the rationalization). Then there's the contributions from people who might not be able to hold down a job bussing tables at Denny's. Those are two extremes. You could easily compute an average cost from hours spent there.

But what could you sell the software for?

Nothing. It's market value is zero - because it's market is a Linux box, and we all know that nobody will pay for software on Linux, right? ;)

--

These are my friends, See how they glisten. See this one shine, how he smiles in the light.

Unfair! by orkysoft · 2002-07-05 05:03 · Score: 1

This method severely underestimates Perl programmers' efforts! :-P

--

I suffer from attention surplus disorder.

For a single project yes. by The+Creator · 2002-07-05 05:14 · Score: 1

The idea is that the inaccuracies go both ways. And for a whole lot of projects even out. If you get enough data then the low precision(* won't matter it the accuracy(* is good.(or was it the other way around)

*) Yes i'm using the math definitions of these words, not the dictionery ones. Because the dict. ones suck.

--

FRA: STFU GTFO

Maybe we should call it Mozilla/Linux :-) by vanguard · 2002-07-05 05:20 · Score: 2

* The largest components (in order) were the Linux kernel (including device drivers), Mozilla (Netscape's open source web system including a web browser, email client, and HTML editor), the X window system (the infrastructure for the graphical user interface), gcc (a compilation system), gdb (for debugging), basic binary tools, emacs (a text editor and far more), LAPACK (a large Fortran library for numerical linear algebra), the Gimp (a bitmapped graphics editor), and MySQL (a relational database system).

Since the second largest part of the system is now Mozilla and not gcc mabye we should stop calling it GNU/Linux and start calling it Mozilla/Linux. :-)

Vanguard

--
That which does not kill me only makes me whinier

Utterly ridiculous by MattRog · 2002-07-05 05:39 · Score: 2

This method of software cost estimation is patently ridiculous. I can't even imagine how anyone could take him even remotely seriously.

Counting MySQL, PHP, etc. lines of code as part of the OS is misleading -- did he count MS SQL, Access, etc. and other pieces of software which could be bundled with a particular flavor of Windows? Consumer Windows OS distribution contains a lot more application code (e.g. Office bundled, vendor-supplied drivers/goodies/etc.) than the 'stock' Windows code numbers listed in his comparisons. Further Windows does not contain individual drivers for every single piece of hardware out there, it has some generic drivers and then relies upon you vendor to supply the drivers for them, which is typically free. How many vendor-supplied drivers vs. homebrew are in Linux?

Further, he bases his cost as if Red Hat 7.x was a complete rebuild -- as if every single line of code was re-written from the previous version, so therefore so-much-ever-million-man-minutes went into making it is wrong. Someone invented the wheel many (tens of?) thousands of years ago. I bet a lot of man hours have been spent refining the wheel. Do auto manufacturers include that into the cost of cars? Do they make you pay for 10,000 years of refinement from the rock-with-a-hole-in-it to wagon wheels to the run-flat tires of today? No, they include the cost of the materials that went into making it and certainly *some* R time, but that cost calculation is determined from various sources, not 'how many molecules of rubber are in my tire'.

His LOC calculation is misleading as well.
if( something )
{
stuff
}
else
{
stuff
}

Contain 4 superfluous lines of code. According to his calculations I did 2x more work than if I wrote it like this:
if( something )
stuff
else
stuff

If you're frisky you can write it in a single line:
if( something ) { stuff } else { stuff }

Why this article was even mentioned here is beyond me. If it I could moderate it I'd put it at (-1: Stupid).

--

Thanks,
--
Matt

Re:Utterly ridiculous by Anonymous Coward · 2002-07-05 07:25 · Score: 0

"Counting MySQL, PHP, etc. lines of code as part of the OS is misleading -- did he count MS SQL, Access, etc. and other
pieces of software which could be bundled with a particular flavor of Windows?"

I never saw windows bundled with these things red hat is. Get a cd w/ this sw on it and it also installs windows and we'll talk. I bet you'll be in jail, though this is most likely illegal. :(

Yeah, right. by aardvarkjoe · 2002-07-05 05:41 · Score: 5, Insightful

According to this program, a little calculator program I've occasionally worked on in my spare time over the last couple years would have cost $ 85,659 to develop. (At the money that I was making as a co-op, roughly 3 years, full-time.) Another project, which my two roommates and I have been working on for most of the last year, again in our spare time, is reported to be $ 1,877,009.

So either I'm doing enough work to be worth several hundred thousand dollars a year, or this thing is complete nonsense.

--

How can we continue to believe in a just universe and freedom to eat crackers if we have no ale?

Re:Yeah, right. by Anonymous Coward · 2002-07-05 08:16 · Score: 0

Your calculator very likely isn't commercial-ready. If you wanted to sell your calculator, you'd need to write help, you'd need to do usability testing, you'd need to fix all the edge case bugs that random users would find that don't affect you when you use it as a personal tool.

If you did all that work, it probably would cost $85K.
Re:Yeah, right. by aardvarkjoe · 2002-07-05 09:08 · Score: 2

Well, by that standard, an awful lot of a standard Linux distribution isn't commercial-ready either. (Not that I would disagree with that, of course -- but the point still stands.)

--

How can we continue to believe in a just universe and freedom to eat crackers if we have no ale?
Re:Yeah, right. by Anonymous Coward · 2002-07-06 03:00 · Score: 0

> So either I'm doing enough work to be worth several hundred thousand dollars a year, or this thing is complete nonsense.

I think you should be flattered!
Have you asked for a raise yet? ;-)

What the hell???? by Anonymous Coward · 2002-07-05 06:28 · Score: 0

Who the hell let these people moderate? This guy obvioulsy meant his post to be funny, yet someone modded it as flamebait? Sheesh, people, get a sense of humor!

bull by Anonymous Coward · 2002-07-05 06:53 · Score: 0

this app produces nonsense, i have a project here which i have worked on for half a year, and it is declared to be the product of a year and a half of coding by three people, and being worth 200000. They prolly run the prog on itself, seen how much crap it is, and raised the values to save their self-esteem...

Wow! by ucblockhead · 2002-07-05 06:59 · Score: 2

I must be a genius...I ran this on the free-time project I started last December and it tells me that it would take a man-year to reproduce!

Wow! Apparently I can do the work of four normal programmers... time to talk to my boss about a raise!

--
The cake is a pie

Responses from the author!! by dwheeler · 2002-07-05 07:02 · Score: 5, Informative

Since I'm the author of this paper (More than a Gigabuck: Estimating GNU/Linux's Size), I suppose I should respond to some of the comments made here:

How did I arrive at the estimate of $1 billion? The short answer is "see the paper". I wrote a tool to compute the number of physical source lines of code (SLOC), used Boehm's well-repected COCOMO model to determine the effort (in person-years) from the SLOC, and then converted that effort into an estimated development cost using programmer salary averages and wrap rates. See the paper for the details.
It's true that there's no necessary relationship between cost and value. I don't see how that contradicts the paper; the paper never claims that there is one. Clearly, you can spend $1 million to develop a program that is worthless; it happens all too often. Proprietary vendors make money by making more money from sales than it cost to develop the software, so proprietary software vendors are very aware of the difference betwen value and cost. Look carefully at the phrasing. All the paper says is that "Had this Linux distribution been developed by conventional proprietary means, it would have cost over $1.08 billion (1,000 million) to develop in the U.S. (in year 2000 dollars)." The paper does not claim that Red Hat actually spent $1 billion, or that their distributions' sale value is related to this development cost figure. Indeed, what the paper shows is that by using OSS/FS approaches, it's possible to build large systems that would cost over $1 billion to develop using conventional proprietary means.
Several have complained about the use of COCOMO for estimating effort from lines of code. COCOMO is certainly not perfect, but it's a well-tested, widely accepted, and widely used model. It's also very clearly documented, so there are no "hidden assumptions". In particular, the model and constants used in COCOMO are based on a wide variety of real projects. It's rediculous to believe that its results are accurate to the nearest hour; as noted throughout the paper, this is only an estimate. A few people have noted that their software took less time to develop, but there are many factors at work. One is that highly experienced people can develop code more quickly; however, not everyone is equally skilled, so with large systems and many developers this effect should even out. Another is that COCOMO includes design time, documentation time, and testing time. Also, this includes not only an average U.S. programmers' salary, but also the wrap rate for overhead (building costs, insurance, and so on) - which programmers don't see in their paychecks, but are certainly paid for by traditional businesses. Don't like COCOMO? That's fine - use your own model, preferably one that's been widely tested in the industry. This paper shows you exactly how to do this sort of analysis.
I do not claim that every line of code is a "complete rebuild". I'm simply trying to estimate how much it would be take to build the system if it was rebuilt.
The problems with physical SLOC's sensitivity to formatting is well-documented, and I note that in the paper. It's not as bad as you'd think when analyzing larger systems, due to averaging. But if you would rather use logical SLOC, feel free to write code to do that and contribute it to sloccount. In short, instead of complaining, contribute.
As documented in the paper, I only used Basic COCOMO. I don't have enough information about each project to really use the more detailed COCOMO models effectively. However, the paper has all you need if you want to do more detailed analysis using other effort and cost estimation models, including the versions of COCOMO that require more input (e.g., Intermediate COCOMO).
SLOC isn't a very good measure of productivity, but it's generally a very good way to estimate effort. This distinction is important. If programmer A can do something in 100 SLOC, and programmer B needs 10,000 SLOC to do the same thing, it's crazy to think that programmer B is more productive. But it is reasonable to believe that it will take more effort for programmer B to do the same thing (and thus more money). It's possible to game this (e.g., creating separate print commands for each letter to be output as a string), but the resulting code is pretty ugly and programmers generally only intentionally game things if they believe having higher SLOC values will improve their salaries (an unlikely claim for the software in the Red Hat Linux distribution). The paper only measures effort to develop Red Hat Linux 7.1. You'll have to determine if that's a comparable level of functionality to other systems.
This doesn't count "the operating system". It counts "Red Hat Linux 7.1". Thus, it includes the word processors, spread sheets, and so on. It's not as easy to determine what to leave out; you could compute just the minimal "base", but few people would want to use such a system. Again, I think that's extremely clearly stated in the paper.
Others have been inspired by my paper to do an analysis of the Debian GNU/Linux distribution, using my tool sloccount. You can see their very interesting paper Counting Potatoes: The size of Debian 2.2 at http://people.debian.org /~jgb/debian-counting. They found that Debian 2.2 includes more than 55 million physical SLOC, and would have cost nearly $1.9 billion USD using over 14,000 person-years to develop using traditional proprietary techniques.
Yeah, I need a better picture. I just haven't gotten around to it.

--
- David A. Wheeler (see my Secure Programming HOWTO)

I don't see what all the fuss is about... by twoslice · 2002-07-05 07:09 · Score: 1

"Estimating the Size/Cost of Linux"

Let see now: the size is five letters (thank god I don't have to use my other hand!) and the cost is of course "Free" (look ma... no hands!)

--

From excellent karma to terible karma with a single +5 funny post...

What. A. Load. Of. Bollocks. by Joel+Rowbottom · 2002-07-05 07:12 · Score: 4, Interesting

We've run some metrics here at work.

We worked out that it took 8 MAN YEARS to write some code.

That's all well and good, but it's been mostly me writing it on 37.5-hour weeks for the past 10 months.

This is a big "duh" in my book.

--
Smegma.

What would M$ pay by OhYeah! · 2002-07-05 07:16 · Score: 1

What would Microsoft pay to buy up an exclusive right to use all of the Linux distributions? Maybe $1B is on the low end?

Re:His Paper Is Bunk. You're Right. by ghopper · 2002-07-05 08:14 · Score: 1

In his paper, he uses the basic COCOMO model for estimating the cost. This model, quite frankly, sucks. Boehm's book even states, more or less, that the COCOMO model is only accurate to a factor of 10.

I have the COCOMO II book, and I have used the COCOMO model for certain projects. I agree that it is not appropriate here. COCOMO was designed with a narrow focus in mind, and applied best to repeatable projects in a structured work environment. It requires you to estimate parameters for factors such as "Programmer Unfamiliarity", "Precedentedness" "Development Flexibility", "Team Cohesion", "Process Maturity", "Multisite Development", etc. Each of these fudge-factors makes it extremely difficult to correctly apply the model to someone else's work.

Also, each of these factors is likely to be different for each major component.

"I was unable to find a publicly-backed average value for overhead, also called the 'wrap rate.' This value is necessary to estimate the costs of office space, equipment, overhead staff, and so on. I talked to two cost analysts, who suggested that 2.4 would be a reasonable overhead (wrap) rate."(from here)

He is using an average overhead rate for a large corporation. He forgot to take in to account the fact that Open-Source developers (generally) don't get office space or health insurance or secretaries. They use their own equipment in their own homes. So a more reasonable overhead rate for this project would be close to 0.1.

So taking all of this in to account, he's probably off by a factor of more than 100. (If you want to know how accurate he was, compare his estimate to the actual cost of developing a Linux distro... ;) While it might have made interesting headlines, I see little value in the actual number.

Hmm. by mrselfdestrukt · 2002-07-05 08:56 · Score: 1

Just ran it on MS windows XP. Itr came to $0.39
Damn!
Does that mean that I've been * coughing* paying too much?

--
"I used to have that really cool,funny sig ,but it got stolen."

in other redundant news... by Anonymous Coward · 2002-07-05 11:22 · Score: 0

so without having any sort of accurate calculation his basic point is that free software would cost a lot more if it wasn't free?

thanks, i'll file that along with 'your computer wouldn't work without electricity' and other such jewels of insight...

mozilla counting is wrong by nslu · 2002-07-05 15:03 · Score: 0

quite a bit of mozilla code is written in javascript, and it isn't counted at all (or counted as c++? dunno)

Windows before Linux? by IWX222 · 2002-07-08 12:17 · Score: 1

Why start with what it cost to develop Windows? As far as I can see there are very few original ideas in Windows in terms of the GUI. If there is one OS that Windows was inspired by then it was surely MacOS! So, if you want to get closer to the billion dollars - add Apple's development costs to the date that Windows 3.0 was developed. Or would that take you over a billion? Feedback on my figures would be appreciated. I would hate to sound like an advocate of MacOS - but I believe it should be respected and left on the shelf for Microsoft to poach, whereas Linux should be revered for what it stands for. ------ IWX222 ------ the solution to the problem caused by the solution to the problem caused by the solution...

--

.sig me!

Slashdot Mirror

Estimating the Size/Cost of Linux

196 comments