Slashdot Mirror


Estimating the Size/Cost of Linux

2bits writes "Wow... A Billion Dollars Worth Of Software On My System For Free! Check This Guy Out, He Came Up With A Counting / Pricing Method For Quite A Few Types of Source Code. Here is the Program. The results on the site are sorta dated, based on RH 7.1, but the app is pretty cool!... Hey, I can finally find out how much all my side projects are worth / costing me..."

25 of 196 comments (clear)

  1. lets see here..... by Anonymous Coward · · Score: 4, Funny

    [cmdrtaco@localhost]$ est slashcode
    Analyzing slashcode.....
    Result: $6.66

    [cmdrtaco@localhost]$

  2. Slow news day, Taco? by damiam · · Score: 5, Interesting
    Good god, people. This app has been out there for years. It's been mentioned in prevoius /. stories. Most people already know about it. This isn't news.

    I know I'll get modded down for saying this, but Taco, as an "editor", couldn't you at least have fixed This Guy's Moronic Capitalization Scheme?

    --
    It's hard to be religious when certain people are never incinerated by bolts of lightning.
    1. Re:Slow news day, Taco? by carlos_benj · · Score: 3, Funny

      ...couldn't you at least have fixed This Guy's Moronic Capitalization Scheme?

      That's not a scheme. The entire post is a very long title for a very short book he's writing...

      --

      --

      As a matter of fact, I am a lawyer. But I play an actor on TV.

  3. Yeah.... by graphicartist82 · · Score: 3, Funny

    A Billion Dollars Worth Of Software On My System For Free!

    Yeah, that's what happens when you use P2P _WAY_ too much

  4. bad news for Linux? by tps12 · · Score: 5, Funny

    This looks like a serious problem for Linux distributors like Red Hat, Mandrake, and Debian. They sell their products (which consist of software and support and manuals) for $40-$100, usually. Now we see that what they put into their product (i.e., the cost) is orders of magnitude beyond that. Even if Red Hat sold every single copy it packaged (it doesn't even come close), and even if nobody downloaded it for free or copied the CDs for a friend (again, an incredibly optimistic assumption), it would still be looking at huge losses.

    This might have worked a few years ago, but with accounting practices coming under scrutiny across the board, I fear that these companies are headed for trouble.

    --

    Karma: Good (despite my invention of the Karma: sig)
    1. Re:bad news for Linux? by jsse · · Score: 3, Funny

      To: ceo@redhat.com
      From: Congress

      Dear Sir,

      We figured out recently that you are selling software which worths 1 billion dollar at suspiciously low price(~$30-$200).

      Worse still, you also allow people downloading your software products from your website for Free! We've reason to suspect that you also involved in anti-competitive practices.

      I hereby invite you and your accountants to come to congress to answer some of our questions.

      Best Rgds,

      P.S. Do not attempt to destroy any accounting records, we are watching you.

  5. value? by rnd() · · Score: 3, Insightful
    It's fun to see someone do somthing like this. However the fact that most people don't use Linux means that the value of using Linux is less than the cost of using linux. Therefore, since the source code is free there must be other costs that are preventing most people from using Linux.

    Instead of wasting time figuring out ficticious pricing based on the way that corporate america prices software, why not figure out a way to remove the aforementioned hidden costs from Linux so that the masses can begin to see what many of us on /. have known for a while: That GNU Linux and Open Source Software represent a great choice.

    --

    Amazing magic tricks

  6. Re:Billion dollars? by Oculus+Habent · · Score: 3, Insightful

    Sure, but what about the time spent in bug fixes, patches, etc? I supposed you can do something like this:

    • Standard programming takes A minutes per line on average.
    • Bug fix/patching programming takes B minutes per line on average.
    • Standard/Patch programming take up C/D percent of the time.
    • Average (mode, perhaps) programmer salary is E dollars.

    Programming cost = E dollars * ((X lines * C * percent * A minutes) + (X lines * D percent * B minutes))

    You could even go fancy and calculate lines-per-minute based on each langauge. But then, what about Man pages, documentation, support sites, etc. These are things you would pay for in commercial software. Shouldn't these be a factor as well?

    --
    That what was all this school was for... to teach us how to solve our own problems. -- janeowit
  7. No more functions for me... by evilviper · · Score: 3, Funny

    I'll never use macros, functions, classes, or the stl again!

    "Look, I wrote a program which does the exact same thing as another program, but mine is worth much, much more!"

    --
    Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    1. Re:No more functions for me... by rjw57 · · Score: 3, Insightful

      Thats precisely the point. Not using STL or standard functions increases the time taken to code, the amount of programming required and decreases the maintainability of the code -- in short your code would _cost_ _more_ to develop if you were company paying for it.

      cost != value in general

      --
      Rich
  8. Slashdot costs industry $1billion/year by pubjames · · Score: 5, Interesting


    I love these kind of stats.

    Slashdot has, say, 100,000 US readers per day.

    Each spends an hour reading slashdot when they should be working.

    Let's say an average Slashdot reader is worth say, $40 an hour, and they read Slashdot on 300 days during the year.

    That means Slashdot costs the USA $1,200,000,000 dollars a year! Crikey! Don't tell Bush!

  9. His Paper Is Bunk by dbretton · · Score: 5, Insightful

    To put it mildly...

    In his paper, he uses the basic COCOMO model for estimating the cost. This model, quite frankly, sucks. Boehm's book even states, more or less, that the COCOMO model is only accurate to a factor of 10.

    Since I no longer have the Boehm book, this quote from a google-found web page will have to do. This is a quote of a quote from Boehm's book, Software Engineering Economics:

    "Basic COCOMO is good for rough order of magnitude estimates of software costs, but its accuracy is necessarily limited because of its lack of factors to account for differences in hardware constraints, personnel quality and experience, use of modern tools and techniques, and other project attributes known to have a significant influence on costs."

    Basically, this means that the estimate could be anywhere from $100M->10B in true cost.

    At the very least, this kid should have stated which of the model variants he was using.

    Better yet, he should have subdivided the source code into multiple categories: kernel+drivers, tools, productivity software, etc. etc., and then applied the various models to them.

    Just my 2 bits.

    BTW, here is the google-found page which has the quote I stole. Plus, it gives a nice, albeit brief, overview of COCOMO.

    -d

  10. isn't SLOC junk? by *weasel · · Score: 3, Interesting


    if analyzing SLOC says nothing about developer contributions, efficiency, or effectiveness - then isn't estimating value based off SLOC fundamentally flawed?

    i mean, you can't have it both ways. Either SLOC shows how productive programmers are, or it doesn't.

    if it does - then get over the SLOC analysis in your job reviews.
    if it doesn't - then you cannot even remotely accurately guage monetary worth through SLOC.

    good luck to the people trying to estimate worth of OSS. good luck to the people trying to estimate the worth of programmers.

    i just don't know why people don't count 'Customer Problems Solved Over Time' as the end-all, be-all.

    (and time and energy fixing software bugs doesn't count. that's not the customers problem. it's the developers)

    who cares how many SLOC are in a product. how many needs of the end user does it fulfill, and how long did it take to get done from the word 'go'?

    yeah, you'd need to define customer needs much more carefully than most shops do... but isn't that part of the eXtreme Programming retinue /. loves to trumpet?

    --
    // "Can't clowns and pirates just -try- to get along?"
  11. An interesting thumbsuck by Twylite · · Score: 5, Interesting

    Running the same SLOC figures against the statistics from the Function Points methodology and you get a different picture. You are looking at 2500 person years of effort, with a cost optimum development time of 6.5 years. However, to deal with the complexity involved you will need approximately 3000 average and 1500 above average developers (at average development rate you could expect a 13 year delivery!). Total price tag: around $2 billion (that's 2e9, in case your definition of billion is different).

    Of course, this is still a very skewed figure. There is no accounting for the quality of code (at the end of such a complex development cycle, you could expect as many as 7 million defects!), and both FP and COCOMO estimate development effort inclusive of design work and documentation, which in OpenSource typically don't match those in mature commercial development environments (from which the FP and COCOMO statistics are derived).

    There is also a huge, and invalid, assumption made by the author, regarding the application of COCOMO (and my FP calculations suffer the same problem). The complexity of a system is MORE than the sum of its parts. This is because developer productivity declines as system complexity increases.

    At 10,000 FP, as developer is often only 60% as productive compared to 1,000 FP. The situation is obviously far worse at 300,000 FP (the entire distribution), yet the kernel itself only weighs in at around 20,000 FP. And even then, clear modularisation reduces complexity for individual developers. So it is grossly unfair to base calculations on the system as a whole.

    The kernel (around 2.5 MLOC) as a single system would be a task for 300 skilled developers over around 3 years, while the Gimp (around 500 KLOC, still near the top of the list in size) would be looking at 50 developers over 18 months. More complex projects need relatively more time and more developers. Doing all these projects in parallel (assuming it were possible - which is isn't because of dependancies, and that's another factor) would take less than the most complex task (kernel = 3 years) and relatively less developers than estimated based on the complexity of that task (30 MLOC / 2.5 MLOC * 300 developers = max 3600 for entire distribution). Max cost: 3600 * 3 * $55k = $594 million.

    And you're STILL not accounting for the fact that employing someone costs a lot more than just paying a salary. Which puts all estimates (mine and the authors) up.

    --
    i-name =twylite [http://public.xdi.org/=twylite], see idcommons.net
  12. Re:His Paper Is Bunk. You're right! by msevior · · Score: 5, Interesting

    A proof point from Abiword. A just ran the program over our abi-unstable directory. About 300,000 LOC estimated cost to produce about $10,000,000.

    I also ran the program over the abiword plugins directory. Estimated cost to produce, $1,200,000.

    Now I know from direct experience that building the main code base of the AbiWord Word Processor took about 100 times more effort than the plugins.

    Cheers

    Martin Sevior
    AbiWord Developer

  13. Linux's true cost: by Anonymous Coward · · Score: 4, Funny

    Priceless

  14. No, SLOC isn't Junk, and You Missed the Point by dbretton · · Score: 3, Insightful

    if analyzing SLOC says nothing about developer contributions, efficiency, or effectiveness - then isn't estimating value based off SLOC fundamentally flawed?

    1) SLOC says nearly *EVERYTHING* about developer contributions. After all, the SLOC is what the developer contributes.

    2) Efficiency is a measurable metric, and can be quite as simple as (SLOC/MM)-(NumBugs/MM), where MM=Man-Month.
    While there is a variance in the efficiency of programmers, for any given company a median efficiency can be determined. From this, a decent cost-estimate for SLOC may be determined.

    i just don't know why people don't count 'Customer Problems Solved Over Time' as the end-all, be-all.

    That collected metric would have almost no utility, unless you could atomize the concept of a 'customer problem'.

    "Well, it took us 6MM to craete that web-based
    accounting system, so it should take us about
    the same to develop these kernel drivers"

    Something like the above doesn't help anyone. It doesn't help the programmers who take part in recording the data; it doesn't help the managers plan and predict the product lifecycle; it doesn't help the customer in letting him know when to expect to see the next product release.

    What you failed to do was drill down further in your analysis of the problem.
    Let's say you just finished putting out product "X", which solved some customer problem. Now the customer wants product "Y" to solve some other problem. How do you estimate "Y" based upon "X"?
    Answer: Break it down. "X" required the following capabilities: A,B,C, and D. You recorded and tracked the amount of time it took to accomplish each capability.

    Now, you break down the customer problem, "Y", and determine what it would take to solve it.
    If you did a good job at atomizing the customer problem on project "X", then you should have been able to come up with an average amount of time/AtomicProblem. Apply this metric and Viola!, you should have a good idea about the scope of "Y".
    Many people like to take the AtomicProblem and equate it to a SLOC estimate.

    What SLOC counting does is try to establish a commonality among various projects so that future projects of various natures may be estimated using previous metrics. This is not perfect, but it should be used as an aid in determining overall project scope and costs.

    i mean, you can't have it both ways. Either SLOC shows how productive programmers are, or it doesn't.

    SLOC shouldn't be used to estimate programmer productivity. It should be used to estimate project productity.

    -D

  15. Lies, damned lies, and statistics by Control-Z · · Score: 3, Funny

    Obligatory Simpsons quote:

    "Oh, people can come up with statistics to prove anything, Kent. 14% of people know that."

  16. Re:Good Lord by EastCoastSurfer · · Score: 3, Insightful

    I would agree, but even the crappy slashdot search came up with the old story post while searching for SLOC. It only came back with 3 stories including this current one. The best part is that Taco also posted the original.

  17. visible man, with invisible shirt by Dr.+Awktagon · · Score: 4, Funny

    Note to Mr. Wheeler: when your shirt is the same color as the background of your web site, you might want to put a thin border around the picture with your favorite free image editing software.. though I'm wondering why exactly your picture is there at all..

  18. funny, but actually closer to $1,000,000 by Pinball+Wizard · · Score: 4, Funny

    Sloccount run on Slashcode 2.25 gives us this:

    Total Estimated Cost to Develop = $ 996,916

    I would have posted the entire output of the program, but unfortunately, their million-dollar lameness filter wouldn't let me!

    --

    No, Thursday's out. How about never - is never good for you?

  19. Re:His Paper Is Valuable by mysticgoat · · Score: 3, Insightful

    His paper is valuable, priceless even, in that it is throwing a spotlight on a part of the Open Source phenomenon that has not yet come into public discussion.

    While I don't know COCOMO, I accept that his numbers are highly suspect. But you have provided a range of accuracy that corrects for this. I am very confident that any reasonable assessment of the Linux development effort is going to be greater than $100 million and less than $10 billion.

    So it is indisputable that Linux is a resource whose development effort exceeds $100 million.

    And no reasonable person can question that this resource is now available at very low cost to anyone or any institution, on a global level.

    It is difficult to see how anyone could not recognize that the use of this resource increases global wealth. Linux does make the world pie bigger.

    I think that is the real story here. Linux is a tool, a lever, that has required at least $100 million of effort to develop, but which anyone can put to work for extremely low cost. I think this kind of phrasing needs to be brought to the attention of those who are being FUDded by groups that feel threatened by Open Source.

  20. Yeah, right. by aardvarkjoe · · Score: 5, Insightful
    According to this program, a little calculator program I've occasionally worked on in my spare time over the last couple years would have cost $ 85,659 to develop. (At the money that I was making as a co-op, roughly 3 years, full-time.) Another project, which my two roommates and I have been working on for most of the last year, again in our spare time, is reported to be $ 1,877,009.

    So either I'm doing enough work to be worth several hundred thousand dollars a year, or this thing is complete nonsense.

    --

    How can we continue to believe in a just universe and freedom to eat crackers if we have no ale?
  21. Responses from the author!! by dwheeler · · Score: 5, Informative
    Since I'm the author of this paper (More than a Gigabuck: Estimating GNU/Linux's Size), I suppose I should respond to some of the comments made here:
    1. How did I arrive at the estimate of $1 billion? The short answer is "see the paper". I wrote a tool to compute the number of physical source lines of code (SLOC), used Boehm's well-repected COCOMO model to determine the effort (in person-years) from the SLOC, and then converted that effort into an estimated development cost using programmer salary averages and wrap rates. See the paper for the details.
    2. It's true that there's no necessary relationship between cost and value. I don't see how that contradicts the paper; the paper never claims that there is one. Clearly, you can spend $1 million to develop a program that is worthless; it happens all too often. Proprietary vendors make money by making more money from sales than it cost to develop the software, so proprietary software vendors are very aware of the difference betwen value and cost. Look carefully at the phrasing. All the paper says is that "Had this Linux distribution been developed by conventional proprietary means, it would have cost over $1.08 billion (1,000 million) to develop in the U.S. (in year 2000 dollars)." The paper does not claim that Red Hat actually spent $1 billion, or that their distributions' sale value is related to this development cost figure. Indeed, what the paper shows is that by using OSS/FS approaches, it's possible to build large systems that would cost over $1 billion to develop using conventional proprietary means.
    3. Several have complained about the use of COCOMO for estimating effort from lines of code. COCOMO is certainly not perfect, but it's a well-tested, widely accepted, and widely used model. It's also very clearly documented, so there are no "hidden assumptions". In particular, the model and constants used in COCOMO are based on a wide variety of real projects. It's rediculous to believe that its results are accurate to the nearest hour; as noted throughout the paper, this is only an estimate. A few people have noted that their software took less time to develop, but there are many factors at work. One is that highly experienced people can develop code more quickly; however, not everyone is equally skilled, so with large systems and many developers this effect should even out. Another is that COCOMO includes design time, documentation time, and testing time. Also, this includes not only an average U.S. programmers' salary, but also the wrap rate for overhead (building costs, insurance, and so on) - which programmers don't see in their paychecks, but are certainly paid for by traditional businesses. Don't like COCOMO? That's fine - use your own model, preferably one that's been widely tested in the industry. This paper shows you exactly how to do this sort of analysis.
    4. I do not claim that every line of code is a "complete rebuild". I'm simply trying to estimate how much it would be take to build the system if it was rebuilt.
    5. The problems with physical SLOC's sensitivity to formatting is well-documented, and I note that in the paper. It's not as bad as you'd think when analyzing larger systems, due to averaging. But if you would rather use logical SLOC, feel free to write code to do that and contribute it to sloccount. In short, instead of complaining, contribute.
    6. As documented in the paper, I only used Basic COCOMO. I don't have enough information about each project to really use the more detailed COCOMO models effectively. However, the paper has all you need if you want to do more detailed analysis using other effort and cost estimation models, including the versions of COCOMO that require more input (e.g., Intermediate COCOMO).
    7. SLOC isn't a very good measure of productivity, but it's generally a very good way to estimate effort. This distinction is important. If programmer A can do something in 100 SLOC, and programmer B needs 10,000 SLOC to do the same thing, it's crazy to think that programmer B is more productive. But it is reasonable to believe that it will take more effort for programmer B to do the same thing (and thus more money). It's possible to game this (e.g., creating separate print commands for each letter to be output as a string), but the resulting code is pretty ugly and programmers generally only intentionally game things if they believe having higher SLOC values will improve their salaries (an unlikely claim for the software in the Red Hat Linux distribution). The paper only measures effort to develop Red Hat Linux 7.1. You'll have to determine if that's a comparable level of functionality to other systems.
    8. This doesn't count "the operating system". It counts "Red Hat Linux 7.1". Thus, it includes the word processors, spread sheets, and so on. It's not as easy to determine what to leave out; you could compute just the minimal "base", but few people would want to use such a system. Again, I think that's extremely clearly stated in the paper.
    9. Others have been inspired by my paper to do an analysis of the Debian GNU/Linux distribution, using my tool sloccount. You can see their very interesting paper Counting Potatoes: The size of Debian 2.2 at http://people.debian.org /~jgb/debian-counting. They found that Debian 2.2 includes more than 55 million physical SLOC, and would have cost nearly $1.9 billion USD using over 14,000 person-years to develop using traditional proprietary techniques.
    10. Yeah, I need a better picture. I just haven't gotten around to it.
    --
    - David A. Wheeler (see my Secure Programming HOWTO)
  22. What. A. Load. Of. Bollocks. by Joel+Rowbottom · · Score: 4, Interesting
    We've run some metrics here at work.

    We worked out that it took 8 MAN YEARS to write some code.

    That's all well and good, but it's been mostly me writing it on 37.5-hour weeks for the past 10 months.

    This is a big "duh" in my book.

    --
    Smegma.