Slashdot Mirror


Statistics On Free Software projects

GenericBoy writes: "The first edition of The Orbiten Free Software Survey is out online. Some of the stats are number of authors and projects, the top 10 contributing authors, how many MB are in all of the free software projects put together (!) and a bunch more. " Now, as they themselves point out in the their Scope and Method, the methodology is crude, and I don't think Orbiten could quite submit it to Nature yet or anything, but it's an interesting bunch of stats.

8 of 93 comments (clear)

  1. Here's how to establish credibility by Otter · · Score: 3
    Might I propose that from now on, Slashdot posters saying:

    • Oh, yeah? You have the source. Write it yourself, you moron!
    • QT/GTK is for idiots.
    • Apple is so stupid. If they open-sourced everything we'd fix it for them.
    • M$ code is terrible.
    • Why isn't Company X open-sourcing their product? Proprietary software is evil!
    • Free software project X sucks.
    or such things, be expected to link to this site showing exactly how much they've contributed.

    Although, given that the study has managed to overlook my insignificant but non-zero contributions, maybe I shouldn't propose that.
  2. The figures need a lot of work by Rich · · Score: 4
    I checked out the stats for some apps I've written and I found they are way out. For example the analysis of kgui gives me 52.789% of the code despite the fact I am the sole author!

    In general the handling of large packages such as KDE seem fairly poor. For example KDE apparantly has no authors according to the by-project listing. I think this is a great idea, but it needs a cleaner source of data, for example Coolo has been able to give some very interesting and detailed figures by running scripts on the KDE CVS repository. Perhaps this is the sort of thing they need to be using as the initial data set from which they make their analysis.

    Rich.

  3. Discussion on Advogato by Carl · · Score: 5
    This was already discussed on Advogato yesterday.

    The discussion points out some interesting facts about why some individuals are listed as big contributers (such as the author of libtool. Duh.) and why some aren't listed at all. They even have some comments from the developers of the survey.

    And I just love the comment of Havoc Pennington:

    It shows me as a major contributor to "gnuclear" and nothing else - I don't even know what gnuclear is. ;-)
  4. Active vs. Passive OSS Participation by SwissPope · · Score: 3

    I looked at the algorithm used to determine how they collected the names of contributors. They grepped e-mail addresses, rcs ids, and copyright info from various files. I don't think that's the best way to draw any useful conclusions in regards to Open Source software. The only real conclusion found here is that Open Source projects include a lot of code written by other people. That's trivial. This study fails to make a distinction between an active contributor and someone whose code was simply borrowed. This is an important distinction to make! For instance, what if I were to take 1000 physics homework assignments and search for "F=ma" in them. I can't assume that the appearance of "F=ma" on your paper means that Newton helped you with your homework. I can only assume that you used Newton's second law of motion to help you solve the problem.

    Similarly, if you wanted to determine who the most prolific scientific researcher is in a field, would you gather data by simply grepping for names in the texts of papers? No, you'll skew the data by counting the names who appear in the paper's "References" when you should just be counting the actual investigators who are listed as the authors of the paper!

    I would like to see this study repeated but making the distinction between an active contributor to a project and someone whose code was simply included. Only then would a top-heavy distribution suggest anything meaningful in regards to OSS authorship.

    If anyone has looked at the CODD algorithms/code and can show me if they used a more sophisicated method to filter out authors with no active involvement in a project, please post. It's a difficult problem to infer who actively and who passively contributed to a project with just a perl script.

  5. Well... kinda... by El+Volio · · Score: 3
    Yeah, the FSF came out way on top, with Sun and the UCB regents not far behind. OK, but is it really fair to compare them to individuals like Gordon Matzigkeit, et al? I'm not familiar with any of the individuals, but it would seem to me that each of them deserves far more credit.

    OTOH, it's nice to see some sort of a start at studying the free software community...

    --

    "You can never have too many elephants on your team."

  6. They didn't look in the best place by divec · · Score: 4

    They list their sources as follows:


    • RedHat Linux v6.1 source rpms
    • Linux kernel sources version 2.2.14
    • Munitions cryptography/security archive
    • An un-random half of Freshmeat

    Debian would have been a more sensible distro to use, because it is overflowing with (packages|crap). Red Hat (presumably) just ship the ones which it makes commercial sense to ship, wheras Debian has everything that anyone's bothered to include whether it's useful or not. For example, Cooledit (my favourite text editor) is missing from the survey. The only problem with Debian would be stuff missing because it is not DFSG-free. Such stuff is available in the non-free/ directory but it's probably not as comprehensive as the main/ directory is.


    Having said that, it's very interesting to see what they have got. I didn't know Andrew Tridgell did all that stuff, for example. This could be a good tool for the community to get to know people better.

    --

    perl -e 'fork||print for split//,"hahahaha"'

  7. Key contributors by konstant · · Score: 5

    What I find most interesting by far is the composition of the contributions when viewed by project. In nearly every project I viewed, there are two or three elite "key contributors" who provide somthing on the order of 1/3 to 7/10 or more of the code, with the remainder provided in a slew of sub-1% coders.

    This relates an interesting story. It appears that, while the real strength of OSS is incremental improvement over time, few projects can exist without a guiding intellect or a handful of ambitious coders on the core team.

    Presenting this data to employers who are concerned about losing control of their code may help assuage their fears of open source. Clearly projects that are "owned" by no one are rarities. A corporation *can* have its cake and eat it too.

    -konstant
    Yes! We are all individuals! I'm not!

    --
    -konstant
    Yes! We are all individuals! I'm not!
  8. Lines of code by El · · Score: 3

    12706 developers working several years on 3149 projects, and they've still produced fewer lines of code than a single release of Win2K... is this because Open Source is more efficient, less feature-rich, or because it doesn't carry the burden of backwards compatibility with DOS 1.0?

    --

    "Freedom means freedom for everybody" -- Dick Cheney