Open Source Developed by Individuals, Not Large Groups
AlainRoy writes "A new article was just published in First Monday, which suggests that most open source projects have rather few developers." He excerpts from the study, done by Sandeep Krishnamurthy: "Based on a study of the top 100 mature products on Sourceforge...most OSS programs are developed by individuals, rather than communities. The median number of developers in the 100 projects I looked at was 4 and the mode was 1."
I can see where a large OSS project could get unwieldy really quickly with 100's of hobby developers scattered across the globe. As the number of "free" developers involved goes up, I'm sure the number of problems skyrockets. If you hand a large project to 4 dedicated people it will probably get done faster than if you farm it to 100. It seems fairly obvious to me that as the number of people working on a project grows, the number of people flaking out/not delivering on the project increases as well.
They who would give up an essential liberty for temporary security, deserve neither liberty nor security
I tend to agree with this point somewhat. The benevolent dictatorship model has proven to be by far the most efficient model for open source programming (Linux kernel). Ideally the world would work in a similar way: one ultimate being dictates what should happen (and its good) and people do it (and the result is good).
--Kevin
One of the biggest arguments for Open Source Software has been the "More Eyeballs" argument. Granted, if I use OSS I can view/edit the source myself, however, my company doesn't have the time nor the human resources to wade through the source code of even the smallest app. The other side is that with apps like Linux there can easily be multiple companies "distributing" their own versions, however, in the long term we haven't seen if this is a viable business model, especially outside of Linux.
There is no longer anything that can be done with computers that is nontrivial and clearly legal. -- Paul Phillips
...that the OSS always seems to turn out better than commercial software?
I am a software developer for a video driver developer in a team of about 7. I'm disgruntled as hell--the process is overwhelming, the design is far too rigid and thus things take about ten times as long as they would be a smaller team. Yet, when I go home and work on my personal projects, I'm as happy as can be. No code standards, no review processes, no cumbersome integrating of six other peoples' changes into my code, nothing.
If the success of small-team OSS projects is any indication, why do software managers think that throwing more people on a project will increase efficiency? It won't!
Karma: Excellent Birds (mostly as a result of listening to Laurie Anderson)
This is absolutely not startling at all. Most projects are started with an idea. Ideas are generated by individuals. Therefore, most projects are the work of individuals. But the same can be said for "normal" closed-source businesses as well. They are started by individuals.
The difference however, is that most open source projects are done in "spare" time while most closed source projects are done in "work" time. Work time is usually well-funded and allows for the creation and gathering of additional resources, while spare time work is in the same nature of hobbies and is self-funded, which makes it hard to justify adding additional resources. More resources means more to manage and the cost analysis tends to steer people away from wanting to do that. Open source projects are usually started as a way for someone to have fun with a topic they are interested in or to learn new skills. They are not started with the intention of creating more management headaches.
If you really think about it, it's not that surprising that most open source projects are run by individuals.
Open source is not really about communities coming together to contribute to a project. Open source is really about communities learning and growing from the shared knowledge of the individuals in that community.
+1 Insightful, -1 Troll. What can I say, I'm an Insightful Troll.
Is there anyone here that belives that sourceforge IS the OSS community? I just don't see how anyone can claim to have an accurate study when using only one source! The most widespread and well know OSS projects are not hosted on sourceforge at all! Don't get me wrong, I love SF personally, but since anyone can post somthing on there and call it a project, it's not a good representation of the whole community. I can't tell you how many projects I have seen that sound cool, so I go to them and they say somthing like "This is a dynamic webpage for C programmers, we are currently looking for a Dynamic web designer and a C programmer".
Sigs are out of style, so I'm not going to use one...oh wait..
This study apparently takes as its presumption that the developers listed in Sourceforge projects are the developers who have actually contributed. Most projects take code from a much wider base than those listed on SF, taking bug fixes from emails.
So a SF project with two developers listed might only have two dedicated people working on it all the time, but might include tidbits of code from 30 different people.
The study's basic conclusion is probably sound: OSS is usually developed by very small core groups. But they need better data before they try to quantify that.
So much for the wonderful idea of many eyes helping me out. Maybe with large projects (although I doubt it, tending to think that you only get a small number of people who actually contribute anything) but for the projects like mine it just doesn't happen.
I've got a crappy FAQ script which I'm rewriting and will also go under the GPL. At the moment it's under some wierdass restrictive job because at the time (1999) I didn't properly understand the GPL and how cool an idea it is - the rewrite will be GPL'ed but i don't expect to be inundated with patches, suggestions and code.
Recently I mooted my SMS application eGenie as going GPL in the next version. I got 1 person interested and 50 people emailing me demanding I give them the code - no, they weren't interested in helping out - they just wanted the code. Bah, sod that then.
I'm fully aware of the Cathedral and Bazar idealogy, but when there is no-one in the Cathedral and you're the only person giving in the Bazar, GPL suddenly doesn't seem to be this wonderful solution to bugs, features and support.
Avantslash - View Slashdot cleanly on your mobile phone.
This "STUDY" does not take into account all of the various audits that have taken place in the distributions that use them ( RH, OBSD, ... ) nor does it take into account all of the smaller patches that have been sent in by dedicated users. What about those projects that are "obvious" but never make it into "mature" according to source forge. I would be much more interested in the study if it included "stable/production" projects as well.
I would say this study does more to sterotype OSS developers as 1-man shows rather than a developer with LOTS of feedback from their users.
If you look at the last table in the paper, it's clear that alpha projects have the most developers.
In fact, mature projects should have fewer developers since there's much less left to do. In many cases, it's likely that the one remaining developer can accept all the patches and fix all reported bugs themselves.
Furthermore, the most popular Open Source projects aren't even hosted on Sourceforge. Where's the data on Mozilla, OpenOffice.org, Apache, Bind, the Linux kernel?
Isn't sourceforge designed as an incubator for smaller projects?
Finally, suppose it's true that there are only a few developers on a list? That doesn't mean they're the only bug-finders, even. Where's the statistics on numbers of bugs found?
In short, is this just another example of sample-bias?
when they say that the open source community is not a community. Not only are many open source projects not that large that they need a whole bunch of developers, but even then most of them depend on feedback from the users who report bugs and send in small fixes and even suggestions for further development. Those people usually are not listed as developers, but are sometimes mentioned and thanked in the READMEs. They are nevertheless as much part of the development process as the ones that develop the program. And may even become developers themselves after some time. They form the community.
Furthermore, I am not sure if sourceforge is an accurate representation of the OSS community.
***Quis custodiet ipsos custodes***
The paper is only as good as the data from which it is derived.
He used data from SourceForge, and (apparently, I couldn't read through the whole thing) did not contact the individual projects for specifics.
Who is to say that just because a project is on sourceforge, that this is the only means being used to manage the development of the project.
A project may only have one developer listed in sourceforge, but who's to say that the sole developer listed is only the maintainer, and the maintainer collects bugfixes, feature upgrades, etc. by some other means (for example a personal CVS server), then thakes the code fixes and publishes them to SourceForge for distribution?
Add that to the countless small bigfixes that are sent into developers via email from random people, and it's easy to through this papers theories out the window.
--
I am probably a developer on a dozen projects that use my open source Java libraries. Open source is just different than normal development.
One thing that takes a lot of man hours to do is testing. If the software crashes, a OSS user is many times able to do a backtrace on the core file and give vital information to the developer. This is much harder to do in a commercial product because you usually have to remove any debugging information. Even more mature OSS projects usually go through a release candidate phase. The "release early, release often..." quotes heard with OSS implies we are using you as software testers. Also a user is more likely to submit bug reports if they know they will be heard and the quickly fixed. Another reason bugs are fixed quickly is because of the pride and ego associated with the software.
What did we expect, that the Mongolian Hordes technique actually works, or that Brooke's Law is purely theoretical? Lean really is mean, especially for the RAD category that a lot of open source falls into.
Anyway, these figures are spurious: I occasionally submit bug reports, fixes and enhancements to dev team on sourceforge projects, but I don't join the teams, because I can't commit the effort. But I did review the code, there's just no metrics that capture it. In fact, everyone who downloads, compiles and runs the source is testing the code to some extent.
In any case, I think you'll find tacit agreement that on most software projects (especially once the sales guys panic and start telling you what the customers actually want, halfway through development) that creationism is indeed a false ideal, and that it's a few dedicated (obsesses/fanatical/insomniac) individuals that do the vast bulk of the actual code development, while unseen teams break the ground in terms of hardware, requirement capturing and high level design, and clean up squads follow on to fix and maintain the stable versions. There's a lot of scope to be an unsung hero in development; I recently caught a bunch of minor memory leaks in a piece of software that had already been written, reviewed, fixed, and reviewed twice more (i.e. we're still catching bugs on the sixth iteration). And yet, because it's a single file controlled by one developer, it looks like only one person really ever worked on it.
Frankly, I think that for every developer who leaves his fingerprints on the code, there's room for at least three unseen backup guys and gals who do nothing but pave the way, clean up afterwards, and interdict management before they can distract the one productive guy. You just can't let management know that's actually the way it works, because it looks - on paper - like an inefficient process. That's quite apart from the testers, the technical writers, the people who do small parts of any GUI, the IT guys who keep the machines and servers running, the sales, marketing and customer support people who tell you varying shades of truth, and even the receptionist fielding calls for you. Even commercial enterprises don't tend to count these people; my current team has nine developers, but there at least another two dozen non-technical people who we absolutely rely on who aren't counted as part of our team. Open source seems to be similar, only with fewer people, and with even some technical people (like testers and casual bug fixers) not being captured in the team size statistics.
If you were blocking sigs, you wouldn't have to read this.
The poster seems to suggest that a few individuals isn't a community. The open source community is a bunch of individuals. This is really not news. How programmers are working on Samba or Apache, not many. You don't need shitloads of programmers on it a project or want a shitload of programmers on project.
I am not going reiterate stuff that has already been written down in books, but more programmers doesn't equal more program - usually the other way around.
I work on a team of 3 and 2 or us are programmers and on is a ui designer/manager. It works well and we crank out some pretty fantastic stuff.
While a single maintainer or developer wouldn't suprise me, I think that just by virtue of it being open many people submit bug reports more readily and or bugfixes where neccesary. This is where the "many eyes" come in.
--------------------------------- Born Again Bourne Again Believer: New Life, GNU/Linux Be Free!
By examining only those projects hosted by sourceforge, the author is biased against the most mature Open Source projects of all (none of which can be found there). Some of these being: the Linux kernel, Apache, bind, sendmail, Perl, Python, Mozilla, etc...
All of these are developed by a community.
So there...
Not if there's a reasonably objective basis for choosing the subjects.
Lets see, I'm going to carefully select a test group of people with cars and see what make they drive.
What is your selection criteria ? Of course if you don't disclose your selection criteria, the statistics are meaningless. However, this is not what I was suggesting. My suggestion was to select open source projects on the basis of their significance. There are several different criteria one could use to measure significance, but being listed on sourceforge certainly isn't one of them.
First, why limit the study to the top 100 of the mature projects, since all the data is available, why not include as many projects as possible unless you've found a specific subset of the data that gives you the conclusion you want?
...
Second, does the study take into account that projects may move from one principle developer to another to another over their lifetime? There may be only one or two at a given time, but there may have been a dozen since its first inception. Perhaps the study took that into account...
Third, "...assistant professor of E-Commerce/Marketing..." I suppose this is the "new education" to go with the "new economy"
I really appreciate those systems that use libraries seperate from the interface(s) and modularizes its features into groups. I like it even better when different projects work hard to interface with these other projects to create a mesh instead of a mess.
This is in no way saying that hacking is bad, but it is just hacking. We all do it from time to time, but outside of the romantic notion that has been built up from outside and within (often merely accepting the outside perception), the fact remains that in serious development you need to be pragmatic. I don't believe that reducing the number of apps and systems for the simple sake of reducing them is good, but rather to combine along the lines of overlapping functionality while giving adopters the ability to tailor and customize. That by itself is something that open source has always had that propietary closed source has never understood (until recently).
Cheers
I'm fully aware of the Cathedral and Bazar idealogy, but when there is no-one in the Cathedral and you're the only person giving in the Bazar, GPL suddenly doesn't seem to be this wonderful solution to bugs, features and support.
... you're just the only person who happens to be managing your stall in the bizar at the moment. :-)
It depends on the amount of interest the project generates, how well you get the word out, etc. It sounds like people do report bugs and while, ideally, they would submit patches with the reports, the are reporting bugs you might never have known about and hence never been able to fix, so the quality of your software has benefited directly as a result (even if you have done all the work actually fixing it yourself).
The kind folks at Blender misunderstood the entire software freedom paradigm, as well as the dynamics of free software and open source projects. People have to be interested and excited to spend their valuable (and ever shrinking) free time contributing. They GPLed a skeletal distributed rendering arbiter daemon, then sat back and waited for the community to finish writing the project. When that didn't happen (after all, it was usable in its current form to those of us who knew how to use it, so we used it, reported bugs, and submitted the occasional patch), they concluded that they would have had not gain had they GPLed Blender itself.
Perhaps not, but there are other GPLed 3d modelling and rendering projects that suggest otherwise. It is much more exciting for a programmer/animator/film hobbiest to work on sexy new special effects modules and features for a project than a back-end, distributed rendering daemon that most people don't have the equipment (read: more than one computer) to use anyway.
So they got bad data, made what IMHO was a very bad strategic decision as a result (to not GPL Blender and concentrate on business approaches to leverage that) and now Blender the product and NaN the company are dead, and the community of enthusiasts that grew around it is dying alongs side it. A loss to the animation community, to the Linux community, and quite probably a loss to NaN (the makers of Blender) as well.
Another way they didn't understand software freedom was their insistence that "no one will ever have to pay to use blender" (a very kind goal, but NOT what free software is about). There were any number of approaches they could have used in giving out software gratis (charge for documentation, charge for the current version and keep the gratis/libre version a few months behind the pay-to-use version, etc.), that would have been obvious had they understood the philosophy, mindset, and implications of free software and the (software) freedom it represents.
Hmm, I didn't really mean for this to become a requiem for Blender, but in any event I don't blame you at all for being put off by 50 greedy (and likely ungrateful) wretches demanding your code, but remember that this is about freedom, yours as much as those 50 ungrateful wretches (some of whome are, quite likely, deliberate trolls or perhaps even MS astroturf-style agent provoceteurs who want you to become disillusioned. The latter would sound utterly paranoid to me, had I not seen it firsthand in action in another context [unrelated to Microsoft]). The bottom line is do you benefit, and are the benefits worth it to you for you to free your code? If the bug reports and community that grows around your tool is beneficial to you, and the quality of your code, then perhaps it is. If not, then obviously it isn't (though perhaps being able to hand-off your code to someone else when you grow weary, or bored, with the project so that it can continue to develop and grow may make it worth your while anyway).
In any event, its your code to do with as you please, and while effusive gratitude for your doing what many thousands of others think nothing of doing (freeing your code) may not be a realistic thing to expect, I sympathise greatly with your disgust at the ungratefulness many users of free software seem all to eager to display.
BTW - If you're running GNU/Linux, X, KDE/gnome, etc. you aren't the only person going to the bizar and helping out
The Future of Human Evolution: Autonomy
Furthermore, it would seem to presume (I haven't read it, I'm basing this on the headline) that open source projects on SourceForge are a representative sample of all open source projects. Who's to say that individual developers are more likely to use SourceForge and large groups are more likely to have their own servers (e.g., Mozilla.org)? This would explain the gathered data equally well, but it is a completely different conclusion, and I think the data is not complete enough to draw either conclusion. To paraphrase Homer Simpson, "you can prove anything with statistics - 85% of all people know that."
-----
Free P2P Backup, Windows & Linux
I also consider bug reports and feature requests to be a very important contribution. I'm always shocked to hear OSS devs say that this kind of feedback is essentially useless.
While these things are not useless by any stretch of the imagination, it certainly doesn't require an open source program to get them. They are useful to all programmers and are turned in by users on all programs. OSS has always been talked about with the idea that people will not only report bugs, but will also sumbit bug fixes. It's not the fact they they are only getting bug reports that is the problem, it's the fact that they aren't getting any of the benefits that are supposed to come with open source.
"Information wants to be expensive" - Stewart Brand, the same guy who said "Information wants to be free"
Hmm, I'm wondering whether "Mature" software is the right type to be examining. Doesn't mature generally mean "finished development", and that there is little that needs to be improved or expanded for the project?
That would tend to decrease the likelyhood of a large developer base. What I'd like to know is if consistent results are seen when the Stable/Production category is used instead of the Mature category.
Fanatically anti-fanatical
Typically, the number of persons who are given authority to enter contributions for projects of any size is much smaller than the number of actual contributors. In "large scale" projects, it is common for a small community of "editors" to be "responsible" for a body of code, by taking the contributions of many individuals, vetting them and ultimately adding these contributions to the code tree.
Thus, for some projects, merely counting the number of committers underestimates the actual number of contributers certainly by one, and perhaps two or more, orders of magnitude.
In addition to the "many submit patches, few apply them to the archive" argument, there's another systematic undercount:
A large project will often have a few people whose job is to integrate the code. Sometimes this means only these integration specialists will have write privileges on the archive. The bigger the project, the more likely this is to occur.
==========
That said, what's "small" about a median of four with write privileges? (The mode of one just means that there are more one-man projects hosted at source forge than N>1 man projects for any particular N.) Four active programmers is a moderately big project, and "median of four" (with mode of non-four) implies there's a bunch with more-than-four as well.
Four is a very good size for a large project. Going above that takes a lot of work and is inefficient on a per-programmer basis, unless you have administrative workers who don't program to do the organization. The "human as four-port" analysis shows why.
Consider a human as a "black box" with a number of "ports", representing equal divisions of his time and/or attention. Each port represents enough time per day to communicate with one co-worker or to do one unit of work. Assume also that this amount is such that the number of "ports" on a programmer is about four. (It's probably a bit larger, but four is close and easy to draw.)
In a given amount of time, with a single-committee project:
A one-man project does four units of work:
A two-man project does six units of work:
A three-man project also does six units of work:
A four-man project does FOUR units of work:
And a five-man project bogs down in talk and does no work at all:
With a different number of "ports" the maximum-work group size changes, but the shape of the curve is the same: Adding people first raises the amount of work done, then levels out, then DROPs it until the group paralyzes. For N ports the stall is at N+1 workers. N work about as well as 1 and (N+1)/2 get the maximum work out.(Ever wonder why you spend half your time in meetings? THAT's why! If the goal is to get the project done in the shortest time regardless of personnel cost, the most effective size for a group of peers has each worker spending about half his time interacting with other workers.)
Now there are a number of ways around that. For instance:
1 Keep the team small (or one-man) and push out the delivery date.
2 Reduce the "bandwidth" of the communication ports and expand that of the work ports by assigning work on natural modularity boundaries.
3 Build a hierarchical organization, with some people specializing in communication (and doing little or no "work" on the code) and others mostly doing "work" but only interacting with their comm specialist (administrator) and maybe coworkers on closely-associated modules.
4 Build a hierarchical organization with one or a core group making final inclusion decisions and the bulk of the organization doing actual coding in small snippets.
Taking 1 to the limit results in a bunch of one-man projects with long delivery schedules. One man is the most productive on a code-per-manhour basis. But if the project is too large he slows down asymptopically as he approaches his "boggle limit" - the largest codebase he can maintain single-handed but no logner expand. That was once estimated at about 10K-20K lines of code (about the size of the System 6 Unix kernel, through NO coincidence).
Taking 2 to the extreme is what you get if you conider the open-source movement as a whole as a single project: The developers on each project need little communication with the developers of the others, beyond standards promulgated by a few core developers. Within a project it's the natural way to go, but there are limits to how much it can help.
3 represents your typical industrial software operation. But open-source developers usually hate to become pointy-haired bosses and stop codiing themselves, and without paychecks to hand out they have a lot more trouble herding the cats. So a few big open-source projects are be run by one or a few notable developers with strong personalities who are able to bite that bullet on managing-over-coding and use reputation points in place of paychecks to motivate their workforces. And the rest are one-mans or small teams of friends, self-organizing in grand primate style along the lines of 4.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way