Trends in an Open Source Project
Doug Muth writes "On Eric Raymond's website, he has just put a graph
depicting the growth of Fetchmail over the last few years. It's
rather interesting that the number of participants in the project has only
grown linearly - not what one would expect from an open source project.
Anyone have ideas as to why this less than expected growth might be?"
And this proves that Slashdot provides enough eyeballs.
This isn't really too suprising given the nature of the project. Fetchmail is a thing, its an entity in itself. The number of coders who would contribute it would be some fraction of the number of coders interested in such a project and who have the necessary expertise. This is actually the first I had heard of it for instance, its not something I was ever looking for.
For exponential growth I think a couple of things are needed. It needs to be a project that will explode in use by word of mouth and it needs to be the type of project that allows a wide variety of backgrounds to be involved.
Linux as a whole is more similar to this, of witch Fetchmail can be seen as a component. The kernel itself has a number of systems. If your expertise lies mostly in networking stacks there's a place for you. If the in and outs of tuning memory architecture for SMP is your forte' then there's a place of you. If you're not so good at coding but can write good succinct documents there's a place for you. If the kernel itself has no interest to you other than as an enabling technology there are other things you can choose to work on: X11 or other graphics interfaces, themes, productivity tools, games etc.
The interest in Linux can spread exponentially, at least for a while. When you've got 5% or 10% market share you can grow exponentially for a while. When you've got 98% market share then you can't grow exponentially unless you conquer new markets. The number of potential places a person can work and the variety of interests it matches means that the number of developers can grow as some fraction of the exponential curve.
Now that I've heard about Fetchmail I do have some interest in it, I probably won't become a developer. That's just not where my talents lie.
Far be it from anyone to think that the NCSA web server wasn't successful, but it has fell to demise. It's a project that failed to survive, even in the dawn of opensource times.
OTOH, the NCSA web server has died out mainly because it was superceeded by Apache, which on a certain level can be viewed as the obvious heir to the NCSA throne (being originally based on the NCSA source, and being mostly upwardly compatible with NCSA). Given that, did NCSA really fail, or not?
> Look at the coding curve. It's logarithmic, and approaching a constant of about 17,000. That means that the additional participants just aren't producing proportionate changes in the open source project.
Just a pedantic point: As a project matures, there will probably be an increasing amount of reworking existing code (cleanup, bug fixes) in proportion to the amount of new code generated. Thus the logarithmic LOC curve probably does not reflect the amount of work going into the project.
Perhaps a more interesting measure would be a plot of the rate of CVS checkins, perhaps weighted by the sizes of the individual checkins. (If anyone can check this for some project, it might provide an interesting/useful datapoint for the community.)
But the above does not prejudice your basic thesis; I myself participate as a spectator/part-time-alpha-tester on various development lists, and I'm sure many others do so as well.
Sheesh, evil *and* a jerk. -- Jade
Left shift 1 for e-mail...
Not really surprising. Fetchmail usage may increase non-linearly, but as the program and docs improve, fewer and fewer sees a need to add patches or ask questions.
I think the sheer number of open source projects might keep the number of developers on any given project low. Another factor might be that someone looking to participate in an open source project could conceivably be overwhelmed by that number, and end up selecting more glamorous projects. Not to say that fetchmail isn't a great program, but when its already stable and full-featured, what need does it have of geometric growth of developers?
The obvious problem is that it would be a bit hard to do (and even a bit harder to "prove correct"), but still... One thing one might consider is to look at all the Linux kernel or Mozilla releases and scan them for e-mail addresses and the like. Maybe only looking at the CREDITS files, maybe also scanning the source itself (e.g. to find driver co-authors that never made it into CREDITS and also those that are explicitly thanked by driver authors for contributing bug reports and fixes). Unfortunately, to make the results interesting, one would also be able to compare those numbers with those of the related "user" mailing lists and collecting that kind of data will be near impossible, I fear. In any case, one would have to be very careful in collecting and even more in interpreting, though.
It probably won't be done, but I would not at all be surprised to find developer curves very similar to the fetchmail one. The curves for the number of users may look very different, but my guess is that the developer one would not.
--
Linux user since early January 1992.
fetchmail doesnt seem like the most high profile os project. I haven't needed to use it yet, but i've read about it.
I'd like to see figures on the linux kernel, samba, gnome, kde and other projects before I pass judgement on the scalability of os.
They're spectators. IOW, they're subscribed because they want news about Fetchmail, or because they're enamoured with ESR. They aren't coding.
Look at the coding curve. It's logarithmic, and approaching a constant of about 17,000. That means that the additional participants just aren't producing proportionate changes in the open source project.
ESR has graphed how the popularity of his fetchmail project has grown over time, which could very reasonably be linear for such a specialized application. He has not graphed how an open source effort grows. A more suitable graph would indicate number of contributors rather than constituents of the mailing lists.
-konstant
-konstant
Yes! We are all individuals! I'm not!
I see no reason why it should be linear nor greater than linear, but plenty of reasons why it should be less than linear or possibly logarithmic.
After all, as projects near a state of completion, you'd expect fewer and fewer bug fixes and enhancements from developers both old and new. If there is any pressure for change resulting from an increase in a project's overall audience, it's certainly not very much, just a secondary effect.
What makes you think the opposite?
"The question of whether machines can think is no more interesting than [] whether submarines can swim" - Dijkstra
Is it reaching completion?
Features are not being added to IMAP and POP3
at anything resembling an alarming rate, and
judging from the discussion on fetchmail-friends
for the past year or so, most people are happy
with it now, once they get it to work with their
mailer. I do not fall into this category; I do
bizarre things with fetchmail. And, ok, I can
think of a few features that might be useful, but
they are all better handled in them MTA.
As a project manager, Eric does quite well, no doubt. He had the presence of mind to gather this
data. And the wherewithal to turn out the graph.
What I'd like to know is whether gnuplot turned out this graph as is, or if the gimp was involved.
Getting a scatter chart is easy, but getting it to look like exactly what you want, for an ad-hoc chart, is not.
I also wonder what would be involved in adding color support for gnuplot to PNG.
-fb Everything not expressly forbidden is now mandatory.
Here's another set of graphs tracking a free software project -- I've been tracking the number of packages in Debian, as well as how many packages are built with different tools, especially my debhelper tool that most debian packages build with these days.
I'm also seeing linear growth, though the angle is steeper.
see shy jo
Normally growth patterns in software usage tend to be explosive when it's a new phenomenon with a hungry need - the IBM PC, linux, netscape downloads, yahoo, amazon.com, etc. After the "revolutionary" period has worn off, the rate of growth stabilizes or becomes linear. At some point, it is destined to tend towards flatness, since the number of users is finite.
My guess is since fetch mail was catering to a well established community, it was by definition a smaller subset of a mature growing community, and its rate of growth was likely to be smaller than the parent. Also, it was not revolutionary or filled a major hunger, so it was not likely to grow at a > linear rate.
L.
So the "exponential growth" of the project levelled off early.
Apache and the Linux kernel have much larger scopes, but Apache is pretty simple (most of the explosions in development are in third-party modules and CGI scripts) and Linux' growth in complexity is levelling off. Look at the size of the kernel tarballs over time - the complexity of the project is no longer growing exponentially. So I would expect developer interest in these projects to level off, if they haven't already.
open source projects ask for talented programmers, not people who just understand the basics of cout and cin but people who really have skill at programming. While programmers are a dime a dozen, good programmers are not and then to lower that number ask for talented programmers with the free time to work on an open source project. Some do but many don't and this ratio doesnt magically change, this would be why I think the numbers on projects don't change logarithmically.
I'm a loner Dottie, a Rebel.
Some project have larger "habitats" - the excitement of Linux is that it's been able to jump from a niche of hobbyists to business applications and even some lower-level users. It has expanded its environment, thus has room for more rounds of exponential growth. The Internet itself saw this phenomenon when it jumped from only technical/professional users to ordinary people.
A specialized mail application does not have this potential (unless it somehow manages to become indispensable, a "killer app").
Thus exponential growth ceases fairly quickly for it.
- Seth Finkelstein
Resources: The number of people participating in such projects is not increasing exponential. I would assume quite a steady percentage of the overall human population is geeky enough to be a candidate for such a project.
While the human population on Earth is showing geometric growth, the population of the high tech countries isnt't. So, it seems safe to assume linear growth of the overall population of possible participants.
And, the percentage of internet users who can actually program must be decreasing during the last 4 years.
Competition: There are more and more projects competing for resources. The number of such project has probably shown geometric growth during the last 2 years.
If you take these two factors together, an average project would grow logarithmic or even not at all.
For Eric this would mean that he is doing very well with fetchmail, as he is able to keep his percentage of the "market".
As a side comment: Hats off for collecting all this data and making use of it. I've seen quite some data graveyards in project management. This one is interesting. Thanks, Eric!
Metrics are a fascintating part of software engineering, though not to be taken too seriously. The problem with them is that if you start taking them seriously, they become the goal of the team. So if you measure lines of code produced, you end up after a while with locquacious coders stuffing code into source control. If you measure defects corrected, you end up with lots of shoddy code being checked in and subsequently fixed...
And if you measure subscribers to devel & support mailing lists, you get lots of new subscribers. Eric, please post an update to your graph tommorrow so we can see the slashdot effect! :)
Left shift 1 for e-mail...
So, you might want to look at a few different trees, with root nodes like the formation of the GNU project, Linux, the Berkeley System Distribution, and the NCSA Web Server Project. I suspect you can make a case for exponential expansion that way.
Thanks
Bruce Perens
Bruce Perens.
There's an easy answer, Fetchmail is a good package with very little work to do. A programmer with some time to work on a new project would be hard pressed to find something worthwhile to do with Fetchmail: it supports almost anything; for most transfers, the server is the performance bottleneck, not Fetchmail; it's stable and reasonably bug-free; it's ported to most relevant platforms. Without having something interesting to do, the programmer is going to look for a different project to work on.
----
----
Open mind, insert foot.
Years divisible by four but not by 100 are leap years. Years divisible by 400 are also leap years. Thus, 2000 is a leap year, 1900 and 2100 are not.
Y2K *is* a leap year, 2000 is divisible by 400.l _ year.html
Our concern is that many people still get this wrong, as you have just demonstrated.
See also
http://www.interlog.com/~r937/lycomplaint.html
http://www.mitre.org/research/y2k/docs/LEAP.htm
http://www.urbanlegends.com/science/2000_a_leap
rant
At the bottom of ESRs page is a link to this image which displays a graph of the linux kernel with files/1000, lines/10,000, words/100,000 and source tree size (MB).
My personal theory on this problem is that people tend to work on things that don't work for them. For me, I've never had a need to work on fetchmail, as it's always dealt with my environment and desired setup flawlessly.
I'd love to come up with a more detailed analysis of this stuff, as you mentioned though. Perhaps somebody taking some sort of sociology class could find an excuse to write this paper?
But if you think about it, all 'living things' die. But NCSA httpd's 'offspring', Apache, lives on. Seems like a pretty normal sort of life cycle to me I guess.
The difference between archie->download.com and NCSA httpd->Apache is that the Apache project really did start out as a descendant of NCSA httpd. download.com and archie are totally unrelated code bases that kinda fill the same niche. Although the Apache developers have rewritten just about all of Apache by now, originally it was A-patch-y version of NCSA httpd. As NCSA httpd died, Apache grew to fill in the gap just as organic offspring live on to further their species as their parents die.
Wouldn't you expect the number of competent programmers will grow as the open source movment grows? Consider those points:
1. reading good source is necessary to learn programming - now more good source is avaliable.
2. the concept of playing with code is now open to people who might just not think of it before. (Scientists with programing skills can develop programs that were purchased before)
3. programming toold are cheaper and better than ever. now its getting easier to code.
Ballerinas have fins that you'll never find
I think the assumption that the number of participants - I mean programmers + mailing list subscribers - grows exponentially is unfounded.
Why should it be?
There are two fundamental limiting factors
1. The growth of the internet may be exponential,
but it's not clear that this means exponential growth of the number of people who are _interested_ in participating (even with subscribing to a mailing list) grows exponential.
One have to remember that the internet doesn't create "technical" people but in reality eats its way into "normal" population. Naturally the percentage of "technicians" who are newcomers to the internet will in fact drop very fast.
2. I guess the number of open source projects grows exponentially itself.
The answer is very simple - Fetcmail is SO evoloved and so well covering it's ground that there is very little one might add to this project before it over bloats (something I don't think ESR is likely to allow to happen). Therefor, the project is simply not very interesting in itslef.
The linear growth we DO see is due probably to things people have been doing WITH Fetchmail rather then "IN" Fetchmail.
Gilad.
- "fetchmail-announce" grows because it is a low-volume list with announcements of interest to all users, not only contributors. It must be a low-volume newsgroup so people do not feel much need to unsuscribe.
- "fetchmail-friends" is self-limiting. Too much discussion tends to drive away those who are not interested and participating. The constant number of members suggests it is an active enough group that people are unsubscribing when they are not interested in the discussion. If there were no discussion, fewer people would bother to unsubscribe.
- The lines of code is changing slowly because it is a special-purpose tool. It is undergoing adjustments and improvements, but its basic function is unchanged. It just works, and people use it.
This is not something which needs dozens of modules and reports to meet different needs. You see geometric growth in a growth medium, not inside a steel girder. What you see in a steel girder is its structural support and the occasional attachment of a needed improvement.>>Most Open Source users seem to think programming is just something you pick up on the weekend and then you whip out
>>9 or 10 professional-grade apps whenever you want to
See, now if we had Visual Basic for Linux this wouldn't be a problem....
Measuring complexity in terms of number of lines does not seem a good idea to me. Adding drivers make the number of lines increase significantly, but not necessarily its complexity, when different drivers are just variations of the same code to suit different hardware specs.
On the other side leaving ext2fs for a journaling file system might not cause a spectacular increase in size of the code but it will increase the complexity dramatically.
---
Dev elpizw tipota, dev phoboumai tipota eimai lephteros http://euclidian.org
It would be even more interesting to have a comparison with figures plotting the usage of fetchmail. This is of course impossible, but would probably give figures that would make people upset about the linear growth in the graph feel better.
bakes
--
Ho! Haha! Guard! Turn! Parry! Dodge! Spin! Ha! Thrust!
In his script, Eric has:
# We don't deal with leap years here because the baseline day is after
# the last leap year (1996) and there's a long time before the next
# one (2004).
Oh dear, oh dear, oh dear...