Ohloh Tracks Open Source Developers
eldavojohn writes "The startup company Ohloh has a database listing 70,000 developers working on 11,000 open source projects. Their aim is to 'rank' open source developers, which raises some interesting questions about exactly how useful this tracking company is. Questions like, 'Is there an accurate way beyond word of mouth to measure the importance and skill of a developer?' I found it slightly alarming that, to this site, the number of commits (with input from the number of kudos) tells how good a developer you are."
I don't know how representative it is, or if it might improve over time, but I looked myself up.
I found mentions in 5 projects - _except_ they're all just versions of 2.6 kernel source with the same contribution for an obscure TV card cx88 variant I did. In practice, I'm sure I'm hardly alone in having contributions (mostly in small ways, but sometimes very considerably) to over 100 projects over the years. I guess I have to go through and add some of those projects.
Naw, CBA. At least I can make sure my resume is accurate.
I tried to think of metrics to relay up the chain (a special thank you to the stat-scm goal in maven) but I come up with some pretty lame ones:
- Code to comment ratio is desired at 1:1 (at least in the commercial world)
- A class/method/function/procedure/module desired size should be defined and rated
- # of Unit tests
As you can see these are the ones that I found could be automatically gathered. And even these have exceptions. Anything else I think of either takes too much time to gather or is subjective. This is tough, I would like to default to peer review but oftentimes I find teammates voicing their personal hatred for an individual or taking into account personal qualities when ranking a developer. Real Life Example: Teammate A is from MIT and teammate B thinks everyone from MIT is a god. Unfortunately Teammate A hasn't done anything but criticize everyone's code without any constructive comments to make it better.I submitted this story hoping it would open dialog on measuring coding abilities in a semi-automated way.
My work here is dung.
I'm sure I'm not the only one who looked at that headline and wondered "How the hell did even a Slashdot editor misspell "Ohio" that badly? Even Taco could probably get within one letter of correct, if only because he's from Michigan.
This reminds me of how academics are increasingly judged. It is more about how many papers and how many other people link to it rather than the quality of each paper's work or the note of the linking party. Accordingly, many authors inflate their 'impact' scores by splitting up papers and publishing nonadvancing science, no-one can blame them for this as many are trying to justify themselves to their departments or are still doing the postdoc merry-go-round looking for new jobs every 18months.
You can't effectively rank developers. First there are just too many to rank. Even in college football, where thousands of people are paid everyday to monitor it, they don't try to rank all of the ~119 Div 1 teams, just the top 25. Secondly there isn't a simple metric to rank developers. It's about as smart as saying look I did the most work on this project because I wrote the most lines of code.
This could even have a negative effect if developers get concerned about their ranking and try to game the system instead of making quality contributions to projects.
Most of my contributions were on website documentation, wikis, or mailing lists, which aren't included in these metrics. At the moment, a lot of my commits are done on repositories not directly available to the public. While I don't really need Ohloh to tell me if I've contributed to a project or not, it's still a little annoying.
And what about contributors who submitted patches that had to be committed by someone else? Or people who contribute by providing help on IRC channels, blogs, forums, or other mailing lists?
While ohloh metrics can be useful, they also need to be taken with a grain of salt, particularly the contributor metrics. They're a bit more useful on measuring a project as a whole (but they still miss a lot of activity).
Who said Freedom was Fair?
What the heck is their business model, or is this just a hobby site? About the only way I can think of to make some money is to take some under the table in exchange for a higher rating.
Would this discourage contributers to open source projects? Now if I put on my resume that I've contributed to an open source project, somebody is going to want to look me up. I have to deal with all that baggage when I just wanted something to do in my spare time. Also, I really am not sure I feel comfortable being given an absolute rank. People always bring different skill/approaches to different jobs and I don't think you can arguably say one is better than another. I've worked in teams where everyone respects the different capabilities and limitations of each member. Its sort of like arguing there is an absolute thing known as "intelligence". Is there really such a thing or do we just all bring different skills/perspectives/approaches to the problems we solve? I'd prefer to think the latter, that everyone contributes what they can but has their own limitations. Talking about absolute "intelligence" or "value" seems condescending and elitist.
It's as good a measure as any.
Few commits means either you're Donald Knuth, or you're not that actively developing your code.
In Open Source active development does tend to mean a reduction in crapness, software wise.
What else it could say I don't know, but since there are few, if any definitive means by which code quality can be measured (and don't give me that lines of code versus man hours rubbish, I heard enough of that nonsense at uni), it's probably a reasonable metric.
http://en.wikipedia.org/wiki/Wikipedia:Editcountitis
Remember the days when Republicans were the party of fiscal responsibility?
So in other words, I could commit some of my own code to a CVS repository, find some errors that I missed, fix them, commit it again, decide to add more comments, commit it again, find one more thing I probably could have done differently and then rewrite it, commit it again...
And I would be ranked highly as a great developer?
/* No Comment */
"Kudos" is not plural, just a word that happens to end in "s", like "pathos". "Kudo", as used on that site, is as meaningless as "etho" or "mytho". The more frequent references to "many kudos" or other treatments of it as discontinuous are also incorrect, although much less jarring.
What I'm listening to now on Pandora...
Wouldn't such a system assume that everyone uses only one handle - or, their real name - all the time for every project? If so, then a lot of people - who contribute under multiple handles, nicks, whatever you want to call their identities - are going to missed or severely under-rated.
I would rather not have my real name attached to most of what I've contributed. One, because my code is so damn sloppy that it's embarrassing. Two, because I don't want the hassle of my real life - you know, offline - and my, uh, "digital lives" conflicting with each other. Three, if I was easy to find - online - I run the risk of being pestered with silly tech support questions.
UrCreepyNeighbor, while an accurate description of my personality, is one of many identities I have. Same could be said of almost everyone. I'm sure "HotChic17CA" doesn't use that username when she's talking with her grandmother, for example.
"The fight for freedom has only just begun." - Geert Wilders
You only have to comment on slashdot to tell that this is a really bad idea. You have people modding a comment "troll" because they don't like a stated opinion, for example. You have people modding a first post as "redundant" and a spot-on comment as "offtopic". People suck, especially at judging other people.
And on a thing like that, you may have someone who knows absolutely nothing about code making judgements about coders.
It's a stupid idea. It actually sounds like some harebrained idea thought up by a PHB. This idea needs to die quickly and horribly.
-mcgrew
mcgrew's razor: Never attribute to stupidity that which can be explained by greedy self-interest
This is completely ridiculous. They are attempting to judge value when there simply is no objective measure for the kind of things they are trying to judge.
I think I would object to having my name listed on this site, even if the "rating" were high.
It's not a measure at all. For example, on some projects a ajor contributor might not even have commit rights... the code would have to be committed by others. And so on.
They are attempting to measure something for which there is no consistent measure. As a consequence, there is no question that their "ratings" MUST be distortions.
So in other words, I could commit some of my own code to a CVS repository, find some errors that I missed, fix them, commit it again, decide to add more comments, commit it again, find one more thing I probably could have done differently and then rewrite it, commit it again...
Your willingness to fix errors, add comments, and do code rewrites puts you in the pantheon of programming gods! The next thing you are going to tell me you actually write your own legible "how to" user guides in PDF!
"All great wisdom is contained in .signature files"
Like the bards of olde, OS devs don't code for money. They code for prestige and fame amongst their fellows! Surely this site will decide who is the greatest dev to walk the earth. And that dev will have his own code set in stone and copied for ages to come. That developer will be legend.
Unless, heaven forbid, the voting is more like the U.S.'s political system.
http://www.ohloh.net/projects/3547/contributors/1354
# of kernel builds
# of ICQ shouting matches ending in "Nazi!!!"
# of cans of Jolt consumed
# of steps from mom's basement to side door. No! You got it all wrong! Everybody knows that programmer productivity is inversely proportional to the number of Slashdot posts!
My blog
It's been done before. It is called advogato. This is a site where developers can join, blog, and rate each other based on a trust matrix.
Some people will get a shiny glory and some will feel annoyed bbecause their projects/contributions have not been tracked.
Engineering is the art of compromise.
I work on Amanda, but the site misrepresents my contributions in two important ways, too: first, I commit a lot of other peoples' patches, so my name appears in the ChangeLog a lot less often than it appears in the commit log. Second, Amanda changed from CVS to Subversion a few years back, and Ohloh doesn't index the old CVS submissions. As a result, the project is marked as just a few years old (it was originally written in '92), and few of the many historical contributors are not listed. I would like to see some way to "correct the record," but I suppose that's pretty hard.
Registering large amounts of personal - almost political - data from a huge number of people without asking their consent would be so much against the European privacy laws. Selling access to that data, or mining it for commercial gain even more so.
It's a very simple model really, when you think about it. Let's examine their possible train of thought:
Sites can sell advertising when they get lots of frequent users. Sites need users to get users. Sites need some kind of user list to bootstrap. Where can you get a big list of users from? Why, isn't that opensource stuff based on lots of people communicating in the open, over the net? Oh, hey, let's use those suckers. Hmm. How can we make more suckers sign up after the first ones? Hmm... we need to make the ones who aren't in the DB feel like they should be. I know... rank them!
Hmm... do these people care about rankings? Will it actually be useful? Ahh... Who cares?
This measurement is not particular good - but what software metrics are?
However it is still a brilliant move, as it will motivate a lot of developers to add projects to Ohloh's database. Developers will add just the projects they have contributed to, so that there ranking will go up.
Most open source projects' commits are already gathered on cia.navi.cx. I don't see what ohioh can add besides a link to the real name, which is easy enough to find out anyway.
this could greatly facilitate finding the proper people to fund OSS development
especially if it's by the right companies (ibm,google,redhat,novell,...)
Quantitative metrics don't work on developers. As soon as a developer learns what it is, they are smart enough to game the system.
I [commit] can [commit] game [commit] any [commit] system [commit] based [commit] on [commit] commit [commit] counts[commit].[commit]
Numver off bugz fixd es eze 2 gaeme two.
Bug free code and low bug recidivism is easy. [have tester check code before checkin].
Number of projects? Sure. Every possible sub-component now has it's own source tree and project space.
Lines of code? Sure, I can write lots of code. It's one of my favorite things to do. On and on and on I can write code. It's like there doesn't have to be an end to what I say. Lots of productivity here. Oddoles and oodles of work product. Lots of code means lots of productivity, and man, I can sure be productive when I have to be. blah blah blah blah balh....
you don't have to use your real name on the interwebs, hoss
I can just see the tv ad:
"Wow! Collect yours today"
Then 2 kids in school uniforms
"I'll trade you my RMS for your Linus and Eric S Raymond!"
These posts express my own personal views, not those of my employer
There are facilities in ohloh to map aliases. It seems everyone can do it. I don't know if there is any conflict management, they seem to take wiki approach throughout their database.
I don't seem a problem in that free software contributions you want not to be associated with your legal name, won't be associated with your legal name.
That tries to create scarcity of the abundant open source.
I am listed as two people with the same pseudonym; my real name is not found. I am listed for two related projects belonging to the same organization. Both of me have the same score albeit for different skills. Ohloh obviously only checks commits to the main branches; my commits and LOCs to an experimental branch of one project would drown my official commits. I won commit status due to my assistance on the mailing lists and a lengthy complicated patch for critical functionality; my name is in the credits, but the commit is credited to someone else. I cured a critical bug in a third project (a component affecting many FOSS projects), but my patch was included as text in the bug tracking system and committed by and credited to someone else. The system measures lines of code, but great programming often reduces LOCs (as happened in the patch to the third project.) I am about to post code in a bug-tracking system that will not be integrated into the project due to management's objections ("This function has been broken for 8 years and we are afraid to fix it"); this will not be credited. Contributions to official documentation are ignored, as is useful information on personal websites.
Another committer has a much higher score. He is involved in more products and has committed tons of code to the official branches. He has kudos from other developers. From researching several developers whose work I know, I agree with the scores (except mine. My fame is in the corporate world for completing critical, urgent IT projects. I am a very minor player in the FOSS world. I am surprised my score is so high based on the limited contributions considered.)
Some suggestions:
1. List one person as one person. The organization requires a unique pseudonym for all projects. Start with the page translating the pseudonyms to real names.
2. Look at mailing list activity and distinguish between questions and answers. The first post in a user thread is a question; additional posts are often requests for information; the final post not from the original poster is usually the solution. The first post in a dev thread is often a new idea; additional posts are clarifications so give credit for length to avoid "+1" posts but avoid crediting lengthy log listings.
3. Look at bug-tracking systems. Many official commits are patches provided by non-committers through the bug-tracking system.
4. Counting LOCs is usually a poor determination of the quality of contributions. One of my official commits added code. My commit was reverted and replaced with half the LOCs of the original code. While I would accept credit for the concept that was implemented, those LOCs should not contribute positively to my score. Did the committer writing the excellent replacement code lose points for reducing LOC?
Ohloh ignores many important types of contributions and poorly assigns credit for much work to the committer rather than the contributor. Ohloh does not distinguish between types of commits; many commits are correcting bugs introduced by the same committer. Assigning importance ratings to commits requires integration with bug-tracking systems; CVS and SVN do not have scoring mechanisms.
Ohloh cannot score personality. Much of my career has been accomplishing the "impossible" (whether functional or due to deadlines) as my focus is on business usefulness rather than technical limitations. One person (scored 9 by Ohloh) is an incredible programmer once someone proves something can and should be done, but is extremely resistant to new ideas. Creativity is not reflected in the scores.
Ohloh has a good idea. I like the kudos system (and hope merit wins over popularity.) The system still needs work to be accepted as a reliable source of useful information.
I spend my life entertaining my brain.
If you could be bothered going to Ohloh and look at it, you will see that if you create an account you can then link to that all the different names you have used for different projects so that Ohloh knows they are all you.
I like the dollar figure ohloh attaches to projects. "This is what it would have cost an enterprise to develop this software." It really gives you an appreciation for how much the open source community is giving to the world.
Please correct me if I got my facts wrong.
Once more, a coding-related site which lumps C and C++ together into one "C/C++" agglomeration. That's the point where I stopped looking.
sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
An increasing number of projects are using mercurial these days. Seems like a not-small oversight.
Jonathan S. Shapiro (The EROS Guy)
After I gave up looking for good single metrics, years ago I invented a "karma" algorithm for use in CVS Monitor (google it).
The karma score is calculated for a single commit, and uses a combination of lines and files added/removed/changed (with some munging so that file moves don't get line scores), and then adds to that the size of the commit message in bytes, and then applies a maximum upper score per commit.
It's worked pretty well, since all the best ways of gaming your karma score also encourage good practices... for example.
* Write longer and more detailed commit messages
* Commit in several smaller pieces rather than large amounts at once
* You get (up to) double points if you implement on a branch, and then merge the branch to trunk. As a bonus, you can pick up major points if you are the guy that takes responsibility for merging all the brances to trunk.
Looking at several historical projects, the scoring system does seem to reflect the "importance" of each developer in a project to a reasonable degree.
Instead of trying to evaluate free software developers, why not help them instead? We know who they are, their names and emails are just a click away from the About box of the software we use, and most of them are googleable. Some of them aren't so lucky in their life outside free software. Some would appreciate some donated webspace, computer hardware, or other support. I have started AlgoLibre.org as a first effort to remind people that some free software developers may need our assistance. If there is any service benefiting free software developers you would like to run, you can run it through AlgoLibre, which is strictly non-commercial as well. Or just drop in and share your interest in the project, perhaps subscribing to the Recent Changes RSS feed if you want to stay tuned. I actually first started the project when I had the idea to provide some kind of medical insurance to all free software developers, so that in case a developer gets ill we can contribute to their medical expenses, and it should run in a wiki-like distributed non-centralised fashion just like free software works, or like a philanthropic circle: developers would explain their need on the site and users would choose whether to contribute or not.
I wrote a paper way back when on a very similar topic. I called it "distributed auditing," and the goal was to try and improve the quality of the code -- the quality of the auditors (and the developers) was a side-effect. I've put that paper up for everyone to enjoy; please don't hammer my site too hard. :p
My biggest active FOSS project has gone through two forks in it's lifetime. This was not because of me, but because of problems when the companies I worked for tried to appropriate my (after hours) hobby work. The final fork happened when I started my own company.
Ohloh has it listed., as well as the second of the two prior forks (both have since died as the companies couldn't maintain them without me). The newer project is however a niche-market project. It allows me to earn a living but it's hardly the kind of app that your typical computer user will ever want - it's there to do a specific job for a specific market section. It is very successful in that section however, and also fathered a number of large 'custom-version' projects which are my livelihood. Ohloh however only counts the roughly 8 months since it was forked - not the history of nearly 7 years prior to that.
A while ago, due to having a low comment-to-code ratio, I went out of my way to add more comments to the code- most of it was completely unneeded - my code made sense to me and others - but it was giving a false, bad reflection on me as a programmer.
Now where it stands, it has a fairly good rating and value on there - but if you consider the true history it's 'value' should be much higher. And since my program is so niche-market, I don't get much kudos from it, after all there are very few other FOSS people who know anything about the field.
This is a complex and powerful application with many levels to it, but it is targeted at a small userbase and consequently has a small contributing developer base, only one person other than me has commit access - I commit other patches myself, how else would I manage it in my spare time ? I feel ohloh greatly underestimates the worth of the project and it's developers simply because it has no way of distinguishing projects everybody needs a variation off and those that are for niche-uses.
Either way, not so much a complaint (my customers recognize the worth with or without ohloh) as an attempt at constructive criticism. On the upside, it has had 33 downloads from their sites and us small projects need all the free publicity we can get (it's a small percentage of the total downloads, but every user counts).
Unicode killed the ASCII-art *
I believe code-to-comment ratio is one of the things Ohloh tracks -- but it can't even figure that out for everyone.
For example, Perl modules are often documented in POD, rather than "normal" comments beginning with #, but Ohloh doesn't know how to parse Pod and so consider lots of well-documented modules to be nearly completely uncommented.
(That would be sort of as if they only counted // comments in Java source but not any in /** ... */ Javadoc comments.)
Esli epei etot cumprenan, shris soa Sfaha.
IANA OSS contributor but I have been a programming enthusiast since 1980 and a professional programmer for ~16 years. The last ten years I've reported to an executive (non-IT boss) for my own productivity and the productivity. Over the these years I've had several conversations with my various supervisors (again; non-IT) regarding how to plan & allocate resources for ~30 projects ranging from static web sites to projects of a much larger nature. I have not found it to be an easy task- setting aside the whole issue of programming as an art and therefore by its very nature less predictable than many corporate duties, there's still the challenge of understanding the effect of the team dynamic (or other aggregate effects such as environment & tools) on an individuals' productivity. I would think that the premise of the project featured in the article would fail to account for such a significant factor in trying to track metrics of the individual. (Sidenote: I don't think this is limited to programming either).
For some reason I keep thinking Joel Spolsky has already shown that lines of code is a poor performance measurement metric. Of course now I can't find the exact article, but he shows how easy it is to game that measurement system.
I was referring to the human factor. A lot of open source projects have only a few people allowed to commit code. Other code that is contributed is committed by those few people. So... who should get the credit for the code?
There is no reliable way to tell, after the fact. Period. And different projects have different commit and documentation standards.
As long as there is no standard way of handling these things among all open source projects (yeah right), this kind of metric will remain meaningless, or even harmful.