perhaps we should have put a disclaimer on the survey. on the other hand, given that almost nobody here seems to have read the text accompanying the survey, it may not be worth it.
the text clearly lists the limitations of the survey including the small code base used; the algorithm to identify and credit authors is clearly documented - and the source code is available on the site FWIW. of course, the survey is full of errors, some of which i've commented on here, on advogato and elsewhere (e.g. gordon matzigkeit).
the main problem is naturally that this is impossible to do by hand and has to be automated; we did want to look at authorship at a file level (the lowest level of granularity available); and author credits are in no fixed format. they're not even there much of the time, which is why copyright holders such as the FSF get a lot of credit too. the only alternative to listing them as they are is to have a huge "uncredited" portion - at least until authors start consistently claiming credit, using the same name or e-mail address in each file they write.
incidentally it is not possible for us to guess which of many contributors to a single file are more important; as documented, the credit is currently split equally among them.
finally, this is just a start. while we intend to continue working on this, the algorithm source code is available as are all the code bases, so nothing stops you from doing it too.
We've just released the first free software survey of 25 million lines of code charting authors' contributions and project participation. naturally, the FSF is on top, with 11% of all code credited. that's not what we wanted to see, though - we wanted to see who wrote the code, not who owned it through copyright. whether the FSF should ask you to assign them the copyright or not, i think the FSF should most certainly list author credits. one thing people can "earn" from free software is reputation, and not listing authors' names takes that away.
We've just released the first free software survey of 25 million lines of code charting authors' contributions and project participation. naturally, the FSF is on top, with 11% of all code credited. that's not what we wanted to see, though - we wanted to see who wrote the code, not who owned it through copyright. whether the FSF should ask you to assign them the copyright or not, i think the FSF should most certainly list author credits. one thing people can "earn" from free software is reputation, and not listing authors' names takes that away.
the text clearly lists the limitations of the survey including the small code base used; the algorithm to identify and credit authors is clearly documented - and the source code is available on the site FWIW. of course, the survey is full of errors, some of which i've commented on here, on advogato and elsewhere (e.g. gordon matzigkeit).
the main problem is naturally that this is impossible to do by hand and has to be automated; we did want to look at authorship at a file level (the lowest level of granularity available); and author credits are in no fixed format. they're not even there much of the time, which is why copyright holders such as the FSF get a lot of credit too. the only alternative to listing them as they are is to have a huge "uncredited" portion - at least until authors start consistently claiming credit, using the same name or e-mail address in each file they write.
incidentally it is not possible for us to guess which of many contributors to a single file are more important; as documented, the credit is currently split equally among them.
finally, this is just a start. while we intend to continue working on this, the algorithm source code is available as are all the code bases, so nothing stops you from doing it too.
We've just released the first free software survey of 25 million lines of code charting authors' contributions and project participation. naturally, the FSF is on top, with 11% of all code credited. that's not what we wanted to see, though - we wanted to see who wrote the code, not who owned it through copyright. whether the FSF should ask you to assign them the copyright or not, i think the FSF should most certainly list author credits. one thing people can "earn" from free software is reputation, and not listing authors' names takes that away.
We've just released the first free software survey of 25 million lines of code charting authors' contributions and project participation. naturally, the FSF is on top, with 11% of all code credited. that's not what we wanted to see, though - we wanted to see who wrote the code, not who owned it through copyright. whether the FSF should ask you to assign them the copyright or not, i think the FSF should most certainly list author credits. one thing people can "earn" from free software is reputation, and not listing authors' names takes that away.