Slashdot Mirror


CPAN: $677 Million of Perl

Adam K writes "It had to happen eventually. CPAN has finally gotten the sloccount treatment, and the results are interesting. At 15.4 million lines of code, CPAN is starting to approach the size of the entire Redhat 6.2 distribution mentioned in David Wheeler's original paper. Could this help explain perl's relatively low position in the SourceForge.net language numbers?"

3 of 277 comments (clear)

  1. Gilb's Law by YetAnotherName · · Score: 4, Interesting
    For anyone who says that lines of code isn't a useful measure, just remember "Gilb's Law":
    Two years ago at a conference in London, I spent an afternoon with Tom Gilb, the author of Software Metrics ... I found that an easy way to get him heated up was to suggest that something you need to know is "unmeasurable." The man was offended by the very idea. He favored me that day with a description of what he considered a fundamental truth about measurability. The idea seemed at once so wise and so encouraging that I copied it verbatim into my journal under the heading of Gilb's Law:

    Anything you need to quanitfy can be measured in some way that is superior to not measuring it at all.

    Gilb's Law doesn't promise you that measurement will be free or even cheap, and it may not be perfect---just better than nothing.
    --Tom DeMarco and Timothy Lister, Peopleware 2/E, Dorset House Publishing, New York, 1999.
  2. Low position? by fanatic · · Score: 4, Interesting
    Copying and pasting the linked Sourceforge page into a file, then sortting yelds the following highest project numbers:

    Perl 5254 projects
    PHP 9010 projects
    Java 12210 projects
    C 13069 projects
    C++ 13255 projects

    So perl is behind only 4 others. Given that much Perl project work probably ends up in CPAN instead of sourceforge, this is actually pretty high. Did the poster mean he'd expect higher without CPAN?

    --
    "that's not encryption - it's a new perl script that I'm working on..." - from some Matrix parody
  3. All lines are not equal by fanatic · · Score: 4, Interesting
    One line on perl typically does a lot more than one line of C code (even without absurd "golf" tricks). The same is true of other high level languages. So even leaving out issues of programmer quality, what does this really mean?

    Also, from the linked article:

    Reasons why these results are meaningless:
    • Most importantly, I've told SLOCCount all of CPAN is one project, which is probably inflating the numbers significantly. When I get more time, I may run SLOCCount per-distribution, then sum the totals. However, SLOCCount appears to have bugs handling this many sub-projects, so I will need to run them separately and manually sum the results.
    • mini-cpan.pl doesn't actually find only the latest versions of everything, some dists are duplicated and some may be ignored.
    • There's probably plenty of generated code not being identified correctly.
    • There's probably plenty of code downloadable from CPAN that wasn't written for CPAN, and so probably shouldn't be counted.
    • All the usual reasons why code metrics based on numbers of lines of source code are meaningless.
    And here's another: CPAN includes perl itself - which is probably a *lot* of lines of C code.
    --
    "that's not encryption - it's a new perl script that I'm working on..." - from some Matrix parody