Slashdot Mirror


CPAN: $677 Million of Perl

Adam K writes "It had to happen eventually. CPAN has finally gotten the sloccount treatment, and the results are interesting. At 15.4 million lines of code, CPAN is starting to approach the size of the entire Redhat 6.2 distribution mentioned in David Wheeler's original paper. Could this help explain perl's relatively low position in the SourceForge.net language numbers?"

22 of 277 comments (clear)

  1. yeah, but try removing the punctuation by WebMasterJoe · · Score: 4, Funny

    If you take out the punctuation, though, it's down to twelve lines of code.

    --
    I really hate signatures, but go to my website.
    1. Re:yeah, but try removing the punctuation by haystor · · Score: 3, Funny

      You mean:

      If you take out the punctuation, it's down to twelve lines of comments.

      --
      t
  2. Huh? by Billobob · · Score: 4, Insightful

    Low position? For a language that's not suppose to be a full-blown low-level language like C/C++, perl is pretty damn well represented - over 1/3 the number of projects compared to C isn't that bad. If you have just one file, something like sourceforge usually isn't needed.

    --
    If you have to ask, you'll never know.
  3. Bahhh! by justanyone · · Score: 4, Funny

    Bahhh, I know people richer than that!

    Now compute the economic gain of using Perl vs. any other language:
    Perl vs. Nothing : $677M
    Perl vs. C : $1.25B
    Perl vs. C# : $2.77B
    Perl vs. Hand Optimized Assembly on Honeywell DPS-3E running GCOS operating system: Priceless

  4. Re:Huh? by _14k4 · · Score: 5, Funny

    Here, I'll repost the link from the article you never read:

    sloccount

  5. Perl isn't Linux by gorim · · Score: 3, Insightful

    Perl is a cross-platform tool that existed long before Linux did. Why do such things get posted under Linux ? May as well post it under BSD it would be doing the same thing. This happened with the recent Bash 3.0 topic as well. Why do people associate things with Linux just because it is open source ? (Unless it is BSD open source).

  6. Useless Measurement? by webword · · Score: 5, Insightful

    What is more important, lines of code or lines of quality code? People are always so impressed with sheer numbers. Quality is important.

    A similar issue is format and structure. You might do something almost right, but it could be better. For example, you might include dates on your web pages but is the format good for users? It can probably be better!

    Numbers are only impressive when they are placed in context of their overall utility. Of course, regarding code, measuring "overall utitility" is no joke. Can you really tell that the code from Programmer A is better than Programmer B.

    In any event, keep your eyes open. Don't let "15.4 million lines of code" amaze you just because the number is big. Let it amaze you because of what it means, and what those lines of code do for users.

    1. Re:Useless Measurement? by Geoff-with-a-G · · Score: 5, Funny

      What is more important, lines of code or lines of quality code? People are always so impressed with sheer numbers. Quality is important.

      Seriously.
      And it's Perl.
      I thought the whole point was that you could write a massive Perl program in a single line.
      15.4 million just tells me that CPAN is getting sloppy. Let's knock that down to say, 17 HUGE lines, okay?

    2. Re:Useless Measurement? by ajs · · Score: 3, Insightful
      LOC isn't a great measure, but when talking about CPAN there are several things to keep in mind that modify the premise of measuring LOC:
      • Perl modules on CPAN include their own, customized installation and testing harness. This renders them far more valuable than a simple dumping ground of LOC.
      • CPAN presents a searchable, globally mirrored database of this code, which again increases its value.
      • Perl itself has an extremely powerful syntax. Many of Perl's detractors, in fact, will claim that this is far too much power to have in a syntax (vs. grammar and/or semantics and/or external libraries), so comparing 1000 LOC in Perl to 1000 LOC in, say, Java or C# or other "mid-level languages" (my phrase) can be quite favorable to Perl. Even comparing to other high-level languages can be, depending on the application (of course, each high level language has its own strengths, and for example, Python's thread handling is much simpler than Perl's, and both Ruby and Python make OO much easier).

      That said, I think that the idea that comparing LOC in, say, a Red Hat distribution to LOC in CPAN is valuable, regardless of the fact that structure and format are also concerns. They are equally concerns in both environments, and both environments have roughly equal pressures on improving both incrementally over time (e.g. bad code gets migrated away from the core and good code gets migrated in).

      ALL OF THAT aside, Perl's CPAN is most valuable not because of its size or the quality of the code, but because it is a repository where thousands of people with highly specialized needs share code with each other. Perl is unique in having created such a space that is widely used outside of core advocates of the language. I don't know why that's the case, but as long as it is, it's a very good thing.

      Getting code noticed by your niche's peers and making it available for everyone to use is key to Perl's success as a language.
  7. Relatively low? by stinkyfingers · · Score: 5, Funny

    It's relatively low because that list is in alphabetical order!

  8. Re:Mining CPan by Dr.+Zowie · · Score: 3, Funny

    >"C#"
    You misspelled "INTERCAL".

  9. Gilb's Law by YetAnotherName · · Score: 4, Interesting
    For anyone who says that lines of code isn't a useful measure, just remember "Gilb's Law":
    Two years ago at a conference in London, I spent an afternoon with Tom Gilb, the author of Software Metrics ... I found that an easy way to get him heated up was to suggest that something you need to know is "unmeasurable." The man was offended by the very idea. He favored me that day with a description of what he considered a fundamental truth about measurability. The idea seemed at once so wise and so encouraging that I copied it verbatim into my journal under the heading of Gilb's Law:

    Anything you need to quanitfy can be measured in some way that is superior to not measuring it at all.

    Gilb's Law doesn't promise you that measurement will be free or even cheap, and it may not be perfect---just better than nothing.
    --Tom DeMarco and Timothy Lister, Peopleware 2/E, Dorset House Publishing, New York, 1999.
    1. Re:Gilb's Law by Minwee · · Score: 3, Insightful

      I'm not seeing any connection there.

      Glib's Law only states that there exists _some_ measure with a value greater than that of not measuring. It doesn't say that every measure, no matter how bizarre, is better than nothing. Glib's Law tells us nothing about the value of lines of code.

      If measurement for measurement's sake was always a good thing then I could take an eight bit CRC of the source code or the ratio of "e"s to "i"s and use those as metrics for quality.

  10. Re:Huh? by servognome · · Score: 4, Funny

    /. response efficiency warning!
    To conserve server resources in the future please update your response "Did you even attempt to click the underlined word 'sloccount'? If not, do it now and read the first line of the first paragraph." with the more efficient "RTFA" or "RTFA you stupid noob" if you are not into the whole brevity thing.

    --
    D6 63 0D 70 89 81 BB 8E 7B 7C 5F 5D 54 EA AB 73
  11. Low position? by fanatic · · Score: 4, Interesting
    Copying and pasting the linked Sourceforge page into a file, then sortting yelds the following highest project numbers:

    Perl 5254 projects
    PHP 9010 projects
    Java 12210 projects
    C 13069 projects
    C++ 13255 projects

    So perl is behind only 4 others. Given that much Perl project work probably ends up in CPAN instead of sourceforge, this is actually pretty high. Did the poster mean he'd expect higher without CPAN?

    --
    "that's not encryption - it's a new perl script that I'm working on..." - from some Matrix parody
    1. Re:Low position? by lawpoop · · Score: 4, Funny

      Yes, but close to 75% of all those PHP Projects are a DVD/CD cataloging system.

      --
      Computers are useless. They can only give you answers.
      -- Pablo Picasso
  12. All lines are not equal by fanatic · · Score: 4, Interesting
    One line on perl typically does a lot more than one line of C code (even without absurd "golf" tricks). The same is true of other high level languages. So even leaving out issues of programmer quality, what does this really mean?

    Also, from the linked article:

    Reasons why these results are meaningless:
    • Most importantly, I've told SLOCCount all of CPAN is one project, which is probably inflating the numbers significantly. When I get more time, I may run SLOCCount per-distribution, then sum the totals. However, SLOCCount appears to have bugs handling this many sub-projects, so I will need to run them separately and manually sum the results.
    • mini-cpan.pl doesn't actually find only the latest versions of everything, some dists are duplicated and some may be ignored.
    • There's probably plenty of generated code not being identified correctly.
    • There's probably plenty of code downloadable from CPAN that wasn't written for CPAN, and so probably shouldn't be counted.
    • All the usual reasons why code metrics based on numbers of lines of source code are meaningless.
    And here's another: CPAN includes perl itself - which is probably a *lot* of lines of C code.
    --
    "that's not encryption - it's a new perl script that I'm working on..." - from some Matrix parody
  13. Re:Mining CPan by Waffle+Iron · · Score: 3, Informative
    Well, what I'd like to see first would be a Python equivalent to CPAN existing in the first place.

    While it's not nearly as big as CPAN, I often find Python code I need in the Vaults of Parnassus

  14. Re:Perl coders make $135k/year? by Minwee · · Score: 5, Informative

    On average, salary is only half of what a company pays for an employee. If you count benefits, office space, training, administration and all of the other costs involved that $135k works out to more like a $67,000 salary.

    A junior programmer working in Manhattan makes about $60,000 a year according to a recent salary survey, going up to $90,000 for a senior guru. Based on those numbers I don't see anything wrong with the $135k/year figure.

    Coders may not _make_ $135,000, but they do _cost_ that much to employ.

  15. Re:Nonsense. by Merk · · Score: 4, Insightful

    Read the quote carefully: "Anything you need to quanitfy can be measured in some way that is superior to not measuring it at all."

    He's not saying that *any* measurement is better than no measurement. He's saying that there exists a measurement that is better than no measurement.

    Which tastes better, ice cream or fresh pineapple? I don't know, but rather than say "It's impossible to say! Any measurement will be flawed." You could do a survey and see what most people think tastes better. That may not be the measurement that is better than no measurement, but for certain purposes it may be.

    In the end, it depends on what your reason for doing the measurement is. If you're going to be marketing a new bubble gum flavour, then this survey is better than no information at all.

  16. Re:Subject rejected - looks too much like ASCII ar by paster · · Score: 3, Funny

    Look at this: XHTML parser using K programming language
    Perl is really clean language :)

    --
    Create RSS feed from any web page http://Page2RSS.com/
  17. Re:Mining CPan by ajs · · Score: 3, Informative
    I checked out that site.

    I only looked at a handfull of the links. It's sort of a Yahoo! (the original indexer, not todays search engine-cum-kitchen sink) for Python code, which is ok, but check out how one uses CPAN in the real world:
    # perl -MCPAN -e shell
    cpan> i /SpamAssassin/
    Distribution F/FE/FELICITY/Mail-SpamAssassin-2.63.tar.gz
    Modul e Mail::SpamAssassin (F/FE/FELICITY/Mail-SpamAssassin-2.63.tar.gz)
    cpa n> install Mail::SpamAssassin
    ---- Unsatisfied dependencies detected during [F/FE/FELICITY/Mail-SpamAssassin-2.63.tar.gz] -----
    Filter::Simple
    Shall I follow them and prepend them to the queue
    of modules we are processing right now? [yes]
    I'm sure you can see how this makes CPAN far more useful for building a large repository of useful Perl modules. How, in Python, can you build several layers of libraries that depend on each other without this kind of repository of dependency information? How does a user "come into the know" about these factors?

    Of course, that ignores the fact that CPAN modules all come with regression testing and online documentation (installed in the sytem "man" tree) as well.