CPAN: $677 Million of Perl
Adam K writes "It had to happen eventually. CPAN has finally gotten the sloccount treatment, and the results are interesting. At 15.4 million lines of code, CPAN is starting to approach the size of the entire Redhat 6.2 distribution mentioned in David Wheeler's original paper. Could this help explain perl's relatively low position in the SourceForge.net language numbers?"
Wow, using sloccount on the full POPFile source shows that developing it would have cost around $500K in a regular software company. That seems about right given the length of time we've been working on it and the number of people involved. Cool tool.
:-)
Now if only I could push the donations up above $5,000
John.
If you take out the punctuation, though, it's down to twelve lines of code.
I really hate signatures, but go to my website.
Low position? For a language that's not suppose to be a full-blown low-level language like C/C++, perl is pretty damn well represented - over 1/3 the number of projects compared to C isn't that bad. If you have just one file, something like sourceforge usually isn't needed.
If you have to ask, you'll never know.
Did you even attempt to click the underlined word 'sloccount'? If not, do it now and read the first line of the first paragraph.
[Ob. reference: I love Perl and use it all of the time, but a programmer I met years ago said it was the only language where the source code reminded him of line noise]
SourceHosting.net, LLC
Ready. Set. Code.
http://www.sourcehosting.net/
Bahhh, I know people richer than that!
Now compute the economic gain of using Perl vs. any other language:
Perl vs. Nothing : $677M
Perl vs. C : $1.25B
Perl vs. C# : $2.77B
Perl vs. Hand Optimized Assembly on Honeywell DPS-3E running GCOS operating system: Priceless
Unitarian Church: Freethinkers Congregate!
Whatever ones favourite language might be, a project to mine CPan and port useful modules to Python, Java or C# would be interesting.... Perl syntax reads as a little terse to many non-Perl devs.
Here, I'll repost the link from the article you never read:
sloccount
Pfft 15.4 Million lines?
/usr/bin/perl ; ;
;
I could write CPAN in a one liner!
#!
use warnings
use strict
print "CPAN:
This
Perl is a cross-platform tool that existed long before Linux did. Why do such things get posted under Linux ? May as well post it under BSD it would be doing the same thing. This happened with the recent Bash 3.0 topic as well. Why do people associate things with Linux just because it is open source ? (Unless it is BSD open source).
What is more important, lines of code or lines of quality code? People are always so impressed with sheer numbers. Quality is important.
A similar issue is format and structure. You might do something almost right, but it could be better. For example, you might include dates on your web pages but is the format good for users? It can probably be better!
Numbers are only impressive when they are placed in context of their overall utility. Of course, regarding code, measuring "overall utitility" is no joke. Can you really tell that the code from Programmer A is better than Programmer B.
In any event, keep your eyes open. Don't let "15.4 million lines of code" amaze you just because the number is big. Let it amaze you because of what it means, and what those lines of code do for users.
How to Download YouTube Videos
It's relatively low because that list is in alphabetical order!
Embarassing I know. Maybe I can blame on silly new topics and color schemes that are so close to each other :)
/. response efficiency warning!
To conserve server resources in the future please update your response "Did you even attempt to click the underlined word 'sloccount'? If not, do it now and read the first line of the first paragraph." with the more efficient "RTFA" or "RTFA you stupid noob" if you are not into the whole brevity thing.
D6 63 0D 70 89 81 BB 8E 7B 7C 5F 5D 54 EA AB 73
Although their "Book TV" show is usually as dense as Perl, and often profiles books that are write-only.
taken! (by Davidleeroth) Thanks Bingo Foo!
$677 million, 5,000 person-years = ~$135,000/year/person.
I don't know any perl coders who make $135 a year, let alone $135,000!
(sorry, but it's true)
Wer mit Ungeheuern kämpft, mag zusehn, dass er nicht dabei zum Ungeheuer wird. --Nietzsche
What the hell is this post talking about? CPAN? SLOCOUNT? Red Hate 6.2? I honestly have no clue. Is lines of code measured in dollars or lines? If lines, why is there a dollar amount in the headline? If dollars, why is there a lines count in the article?
Comment of the year
So perl is behind only 4 others. Given that much Perl project work probably ends up in CPAN instead of sourceforge, this is actually pretty high. Did the poster mean he'd expect higher without CPAN?
"that's not encryption - it's a new perl script that I'm working on..." - from some Matrix parody
To conserve server resources in the future... STFU?
Also, from the linked article:
And here's another: CPAN includes perl itself - which is probably a *lot* of lines of C code."that's not encryption - it's a new perl script that I'm working on..." - from some Matrix parody
if you're doing the work for a recognized non-profit organization, then you might want to talk to an accountant, and see if it's worthwhile to do. There are probably implications where you work for non-profit, and it produces code that you can re-use in other for-profit applications.
But I'm not a tax lawyer or an accountant, so if you're really interested in writing off some of your work, it'd be worth looking into.
Build it, and they will come^Hplain.
Patently, bad measurements are worse than no measurements.
"Measurement drives performance." If you are measuring the wrong thing or using misleading measurements, you will do the wrong thing.
Anyone who thinks they can devise a meaningful measurement the quality of Beethoven's Fifth Symphony versus Brahm's First... or which tastes better, vanilla ice cream or fresh pineapple... or who is a better ballplayer, Willie Mays or Sammy Sosa... needs to have their head measured, preferably with a standardized test.
In order to tell whether measurement in some way is superior to not measuring it at all, you need a way to measure the quality of the measurement. But to do that, you need...
"How to Do Nothing," kids activities, back in print!
PERL is nice in that it has a lot of prepackaged modules that provide a lot of functionality. But when you distribute code that uses these modules, the end user must install them. This is a big pain in the rear for the average user, which is why I believe that PERL is a bad choice for programs intented for the end user.
SourceForge is a great tool with meaningful projects there, but you kind of have to take the info you get from looking at overall numbers there with a grain of salt.
Nearly twice as many projects as Python, almost three times as many as VB, and more than a third as many as C or C++ is low? For a scripting language? I think these numbers prove the versatility and popularity of Perl.
This type of measurement is simplistic and poor: it fails to take into account the functional density of the languages. For example, in one line of perl, you can do what takes ten lines of C/C++ -- especially considering that this is one of perl's raison d'tre.
The measurements really should be multi-dimensional:
- lines of code
- number of projects
- number of files
- number of modules (classes, namespaces, etc)
- statement density (e.g. per-language primitives: while, for, @, ->,
I read the whole thing including the comments as CSPAN and was like wtf?
b) Maybe the sloc counter didn't recognize Perl comments, so it overcounted lines. Wait, Perl programs never have comments.
c) Does this make it "a Perl of great price"?
Have you read my blog lately?
I think one of the reasons why many of the things people do in Perl don't end up becoming SourceForge projects is because they're specific to a particular environment -- my company does pretty much everything {that others might do on Windows desktops} using in-house-written Perl scripts accessed through a web browser; but they really aren't general-purpose enough to warrant releasing to the world at large. For instance, we need to store the Ordnance Survey grid references of our customers -- but not everyone will need that functionality. Perl itself provides a kind of "generality-of-purpose abstraction layer"; there's not much sense in writing a program that can handle fifty squillion different data formats if you're only ever going to use one, especially given that processor power and disk space are so cheap nowadays. I also use Perl for jobs that could be done using bash or awk or sed, but Perl is just so handy; and if I need to add one more fearure, I know I can. I'll also use perl -e 'print "something\n"' in an Xterm as a calculator {one day I'll even define a key map that puts the sequence on a function key}.
Alternatively, Perl -- thanks to all those wonderful library bindings -- might well be used for an initial "feasibility study", say to develop and test the most important function(s) that will end up forming the core of a project; and, once the proof-of-concept is there, the whole thing is then rewritten "from the ground up" in something like C or C++ {which has bindings for the dead same libraries anyway, but feels more "proper" because it's compiled rather than interpreted}.
Je fume. Tu fumes. Nous fûmes!
don't worry, you still get modded up
I've told SLOCCount all of CPAN is one project,
isn't the sum equal to all parts? I know it is more difficult to do big projects. (all those middle-managers)
generated code
Is code, generated more efficiently.
code downloadable from CPAN that wasn't written for CPAN,
there is probably code in red had that was never written for red had. That is the trouble with open source.
numbers of lines of source code are meaningless.
No no, they give you serious bragging rights!
So things are not that bad. Just the duplicates....
So things aren't quite bad. Just the duplicates....
a programmer I met years ago said it was the only language where the source code reminded him of line noise [emphasis added]
He'd obviously never seen APL -- one the few languages terser and more cryptic than Perl, and (AFAIK) the only one to require its own font.
-- Alastair
Sure, Slashdotters hate Flash, but why aren't there any ActionScript projects on SourceForge, while there are 1822 JavaScript projects?
--
make install -not war
In my experience with CPAN I have found it follows the Larry Wall concept that there are many ways to do the same thing. For starters, there are several modules which can communicate with a POP3 server. There are many XML parsers and many means of talking to a MySQL database. Unfortunately I would not say each solution is feature complete or even good quality. It is great that it has built-in Pod Doc, but the fact remains is that it can be quite difficult to get some things done.
.NET 1.1 profile implemented by Mono to be much more appealing. While there may be fewer means of connecting to a POP3 server, there is a good chance the one that is there will work well enough.
I was able to whip together a webmail client which fetches mail from a POP3 server and parse the MIME types to display content with several Perl modules which was a pretty amazing feat with the little amount of code which I wrote. But as I wrote it I had to come up with many workarounds for incomplete features in the CPAN modules. I also found that some modules were object oriented and some were not.
So in the end I am finding things like the Java Foundation Classes or the
But I am still curious how the Ruby folks are doing. They have been committed to object-oriented programming and may be able produce higher quality solitions. Anyone doing Ruby here?
Brennan Stehling - http://brennan.offwhite.net/blog/
But, how do you know that the way you're measuring it is better than not measuring at all? There are lots of ways to measure things that are worse than no measurement at all, because they reward the wrong activity.
The canonical examples here are paying programmers per bug fixed, or paying testers per bug detected. Either one of these alone is bad - together they allow programmers and testers to print money for themselves.
In theory, nothing is unmeasurable. In practice, some things are so hard to measure that you might as well not even try.
To a Lisp hacker, XML is S-expressions in drag.
Could it mean that folks who write Perl are more likely to submit their work to CPAN?
How does the "instant gratification" of using an interpreted language factor into all this? I know one of the attractions of Perl for me is that I don't have to compile it to see if it works. I just run it.
"Obviously, I'm not an IBM computer any more than I'm an ashtray" (Bob Dylan)
I don't understand either. I all my Perl programming for Windows.
If anyone's interested, I've developed a hardware/software audit script entirely in Perl for Win32 (binaries included) that stores data in a centralized MySQL server. Vist here.
-- Political fascism requires a Fuhrer.
After the first detailed analysis of the large perl leak onto the net experts reckon CPAN could cost $677M to clean up 8)
If its the cost of writing the code then it should be a good approximation of the cost of writing it when perl 6 comes out
Clean syntax...
... };
You can write some pretty clean syntax in perl just:
#!/usr/local/bin/perl -Tw
use warning;
use strict;
use diagnostics;
use vars qw{
main();
exit;
# Your perl code.
1;
portable libraries?!
What the heck are you smoking dude? I want some!
It works on more platforms than any other language,
including C because it wraps libc platform weirdness into "you don't have to know or care" equivalent.
Think about EBCDIC, incomplete , endianess, file systems
that don't have all unixes attributes.
A decent GUI library?
There's a Perl/Tk. okay it sux.
There's a wrapper for GTK and Windows API.
okay it sux too.
There's an HTML API, where you can write
your entire program in HTML/JavaScript/Perl,
and just install some Apache, with mod_xmlrpc
and mod_perl thing that runs whatever you want locally on the machine. --> "good portable compromise".
You could also use something like C++ Builder
an embed your perl program within your C/C++ application.
http://perllinux.sourceforge.net
Ever looked at CPAN?
Look at this: XHTML parser using K programming language :)
Perl is really clean language
Create RSS feed from any web page http://Page2RSS.com/
...then how'd they become the Ruling Class? You know, not every rich person is a slutty blonde bimbo heiress like Paris Hilton (someone who I'm sure would struggle to make up the bed in just one room of one of daddy's hotels). A good deal of the wealthy class is self made (particularly in North America)--perhaps your view is coloured by the more class-oriented system of the UK, where there is a fair bit more wealth through inheritance.
Jobs and Woznaik founded Apple and Jobs still runs it (hell of a lot bigger than a mere electronics store chain). I'm sure both of them would be more than capable of wiring up a 13A plug seeing as they were capable of designing, building and programming a computer (and devices allowing them to call Europe for free). And while Bill Gates came from a fairly affluent family, he was hardly a billionaire and managed to survive the early Micro-soft days in dumpy New Mexico digs and do low-level assembly programming.
And yes, I'm sure many of the owners of GM and Ford know how to change a tyre--seing as they are publicly held companies with a large number of shareholders. I'm willing to bet that the executives.management could do it (Lee Iacocca comes agross as a guy who is down-to-earth enough that he could.
My sister is the Canadian president of a multi-national corporation and not only can she peel a potato, she peeled many of them making dinner for her two kids every night as a stay-at-home mother when she was in her early twenties.
Fact is, it is no longer the 19th century, democracy is widespread and the "ruling class" is no longer so dominated by inheritance like it once was. This Marxist theory of the proletariat rising up en-masse against a ruling class dependent on workers output just doesn't wash. Today, those of the working class with the capacity and drive to step up are able to rise one-by-one. And once you are part of the "ruling class" it is human nature to defend it regardless of others actions--particularly when your wealth is earned.
I'd daresay that Gilb's law's meaning is nearer to something more like "anything that can be quantified"... I mean, there are no units in which you can quantify the symphonies' quality or other subjective affairs. But if the thing you wanted to measure in the beginning, is measurable in certain units, then surely any approximation is better than no idea at all...
... from the forgotten corner in europe
...how much of that is devoted to MP3 taggers and MVC frameworks... :-)
I wonder what percentage of 15.4 million lines of code are tests. CPAN emphasizes on tests in every module submitted.
In some cases the size of the test scripts is larger than the core code of the module.
Umm.. Perl has a vast repository of reusable code to avoid reinventing the wheel. It's called CPAN.
404 Not Found: No such file or resource as '.sig'
Another related paper (that I didn't write) is Counting Potatoes: The size of Debian 2.2. They found that Debian 2.2 includes more than 55 million physical SLOC, and would have cost nearly $1.9 billion USD using over 14,000 person-years to develop using traditional proprietary techniques.
So what's the purpose of all these studies? Insight. There are all sorts of limitations in any measure, including any source lines of code (SLOC) measure. But, in spite of those limitations, there are things you can learn. Using tools (like SLOC counting tools) to measure software can help you understand things about the software, as long as you understand the limitations of the measure.
In particular, many studies have shown that SLOC is very strongly related to effort (so much so that you can even use equations to predict it). If you want to determine effort in CPAN, you can't just go ask people; few open source software / Free Software (OSS/FS) developers record exactly how much effort they invested. So, these kinds of measures are really helpful for estimating how much effort went into developing the software. Obviously, not all effort is equal (a genius can turn a hard problem into an easy one). And not all code is good, or even useful. But if you want to understand and measure effort, then these measures do have a value. In particular, these results have shown that OSS/FS can scale up to large projects requiring large amounts of effort.
- David A. Wheeler (see my Secure Programming HOWTO)
I use Perl every day and it has been one of the eisiest to learn and use language that I have found. The syntax to me, seems pretty clear, concise and easy to read, for instance, for a GUI example:x t=>'Hello World');
use Tk;
$win=MainWindow->new;
$label=$win->Label(te
$label->pack;
. The syntax actually is very clean and rather simple and easy to use. In my extensive use of Perl I have found the syntax to be very clean, clear, and easy to understand. I think Perl does things a little differently than other languages, and immediately when people see something that is different to them, it seems many think that there is something wrong with it. If you look at another language for the first time it might seem unusual and strange to you. I have looked at PHP, Python, C++ and Java, those languages seem difficult and strange to me when I first look at them, but thats probably because I hadn't used them enough. It shouldn't be wrong to be different. Perl does things differently but that doesnt make it bad or worse than other languages.
As far as OO goes, Perls OOs is not "bolted on". It is elegantly, carefully designed and integrated with the language. The process of creating and using a Perl module is simple and straightforward, I have done it many times, and just as easy as other programming languages. For example: use Module; $module=Module->new; $module->method(); seems pretty simple and clear to me, in fact, more elegant than some other OO languages that I have seen, in my opinion. I have used C++ and Java, and actually do prefer the design of Perls OO over other languages, it actually takes me less time to use it and code for it than it does on other languages, but thats my personal preference. I have found that with Perl OO and Perl syntax in general it is easy to write clear, good, concise code in less space than many other languages. Perl, to me anyway, requires less language verbage than say C++ or Java does, but is clear and concise.
People have different needs and tastes, and if people should use the programming tools that best suites them. Perl best suites my needs and works in a way that is natural and easy to me. It took me less time to learn Perl than it has other languages.
As far as the GUI libraries, Perl has interfaces to a wide range of GUI libraries, from GTK, QT, OpenGL, Tk, FLTK, etc. Take your pick. Tk is most often distributed with Perl, including with ActiveState Perl on Windows. I have used TK on many occassions and found it to have a very elegant and well designed, yet powerful API.
Perl modules to me seem to be very portable, I have used ones on many different OSs and programs, with no problem. There is nothing inherit in the module system that makes it unportable.
I really havent seen any of the issues that you have mentioned in my extensive use of the Perl programming language.
If your argument is, "this measure doesn't measure how many lines of code it would have taken in C", or "how much effort would it have taken if it was written in C", well, that's true. So what? That wasn't what was being measured. If that's what you wanted, there are well-known conversion factors where you can estimate the SLOC in C, and convert it to effort. But those conversion factors are estimates with a LOT of slop, and the published conversion factors have almost no published data to justify them, nor do they identify the ranges and standard deviations and other caveats. But if that's what you wanted, I'm not sure if there's a better way to do it.
- David A. Wheeler (see my Secure Programming HOWTO)
SLOCCount measures "physical SLOC", and thus ignores blank lines and comment-only lines (including Perl PODs). It's not the same as "wc -l". Go read its documentation if you want to understand exactly what it does; it has a lengthy description of exactly what it measures, and why, along with references to the (substantial) research literature behind such tools.
- David A. Wheeler (see my Secure Programming HOWTO)
That's far too simplistic a statement. It depends on your task. If all you want to do is serve up web pages, what PHP was primarily created for, then yes, PHP is easier to work with. But if you want to sequence DNA, Perl is the tool for the job. With Perl, you can get a lot more done with far fewer lines of code to tackle hard-core programming problems.
---Technology will liberate us if it doesn't enslave us first.
industry standard languages
Ok, that was your problem, right there. The languages you refer to may be widely used in YOUR industry, and thus have libraries that you need in that industry, but I assure you that in MY industry, Perl (given CPAN) is far-and-away the best tool for most jobs BECAUSE of its ubiquity and support libraries for just about anything you would ever want to do.
That doesn't say your language is useless, but clearly you're only looking at a small slice of the world.
"At 15.4 million lines of code,"
But how many libraries of congress is CPAN?
-- Bowery's Razor; a corollary (applicable to programmers and other state crafters) of Okham's Razor.
Seastead this.
See, I would argue that the industry standard languages are (in order) C, C++, VB, and LabWindows/LabView/MATlab/other very high level language, because of my slice.
Its likely that your slice (the Perl-using slice) is also a very small slice. Based on a fairly wide-ranging look, I'd guess that C and C++ are the most widely used.
---
Mod me down, you fucking twits. Go ahead. I dare you.
(I read with sigs off.)
Well, kind of. Technically, TECO wasn't a programming language but a text editor (and corrector). Emacs wasn't so much a program written in it but a collection of editing macros for it.
I'll agree, though, that it looks like line noise.
-- Alastair
Yes and no... it depends on what level of development you're talking about. C and C++ are used throughout (I'm guessing) both of our environments, but your avergage in-house coder for operations or end-user apps is probably not going to be writing in those languages. In high-tech firms that are bringing new solutions to the business world (e.g. software spin-offs from accademia, scientists writing code to support their work, Linux and/or Unix admins trying to get work done) you're going to see a lot of the sorts of high-level languages that are available in that world.
In the insurance companies and other non-technical companies that have a lot of technology dependencies, you're probably going to see much more of the proprietary systems (like VB) in use because they feel more comfortable with something that comes with a salesman (I'm not saying that in a negative way, it's just my experience).
I explicitly stated that as time goes on inheritance of wealth becomes less important to determining success.
My sister was borne to a father who worked shift work running the boilers in a meat processing facility at the time. He was not, isn't and probably won't ever be (as he is now retired) president or chairman of a multinational company. My sister worked her way up from selling door-to-door to suppliment a fairly modest household income to where she is now twenty years later. She is not related in any way to the founder of the company, and daddy had nothing to do with her current position except to raise his kids well.
While it's sure a hell of a lot easier to be born into a position of wealth, there is NOTHING in the free world today that prevents a "commoner" from improving his lot in life except his or her own sense of limitation.
And why not, the only thing the end user uses his desktop for is pressing the icons for the apps he wants to use, just openoffice, mozilla, a mail client and 99% of the users are happy
I do embedded. Most embedded code is C and C++. I've never seen Java or C# or Perl or any of those (Perl for pretty obvious reasons - you need a substantial base underneath it). In fact, embedded is pretty much a death knell for anything interpreted, and for very HLL.
We use VB and LabWindows to write interfaces to talk to our embedded boards, and MATLab to do data analysis.
Remember that there's a *lot* of programming that has nothing to do with PCs.
---
Mod me down, you fucking twits. Go ahead. I dare you.
(I read with sigs off.)
Yes. Anyway, the number of development projects using a language is not necessarily the measure of its usefulness or the extent to which it is used. JavaScript is everywhere in HTML but has few whole projects compared with other "languages."
The number of utility scripts and small applications in Perl must be astronomical (many of these of course are available on CPAN and don't need to be developed).
While well written, the author of the linked page completely failed to mention the /real/ date standard, ISO 8601. It is the most logical (descending order) and least confusing.
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
There's a second 's' missing, somewhere...
The post anonymously option you are [not] attempting to use is one that isn't available to your user.
The post anonymously option you are [not] attempting to use is one that isn't available to your user.
Confucius say, "Find worm in apple - bad. Find half a worm - worse."