Optimizing Perl
An anonymous reader writes "Perl is an incredibly flexible language, but its ease of use can lead to some sloppy and lazy programming habits. We're all guilty of them, but there are some quick steps you can take to improve the performance of your Perl applications. This article looks at the key areas of optimization, which solutions work and which don't, and how to continue to build and extend your applications with optimization and speed in mind."
Does this mean you can't optimize Perl? =D
Disclaimer: I use Perl almost exclusively for programming.
rm -rf /usr/bin/perl
:) }
{ Just Use Python!
; -- the corruption of government starts with its secrets. a truly free people keep no secrets. --
Got it now. Mirrored here, JIC. Please hit the link if you can, because they have a "Rate this article" thing at the bottom (I don't know if the form still works from the mirror, relative resources and such.), and we should give the author the good karma he/she deserves.
-hattmoward
The second rule or program optimization (FOR EXPERTS ONLY!): Don't do it yet.
-- fortune
The World Wide Web is dying. Soon, we shall have only the Internet.
Here's my style guide, something I developed using Perl for over 5 years now.
Pardon the length, it's unavoidable.
Perl Coding Conventions and Style Guide
By Kevin J. Rice, Kevin@justanyone.com
General conventions:
Read the Perl style guide (http://theoryx5.uwinnipeg.ca/CPAN/perl/pod/perlst yle.html), and follow the conventions therein, especially the following:
4-column indent
Blank lines between chunks that do different things.
Use mnemonic variables- the names must mean something useful. No one character names!
Variable naming conventions:
$ALL_CAPS_HERE constants only (beware clashes with perl vars!);
$Some_Caps_Here package-wide global/static (also prefix 'gv_', see below);
$no_caps_here function scope my() or local() variables;
Be consistent.
Be nice.
Specific Coding Practices:
1. Always do a 'use strict;' at the beginning of every module and script. This catches both subtle errors and bad coding practices.
2. Programs should pass 'use warnings;' with a minimum of warnings before going into production. Note: turning off warnings in production is sometimes required for security or stability purposes. Solve root cause for all warnings if possible; don't just eliminate immediate cause.
3. Turn on Taint checking for all cgi / web enabled scripts. Invoke with "perl -T" or "#!/usr/bin/perl -T".
4. Use spaces for indenting, not tab characters. No file should contain any tab characters. These display differently in various terminals/editors, and mixing spaces and tabs makes code very messy. Most modern editors can be set to automatically insert spaces in place of tabs.
5. Each subroutine should perform one distinct task. Feel free to break down lengthy (i.e., more than 1-2 screenfuls of code) subs. This means almost all subroutines should be 120 lines or less; longer ones should be justified in code review.
6. Code blocks, when more than 1 or 2 lines, should have the block { } at the same indentation level to aid visual clarity of where that block starts/stops. Example:
7. Fully parenthesize stuff like "if ($a >= 5 || $b > 4)" into "if (($a >= 5) || ($b > 4))" so the user has no need to know/get wrong the order-of-operations. This includes one line conditionals like, "if (a) {}" - don't do: "if a {}".
8. Evals: Always use evals when doing system calls. If otherwise using them, always comment/explain why. If you know something might 'die', explain it specifically, since it probably isn't obvious.
9. Explicitly 'return' values at the end of every sub. Don't EVER use the last statement's value as a default return value; someone modifying the code later might not know you're depending on that value.
10. All modules must explicitly end with '1;' to provide a return value for the module.
11. Minimize the use of map() due to its confusing nature.
12. Use parentheses around all function calls, such as sort($a, $b) instead of "sort $a $b;" to make it obvious a function call is occurring. Prefer not to use the Perl subroutine operator, as in "&subroutinename($arg1, $arg2);" just do "subroutinename($arg1, $arg2);".
13. Don't use the 'unless' verb. Instead of, "unless($foo) {...}", code: "if (!$foo) {...}". The 'unless' verb is plenty confusing due to its uniqueness to Perl.
14. Modules and scripts, when over 200 or so lines, should have a logMessage() subroutine that allows for various levels of logging (0=silent, 1=minimal, 4=normal, 8=verbose): logMessage(1, "message");
15. Use a main() sub for all scripts, and include an explicit exit with an exit code appropriate to the platform you're on. Do not
Unitarian Church: Freethinkers Congregate!
For instance, flock is your friend ... and as I outline in my slashdot effect analysis you had better be prepared to handle race conditions. Ignoring the web server overload (mod_perl would have helped here), the code actually hung in there fairly well as I've learned from past "mistakes" when I've seen some pretty funky error messages crop up ... but even this time around, there was two minor corner cases I failed to account for (had never been "tickled" before) ... but those are fixed now so I'll be "more" ready if my christmas lights show up on Slashdot again ... but then again, you are never really "ready" for Slashdot! ;-)
Hulk SMASH Celiac Disease
Are you accusing me of writing PERL? Come over here and say that again!
No offense but, I think that programming in Perl is a sloppy lazy programming habit.
Laboratree - Scientific collaboration based on OpenSocial.
||= very handy optimization... especially in a persistant environment such as mod_perl or speedy cgi.
Bush and Blair ate my sig!
Many optimizations listed in the article are not pertinent to Perl, but to any programming language, and as such are inapropriate to be there. Like the part about avoiding calling functions inside loops, short-circuit logic, sorting, etc..
But there are some good tips there, too: the part about string handling, references, and the AutoLoader.
4. I disagree. The tab is useful. The fact that they display differently in various terminals/editors is a FEATURE! But I agree that mixing spaces and tabs is a bad idea.
6. bullshit. That's personal taste.
13. utter bullshit. What's confusing about 'unless'? It may be unique to perl, but it's a pretty obvious english word.
14. bah
16. good! good that you don't prohibit gotos
18. bullshit. The $_ variable has a nice semantic value. Of course it should be used only in small blocks.
21. hmm.. Sometimes it's nice to declare them near place they're used.
22. Sometimes it makes sense to declare many inter-related variables on one line. Like my ($display_width, $display_height);
23. nice. I prever to use just g_
26. I like to use dashes instead #---------
29. hun?
I'm still working through it, but I cannot reproduce its purported effects.
;
First it has a syntactical error with the "x" operator; it puts the number on the left and the string on the right, but the actual syntax it the other way around. If the author had actually tried to run his examples, he would have noticed this.
Then the author says that putting as much text in a single-quoted string as possible better, and says that something like:
print 'aaaaaaaaaaaaaaaaaa',"\n"
is better than:
print "aaaaaaaaaaaaaaaaaa\n";
I just tested this, and not only could I find no difference between single and double-quoted strings with the same amount of text, the suggested "improvement" with two separate strings, above, was significantly SLOWER than the second version.
At this point I lost interest (and respect) and stopped checking. but don't take my word for it, try it youself! Are you getting different results?
Data object 1: upddate = 111, updtime = 1100, itemid = 200
Data object 2: upddate = 1111, updtime = 100, itemid = 200
So both strings would have a sort value of 111110200, but of course, data object 1 should be sorted before data object 2. Using delimiters in the sprintf statement will ensure that different fields are marked as different, but they will interfere with the sort order.
Another problem is that if your sort string is too long, perl may convert it to a floating point number and thus lose the data from the later fields.
The more correct way to do this sort isThe added benefit of this method is that it definitely won't have overflow problems (which may be the case in the above examples, because "<=>" is the numeric compare operator. Had the author used "cmp", there would then be a quantity of numeric comparisons proportional to the length of the sort string.
The other benefit of my sort is that it is more flexible. you can change the "<=>" operator to a "cmp" operator if one of your fields is string data.
The sort that I propose (one I've been using) may or may not be faster than the "faster" sort proposed by the author, but then again, speed is nothing without correctness.
Look, this kind of "squeeze the last bit of performance" exercise can be nice fun for assembly, or possibly C, programmers, but when have you had something that was acceptable as a perl script, but only after extensive optimization?
Better yet, I would have liked pointers on how to test code snippets for performance (such as illustrating the use of Benchmark or Devel::SmallProf), and then possibly a few pointers like this. (and why was Memoize left out of an article like this?) This sounds like someone writing perl who'd rather be writing assembly code.
In optimizing my (and others') perl scripts, the best tools I've found are the profiler and an understanding of what the code is supposed to do. That, and changing the nature of deployment of the program - from a cgi script to mod_perl, for example. All these little techniques are chasing after grains of sand, when there's a big rock right in front of your face.
Incidentally, I am getting a slightly better speed on the singlequote example (as claimed). My times are 12s vs 14s.
The primary bottleneck here is in the IO of the print statement itself. I bet that the string interpolation is probably very fast compared to the buffering slowness induced start/stop-ness of the second print statement. Most likely you have a very fast CPU.
Buffering makes all the difference in the world. From some of the benchmarking I've done previously, I have a hypothesis that "\n" sent to a print statement will trigger a buffer flush after it finishes sending that string off to the print statement.
In other words, an exaggerated version of foreach(1..1000) { print "blah\n";} will be slower than foreach (1..50) { print "blah\ntimes ten" }
I have seriously changed the execution time of one script from 980ms to 98 ms just as a result of bad buffering from the print statement. I think that the process of splitting up the print statements probably made it wait (more often) for the I/O resources.
In my particular case, I had a CGI script running off localhost, that was outputting about 30kb of text, looping through 80 or 90 records of data. Rather than doing the print statements directly, I buffered everything to a string, and then at the end of the loop. As a tradeoff, I think I may have actually settled on flushing my string buffer at the end of each loop cycle.
Just for the record, I think the article is good in the sense that if you didn't realize that what you were doing might be a performance issue, it's a good wakeup call to go and benchmark your code. But don't just follow the advice on faith.
test:
I have a friend who works at a big company that provides a lot of "utility" to its customers.
They run perl scripts all the time to crunch text files containing lots of data coming in from remote sensors and stuff like that. He told me that the more senior guys have the philosophy is "Optimize? nah, just let it run the extra 20 minutes."
And they're talking about scripts that get run in a cron job DAILY.
... where the optimized code is easier to read.
isn't unique to Perl. It exists, for example, in Common Lisp.
Perl compiles its code into an intermediary "tree" of logic nodes (Perl "opcodes"). Are there any topology strategies for optimizing that tree, in the graph itself? Any visualization tools that let Perl generate the tree, then let a programmer change the tree, then complete the compilation of the new tree to new code? Is Parrot/Perl6 making any of these strategies more feasible, or are they all going away?
--
make install -not war
DAY 1:
Manager: How many lines of code did you write today?
Developer: One.
DAY 2:
Manager: How many lines of code did you write today?
Developer: One.
Day 3:
Manager: How many lines of code did you write today?
Developer: One.
Manager: Are you telling me that in three days you've only mangaged to write three lines of code?
Developer: You don't understand -- I've been working on the same line of code all three days.
Manager: (pauses) You're writing in perl again, aren't you?
Software Wars
go back to your ivory tower, you useless person
My editor (http://ultraedit.com/, when I hit the tab key, insert 4 spaces.
*THIS HITS THE NAIL ON THE HEAD*
You've got it configured to *insert* spaces when you hit tab. I don't recall ultraedit doing that by default (haven't used it in a few years tho). Most editors I use by default will *RENDER* a tab as X spaces, not actually *CREATE* X spaces. If it renders as X, you can easily change the rendering. But once they've become spaces, you can't go back (easily anyway).
creation science book
Well for me the greatest optimization is Perl itself, which allows to quickly write potent code. It spares the programmer's time, which costs much more than machine time. And as to optimization - well I think that a good optimizing compiler should do the job - you know I couldn't recognize my inefficient C++ code after running it through the Intel C++ compiler - it has improved soooo much!
You can defy gravity... for a short time