Are There Perl Optimization Guides?
ara818 asks: "I have written a 4,000-line personal web assistant using Perl. After getting everything to work I am now working on making the code run faster. The problem I keep running into though is that there are so many ways to do the same thing in Perl that I don't know which is faster. Right now, I am working on intuition but I'd really like a site or book that could give me at least a few pointers or some guidelines. Is there any such resource available for Perl, or for that matter, other popular programming languages?"
http://perl.apache.org/guide/performa nce.html
Matt. Want XML + Apache + Stylesheets? Get AxKit.
Programming Perl has a section called "Efficiency" in the "Other Oddments" chapter, which contains many useful pointers for time and space efficiency.
Aside from Programming Perl, which was mentioned earlier, another book, that focuses on more some more obscure optimizations, is "Advanced Perl Programming", by Sriram Srinivasan. O'Reilly, of course.
I'm not sure how well-developed the thing is yet, but if it's in a workable state, would compiling some parts of your program help? (At least a bit?)
There is a spellbook here; eat it? [ynq]
Also, be sure to avoid overloading the garbage collector. I know that a lot of Java (another garbage collected language) programs needlessly allocate String objects by using the concatenation operator. Find out a little more about how Perl's GC works and you'll probably see that your code is working it a bit too hard at least someplace.
~wog
I mean, Perl is reasonably fast most of the time. Is there a real need to optimize? That's the first question you need to ask yourself. If the answer is yes, figure out what's slowing it down. If the algorithms you're using are good, and yet the code is still too slow for acceptable performance in Perl, try to find a standard Perl module (or something on CPAN) that's written in C that does what you want. If that's not avaiable either, write it in (C, C++, Ada95, Objective C, whatever floats your boat) and call it from the shell (be careful about tainted paths, though) - or if you're ambitious, learn SWIG or XS and make a Perl module (then submit it to CPAN!)
- Use hashes instead of linear searches. Instead of iterating over @keywords to see if $_ is a keyword, construct a hash with it:
- Consider using foreach, shift, or splice rather than subscripting.
- Use use integer.
- Avoid goto
- Avoid printf if print will work
- Avoid $&, $`, and $'
- Avod using eval on a string. Eval of a string forces recompilation every time the program is ran. In particular, symbolic references instead fo using eval to to construct variable names: ${$pkg . '::' . $varname} = &{ "fix_" . $varname}($pkg)
- Avoid eval inside a loop. Put the loop into eval instead, to avoid redundant recompilations of the code.
- Avoid run-time-compiled patterns, that is,
/$pattern/. Use the /pattern/o (once only) pattern modifier to avoid pattern recompilations when the pattern doesn't change over the life of the process. For patterns that change occasionally, you can use the fact that a null pattern refers back to the previous pattern, like this: /$currentpattern/; # Dummy match, must suceed //;
- Short-circuit alternation is often faster than the corresponding regular expressions. So:
/one-hump/ || /two/; /one-hump|two/; - Reject common cases early with next if inside a loop. As with simple regular expressions, the optimizer likes this. You can typically discard comment lines and blank lines even before you do a split or chop:
/^#/; /^$/;
- Avoid regular expressions with many quantifiers, or with big {m,n} numbers on parenthesized expressions.
- Maximize length of any non-optional literal strings in regular expressions. This is counterintuitive, but longer patterns often match faster than shorter patterns. That's because the optimizer looks for constant strings and hands them off to a Boyer-Moore search, which benefits from longer strings. Compile your pattern with the -Dr debugging switch to see what Perl thinks the longest literal is.
- Avoid expensive subroutine calls in tight loops.
- Avoid getc, use sysread instead (for single-character I/O only). . To get all the non-dot files within a directory, say something like this:
- Avoid frequent substr on long strings
- Use pack and unpack instead of multiple substr invocations.
- Use substr as an lvalue rather than concatenating substrings.
- Use s/// rather than concatenating substrings.
- Use modifiers and equivalent and and or, instead of full-blown conditionals. Statement modifiers and logical operators avoid the overhead of entering and leaving a block. They can often be more readable too.
- Use $foo = $a || $b || $c instead of:
- Set default values with $pi ||= 3;
- Don't test things you know won't match. Use last or elsif to avoid falling through to the next case in your switch statement.
- Use special operators like study, logical string operations, unpack 'u' and pack '%' formats.
- Beware of the tail wagging the dog. Misresembling ()[0] and 0
.. 2000000 can cause Perl much unnecessary work. In accord with UNIX philsophy, Perl gives you enough rope to hang yourself. - Factor operations out of loops.
- Slinging strings can be faster than slinging arrays.
- my variables are normally faster than local variables.
- tr/abc//d is faster than s/[abc]//g
- Print with a comma separator may be faster than concatenating strings.
- Prefer join("",
...) to a series of concatenated strings. - Split on a fixed string is generally split on a pattern. That is, use split(/
/, ...) rather than split(/ +, ...). - system("mkdir
...") may be fsater on multiple directories if mkdir(2) isn't available. - Cache entries from passwd and group and so on.
- Avoid unnecessary system calls.
- Avoid unecessary system() calls.
- Keep track of your working directory rather than calling pwd each time.
- Avoid shell matacharacters in commands -- pass lists to system and exe where appropriate.
- Set the sticky bit on the Perl interpreter on machines without demand paging. chmod +t
/usr/bin/perl - Using defaults doesn't make your program faster
The same chapter also lists Space Efficiency, Programmer Efficiency, Maintainer Efficincy, Porter Efficiency, and User Efficinecy. Each section contradicts each other.my %keywords;
for (@keywords) {
++$keywords{$_};
}
Then test $keywords{$_} for a nonzero value to see if $_ is a keyword.
"foundstring" =~
while () {
print if
}
print if
is likely to be faster than:
print if
at least for certain values of one-hump and two. This is because the optimizer likes to hoist ceertain simple matching operations up into higher parts of the syntax tree and do very fast matching with a Boyer-Moore algorithm. Complicated patterns defeat this.
while () {
next if
next if
chop;
@line = split(/,/);
opendir(DIR, ".");
@files = sort grep(!/^\./, readdir(DIR));
closedir(DIR);
if ($a) { $foo = $a; }
elsif ($b) { $foo = $b; }
elsif ($c) { $foo = $c; }
Do not shift and do not increment integers for cycles unless necessary (perl integer math sucks and shift loads the garbage collector). Foreach is faster.
Do not compare and substr, regexp is faster. If anything can be formulated as a regexp, regexp it. That what perl is good at. Have fun.
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
This book spends a lot of pages down in the hardware level of optimization, but covers a lot of optimizations in other programming languages, particularly C. The author also discusses some general principles of optimization: Concentrate your optimization efforts on the code which is used the most, such as inner loops; consider alternative representations of your data, etc.
Sprinkle liberally through the code.
Study the results. You'll find that often what you though was slow isn't and what you thought was fast is slow. In one case a CGI I saw was taking 2 seconds to just load all the libs that were being requird even though most were not needed!
In another case we found that the fastest way to beuild a free text endgine was to munge the input into a regex and use the builtin grep on the data (cut the search from 500ms to sub 100ms of CPU).
Overall it's very application dependent - most of the optimizations (apart from the regex stuff) are about good coding practice as much as anyting else.
Never underestimate the bandwidth of a truck load of tapes