Optimizing Perl

← Back to Stories (view on slashdot.org)

Posted by michael on Friday October 22, 2004 @02:54AM from the other-way-to-do-it dept.

An anonymous reader writes "Perl is an incredibly flexible language, but its ease of use can lead to some sloppy and lazy programming habits. We're all guilty of them, but there are some quick steps you can take to improve the performance of your Perl applications. This article looks at the key areas of optimization, which solutions work and which don't, and how to continue to build and extend your applications with optimization and speed in mind."

4 of 68 comments (clear)

Min score:

Reason:

Sort:

Here's My Style Guide by justanyone · 2004-10-22 03:07 · Score: 5, Informative

I know, many people means many styles.
Here's my style guide, something I developed using Perl for over 5 years now.

Pardon the length, it's unavoidable.

Perl Coding Conventions and Style Guide
By Kevin J. Rice, Kevin@justanyone.com

General conventions:

Read the Perl style guide (http://theoryx5.uwinnipeg.ca/CPAN/perl/pod/perlst yle.html), and follow the conventions therein, especially the following:
4-column indent
Blank lines between chunks that do different things.
Use mnemonic variables- the names must mean something useful. No one character names!
Variable naming conventions:
$ALL_CAPS_HERE constants only (beware clashes with perl vars!);
$Some_Caps_Here package-wide global/static (also prefix 'gv_', see below);
$no_caps_here function scope my() or local() variables;
Be consistent.
Be nice.

Specific Coding Practices:

1. Always do a 'use strict;' at the beginning of every module and script. This catches both subtle errors and bad coding practices.

2. Programs should pass 'use warnings;' with a minimum of warnings before going into production. Note: turning off warnings in production is sometimes required for security or stability purposes. Solve root cause for all warnings if possible; don't just eliminate immediate cause.

3. Turn on Taint checking for all cgi / web enabled scripts. Invoke with "perl -T" or "#!/usr/bin/perl -T".

4. Use spaces for indenting, not tab characters. No file should contain any tab characters. These display differently in various terminals/editors, and mixing spaces and tabs makes code very messy. Most modern editors can be set to automatically insert spaces in place of tabs.

5. Each subroutine should perform one distinct task. Feel free to break down lengthy (i.e., more than 1-2 screenfuls of code) subs. This means almost all subroutines should be 120 lines or less; longer ones should be justified in code review.

6. Code blocks, when more than 1 or 2 lines, should have the block { } at the same indentation level to aid visual clarity of where that block starts/stops. Example:

if ($condition == $value) { $another_var++; $two = $three + 12; } else { $four = $pi * 13; }

7. Fully parenthesize stuff like "if ($a >= 5 || $b > 4)" into "if (($a >= 5) || ($b > 4))" so the user has no need to know/get wrong the order-of-operations. This includes one line conditionals like, "if (a) {}" - don't do: "if a {}".

8. Evals: Always use evals when doing system calls. If otherwise using them, always comment/explain why. If you know something might 'die', explain it specifically, since it probably isn't obvious.

9. Explicitly 'return' values at the end of every sub. Don't EVER use the last statement's value as a default return value; someone modifying the code later might not know you're depending on that value.

10. All modules must explicitly end with '1;' to provide a return value for the module.

11. Minimize the use of map() due to its confusing nature.

12. Use parentheses around all function calls, such as sort($a, $b) instead of "sort $a $b;" to make it obvious a function call is occurring. Prefer not to use the Perl subroutine operator, as in "&subroutinename($arg1, $arg2);" just do "subroutinename($arg1, $arg2);".

13. Don't use the 'unless' verb. Instead of, "unless($foo) {...}", code: "if (!$foo) {...}". The 'unless' verb is plenty confusing due to its uniqueness to Perl.

14. Modules and scripts, when over 200 or so lines, should have a logMessage() subroutine that allows for various levels of logging (0=silent, 1=minimal, 4=normal, 8=verbose): logMessage(1, "message");

15. Use a main() sub for all scripts, and include an explicit exit with an exit code appropriate to the platform you're on. Do not

--
Unitarian Church: Freethinkers Congregate!
A better way to Sort than in the article. by cryptor3 · 2004-10-22 05:26 · Score: 4, Informative

In the article, the author mentions that a faster way of implementing this sort:
my @marksorted = sort {sprintf('%s%s%s', $marked_items->{$b}->{'upddate'}, $marked_items->{$b}->{'updtime'}, $marked_items->{$a}->{itemid}) <=> sprintf('%s%s%s', $marked_items->{$a}->{'upddate'}, $marked_items->{$a}->{'updtime'}, $marked_items->{$a}->{itemid}) } keys %{$marked_items};
is this sort, which pre-computes a "sort" field for each record. (of course, at the expense of memory):
map { $marked_items->{$_}->{sort} = sprintf('%s%s%s', $marked_items->{$_}->{'upddate'}, $marked_items->{$_}->{'updtime'}, $marked_items->{$_}->{itemid}) } keys %{$marked_items}; my @marksorted = sort { $marked_items->{$b}->{sort} <=> $marked_items->{$a}->{sort} } keys %{$marked_items};
I argue that this implementation is flawed, because the fields can run together, so for example, if you had the following data:

Data object 1: upddate = 111, updtime = 1100, itemid = 200
Data object 2: upddate = 1111, updtime = 100, itemid = 200

So both strings would have a sort value of 111110200, but of course, data object 1 should be sorted before data object 2. Using delimiters in the sprintf statement will ensure that different fields are marked as different, but they will interfere with the sort order.

Another problem is that if your sort string is too long, perl may convert it to a floating point number and thus lose the data from the later fields.

The more correct way to do this sort is
my @marksorted = sort { $marked_items->{$b}->{'upddate'} <=> $marked_items->{$a}->{'upddate'} || $marked_items->{$b}->{'updtime'} <=> $marked_items->{$a}->{'updtime'} || $marked_items->{$b}->{'itemid' } <=> $marked_items->{$a}->{'itemid'} } keys %{$marked_items};
The added benefit of this method is that it definitely won't have overflow problems (which may be the case in the above examples, because "<=>" is the numeric compare operator. Had the author used "cmp", there would then be a quantity of numeric comparisons proportional to the length of the sort string.

The other benefit of my sort is that it is more flexible. you can change the "<=>" operator to a "cmp" operator if one of your fields is string data.

The sort that I propose (one I've been using) may or may not be faster than the "faster" sort proposed by the author, but then again, speed is nothing without correctness.
unless by hding · 2004-10-22 08:38 · Score: 3, Informative

isn't unique to Perl. It exists, for example, in Common Lisp.
Re:Best PERL Optimization trick ever: by torpor · 2004-10-22 19:48 · Score: 2, Informative

why should a language be all about string handling? thats what good libs are for.

this article starts out with the 'some people program in Perl and use terrible habits' point. the problem is, Perl allows you into this bad habit territory, by design of the language.

string handling is just one use for a language. python has plenty of superb string handling libs. its also very difficult to get into the same 'bad habit' territory that you can get into with Perl..

my original post was to make the point that if you don't want to have to 're-optimize' your badly-written Perl code, just use Python in the first place.

--
; -- the corruption of government starts with its secrets. a truly free people keep no secrets. --