Slashdot Mirror


Optimizing Perl

An anonymous reader writes "Perl is an incredibly flexible language, but its ease of use can lead to some sloppy and lazy programming habits. We're all guilty of them, but there are some quick steps you can take to improve the performance of your Perl applications. This article looks at the key areas of optimization, which solutions work and which don't, and how to continue to build and extend your applications with optimization and speed in mind."

7 of 68 comments (clear)

  1. Optimization Rules! by FooAtWFU · · Score: 4, Insightful
    The first rule of program optimization: Don't do it.
    The second rule or program optimization (FOR EXPERTS ONLY!): Don't do it yet.

    -- fortune

    --
    The World Wide Web is dying. Soon, we shall have only the Internet.
    1. Re:Optimization Rules! by Smallpond · · Score: 4, Insightful

      +1 insightful.

      Look at his first example, which is concatentating 1 million strings. His "bad" time is 5.2 seconds and the good time is 1.7. Who cares? Nobody uses perl to do high-performance computing. Imagine you are extracting 1M strings from a database and doing something with them. Would you care about a 3 second difference?

      Its OK to write good code, but its better to make your code clear and not dependent on clever tricks.

  2. Here's My Style Guide by justanyone · · Score: 5, Informative
    I know, many people means many styles.
    Here's my style guide, something I developed using Perl for over 5 years now.

    Pardon the length, it's unavoidable.

    Perl Coding Conventions and Style Guide
    By Kevin J. Rice, Kevin@justanyone.com

    General conventions:

    Read the Perl style guide (http://theoryx5.uwinnipeg.ca/CPAN/perl/pod/perlst yle.html), and follow the conventions therein, especially the following:
    4-column indent
    Blank lines between chunks that do different things.
    Use mnemonic variables- the names must mean something useful. No one character names!
    Variable naming conventions:
    $ALL_CAPS_HERE constants only (beware clashes with perl vars!);
    $Some_Caps_Here package-wide global/static (also prefix 'gv_', see below);
    $no_caps_here function scope my() or local() variables;
    Be consistent.
    Be nice.

    Specific Coding Practices:

    1. Always do a 'use strict;' at the beginning of every module and script. This catches both subtle errors and bad coding practices.

    2. Programs should pass 'use warnings;' with a minimum of warnings before going into production. Note: turning off warnings in production is sometimes required for security or stability purposes. Solve root cause for all warnings if possible; don't just eliminate immediate cause.

    3. Turn on Taint checking for all cgi / web enabled scripts. Invoke with "perl -T" or "#!/usr/bin/perl -T".

    4. Use spaces for indenting, not tab characters. No file should contain any tab characters. These display differently in various terminals/editors, and mixing spaces and tabs makes code very messy. Most modern editors can be set to automatically insert spaces in place of tabs.

    5. Each subroutine should perform one distinct task. Feel free to break down lengthy (i.e., more than 1-2 screenfuls of code) subs. This means almost all subroutines should be 120 lines or less; longer ones should be justified in code review.

    6. Code blocks, when more than 1 or 2 lines, should have the block { } at the same indentation level to aid visual clarity of where that block starts/stops. Example:

    if ($condition == $value)
    {
    $another_var++;
    $two = $three + 12;
    }
    else
    {
    $four = $pi * 13;
    }

    7. Fully parenthesize stuff like "if ($a >= 5 || $b > 4)" into "if (($a >= 5) || ($b > 4))" so the user has no need to know/get wrong the order-of-operations. This includes one line conditionals like, "if (a) {}" - don't do: "if a {}".

    8. Evals: Always use evals when doing system calls. If otherwise using them, always comment/explain why. If you know something might 'die', explain it specifically, since it probably isn't obvious.

    9. Explicitly 'return' values at the end of every sub. Don't EVER use the last statement's value as a default return value; someone modifying the code later might not know you're depending on that value.

    10. All modules must explicitly end with '1;' to provide a return value for the module.

    11. Minimize the use of map() due to its confusing nature.

    12. Use parentheses around all function calls, such as sort($a, $b) instead of "sort $a $b;" to make it obvious a function call is occurring. Prefer not to use the Perl subroutine operator, as in "&subroutinename($arg1, $arg2);" just do "subroutinename($arg1, $arg2);".

    13. Don't use the 'unless' verb. Instead of, "unless($foo) {...}", code: "if (!$foo) {...}". The 'unless' verb is plenty confusing due to its uniqueness to Perl.

    14. Modules and scripts, when over 200 or so lines, should have a logMessage() subroutine that allows for various levels of logging (0=silent, 1=minimal, 4=normal, 8=verbose): logMessage(1, "message");

    15. Use a main() sub for all scripts, and include an explicit exit with an exit code appropriate to the platform you're on. Do not

  3. And then stress-test with Slashdot ... by xmas2003 · · Score: 4, Interesting
    I use Perl for my halloween webcam - same code is used in the christmas webcam ... and thought I had it in pretty decent shape ... until the Slashdot thundering herd descended on it and gave it one heck of a stress test.

    For instance, flock is your friend ... and as I outline in my slashdot effect analysis you had better be prepared to handle race conditions. Ignoring the web server overload (mod_perl would have helped here), the code actually hung in there fairly well as I've learned from past "mistakes" when I've seen some pretty funky error messages crop up ... but even this time around, there was two minor corner cases I failed to account for (had never been "tickled" before) ... but those are fixed now so I'll be "more" ready if my christmas lights show up on Slashdot again ... but then again, you are never really "ready" for Slashdot! ;-)

    --
    Hulk SMASH Celiac Disease
  4. A better way to Sort than in the article. by cryptor3 · · Score: 4, Informative
    In the article, the author mentions that a faster way of implementing this sort:
    my @marksorted = sort {sprintf('%s%s%s',
    $marked_items->{$b}->{'upddate'},
    $marked_items->{$b}->{'updtime'},
    $marked_items->{$a}->{itemid}) <=>
    sprintf('%s%s%s',
    $marked_items->{$a}->{'upddate'},
    $marked_items->{$a}->{'updtime'},
    $marked_items->{$a}->{itemid}) } keys %{$marked_items};
    is this sort, which pre-computes a "sort" field for each record. (of course, at the expense of memory):
    map { $marked_items->{$_}->{sort} = sprintf('%s%s%s',
    $marked_items->{$_}->{'upddate'},
    $marked_items->{$_}->{'updtime'},
    $marked_items->{$_}->{itemid}) } keys %{$marked_items};
    my @marksorted = sort { $marked_items->{$b}->{sort} <=>
    $marked_items->{$a}->{sort} } keys %{$marked_items};
    I argue that this implementation is flawed, because the fields can run together, so for example, if you had the following data:

    Data object 1: upddate = 111, updtime = 1100, itemid = 200
    Data object 2: upddate = 1111, updtime = 100, itemid = 200

    So both strings would have a sort value of 111110200, but of course, data object 1 should be sorted before data object 2. Using delimiters in the sprintf statement will ensure that different fields are marked as different, but they will interfere with the sort order.

    Another problem is that if your sort string is too long, perl may convert it to a floating point number and thus lose the data from the later fields.

    The more correct way to do this sort is
    my @marksorted = sort {
    $marked_items->{$b}->{'upddate'} <=> $marked_items->{$a}->{'upddate'} ||
    $marked_items->{$b}->{'updtime'} <=> $marked_items->{$a}->{'updtime'} ||
    $marked_items->{$b}->{'itemid' } <=> $marked_items->{$a}->{'itemid'}
    } keys %{$marked_items};
    The added benefit of this method is that it definitely won't have overflow problems (which may be the case in the above examples, because "<=>" is the numeric compare operator. Had the author used "cmp", there would then be a quantity of numeric comparisons proportional to the length of the sort string.

    The other benefit of my sort is that it is more flexible. you can change the "<=>" operator to a "cmp" operator if one of your fields is string data.

    The sort that I propose (one I've been using) may or may not be faster than the "faster" sort proposed by the author, but then again, speed is nothing without correctness.
  5. Re:some comments by justanyone · · Score: 4, Insightful
    4. Tabs: Tab is a nasty character that is not visibly different from x number of spaces. Lots of people like tabs. That's fine. Lots of people don't. That's fine, too. But, when 2 people work on the same code, bad stuff happens. Spaces ALWAYS get mixed in. This is bad. The easiest method to elim this prob is No Tab Chars. This can get religious, but BADLY ALIGNED CODE LEADS TO CODING ERRORS! This is a frequent mistake and costs time (and therefore money and anger).

    6. The "bullshit... personal taste" aspect of brace alignment is both true and misleading. Really, it doesn't matter which way you do it, as long as you're consistent. But, with multiple people working on the same code, consistency is difficult. I've always done it with left brace on the left margin so I could easily see what lined up where. If your rule is opposite, fine, but USE ONLY ONE and code looks much nicer.

    13. UNLESS (pardon my french) = stronzino (a little piece of shit). It's in the language to assist removal of a single ! 'not'. This can really confuse people. I'm not the smartest guy, nor the dumbest, but sometimes I see it and just go, "huh?". I'm not used to it. Neither have been many other Perl coders I know when we've spoken about it.

    14. I take it by "Bah" you don't like scripts to log their actions. I've fought this recently with a 'know-it-all' type who wanted to build something fancy to do logging "when I get around to it". Yuck. Keep it simple, log what's going on so you can trace it later. Simple text files with "just did this, value=12" can help tremendously in debugging production problems. Users never know what they did; error messages never can contain enough info about what happened before.

    16. GOTOs are evil. I admit to some brainwashing by CS profs on this, but have dealt with enough spaghetti code to agree with it. Yes, there are times when it's good. But, in my last 100,000 lines of Perl, I haven't had to use it yet. So, it must not be vital. My goal is simplicity of code, not speed, since who cares about speed most of the time anyway, unless it's really bad, in which case there's probably somethign you're doing wrong otherwise.

    18. $_ is valuable only until you need to know what's in it. Then, you need a real variable name. You also may need that var to stick around past the next function call. I say, use 'my $request = $_; ' or something to grab $_ and make it obvious.

    21. Declaring vars near use is good ONLY in subs. If you have:
    exit(main());
    sub main
    {
    do_jack($GV_DEF_ONE);
    }
    my $GV_DEF_ONE = 12;
    sub do_jack
    {
    ....
    }
    you'll get an error during parsing due to GV_DEF_ONE not being declared yet.

    Regardless, Global vars are hard enough to spot and should be rare, declare them all at the top of the module to make it bloody obvious you're using one.

    22. I can sometimes agree to my ($a, $b) = split(',',@inlist); but not disparate vars all crammed together on one line, it's not readable, the vars are hidden, not aligned and initialized, etc.

    29. Lines of hashes visually indicate end of file. I can always tell I have the last page of a printout when all my files end with 5 or so rows of hashes. Just convention and a good idea, not a hard-fast rule.
  6. My favorite perl joke: by mshiltonj · · Score: 4, Funny

    DAY 1:
    Manager: How many lines of code did you write today?
    Developer: One.

    DAY 2:
    Manager: How many lines of code did you write today?
    Developer: One.

    Day 3:
    Manager: How many lines of code did you write today?
    Developer: One.
    Manager: Are you telling me that in three days you've only mangaged to write three lines of code?
    Developer: You don't understand -- I've been working on the same line of code all three days.
    Manager: (pauses) You're writing in perl again, aren't you?