Mastering Algorithms with Perl

← Back to Stories (view on slashdot.org)

Mastering Algorithms with Perl

Posted by Hemos on Wednesday December 8, 1999 @03:26AM from the getting-to-know-your-perl dept.

John Regehr sent us an excellent review of Mastering Algorithims with Perl, another O'Reilly & Associates effort. Written by Jon Orwant, Jarkko Hietaniemi, and John Macdonald, this is a book designed to take your Perl to a new level of wizardery. Mastering Algo author Jon Orwant, Jarkko Hietaniemi, and John Macdonald pages 704 publisher O'Reilly, 08/1999 rating 8/10 reviewer John Regehr ISBN 1-56592-398-7 summary The intended audience is programmers who don't have a background incomputer science, who know at least some Perl. However, experiencedprogrammers who don't know Perl should have no trouble picking up thebasics of the language with this book and a copy of ProgrammingPerl. In The New Hacker's Dictionary under "superprogrammer," we read that "productivity can vary from one programmer to another by three orders of magnitude." I would argue that at least one of these factors of ten comes from the ability to quickly recognize what algorithms should be used to solve different parts of a problem and to find or write implementations of those algorithms that will result in an efficient program, given the available time and the characteristics of the problem. This ability is developed through experience and by understanding the highlights of the large body of algorithms and analysis of algorithms that has been developed to solve problems that occur over and over again in computer programs.

Mastering Algorithms with Perl is designed to provide the necessary background. It's structured like a traditional algorithms textbook: after describing some basic and advanced data structures (linked lists, trees, heaps, etc.), it has chapters about searching, sorting, sets, matrices, graphs, strings, and some related topics. After the introduction and discussion of data structures, the chapters are relatively independent and could be read in any order. The authors provide plenty of cross-references as well as pointers to books that describe individual subjects in more detail.

The intended audience is programmers who don't have a background in computer science, who know at least some Perl. However, experienced programmers who don't know Perl should have no trouble picking up the basics of the language with this book and a copy of Programming Perl. Also, computer scientists can often use a review of algorithms, and the CPAN pointers are very useful. So, I would go so far as to say that this book would enrich any programmer's bookshelf. A stringent test of the merit of a new technical book is to ask if it adds some value, given the best existing books in its area? I think that Mastering Algorithms with Perl definitely does. It is a well-written introduction to algorithms that is more accessible, practical, and entertaining than standard algorithm books. It leverages off of the strengths of a powerful language and a large base of reusable code.

The rest of this review will evaluate the strengths and weaknesses of Mastering Algorithms with Perl in more depth. The central issue that I will consider is why the reader might or might not prefer an algorithms book that concentrates on a single language, as opposed to a general algorithms book. I will try to be up-front about my biases: as a computer scientist, I consider this book to be a compromise between an algorithms book and a how-to manual. This compromise makes it much more useful to Perl programmers, but it sometimes causes the algorithms content to be too watered down.

It is traditional in algorithms books to describe algorithms in pseudocode, which often superficially resembles Pascal. The difference between pseudocode and real code is that pseudocode is not compilable - it ignores implementation details that are not helpful to understanding a particular example. This is considered to be an advantage: without the clutter, the core of the algorithm is easier to see and understand. At the beginning of the book the authors make the point that the Perl code for a binary search is actually shorter than the corresponding pseudocode. And it's true! The advantage of the Perl program is that we have a readable description of the algorithm, and it's executable too. (Unfortunately, it's often nontrivial to convert pseudocode into real source code - the devil is in the details.) The binary search example is slightly misleading, however, because in this case a native Perl data structure (the array) matches the semantics of the problem extremely well, leading to a clear and concise implementation. Later in the book, particularly in the chapter on graphs, we see examples where Perl's built-in data structures are less well suited to the problems. The executable Perl code for graph operations are much longer than the corresponding pseudocode, and are often so syntactically cluttered that they are difficult to read. Is this a flaw in the book or in Perl? No - it's a consequence of giving examples in runnable code instead of pseudocode. Is the tradeoff worth it? Probably, but it depends on what you're trying to get out of the book.

Another consequence of basing an algorithms book on a real language is that the authors can point readers to existing implementations of the algorithms, in CPAN. It's hard to overstate how big of a win this is. Perl is a powerful language to begin with, but it becomes far more powerful when programmers are able to take advantage of the large body of existing code modules. An unfortunate side effect of the fact that the book talks about specific versions of Perl and about specific CPAN packages is that this information will become outdated much more quickly than the algorithms will. Unless the Perl language and CPAN are exceptionally stable in the future, I would not expect most of this information to be valid for more than a few years - hopefully a new version of the book will be available before this one becomes too out of date.

Because the book provides executable code for the algorithms, it's possible to evaluate the performance of the example code (which is available at the O'Reilly site). The authors benchmark a number of the algorithms that they present, and compare the results. This is a nice change from the discussion of asymptotic running times found in traditional algorithm books, which generally ignore the constant factors that often make the difference between an algorithm being useful in practice or not.

The design and analysis of algorithms is a highly mathematical discipline. A sophisticated set of tools has been developed to evaluate the tradeoffs between various algorithms: How efficiently do they use memory and processor cycles? What is the best, average, and worst case running time of various operations? How does the algorithm scale as the size of the input grows? As it turns out, programmers need to understand a few of these formalisms, particularly the "big O" notation for describing asymptotic running time. I think that Mastering Algorithms with Perl uses theory in just the right way: as an aid to programmers' intuition about algorithms, rather than beating us over the head with formulae and proofs. That said, I think there is one area of theory that this book should have spent more time on: NP completeness. NP-complete problems are solvable, but are believed to be inherently hard: no efficient algorithm has been discovered to solve them. There are a wide variety of NP-complete problems, and they do come up in practice. For programmers, the important thing is first to recognize that an NP-complete problem has been encountered, and that it cannot be solved exactly except in small instances. Then, a heuristic that comes up with a good enough approximation of the solution needs to be found and implemented. This is a practical and well-studied part of algorithm design, and in a 650-page book I would expect more than a page or two to be devoted to it.

Several chapters of Mastering Algorithms with Perl are too shallow to be considered good introductions to the associated areas of algorithms. For example, the chapter on matrices only shows code for some of the more trivial matrix operations; for complex tasks, it tells the reader how to use PDL - the Perl Data Language. Although PDL looks like a useful and powerful package, readers should not confuse knowing how to use it with understanding matrix algorithms. In other words, the matrix chapter is too much of a how-to manual. Other chapters such as the ones on searching and sorting are excellent and avoid falling into this trap. Algorithms is a huge area, and it can't all be covered well in 650 pages. The later chapters are a lot of fun to read, but some of them should probably have been scrapped in favor of more depth in core areas.

In conclusion, this is a well-written, useful book. Viewed as a Perl book it's superb; it complements the strengths of Programming Perl and The Perl Cookbook, and I think most or all Perl programmers would benefit from having a copy. Viewed as a computer science book, it has made a number of compromises in order to focus on a specific language; this is not necessarily a problem but it is something that readers should be aware of.

Acknowledgments: Thanks to Tom Christiansen, Dave Coppit, Bill Pearson, and Jamie Raymond for helpful comments on previous drafts of this review.

Purchase this book at fatbrain.

8 of 225 comments (clear)

Min score:

Reason:

Sort:

Re:Algothingies (having just forgotten how to spel by ZarKov · 1999-12-07 23:05 · Score: 3

If it's Perl you're looking to get into, the Great O'Reilly offers up a number of books, including Learning Perl, Programming Perl, Advanced Perl Programming, the Perl Cookbook, etc. Start out with Learning Perl. Some other posts mention Python, which is also good for CGI, and you can pick up O'Reilly's Learning Python and Programming Python. Be forewarned, though. I've used both for CGI programming. And when I'm using Python (powerful though it is), I find myself longing for the regexps of Perl.

If you'd like an online tutorial, you might want to check out The CGI Resource Index, which is made by the same guy as Matt's Script Archive. Between the tutorials on the Resource Index, looking at the source of Matt's script, and reading the O'Reilly books, you can learn just about anything you want to know about Perl.

Of course, if you get stuck, you can always go to ng's, irc, or your local Perl nut.
Excerpt available on line by Pope+Slackman · 1999-12-07 23:09 · Score: 4

Chapter 10, 'Geometric Algorithms' is available in PDF format, here.

--Kevin

=-=-=-=-=-=
"I think the P-Funk Mothership just landed in my back yard!"
No, YOU are wrong by FascDot+Killed+My+Pr · 1999-12-07 23:32 · Score: 4

Problems are NP-complete if it is proven that there is no algorithm which solves it in polynomial time.

Wrong. Problem A is NP-complete if there are no problems in the set NP that are harder than A.

Furthermore, no one has yet proved that NP problems cannot be solved in polynomial time, although it is widely suspected this is true.

What he describes is NP-hard.

The classes *I* took used "NP-complete", "NP-hard" and "hard for NP" synonymously.

...factorization being NP which has not been proven

Again, just plain wrong. Prime Factorization is known to be NP (NP-complete, in fact). What is NOT known is whether PF can be solved in polynomial time.

It sounds very much like you picked up your Computation Theory knowledge by reading posts on Slashdot. I don't recommend that for people who enjoy flaming.
---

--
Linux MAPI Server!
http://www.openone.com/software/MailOne/
(Exchange Migration HOWTO coming soon)
Try reading "Object Oriented Perl" by Matts · 1999-12-07 23:53 · Score: 3

Seriously - read the subject. OOPerl is an amazing book - perhaps the only perl book you'll need. It's very concise, very clear, and the code you end up producing is clean code, not unlike Java or Python code.

Really though you're trolling. Sure there are things wrong with perl (like the fact that the second argument to bless isn't mandatory), but you can create crap code in any language. If you have experience of building a large app in perl and it all went horribly wrong because it got too large - you only have yourself to blame.

--

Matt. Want XML + Apache + Stylesheets? Get AxKit.
Another algorithms text recommendation by DiningPhilosopher · 1999-12-08 00:18 · Score: 3

Introduction to Algorithms, Cormen, Leiserson and Rivest (yes, Ron Rivest, the R in RSA).

Far more mathematically rigorous than the O'Reilly book (from what I read of the O'Reilly book in the bookstore - I didn't buy it because it didn't look like much I didn't already have). No actual code, just pseudocode. I think this is the book you want if you really want to learn about algorithms (but not if you just want to get stuff done in Perl).

It's expensive, but it's a tome (>1000 pages). It was a good class textbook and still makes a very good reference. Check out the Table of Contents.

--
/* The beatings will continue until morale improves. */
NP-completeness explained by raph · 1999-12-08 01:52 · Score: 5

Ok, this comment has a lot of inaccuracies. Let me try to clarify.

NP stands for "nondeterministic polynomial", and is probably most easily understood as the class of problems for which the solution can be verified in polynomial time. It includes P, the class of problems that can be solved in polynomial time.

A nice example of a problem that is in NP but not known to be in P is satisfiability. This problem is given as a list of predicates of the form (x1 or x2 or not x3). The problem is finding a set of xi such that all of the predicates are satisfied.

So, it should be obvious that you can verify a solution in polynomial time - just start with the values of xi and check that all the predicates turn out true. However, there is no known general technique for solving this problem than enumerating all the possibilities, which takes exponential time.

NP-completeness takes this idea one step further. It is a large an interesting class of problems that are basically equivalent in difficulty. If you solve one, you've solved them all. Thus, if a problem is NP-complete, there's no known efficient algorithm.

The way people analyze NP completeness is do define reductions, ie show how instances of one problem can be reduced into instances of another NP-complete problem, and vice versa. Maybe this takes "highly advanced skills," but it's actually fairly routine for algorithmicists.

The class of NP-complete problems includes the travelling salesman problem, the Hamiltonian path problem, the knapsack problem, determining collisions of 3D objects, and many others.

NP-hardness is when you can reduce an NP-complete problem into the NP-hard problem, but not necessarily vice versa. Many integer optimization problems are NP-hard.

Factoring is clearly NP, but is not known to be NP-hard. It's entirely plausible that someone (Arjen Lenstra, for example) will come up with a polynomial factoring algorithm, but leave the rest
of the NP pantheon untouched.

There are some crypto algorithms that are based on NP-completeness, but NP-completeness is not in and of itself enough for strong crypto. Even if the problem is hard in general, a specific instance may be easy to solve. Unless you can prove that this never happens, you're hosed. IBM has done some excellent work in this direction with randomized self-reductions for their lattice-based crypto algorithm.

Complexity theory is one of the most beautiful areas in computer science, and NP-completeness is one of the most striking results, as it illuminates a fundamental unity across many seemingly disparate subfields of computer science. It is indeed a shame that this book skimps on its coverage of NP-completeness.

--
LILO boot: linux init=/usr/bin/emacs
Algorithm Book by banfield · 1999-12-08 10:25 · Score: 3
If it is an algorithm book you are looking for, I highly recommend
- Computer Algorithms: Introduction to Design and Analysis 3rd Ed
by Baase and Gelder. I just took my intro to algo course in which this was used as a text; it is very much readable and the examples really do illustrate the principles they claim to. It spans every thing from a base level introduction to NP-complete.
--

Banfield
Cybernetic Epidemiological Report: BAD CODE SUCKS by Tom+Christiansen · 1999-12-10 01:07 · Score: 3
In http://slashdot.org/comments.pl?sid=99/12/08/10412 49&cid=224, zarkov@netnitco.net wrote:
read(STDIN, $bfr, $ENV{'CONTENT_LENGTH'}); foreach (split(/&/,$bfr)) { $kv = [split(/=/,$_)]; foreach (0...1) { $kv->[$_] =~ tr/+/ /; $kv->[$_] =~ s/%([0-9a-fA-F][0-9a-fA-F]) /pack("C", hex($1))/eg; } $form->[0]{$kv->[0]} = $kv->[1]; } foreach (split(/&/,$ENV{'QUERY_STRING'}) { $kv = [split(/=/,$_)]; foreach (0...1) { $kv->[$_] =~ tr/+/ /; $kv->[$_] =~ s/%([0-9a-fA-F][0-9a-fA-F]) /pack("C", hex($1))/eg; } $form->[1]{$kv->[0]} = $kv->[1]; }
Here's what you did that was dubious (nits) or wrong (bugs):
- You have a bug in your read: you failed to check the return value of your system call. That's a supermajor bug, an automatic disqualification.
- You have a bug in your split. You need to supply a third argument of 2, or else you fail on URLs such as http://somewhere/cgi-bin/dumpreq?this=good=stuff&t hat=bad=stuff
- Are you aware that the new CGI spec from W3C deprecates the use of & and insists on semicolon? In fact, the W3C validator now insists on semicolons. Your split doesn't know better. This could easily be a bug soon enough.
- You didn't test for whether you had a HEAD request and react accordingly. That means spiders will trigger your program's full effects. That's a bug.
- Your code can only handle trivial forms. It not only screws up on file uploads, it has no contingency for a name that occurs more than once, as occurs in related groups of related widgets--thing likes checkbox groups, select widgets, or multipart hidden data. This comes up all time times, as in http://somewhere/cgi-bin/pickit?cheese=swiss&chees e=cheddar&bread=rye.
  Without seeing the code for those important parts, I can't say for sure, but given the rest of the non-industrial strength code, one can easily imagine the worst.
- You didn't test for whether you had a GET or a POST request. You just forge ahead.
- You have duplicate code. That's a very bad. It means you might get an update problem.
- You didn't guard against a denial-of-service attack through too much data for your memory to hold coming from a huge POST.
- You never declared any of your variables. Is this code use strict and use warnings clean?
- Your use a magic numbers, 0 and 1, is confusing. Sometimes you use them for a key versus a value; other times you use them for the form data from STDIN versus the form data from the environment.
- You have duplicate code. That's a very bad. It means you might get an update problem. (Why yes, Virginia, this is a repeat. So is yours. See the problem? :-)
- This code:
  $kv = [split(/=/,$_)]; foreach (0...1) { $kv->[$_] =~ tr/+/ /; $kv->[$_] =~ s/%([0-9a-fA-F][0-9a-fA-F])/pack("C", hex($1) )/eg; }
  This very nonperlishly awkward. It doesn't have to be this bad. Why aren't you using the foreach better?
  That would read better like this:
  foreach (@$kv) { tr/+/ /; s/%([0-9a-fA-F][0-9a-fA-F])/pack("C", hex($1))/eg; }
  Because otherwise you have too much needlessly duplicated information.
  Better yet, you should just split into my ($key, $value) and loop across those in the same way. The anonymous array just seems to hurt legibility.
That should be far more than enough nits and gnats to keep you thinking for a while.
As I said, I write CGI programs all day long. That's how I make a living. And I've never had a problem with this.
Absence of evidence does not imply evidence of absence.
Here's my suggestion: read the CGI.pm source very, very carefully. There's a lot to learn. Good luck. Hopefully, you'll repent of your hackish ways that help give Perl a bad name by spreading bad CGI code around the world.
And since you're advocating we not show people how to do something because it might be "complicated", do you suppose we ought just to close the source of Linux?
I shan't be tilting at any straw windmills.
I shall, however be patiently awaiting your public apology and contrition.