Perl for Web Site Management
In his preface, Callender describes his own transition from a writer and editor to the kind of one-man-band that, back in the '90's, we called a "webmaster". He characterizes himself and others in the same boat as "accidental programmers", and justly praises Larry Wall for creating a programming language that enables such novice coders to do useful things right away. "Like natural languages, one of the ways in which Perl makes easy things easy is that it is designed to let you get by using only a small subset of the language. As Larry puts it, Perl lets you talk baby talk, and in Perl such baby talk is officially okay."
For non-programmers, this is a better Learning Perl than Learning Perl. The latter title, by Schwartz and Phoenix, is explicitly intended for established programmers seeking to add Perl to their existing tool belt of languages. Perl for Web Site Management is for the folks Apple used to call "the rest of us". Callender assumes no knowledge on the part of his reader beyond some familiarity with HTML and the web; this starting-from-zero approach makes the book maximally inclusive, while his ability to convey a lot in a small space brings the newbies a long way in the space of a couple chapters. He provides thorough redirection to the standard sources of Perl and Internet lore (the perl* man pages, the standard Perl programming texts, and others).
Virgin programmers, when they're through with Perl for Web Site Management, will find themselves able to make effective use of Perl programs to automate a plethora of tasks, including mass manipulation and modification of a site's files; server log analysis (using Perl's powerful regular expression facility); link checking (using the LWP module); and auto-generating an annotated site map from the <META> tags in the site's HTML files. The latter part of the book introduces server-side web application programming using CGI (examples include coding a site Guestbook and integrating with the SWISH-E site search facility), along with more advanced lore like the CPAN code archive, Perl's object-oriented features, storing user data in DBM databases, and publishing modules for reuse by others. Along the way, the book teaches a respectable amount about UNIX, as well; the main text, as well as the many informative sidebars, contain concise and clear explanations of necessities like stdin/stdout redirection; chmod and file permissions; shell filename globbing; tab completion in bash; network troubleshooting with traceroute; and much more.
Callender's writing style provides the right mix of hand-holding, humor, and clarity for the book's target audience. He simplifies without dumbing down, and he proves that he picked up a considerable amount of hacker culture on his own journey up the learning curve, which he shares with his pupils, citing sources from Neal Stephenson's In the Beginning Was the Command Line to Jon Udell's Practical Internet Groupware. He also does a good job of evangelizing the culture of sharing and open systems that created Perl, Apache, and the Internet as we know it, giving abundant proper credit to the authors and creators of all the tools and references to which he refers his readers. He concludes by listing, and providing jumping-off points for, the wide variety of logical "next steps" that go beyond the scope of the book: Python and other programming languages for the web, Apache configuration, mod_perl, system administration, and relational database integration.
As you may have guessed by now, I recommend this book highly, especially for anyone who finds him- or herself with responsibility for maintaining a web site but feeling a bit underequipped to do so. The book has a limitation (which is not the same as a shortcoming): it's a tutorial, not a reference work; though the index is quite serviceable, this isn't the book to turn to when you need to remember the order of the arguments to substr. This is a book to sit down and read through, once or multiple times, to help build a framework of knowledge and begin populating it with pearls of wisdom that can be put to immediate use.
Additional information about the book, including code for the examples given, is available on the web at the author's web site, O'Reilly's page for the book, and at the online bookseller site of your choice. Table of Contents:
Preface
1. Getting Your Tools in Order
Open Source Versus Proprietary Software
Evaluating a Hosting Provider
Web Hosting Alternatives
Getting Started with SSH/Telnet
Meet the Unix Shell
Network Troubleshooting
A Suitable Text Editor
2. Getting Started with Perl
Finding Perl on Your System
Creating the "Hello, world!" Script
The Dot Slash Thing
Unix File Permissions
Running (and Debugging) the Script
Perl Documentation
Perl Variables
A Bit More About Quoting
"Hello, world!" as a CGI Script
3. Running a Form-to-Email Gateway
Checking for CGI.pm
Creating the HTML Form
The <FORM> Tag's ACTION Attribute
The mail_form.cgi Script
Warnings via Perl's -w Switch
The Configuration Section
Invoking CGI.pm
foreach Loops
if Statements
Filehandles and Piped Output
die Statements
Outputting the Message
Testing the Script
4. Power Editing with Perl
Being Careful
Renaming Files
Modifying HREF Attributes
Writing the Modified Files Back to Disk
5. Parsing Text Files
The "Dirty Data" Problem
Required Features
Obtaining the Data
Parsing the Data
Outputting Sample Data
Making the Script Smarter
Parsing the Category File
Testing the Script Again
6. Generating HTML
The Modified make_exhibit.plx Script
Changes to &parse_exhibitor
Adding Categories to the Company Listings
Creating Directories
Generating the HTML Pages
Generating the Top-level Page
7. Regular Expressions Demystified
Delimiters
Trailing Modifiers
The Search Pattern
Taking It for a Spin
Thinking Like a Computer
8. Parsing Web Access Logs
Log File Structure
Converting IP Addresses
The Log-Analysis Script
Different Log File Formats
Storing the Data
The "Visit" Data Structure
9. Date Arithmetic
Date/Time Conversions
Using the Time::Local Module
Caching Date Conversions
Scoping via Anonymous Blocks
Using a BEGIN Block
10. Generating a Web Access Report
The &new_visit and &add_to_visit Subroutines
Generating the Report
Showing the Details of Each Visit
Reporting the Most Popular Pages
Fancier Sorting
Mailing the Report
Using cron
11. Link Checking
Maintaining Links
Finding Files with File::Find
Looking for Links
Extracting
Putting It All Together
Using CPAN
Checking Remote Links
A Proper Link Checker
12. Running a CGI Guestbook
The Guestbook Script
Taint Mode
Guestbook Preliminaries
Untainting with Backreferences
File Locking
Guestbook File Permissions
13. Running a CGI Search Tool
Downloading and Compiling SWISH-E
Indexing with SWISH-E
Running SWISH-E from the Command Line
Running SWISH-E via a CGI Script
14. Using HTML Templates
Using Templates
Reading Fillings Back In
Rewriting an Entire Site
15. Generating Links
The Docbase Concept
The CyberFair Site's Architecture
The Script's Data Structure
Using Data::Dumper
Creating Anonymous Hashes and Arrays
Automatically Generating Links
Inserting the Links
16. Writing Perl Modules
A Simple Module Template
Installing the Module
The Cyberfair::Page Module
17. Adding Pages via CGI Script
Why Add Pages with a CGI Script?
A Script for Creating HTML Documents
Controlling a Multistage CGI Script
Using Parameterized Links
Building a Form
Posting Pages from the CGI Script
Running External Commands with system and Backticks
Race Conditions
File Locking
Adding Link Checking
18. Monitoring Search Engine Positioning
Installing WWW::Search
A Single-Search Results Tool
A Multisearch Results Tool
The map Function
19. Keeping Track of Users
Stateless Transactions
Identifying Individual Users
Basic Authentication
Automating User Registration
Storing Data on the Server
The Register Script
The Verification Script
20. Storing Data in DBM Files
Data Storage Options
The tie Function
A DBM Example Script
Blocking Versus Nonblocking Behavior
Storing Multilevel Data in DBM Files
An MLDBM-Using Registration Script
An MLDBM-Using Verification Script
21. Where to Go Next
Unix System Administration
Programming
Apache Server Administration and mod_perl
Relational Databases
Advocacy
Index
You can purchase Perl for Web Site Management from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
While Perl is certainly a much more powerful and flexible language, most website management functions can be carried out much more simply and in less time in PHP since it was designed with website management and database connectivity in mind.
PHP is roughly Perl-based, and is probably a more appropriate language for beginning coders than Perl, IMHO. Having written website management tools in both languages, if I had to do it again, I'd do it in PHP.
The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!
Learning Perl and System Administration at the same time.
Nothing could possibly go wrong there...
The opposite of progress is congress
...who have never programmed before, but who now find themselves with the need to create their own site-management tools, automated web clients, and web-based applications.
.COM boom? I have full respect for a manager or web designer wishing to learn programming and web development. However, teaching them the tools first is not going to make them a good programmer. I'm afraid that books like this will lead towards more poorly designed and written programs. A web application is software and should be treated as such. Is it just me, or does anyone else share this feeling?
I hope I don't come off as an elitist, but don't we have enough "non-programmers" acting as programmers thanks to the
There is no longer anything that can be done with computers that is nontrivial and clearly legal. -- Paul Phillips
Have you ever been asked to clean-up really bad perl? Many times "clean-up" is a total re-write.
As a fulltime "baby talk" to "real code" translator, I have often found that "baby talk" doesn't usually have any documentation.
Perl is a kick-ass language but if written badly it wont scale properly and there are very few things in this world that suck more then sky-scraper built on top of a shanty.
huper
I sure hope this review is accurate. As a computer savvy non-programmer, I've been having trouble wrapping my head around Learning Perl. I think a lack of formal education is really hindering me at this point (go to college, kids!). Maybe this book will help out, I used to be pretty good at C64 BASIC :)
jred
I'm not a mechanic but I play one in my garage...
I hope I don't come off as an elitist, but don't we have enough "non-mechanics" acting as mechanics? I have full respect for a farmer or tradesman wishing to learn auto repair and design. However, teaching them the tools first is not going to make them a good mechanic. I'm afraid that books like this will lead towards more poorly designed and built cars. A car is mechanical and should be treated as such. Is it just me, or does anyone else share this feeling?
Infuriate left and right
Programming is more than just a job or a hobby; it's a lifestyle. The more people that can be drawn into the programming world, the better. People who may not have otherwise experimented with programming may do so in order to develop web based applications.
Furthermore, giving content developers a better sense of how Unix operates can only be a good thing. As a system administrator dealing with content developers, I know the woes of having to fix things when say, they can't get file permissions right from their ftp clients, and they don't know the first thing to do to fix it themselves.
Also, the more people understand Unix/Linux, the better chance it has of faring in the desktop.
Try this thread
UNIX/Linux Consulting
It just recently occurred to me; why are people always rolling their own, instead of using production quality stuff?
Sure, some people can't afford a $700 package like WebObjects, but then, if you're worth $20 an hour, that's only 35 hours worth of time... or one week.
If you can get up in one hour what takes you one week 'learning from scratch'... as well as not having to write *or* maintain tools... isn't that money well spent?
GPL Deconstructed
There is also the book "Elements of Programming with Perl" which has generally been well received. It is not a web-specific book, and assumes no previous programming knowledge.
"Linux is only free if you don't value your time."
The same is true with any "free" software. It's only a bargain some of the time. Too many people here don't realize that.
- A.P.
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
(* I'm the kind of person who prefers pragmatism over abstraction, .... *)
What would be an example of one not being the other (assuming "good" abstraction)? I don't see why they would be different.
"Abstraction" is sort of an abused word that is in the eye of the beholder. Sometimes it means "hiding the nitty gritty details", but isn't that a pragmatic goal? (If not carred away)
Table-ized A.I.
I bought this book a few months ago, and if I hadn't been down with pneumonia for a few weeks, I'd probably be done with it. But my experience so far has been very positive. I've gone through Learning Perl, and had a few classes in C and Fortran (Hooray for university programing requirements: Fortran. Sheesh.), and PfWSM is the best so far - it is teaching me useful bits while showing me the larger language.
Now if I had more content to put in the websites I'm managing, that would make things easier to automate. It's kinda hard to automate when your two test runs do all of the conversion you need. :)
(* Who cares about Perl? It's gonna dyi [perl.com] anyway soon along with Python. *)
They probably will fall out of style. Almost every language does when new techniques or fads come along. It goes with the territory.
There are a few exceptions, though:
1. C/C++ - These have become the "new assembly language(s)". They are not really known for their software engineering and RAD qualities, and that is probably what saves them: the don't *have to* compete there where things change all the time. But, they have turned out to be fill the niche of assembly languages WRT speed, plus add some portability.
2. LISP - Never popular but never dead. While other languages tend to exaggerate and get carried away with the fads of their day, LISP's "meta" abilities has allowed it to adapt to the day's fads and trends. It's strategy (or ability) is to bend well rather than fit well. It was born in the 1950's as a math-like experiment and was not originally meant for application building.
Table-ized A.I.
Yes, more than once. In each case, a rewrite into a more appropriate language was the answer. I'm a professional developer, and I can write decent Perl, but what I use it for are admin scripts and text processing (I first began using Perl as a more powerful replacement for Awk).
The Perl systems I've been asked to look at have been entire applications where the coder started small and grew without any architecture. Once you hit about 4,000 lines of code or so, the problems start to appear. By the time you reach about 10,000 lines of code, the seams are creaking. Systems that need much more than this should not be written in Perl.
Perl is a kick-ass scripting language with incredibly powerful features and an enormous set of really useful libraries, but in any application large enough to need an "architecture", it's not appropriate.
Yes, you can use discipline and write well-structured Perl, but that's like writing object-oriented code in C - the language isn't helping you. For serious software development, use a language that helps you write clean, modular, readable, self-documenting code.
Sure, off-the-shelf products may be great for basic cookie-cutter style websites, but learning how to program everything yourself is much more useful.
Different tools for different jobs.
Sticking feathers up your butt does not make you a chicken - Tyler Durden
What's wrong with man perl?
I take it you're unaware that Perl 5 allows for modular, object-oriented code?
{
package Foo;
sub new {
my $class = shift;
return bless {@_}, $class;
}
sub bar {
my $self = shift;
if (@_) {
$self->{bar} = shift;
}
return $self->{bar};
}
}
my $foo = new Foo(bar => "hello world!");
print $foo->bar;
# hello world!
See, that wasn't so hard.
And in some ways the underlying approach is little different that Python's(and PHP's... I think) much vaunted object model. OO code in C functions in much the same way as well. In all of them the object itself is the first argument passed to the function/subroutine, and syntactic sugar allows this to happen automagically.
Foo::bar($foo) == $foo->bar;
Parrot is not a language, but rather a virtual machine that will act as a runtime environment for Perl6, Python, Ruby, Tcl, Scheme, etc.
Although I suppose you could be talking about Parrot assembly language.
When you finish this one pick up:
MySQL and Perl for the Web
Paul DuBois
New Riders
It is an exellent book.
This
You can write bad code in any language, and no language particularly encourages good coding practices. The only exceptions I can think of off the top of my head are Eiffel(http://smalleiffel.loria.fr), and perhaps Ada.
It isn't the job of a language syntax to make someone a good programmer. That's the job of the person teaching them. If it's that big a problem, get into the newbie communities and instill good practices from the very beginning.
Chapter 22 Writing A Maintainable And Readable Perl Script Longer Than 8 Lines
If there was such a thing as a readable and maintainable perl script longer than 8 lines they'd have to write a couple of chapters on that subject alone. Perl projects turn into unreadable spagetti code faster than red meat rots in the summer sun.
Can you comment on such tools as WebObjects, and how application server + application/web-service development tools compare to something like roll your own via Perl?
There's obviously going to be learning-time-cost-flexibility-power tradeoffs when evaluating, but the question I'm wondering: when is it worth more to do it yourself, and when is it worth more to use someone else's tools?
At my stage in my life, DIY is cheaper and easier. If I were running a for profit website with customers, privacy and security concerns, reliability concerns... is DIY still feasible for one person? For 10 programmers?
Is deploying a commercial product something to consider? When? What circumstances? Or are they complementary?
GPL Deconstructed
Having more true designers (not frontpage html hacks) that can program would be a big benefit to software development. There are way too many sites and apps out there that function the way they do simply because it was the easiest or cleanest way to program. Sometimes this makes the app easy to use by other geeks, because they can guess how it works, but often it is absolutely horrible to use from the non-geek user's perspective. Even in cases where designers and info architects (anyone still use this job title? ha!) specify something that would be very easy to use often times the programmers will say it's just not possible: often it is possible but the programmers just don't want to do it. If the designers know more about programming, the programmers won't be able to bail out anymore. The end result will be better applications for non-geek users.
There needs to be a certain amount of cross-pollenation for the good of the gene pool: It's certainly got to be easier to turn designers into programmers than programmers into designers.
Unleashing Perl on the newbie? This may not be the best idea.
I honestly don't know if there is really "a better way" (though of course there's more than one way to do it), but I just spent two days in utter frustration with various Perl problems. One of which I never actually got to the bottom of, and ended up coding around. I am now officially thinking of giving up Perl for Python or Ruby or Java or LISP or something.
I am not a perl newbie. I wouldn't consider myself an expert, but I've been using perl and doing web development (C,C++,Java,Perl, PHP, and once, on a crazy night, even Prolog) since 1996, and developing software since 1994. And yet almost every time I use perl for anything more than 30 lines or so -- especially if it involves OO and packages -- I get caught by some gotcha or another.
Take this, for example. Did you know that LWP::UserAgent traps non-HTTP errors and then gives you a Code 500? So if, like me, you didn't have HTML::HeadParser installed on your machine, rather than simply telling you this is the problem, you might spend two hours trying to figure out why in the world you're getting a server 500 error when looking up http://www.google.com fails with LWP::UserAgent when the rest of your browsers can load it just fine.
That's the one I figured out. The one I couldn't figure out is why URI::URL worked just fine when I used it in the "main" namespace, but when I tried to use it inside a package I wrote myself (called from the "main" namespace, subclassing HTML::Parser), it kept telling me it couldn't find the ->host() method. After about 5 hours of scanning documentation, changing various @INC-related and other things, I decided it would be easier to just write my own URL package that did the subset of things I needed it to do (very glad Gisle Aas included that very nice regexp at the end of the perldoc documentation for URI that helped out, but still....). It was.
Too many hours spent trying to figure out why the straightforward behavior you expect isn't the behavior you get. That's the problem with Perl. The semantics of Perl are built to be easy in a few cases, and very expressive indeed if you know exactly what you're doing, but horribly full of traps for even the wary. Is there a language with the flexibility of Perl, and a greater clarity of expression? I don't know. But I'm looking. And in the meanwhile, I'm going to hold off suggesting perl to those I know who are looking to cross from content to code.
Libertarianism is rich wolves and poor sheep playing gambler's ruin for dinner.
if I hadn't been down with pneumonia for a few weeks
Listen to your god, dont do it!
In fact, Perl does reward quick hacks. Consider:
print "$_\n" foreach @foo;
This is very easy for Perl to figure out, and can be optimized. A general loop, however, is much much harder for the bytecode compiler to optimize, since it introduces far greater potential complexity. The more concisely you can say something, the fewer different ways it can be taken.
A lot of the things that look ambiguous in Perl are in fact quite the opposite to the compiler/runtime. Using such things when appropriate is a good idea even in large projects.
Perl 6 should help quite a bit, as well. Optional strong typing, and a proper OO facilities. I even made a comment on the mailing list that it might be worth investigating whether or not Perl 6 could transcend the simple public/private/protected scheme and do something along the lines of Eiffel. Say...
class Hello {
method world is public {
print "Hello world!\n";
}
method foo is public(Bar) {
print "Hello foo!\n";
}
}
class Foo {
method main {
Hello.new.foo;
# throws an exception since
# method foo of class Hello
# is private to class Foo
}
}
Well.... I started with HTML and Javascript, then moved onto some very basic Perl at the urging of a friend, and did some pretty complex(and really stupid!) things with flat-file databases.
:-)
At the urging of the same friend I learned Python, and grew to like it rather quickly as it helped me understand OO concepts.
I later moved onto Ruby, and for various reasons was alternating between the two languages heavily when I registered my username.
Since, in the excitement that Parrot has provided - it's the first time I haven't been "catching up" on something - I've returned to my Perl beginnings, doing it right this time around.
You'd be amazed the amount of programming theory you can soak up reading through the perl6 mailing lists.
When I help friends with picking up Perl, I generally start by emphasizing the separation of interface and implementation in a program, the go over basic I/O, then variables, then subroutines, then aggregate and nested data structures, then objects. All of the different control structures(conditionals, loops, etc.) are gradually covered in the context of building a trivial sample class. I save things like closures for a bit later, as I don't like to mix the "here's a data structure with access to some subroutines", and "here's a subroutine with access to some persistent data" messages.
I know what you mean, since I've picked up a lot of theory in similar ways over the years. Along those lines, Lambda the Ultimate is a good place to get pointers to a variety of current research. It's worth getting at least some of the theory closer to its source. Have you read SICP, for example? That was one of the books that got me back into theory after many years out of CS at university (the CS I did was pretty lame - mainly learnt Pascal, very little real CS theory).
If you're into that sort of thing, though, SICP is just a gateway drug. Lambda calculus, type inferencing, type theory in general, and much, much more follows, and pretty soon all the mainstream languages are looking pretty pale... It all does give some good criteria by which to compare languages, though, and helps avoid being limited in one's thinking by the language one happens to be using.
BTW, I agree about not teaching closures in Perl to newbies. Perl and Python both have enough hardcoded ways to do things that you don't need to rely on closures, except to be perverse. The more important concept for useful programming is higher-order functions, since they provide a capability that's directly useful in Perl (or any language), and closures can be introduced in that context.
You've probably come across this before, but here's a nice piece about ML's type system from a partly Perl perspective.
Actually, I'm just beginning to dip my toes into the Scheme/Haskell/*ML waters. It doesn't help that most of the compilers are a pain to install on Mac OS X.
I've found closures can be genuinely useful for complex looping behaviors that are used repeatedly throughout a program.