Slashdot Mirror


Perl for Web Site Management

PerlDiver writes: "Perl for Web Site Management by John Callender is for web professionals -- designers, editors, HTML jockeys -- who have never programmed before, but who now find themselves with the need to create their own site-management tools, automated web clients, and web-based applications. The title is an understatement; the book covers not just Perl programming but the bulk of what a novice needs to learn to function in a UNIX environment, from pwd and man to installing software packages from source tarballs. If you or anyone you know wants to cross the chasm from 'content' to 'code,' get this book." Read on for the rest of his review. Perl for Web Site Management author John Callender pages 528 publisher O'Reilly and Associates rating 8 reviewer perldiver ISBN 1565926471 summary Superb introduction to Perl for "accidental programmers"

In his preface, Callender describes his own transition from a writer and editor to the kind of one-man-band that, back in the '90's, we called a "webmaster". He characterizes himself and others in the same boat as "accidental programmers", and justly praises Larry Wall for creating a programming language that enables such novice coders to do useful things right away. "Like natural languages, one of the ways in which Perl makes easy things easy is that it is designed to let you get by using only a small subset of the language. As Larry puts it, Perl lets you talk baby talk, and in Perl such baby talk is officially okay."

For non-programmers, this is a better Learning Perl than Learning Perl. The latter title, by Schwartz and Phoenix, is explicitly intended for established programmers seeking to add Perl to their existing tool belt of languages. Perl for Web Site Management is for the folks Apple used to call "the rest of us". Callender assumes no knowledge on the part of his reader beyond some familiarity with HTML and the web; this starting-from-zero approach makes the book maximally inclusive, while his ability to convey a lot in a small space brings the newbies a long way in the space of a couple chapters. He provides thorough redirection to the standard sources of Perl and Internet lore (the perl* man pages, the standard Perl programming texts, and others).

Virgin programmers, when they're through with Perl for Web Site Management, will find themselves able to make effective use of Perl programs to automate a plethora of tasks, including mass manipulation and modification of a site's files; server log analysis (using Perl's powerful regular expression facility); link checking (using the LWP module); and auto-generating an annotated site map from the <META> tags in the site's HTML files. The latter part of the book introduces server-side web application programming using CGI (examples include coding a site Guestbook and integrating with the SWISH-E site search facility), along with more advanced lore like the CPAN code archive, Perl's object-oriented features, storing user data in DBM databases, and publishing modules for reuse by others. Along the way, the book teaches a respectable amount about UNIX, as well; the main text, as well as the many informative sidebars, contain concise and clear explanations of necessities like stdin/stdout redirection; chmod and file permissions; shell filename globbing; tab completion in bash; network troubleshooting with traceroute; and much more.

Callender's writing style provides the right mix of hand-holding, humor, and clarity for the book's target audience. He simplifies without dumbing down, and he proves that he picked up a considerable amount of hacker culture on his own journey up the learning curve, which he shares with his pupils, citing sources from Neal Stephenson's In the Beginning Was the Command Line to Jon Udell's Practical Internet Groupware. He also does a good job of evangelizing the culture of sharing and open systems that created Perl, Apache, and the Internet as we know it, giving abundant proper credit to the authors and creators of all the tools and references to which he refers his readers. He concludes by listing, and providing jumping-off points for, the wide variety of logical "next steps" that go beyond the scope of the book: Python and other programming languages for the web, Apache configuration, mod_perl, system administration, and relational database integration.

As you may have guessed by now, I recommend this book highly, especially for anyone who finds him- or herself with responsibility for maintaining a web site but feeling a bit underequipped to do so. The book has a limitation (which is not the same as a shortcoming): it's a tutorial, not a reference work; though the index is quite serviceable, this isn't the book to turn to when you need to remember the order of the arguments to substr. This is a book to sit down and read through, once or multiple times, to help build a framework of knowledge and begin populating it with pearls of wisdom that can be put to immediate use.

Additional information about the book, including code for the examples given, is available on the web at the author's web site, O'Reilly's page for the book, and at the online bookseller site of your choice. Table of Contents:

Preface

1. Getting Your Tools in Order
Open Source Versus Proprietary Software
Evaluating a Hosting Provider
Web Hosting Alternatives
Getting Started with SSH/Telnet
Meet the Unix Shell
Network Troubleshooting
A Suitable Text Editor

2. Getting Started with Perl
Finding Perl on Your System
Creating the "Hello, world!" Script
The Dot Slash Thing
Unix File Permissions
Running (and Debugging) the Script
Perl Documentation
Perl Variables
A Bit More About Quoting
"Hello, world!" as a CGI Script

3. Running a Form-to-Email Gateway
Checking for CGI.pm
Creating the HTML Form
The <FORM> Tag's ACTION Attribute
The mail_form.cgi Script
Warnings via Perl's -w Switch
The Configuration Section
Invoking CGI.pm
foreach Loops
if Statements
Filehandles and Piped Output
die Statements
Outputting the Message
Testing the Script

4. Power Editing with Perl
Being Careful
Renaming Files
Modifying HREF Attributes
Writing the Modified Files Back to Disk

5. Parsing Text Files
The "Dirty Data" Problem
Required Features
Obtaining the Data
Parsing the Data
Outputting Sample Data
Making the Script Smarter
Parsing the Category File
Testing the Script Again

6. Generating HTML
The Modified make_exhibit.plx Script
Changes to &parse_exhibitor
Adding Categories to the Company Listings
Creating Directories
Generating the HTML Pages
Generating the Top-level Page

7. Regular Expressions Demystified
Delimiters
Trailing Modifiers
The Search Pattern
Taking It for a Spin
Thinking Like a Computer

8. Parsing Web Access Logs
Log File Structure
Converting IP Addresses
The Log-Analysis Script
Different Log File Formats
Storing the Data
The "Visit" Data Structure

9. Date Arithmetic
Date/Time Conversions
Using the Time::Local Module
Caching Date Conversions
Scoping via Anonymous Blocks
Using a BEGIN Block

10. Generating a Web Access Report
The &new_visit and &add_to_visit Subroutines
Generating the Report
Showing the Details of Each Visit
Reporting the Most Popular Pages
Fancier Sorting
Mailing the Report
Using cron

11. Link Checking
Maintaining Links
Finding Files with File::Find
Looking for Links
Extracting
Putting It All Together
Using CPAN
Checking Remote Links
A Proper Link Checker

12. Running a CGI Guestbook
The Guestbook Script
Taint Mode
Guestbook Preliminaries
Untainting with Backreferences
File Locking
Guestbook File Permissions

13. Running a CGI Search Tool
Downloading and Compiling SWISH-E
Indexing with SWISH-E
Running SWISH-E from the Command Line
Running SWISH-E via a CGI Script

14. Using HTML Templates
Using Templates
Reading Fillings Back In
Rewriting an Entire Site

15. Generating Links
The Docbase Concept
The CyberFair Site's Architecture
The Script's Data Structure
Using Data::Dumper
Creating Anonymous Hashes and Arrays
Automatically Generating Links
Inserting the Links

16. Writing Perl Modules
A Simple Module Template
Installing the Module
The Cyberfair::Page Module

17. Adding Pages via CGI Script
Why Add Pages with a CGI Script?
A Script for Creating HTML Documents
Controlling a Multistage CGI Script
Using Parameterized Links
Building a Form
Posting Pages from the CGI Script
Running External Commands with system and Backticks
Race Conditions
File Locking
Adding Link Checking

18. Monitoring Search Engine Positioning
Installing WWW::Search
A Single-Search Results Tool
A Multisearch Results Tool
The map Function

19. Keeping Track of Users
Stateless Transactions
Identifying Individual Users
Basic Authentication
Automating User Registration
Storing Data on the Server
The Register Script
The Verification Script

20. Storing Data in DBM Files
Data Storage Options
The tie Function
A DBM Example Script
Blocking Versus Nonblocking Behavior
Storing Multilevel Data in DBM Files
An MLDBM-Using Registration Script
An MLDBM-Using Verification Script

21. Where to Go Next
Unix System Administration
Programming
Apache Server Administration and mod_perl
Relational Databases
Advocacy

Index

You can purchase Perl for Web Site Management from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

8 of 148 comments (clear)

  1. Cool by baldass_newbie · · Score: 5, Funny

    Learning Perl and System Administration at the same time.
    Nothing could possibly go wrong there...

    --
    The opposite of progress is congress
    1. Re:Cool by jbc · · Score: 5, Interesting
      When I started writing the book, I believed (naively) that I would indeed be able to cover some of those things. I at least thought I'd be able to get far enough to talk about some database administration and SQL.

      Didn't happen. Once I got into the specifics of everything I needed to explain to get my target reader up to speed, I realized that there was no way to get there while being true to the needs of my intended audience. So I didn't try. Things like Apache installation, DNS, HTML, and graphic design I assumed that the reader either had somebody else to take care of or knew enough about for their current purposes already.

      The TOC in Dave's review gives a pretty clear picture of what the book does and doesn't manage to cover. In the end, it's very much a beginner's book. It's a "See Dick code. Code, Dick, code!" type of book. It's about helping the reader make a good start, and doesn't pretend to take them all the way to the end of the journey. I like to think that that makes the book more honest, and more useful, at least for its intended audience, than all those brightly-colored ones crowding it off the bookstore shelves that do promise to teach all those things, but then fail to deliver.

      Of course, I may be biased. :-)

      John

  2. Is this a good thing? by tshak · · Score: 4, Insightful

    ...who have never programmed before, but who now find themselves with the need to create their own site-management tools, automated web clients, and web-based applications.

    I hope I don't come off as an elitist, but don't we have enough "non-programmers" acting as programmers thanks to the .COM boom? I have full respect for a manager or web designer wishing to learn programming and web development. However, teaching them the tools first is not going to make them a good programmer. I'm afraid that books like this will lead towards more poorly designed and written programs. A web application is software and should be treated as such. Is it just me, or does anyone else share this feeling?

    --

    There is no longer anything that can be done with computers that is nontrivial and clearly legal. -- Paul Phillips
  3. Compare to Hayes auto repair books by A+nonymous+Coward · · Score: 5, Insightful

    I hope I don't come off as an elitist, but don't we have enough "non-mechanics" acting as mechanics? I have full respect for a farmer or tradesman wishing to learn auto repair and design. However, teaching them the tools first is not going to make them a good mechanic. I'm afraid that books like this will lead towards more poorly designed and built cars. A car is mechanical and should be treated as such. Is it just me, or does anyone else share this feeling?

  4. Re:Perl vs. PHP by jbc · · Score: 5, Interesting
    I talk about PHP (albeit very briefly) toward the end of the book, in the "Where to Go Next" chapter. While PHP's relative simplicity obviously makes it a great choice for a non-programmer needing to automate web stuff, I was driven to make Perl the focus of the book by several factors:
    • It's more flexible for the random data munging that makes up a big part of the book (things like mass-editing a collection of documents, generating reports from the server's access logs, and so on).
    • In the happy event that the non-programmers who are the book's target audience find themselves wanting to go beyond web-specific programming tasks, Perl will provide them a better platform for doing that.
    • CPAN.
    • I didn't know much (well, any) PHP when I started writing the book. When I started bugging my knowledgeable friends to tell me how to do web things more efficiently, PHP didn't exist yet. If it had, they might well have steered me toward it. As it was, they steered me towards Perl - and overall, I'm really happy that they did.
    A book like mine that focused on PHP rather than Perl could be really useful for non-programmers looking to automate their web development. Unfortunately, someone else would have to write it. In the meantime, for someone willing to take on the challenge of learning a more powerful language like Perl, the potential rewards make it, in my view at least, a viable alternative.

    John

  5. Website tools... by 2nd+Post! · · Score: 4, Insightful

    It just recently occurred to me; why are people always rolling their own, instead of using production quality stuff?

    Sure, some people can't afford a $700 package like WebObjects, but then, if you're worth $20 an hour, that's only 35 hours worth of time... or one week.

    If you can get up in one hour what takes you one week 'learning from scratch'... as well as not having to write *or* maintain tools... isn't that money well spent?

    1. Re:Website tools... by denshi · · Score: 5, Insightful
      Ummm.. because WebObjects sucks?

      Ha, ha, only serious. No, really, work with me here.

      So you've got a collection of programmers. You spend $700 per progger on WebObjects to get 'production quality'. And you expect the site to be up, when, exactly?

      Let's see what now has to happen:

      1. The programmers have to learn or brush up on J2EE & Java.
      2. They need to learn how to interact with WebObjects, the APIs, the quirks, the bugs, the misfeatures.
      3. They need to learn to hack around WebObjects when it gets in the way of something they want to do, because they don't have the source code and even if they did they don't know it at all.
      4. The administrators need to learn how to keep it up and running, how to tune it, how to get specific logging, etc.
      5. Everyone has to wade through an enormous volume of typically low-quality documentation to filter out the 95% of the framework that they don't need.
      6. The admins need to figure out why WebObjects' "patented object-relational mapping" is absolutely destroying performance on the DB server and how to get around it.
      None of these things are free; you will invest substantial time in learning an app if you go that route.

      Now let's say you have a collection of programmers who 1) prefer things other than Java, 2) aren't afraid of SQL, 3) know the web, and 4) like to intimate with their code? (Improbable, I know; there are only a few tens of thousands of us out there.) This team can roll from scratch their own system to do the smallest set of functionality they need, and work up from there. They can admin it in confidence because they wrote it and know how it operates. They can reliably say where performance hits are, and they can do it all in their language of choice.

      The question is really: spend time trying to understand someone else's product, or spend time writing a product you understand implictly? For some (most) projects, the former is desirable, for some, the latter is a better choice. For writing an application in a rapidly evolving field, impenetrable closed-sourced middleware is frequently a loss in terms of time usage.

  6. Re:As Jamie Zawinski said: by warpSpeed · · Score: 5, Insightful
    The same is true with any "free" software. It's only a bargain some of the time. Too many people here don't realize that.

    I value my time a lot, thank you very much, and I use Linux almost exclusivly.

    How long will it take you to learn WebObjects and be able to realize the $700 software investment in saved time. What about the upgrade treadmill? Why teach yourself a close source tool set when you can just as easly use an open source tool set? You would arguably recoup the cost in time, and therfore money, much sooner.

    Also if you choose to invest your time in a propritary system, you are still bound to the whims of the company that developed it. Look at the people that invested time in Drumbeat 2000. Macromedia bought it, and promptly crushed it. You learned Drumbeat, you were screwed. I don't know about WebObjects, but can you garuntee that they will not have a simmilar fate?

    You can choose to learn something that gives you total control over the tool, and you are much less likely to be screwed with your time investment.