Slashdot Mirror


Writing Apache Modules with Perl and C

Thanks to darrn chamberlain for an excellent review of the Lincoln Stein and Doug MacEachern book Writing Apache Modules with Perl and C. This is an excellent book for those considering working with Apache and mod_perl, and helpful for C programmers. Click below for more details. Writing Apache Modules with Perl and C author Li pages 724 publisher O'Reilly, ISBN: 156592567X rating 9.5/10 reviewer darren chamberlain ISBN summary Absolutely essential for anyone who is considering using Apache and mod_perl. C programmers may need more. The Scenario

If you're like me, your first introduction to Perl [?] was in the form of CGI [?] scripts. A few years ago, I inherited a few dozen ancient CGI scripts (Perl and otherwise) that required Immediate Attention. CGI led to Perl, and to Apache [?] ; Perl and Apache led, naturally enough, to mod_perl [?] , once I started hitting the performance bottlenecks inherenent in CGI programming. After researching mod_perl, building a mod_perl-enalbed Apache, and reading all the available online documentation, I got it up and running--and I was suitably impressed.

So, when O'Reilly [?] announced a book devoted to programming Apache with Perl, I was extremely excited. The book starts with an introduction and history of web programming, introduces CGI and other types of web programming (server API [?] 's, such as ISAPI and NSAPI; embedded processors, such as mod_perl, mod_dtcl, and mod_pyapache; FastCGI; Java [?] servlets [?] ; ActiveX [?] ; and client-side scripting languages, such as VBScript [?] and JavaScript [?] ), and then describes the Apache module architecture, using some simple examples ("Hello, World" in Perl and in C). Then it gets good, covering dynamically generated content; the hobgoblin of HTTP, state; and all the other stuff that gives CGI programmer nightmares (like authentication and authorization).

What's Bad?

Although the title reads '... with Perl and C', the emphasis is very obviously on Perl. The C API reference chapters (chapters 10 and 11, pages 505 through 631) are very thorough, but almost all the examples are in Perl only. In fact, the authors go so far as to recommend that almost all Apache modules be written in Perl, and not C, except for very small modules or modules that need that extra speed boost or small memory footprint of being compiled into the server (page 13: "Anything you can do with the C API you can do with mod_perl with less fuss and bother."). Their reasoning is sound: mod_perl modules and scripts require a server restart at most, and often not even that, while for C modules, Apache itself must be recompiled; but I was expecting more in this area, perhaps a larger section on using DSO. After the book was published, however, several of the Perl-only examples were ported to the C API, and are available for download.

A few of these examples have already been published, and in these cases the book is mostly redundant. Notably, the Apache::NavBar module (which Lincoln uses on the server in his lab) and the Apache::AdBlocker module (chapters 4 and 7), appeared in The Perl Journal last year (issues 12 and 11). This is not that big a deal, since both of these modules are incredibly useful and probably deserve to be published in a few more places, but two brand new modules would have been most welcome, especially since the book's target audience probably also reads The Perl Journal.

What's Good?

There's a lot to like here. Since I'm a Perl programmer by trade and disposition, I personally liked the fact that 99.9% of the examples were written in Perl. With only a few exceptions, the modules could be copied into the right locations and run immediately; the exceptions were the modules that made use of either other programs (Chapter 5's Hangman program which uses a relational database to store state information) or specialized Apache features (Chapter 7's Apache::AdBlocker module, which requires proxy functionality).

Much of the text and all of the source code is available on the web at www.modperl.com. Chapters 6, 7, 8, and 9 can be found on the web site for the book, as can all the Perl modules and some of the examples in functional form (Apache::Magic and hangman).

Chapter 9 is the key chapter, and the heart of the book. It describes in great detail all the Apache:: modules. If you use mod_perl at all, download and print this chapter. Memorize it. Use your favorite indexing script to make it searchable. Everything you need to know about mod_perl is here in this chapter.

The appendices are also excellent, although, because it is an Apache book, I would have figured that several of the sections would be regular chapters, and not relegated to the end. The appendices are divided pretty evenly between concentrating on Perl and on C, unlike most of the rest of the book.

So What's In It For Me?

Fortunately for people like me, there is a lot of information about mod_perl on the web; The Perl Journal has had several articles on it, WebMonkey has had an article or two, and so on. There is a comprehensive mod_perl developer's guide on the offical Apache/Perl site. Lincoln Stein uses it a lot on his site and in his software. And, of course, we have the man pages and perldocs. So why do we need a book?

A few reasons. First and foremost, few of those sources go into the kind of detail that this book does, while still being approachable. Second, the book focuses on Apache, programming Apache, and (to a lesser extent) programming applications on the web; Perl and C are the means here, not the end. The in-depth technical discussions are about Apache: how it translates URI's to filenames, how it handles subrequests and internal redirects, how it maps files to MIME types. It then presents techniques for usurping these functions, customizing each phase of the reponse process, and explains when and why you would want to do this, instead of letting Apache do it's own thing. Creating checksums on the fly, compressing and decompressing data, creating extremely flexible HTML preprocessors, and modifying outgoing and incoming headers are some just some of the given examples.

The reference chapters are probably the single most valuable thing about the book. If you are a Perl programmer on a budget, you can download chapter 9 from the web site, but the C programmers out there have to buy the book to get the C API refernce. The C reference is 2 chapters (126 pages) long, and covers all the functions in precise detail.

For those among you who are using Microsoft operating systems, the book pays special attention to building, installing, and configuring mod_perl and Apache on Win32 systems, where it is different from Unix and Unix-like systems. Most of the actual modules are very similar (except for the obvious ones, such as scripts that call sendmail and the scripts that access MySQL), but the installation and building of mod_perl (or ApacheModulePerl.dll) are very different. The process is described in enough detail to make it possible, without boring those readers to whom it is irrelevant.

Conclusion

Programming Apache/mod_perl without this book is like writing Perl without the camel book. It can be done, but it is much easier and more enjoyable with the book. The writing is clear, informative, straight-forward, and, at times, amusing. The authors are the definitive sources for information on mod_perl and CGI programming, and this is reflected in every aspect of the book. While not as definitive for C programmers, it is still the best Apache API reference out there, other than the actual source code itself.

Purchase this book at Amazon.

Errata Table of Contents
  1. Server-Side Programming with Apache
  2. A First Module
  3. The Apache Module Architecture and API
  4. Content Handlers
  5. Maintaining State
  6. Authentication and Authorization
  7. Other Request Phases
  8. Customizing the Apache Configuration Process
  9. Perl API Reference Guide
  10. C API Reference Guide, Part I
  11. API Reference Guide, Part II
  1. Standard Noncore Modules
  2. Building and Installing mod_perl
  3. Building Multifile C API Modules
  4. Apache:: Modules Available on CPAN
  5. Third-Party C Modules
  6. HTML::Embperl--Embedding Perl Code in HTML

11 of 43 comments (clear)

  1. Book schmook! by Aqualung · · Score: 2

    I found the C API to be very well documented, and the examples I found on the web were fairly concice and illustrative. Why anyone would want to burden their server with modules written in Perl is beyond me though.
    ----
    Dave
    All hail Discordia!

    --

    - Dave
    1. Re:Book schmook! by Ed+Avis · · Score: 4
      Why anyone would want to burden their server with modules written in Perl is beyond me though.

      I think the idea is that the Perl interpreter is loaded at startup as part of the Apache process. The Perl programs are also compiled just once at startup. Once you've done this, running modules written in Perl simply involves interpreting bytecode, which although not as fast as C, is probably fast enough for most applications. Process creation overhead and loading / compiling scripts is usually the real killer for performance, not executing them.

      Besides, how much time does the machine spend in the Perl script, and how much calling Apache API functions? And how relevant is any of this, given that the biggest bottleneck is often bandwidth, not CPU time?

      --
      -- Ed Avis ed@membled.com
  2. It was a good introduction by johnnyb · · Score: 3

    This book was really good for an introduction to modules for someone (like me) who had never done anything beyond fork/exec CGI scripts. However, as you learn more, and try to do more interesting stuff, you find that the book skimmed the surface on several areas. Basically, for anything very technical or sophisticated, take the book with a grain of salt. Don't assume the book to be 100% correct on every point. They make a lot of mistakes. However, it was definitely worth the read and the money, and I use the appendices quite often when trying to find the function I need.

  3. Conflict of interest? by Ed+Avis · · Score: 2

    Has anyone else noticed that O'Reilly have their own web server which competes with Apache?

    I expect the publishing and software divisions are kept separate, to avoid the IBM syndrome of products being squashed / crippled to avoid 'cannibalizing' sales of products from another division. But it still seems a bit strange.

    --
    -- Ed Avis ed@membled.com
    1. Re:Conflict of interest? by raykt · · Score: 2

      not really a conflict of interest, since the software component is built in house - and the books are authored elsewhere.

      note that WebSite Pro only runs on NT/98/95 whereas Apache runs on whatever you can build it on. And O'Reilly use Solaris as the hardware for www.ora.com and linux.ora.com ( the latter is definately running Apache for the webserver, the former cannot be running website) and others check out Netcraft details for Ora sites.

      Website Pro does look to be quite a nice product, and should displace IIS as a good sererver for these platforms (NT etc).

  4. Great Book by Hoonis · · Score: 2

    This book is excellent. You can learn enough from it to get way into the internals of the server, and the focus on perl is warranted. With mod_perl + apache, there is a near perfect marriage of performance and development time.

    I knew many of the things discussed in it, but the added detail of the chapters taught me many new things. If you have access to a mod_perl server to develop on, this book will fill your head with great ideas for features, design strategies, and even does a great job of cataloging "fun" CPAN modules out there for the taking.

  5. heh by Anonymous Coward · · Score: 2

    This summer I was in intern at Cold Spring Harbor Biological Labs where Dr. Stein works as a bioinformatician! I got some help from him a bunch of times and worked with some of his postdocs.

    We also heard a presentation from him regarding his internet interface to the DB of the C. elegans genome. He's a nice guy and something of an interesting character, and definately knows his perl!

    Respectfully,

    Kevin Christie

    kwchri@maila.wm.edu

    PS - Perl rules!!!

  6. To moderate, or not to moderate ... by LizardKing · · Score: 2

    Aqualung's post was not a troll - he has a point in that mod_perl is a slippery beast. If it's not used *just right* it leaks memory like a string vest.

    Perl is also a no-no (in mod_perl or straightforward standalone guise) for very heavily loaded sites. At Yahoo!, Perl is considered too resource hungry for use on the frontline webservers.

    This leaves you in the unenviable situation of writing leakless, bugless C or C++ code. Catch 22 time ...


    Chris Wareham

  7. Help the authors make money by Jeffrey+Baker · · Score: 3
    If you want the authors to make a little more money when you buy this book, use one of the links on www.modperl.com.

    There are links to Amazon.com and O'Reilly.

    Cheers,
    -jwb

  8. My recommendations by LizardKing · · Score: 2

    I tend to be doing database driven stuff, mostly with MySQL, but occasionally with Oracle, Informix, etc. Apart from this it's information stored in memory mapped files, which are updates from live feeds.

    The actual web pages tend to be HTML hardcoded into C and C++ programs, with the dynamic stuff coming from the database or memmapped files. For instance, I am currently writing a reporting system. This is a C++ database load program that uploads the tables once every 24 hours. The searching is done by several C programs tailored to the individual search being performed - in other words one program for editors, another for authors. The nearest thing to 'templates' that it uses is a static library that has output routines for various headers, footers and standard menus.

    This is a little bit more laborious than using say PHP3, or mod_perl. However, it is blisteringly fast and efficient.

    One reason I tend to shy away from Perl besides the performance or resources issue, is the question of maintainability. It is very easy to get the job done quickly in Perl. It's also easy to write terribly unreadable code. One of the systems that I am replacing is simply line noise and a bunch of cron jobs. The other does absolutely no error checking, and has been missing many errors in the data feed for the last two years.

    You may argue that the issue of Perl code maintainability is down to the authors of the original systems, but Perl encourages quick hacks. When these hacks go into production they end up being a nightmare to maintain or enhance.


    Chris Wareham

  9. Re:Care to enlighten us with your recommendations by orabidoo · · Score: 2

    in MIPS assembly. seriously; C is way too high level, and perl is write-only line noise. MIPS assembly is the way to go. if you really must run x86 servers, you can always use the mod_mipsasm emulator.