Writing Apache Modules with Perl and C
If you're like me, your first introduction to Perl [?] was in the form of CGI [?] scripts. A few years ago, I inherited a few dozen ancient CGI scripts (Perl and otherwise) that required Immediate Attention. CGI led to Perl, and to Apache [?] ; Perl and Apache led, naturally enough, to mod_perl [?] , once I started hitting the performance bottlenecks inherenent in CGI programming. After researching mod_perl, building a mod_perl-enalbed Apache, and reading all the available online documentation, I got it up and running--and I was suitably impressed.
So, when O'Reilly [?] announced a book devoted to programming Apache with Perl, I was extremely excited. The book starts with an introduction and history of web programming, introduces CGI and other types of web programming (server API [?] 's, such as ISAPI and NSAPI; embedded processors, such as mod_perl, mod_dtcl, and mod_pyapache; FastCGI; Java [?] servlets [?] ; ActiveX [?] ; and client-side scripting languages, such as VBScript [?] and JavaScript [?] ), and then describes the Apache module architecture, using some simple examples ("Hello, World" in Perl and in C). Then it gets good, covering dynamically generated content; the hobgoblin of HTTP, state; and all the other stuff that gives CGI programmer nightmares (like authentication and authorization).
What's Bad?Although the title reads '... with Perl and C', the emphasis is very obviously on Perl. The C API reference chapters (chapters 10 and 11, pages 505 through 631) are very thorough, but almost all the examples are in Perl only. In fact, the authors go so far as to recommend that almost all Apache modules be written in Perl, and not C, except for very small modules or modules that need that extra speed boost or small memory footprint of being compiled into the server (page 13: "Anything you can do with the C API you can do with mod_perl with less fuss and bother."). Their reasoning is sound: mod_perl modules and scripts require a server restart at most, and often not even that, while for C modules, Apache itself must be recompiled; but I was expecting more in this area, perhaps a larger section on using DSO. After the book was published, however, several of the Perl-only examples were ported to the C API, and are available for download.
A few of these examples have already been published, and in these cases the book is mostly redundant. Notably, the Apache::NavBar module (which Lincoln uses on the server in his lab) and the Apache::AdBlocker module (chapters 4 and 7), appeared in The Perl Journal last year (issues 12 and 11). This is not that big a deal, since both of these modules are incredibly useful and probably deserve to be published in a few more places, but two brand new modules would have been most welcome, especially since the book's target audience probably also reads The Perl Journal.
What's Good?There's a lot to like here. Since I'm a Perl programmer by trade and disposition, I personally liked the fact that 99.9% of the examples were written in Perl. With only a few exceptions, the modules could be copied into the right locations and run immediately; the exceptions were the modules that made use of either other programs (Chapter 5's Hangman program which uses a relational database to store state information) or specialized Apache features (Chapter 7's Apache::AdBlocker module, which requires proxy functionality).
Much of the text and all of the source code is available on the web at www.modperl.com. Chapters 6, 7, 8, and 9 can be found on the web site for the book, as can all the Perl modules and some of the examples in functional form (Apache::Magic and hangman).
Chapter 9 is the key chapter, and the heart of the book. It describes in great detail all the Apache:: modules. If you use mod_perl at all, download and print this chapter. Memorize it. Use your favorite indexing script to make it searchable. Everything you need to know about mod_perl is here in this chapter.
The appendices are also excellent, although, because it is an Apache book, I would have figured that several of the sections would be regular chapters, and not relegated to the end. The appendices are divided pretty evenly between concentrating on Perl and on C, unlike most of the rest of the book.
So What's In It For Me?Fortunately for people like me, there is a lot of information about mod_perl on the web; The Perl Journal has had several articles on it, WebMonkey has had an article or two, and so on. There is a comprehensive mod_perl developer's guide on the offical Apache/Perl site. Lincoln Stein uses it a lot on his site and in his software. And, of course, we have the man pages and perldocs. So why do we need a book?
A few reasons. First and foremost, few of those sources go into the kind of detail that this book does, while still being approachable. Second, the book focuses on Apache, programming Apache, and (to a lesser extent) programming applications on the web; Perl and C are the means here, not the end. The in-depth technical discussions are about Apache: how it translates URI's to filenames, how it handles subrequests and internal redirects, how it maps files to MIME types. It then presents techniques for usurping these functions, customizing each phase of the reponse process, and explains when and why you would want to do this, instead of letting Apache do it's own thing. Creating checksums on the fly, compressing and decompressing data, creating extremely flexible HTML preprocessors, and modifying outgoing and incoming headers are some just some of the given examples.
The reference chapters are probably the single most valuable thing about the book. If you are a Perl programmer on a budget, you can download chapter 9 from the web site, but the C programmers out there have to buy the book to get the C API refernce. The C reference is 2 chapters (126 pages) long, and covers all the functions in precise detail.
For those among you who are using Microsoft operating systems, the book pays special attention to building, installing, and configuring mod_perl and Apache on Win32 systems, where it is different from Unix and Unix-like systems. Most of the actual modules are very similar (except for the obvious ones, such as scripts that call sendmail and the scripts that access MySQL), but the installation and building of mod_perl (or ApacheModulePerl.dll) are very different. The process is described in enough detail to make it possible, without boring those readers to whom it is irrelevant.
ConclusionProgramming Apache/mod_perl without this book is like writing Perl without the camel book. It can be done, but it is much easier and more enjoyable with the book. The writing is clear, informative, straight-forward, and, at times, amusing. The authors are the definitive sources for information on mod_perl and CGI programming, and this is reflected in every aspect of the book. While not as definitive for C programmers, it is still the best Apache API reference out there, other than the actual source code itself.
Purchase this book at Amazon.
Errata Table of Contents
- Server-Side Programming with Apache
- A First Module
- The Apache Module Architecture and API
- Content Handlers
- Maintaining State
- Authentication and Authorization
- Other Request Phases
- Customizing the Apache Configuration Process
- Perl API Reference Guide
- C API Reference Guide, Part I
- API Reference Guide, Part II
- Standard Noncore Modules
- Building and Installing mod_perl
- Building Multifile C API Modules
- Apache:: Modules Available on CPAN
- Third-Party C Modules
- HTML::Embperl--Embedding Perl Code in HTML
I was actually very disappointed with this book. I might suggest looking elsewhere.
I found the C API to be very well documented, and the examples I found on the web were fairly concice and illustrative. Why anyone would want to burden their server with modules written in Perl is beyond me though.
----
Dave
All hail Discordia!
- Dave
This book was really good for an introduction to modules for someone (like me) who had never done anything beyond fork/exec CGI scripts. However, as you learn more, and try to do more interesting stuff, you find that the book skimmed the surface on several areas. Basically, for anything very technical or sophisticated, take the book with a grain of salt. Don't assume the book to be 100% correct on every point. They make a lot of mistakes. However, it was definitely worth the read and the money, and I use the appendices quite often when trying to find the function I need.
Engineering and the Ultimate
Has anyone else noticed that O'Reilly have their own web server which competes with Apache?
I expect the publishing and software divisions are kept separate, to avoid the IBM syndrome of products being squashed / crippled to avoid 'cannibalizing' sales of products from another division. But it still seems a bit strange.
-- Ed Avis ed@membled.com
This book is excellent. You can learn enough from it to get way into the internals of the server, and the focus on perl is warranted. With mod_perl + apache, there is a near perfect marriage of performance and development time.
I knew many of the things discussed in it, but the added detail of the chapters taught me many new things. If you have access to a mod_perl server to develop on, this book will fill your head with great ideas for features, design strategies, and even does a great job of cataloging "fun" CPAN modules out there for the taking.
I'm not sure how the book examples access MySQL, but I use DBI. Scripts run on NT or unix without modification. Otherwise, DBI would be pointless.
This summer I was in intern at Cold Spring Harbor Biological Labs where Dr. Stein works as a bioinformatician! I got some help from him a bunch of times and worked with some of his postdocs.
We also heard a presentation from him regarding his internet interface to the DB of the C. elegans genome. He's a nice guy and something of an interesting character, and definately knows his perl!
Respectfully,
Kevin Christie
kwchri@maila.wm.edu
PS - Perl rules!!!
Aqualung's post was not a troll - he has a point in that mod_perl is a slippery beast. If it's not used *just right* it leaks memory like a string vest.
...
Perl is also a no-no (in mod_perl or straightforward standalone guise) for very heavily loaded sites. At Yahoo!, Perl is considered too resource hungry for use on the frontline webservers.
This leaves you in the unenviable situation of writing leakless, bugless C or C++ code. Catch 22 time
Chris Wareham
Any C programmer worth his salt can figure out how to write a module by looking at other modules and the Apache source code.
;)
What kind of hand holding is next? Apache Module Wizard integrated into bash?
Let's face it; most computer books are written purely for profit. Particularly ones about dreary, passionless, narrowly-defined topics like writing extensions to a particular application.
This book makes a great comliment to online docs for C module writers. I'm also on the Apache module writers mailing list and I happen to know that most of the other people on that list refer to this book often -- it is the defacto bible for Apache module writers who use C.
Maybe it's just me but I find the Everything links much more distracting than helpful. I see where they're a good idea in theory (ZDNet has something similar that's pretty good) but I think Everything is more of a facetious geek manifesto than a reference source. Is it useful for someone who doesn't know what ActiveX is to learn that it's "One of Bill Gates' and Microsoft's evil minions to take over the world. Life would be much better without it."?
Just my $0.02
What I'm listening to now on Pandora...
The first pragraph is a bit excessive on Everything linkage. I mean how many people reading this need terms like Java and CGI defined for them?
There are links to Amazon.com and O'Reilly.
Cheers,
-jwb
Any thoughts anyone?
I think this book is even better than O'Reilly's "Definitive Guide to Apache" to learn about how Apache works internally. Of course, this is due in part to Lincoln Stein, who is a great author.
Seriously. This is *not* a flame.
How do you suggest people generate dynamic web pages?
-- Mike Greaves
It was supposed to be directed at Lizard King below. Mike Greaves
Comment removed based on user account deletion
Seriously. This is *not* a flame.
How do you recommend people generate dynamic web pages.
-- Mike Greaves
I *REALLY* disagree with the author's assertion that apache modules should be written in perl. Many apache modules end up being glue into an existing system. Most of the benefit of being an apache module goes away if it consists of perl code that calls 'system' on existing programs. For peak performance, the existing code must be glued directly into apache, which means using C.
being primarily a c programmer, i was initially dissappointed that this book seemed to focus almost entirely on mod_perl. (while i like perl for quick and dirty wirk, it never fails to infuriate me when i try to use it for anything significant) however, after reading through the book, i still found it to be extremely useful. the middle chapters give a very good explanation of how the apache api can be used to do what you want. Basically, i skimmed through the rest of the book to get the basic concepts down, and since then i have lived in chapters 10 and 11, the c api reference. even without the rest of the book to explain how to use it, these two chapters are by far the most useful reference for the apache api that i have found.
If I don't put anything here, will anyone recognize me anymore?
I've seen what mod_perl does to the httpd daemon and it's not pretty. The compiled executable is over 1 MB in size and the memory footprint is over 5 MB per child! Ouch! I've written a fairly data-happy Apache module in C that added only 20K to the httpd daemon's executable size and memory footprint -- it's much nicer. Of course for complex tasks like what Slashdot is doing for each page request you may as well use mod_perl.
I tend to be doing database driven stuff, mostly with MySQL, but occasionally with Oracle, Informix, etc. Apart from this it's information stored in memory mapped files, which are updates from live feeds.
The actual web pages tend to be HTML hardcoded into C and C++ programs, with the dynamic stuff coming from the database or memmapped files. For instance, I am currently writing a reporting system. This is a C++ database load program that uploads the tables once every 24 hours. The searching is done by several C programs tailored to the individual search being performed - in other words one program for editors, another for authors. The nearest thing to 'templates' that it uses is a static library that has output routines for various headers, footers and standard menus.
This is a little bit more laborious than using say PHP3, or mod_perl. However, it is blisteringly fast and efficient.
One reason I tend to shy away from Perl besides the performance or resources issue, is the question of maintainability. It is very easy to get the job done quickly in Perl. It's also easy to write terribly unreadable code. One of the systems that I am replacing is simply line noise and a bunch of cron jobs. The other does absolutely no error checking, and has been missing many errors in the data feed for the last two years.
You may argue that the issue of Perl code maintainability is down to the authors of the original systems, but Perl encourages quick hacks. When these hacks go into production they end up being a nightmare to maintain or enhance.
Chris Wareham
Also note that because a perl script is interpreted its code actually goes in the data segment, not the text segment. Hence there is no memory sharing of scripts among the httpds.
"Amazon.com has a market capitalization of $5.75 billion (August 10, 1998). They built their site with compiled C CGI scripts connecting to a relational database. You could not pick a tool with a less convenient development cycle. You could not pick a tool with lower performance (forking CGI then opening a connection to the RDBMS). They worked around the slow development cycle by hiring very talented programmers. They worked around the inefficiencies of CGI by purchasing massive Unix boxes ten times larger than necessary. Wasteful? Sure. But insignificant compared to the value of the company that they built by focusing on the application and not fighting bugs in some award-winning Web connectivity tool programmed by idiots and tested by no one."
cpeterso
I got Writing Apache Modules with Perl and C last week, and it is excellent!
I highly recommend it for beginners and experts.
Now I'm REALLY glad I posted AC...