HTTP: The Definitive Guide

← Back to Stories (view on slashdot.org)

Posted by timothy on Tuesday May 20, 2003 @04:00AM from the four-easy-letters dept.

Michael Palmer writes "OK, how well you know HTTP? Here's a pop quiz: QUESTION: Did you know that the Keep-Alive header was valid in HTTP 1.0, but has been deprecated in HTTP 1.1? A) What does "deprecated" mean? B) What is the "Keep-Alive header?" C) That's too bad - I kind of thought Keep-Alive was handy! D) Get with the program... HTTP 1.1 came out in 1999. The Internet boom is over already! Persistent connections are the default in HTTP 1.1 anyway." Answer (not necessarily your answer) and the rest of Palmer's review follows. HTTP: The Definitive Guide author David Gourley, Brian Totty pages 656 pages publisher O'Reilly & Associates; 1st edition (September 2002) rating excellent overview, plus detail in core areas reviewer Michael Palmer ISBN 1565925092 summary An overview of HTTP and related topics

OK, so I answered "C". I am going to make bold the claim that HTTP: The Definitive Guide, the long-awaited O'Reilly book on HTTP is ambitious enough in breadth and depth that if you answered "B," "C," or "D," you will find this book useful and informative. This is primarily due to clear organization of the book, as well as its friendly (even chummy) writing style.

Even if you are a technically-inclined sort from the Marketing department, and answered "A," you could get a good technical overview of the plumbing of the Web by skimming through this book; plus, having any O'Reilly book on the shelf in your cubicle would score you some street cred with the guys sitting over in Development -- this could be the one you've actually read. :-)

Breadth Unless you answered "D," HTTP is more complicated than you think. This is especially true if, as the authors of a good technical book should do (and these authors do), one spends some time touching on matters one level down (to TCP/IP, and other areas, in this case), and one level up (to HTML, generally, in this case). Because the authors are particularly concerned with HTTP performance, details of the interactions between HTTP and adjacent levels can be important.

The book is divided into five main sections: 1) an overview of HTTP, URLs, and connection management; 2) HTTP Architecture, including Web servers, proxies, caches, gateways, tunnels, robots; 3) Identification, Authorization, and Security; 4) Entities, Encodings, and Internationalization; 5) Content Publishing and Distribution, including hosting, publishing, load balancing, logging. So, even if you classify yourself as a "D," or even if you are hacking on an extensible open-source router software platform (in that case, you are an "F"), you will find yourself pulling this book from the shelf from time to time to check on something in one of these areas. The modular organization of the book is good.

The full Table of Contents is available on line.

Depth One (unfortunate?) thing about the Web is that its "architecture" (if you can even call it that) evolved and grew piece by piece. The design goals people had in mind back in 1993, or even in 1999, have been blown away by what has happened on the ground. Inter-company politics have also been a big factor -- never helpful for promoting standardization, or sound design. (Perhaps another problem has been the lack of an O'Reilly book on HTTP to tie everything together!) Hence, not only do you have a confusing mass of obsolete and/or overlapping specifications documents, you also have major differences between how different browsers, servers, and proxies adhere to these specifications in practice. This is one place the book shines: sprinkled throughout the pages are little tidbits about compatibility or performance pitfalls, gleaned from much practical experience. (The authors were some of the architects of Inktomi's Traffic Server "enterprise class" Web cache. Think "proxy caching for all of AOL's Web traffic.") As one example: "Technically, any Connection header fields (including Connection: Keep-Alive) received from an HTTP/1.0 device should be ignored, because they may have been forwarded mistakenly by an older proxy server. In practice, some clients and servers bend this rule, although they run the risk of hanging on older proxies." I can just imagine the series of bug reports leading to the inclusion of that piece of advice in the book. There are many other such warnings and bits of advice, generally aimed at HTTP application developers, often with an eye to performance tuning.

Here again, appropriate depth of discussion for a variety of readers is handled by clear organization of the book. The basic background material is laid out, and as the authors dive deeper into detail they may make a suggestion like, "If you are [not] writing high-performance HTTP software... feel free to skip ahead." Then, at the end of every chapter, there is a section labelled, "For More Information," which is a collection of relevant references and links, for those who want to dig into the source documents themselves.

Cautions This book review is addressed to the Slashdot crowd, a very technically savvy audience, so it's appropriate to mention what this book is not. It's not a detailed technical reference on all the topics mentioned in the table of contents (above); it would be tough to fit all that material into the book's 650-plus pages. However, the book is a good overview of HTTP and many related topics. The book does dip down into the grungy detail in many areas, but this won't be your only reference if you are a Web application developer.

Conclusion Overall, this is one of the more accessible O'Reilly books I own. In addition, while experts will certainly seek out greater depth in their particular area of expertise, few people are expert in the whole range of topics related to HTTP that this book covers. In addition, the book provides many tips drawn from practical experience, and references to more detailed material. HTTP, if not the heart and soul of the Web (perhaps that is Web content itself), could perhaps be called the Web's circulatory system. If you have a professional interest in Web content distribution, or Web application development, I believe this book deserves a spot on your shelf.

You can purchase HTTP: The Definitive Guidefrom bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

17 of 283 comments (clear)

Min score:

Reason:

Sort:

problems with definitive guides by stonebeat.org · 2003-05-20 04:16 · Score: 4, Insightful

The problem with definitives guides is that, they get outdated very quickly :)

so i wouldn't spend any money on them. instead i would just browse the W3C website or other reference web sites.

--

Consensus is good, but informed dictatorship is better
Re:RFCs have all the info you need by bwalling · 2003-05-20 04:18 · Score: 2, Insightful

Honestly, save yourself ~ $50 for an O'Reilly book and go directly to the source of the information:

HTTP 1.0
HTTP 1.1

Well, the organization of the RFCs isn't exactly what I'm looking for, there is useful commentary in the book, there is an index in the book, and I like having things in print. Sure, it's not too expensive to print the RFC, but if you shop around, the book isn't $50.
Re:RFCs have all the info you need by Anonymous Coward · 2003-05-20 04:23 · Score: 4, Insightful

No, RFCs don't have all the information you need. Specifications should contain a succint description of the protocol - not advice, best practices, informative examples, and so on. That is what books like this are for.
Re:zeldman by Brummund · 2003-05-20 04:38 · Score: 3, Insightful

I don't know about you. but I'd rather die or work in the advertising business than buy a book about web design by someone who uses light grey on white background on their homepage. Come on, he should know better than "It's hardly readable, but it SURE looks nice."
Re:RFCs have all the info you need by Xerithane · 2003-05-20 04:58 · Score: 3, Insightful

...not advice, best practices, informative examples, and so on. That is what books like this are for.

HTTP 1.1 does tell you the best practice. It says, "You SHOULD do XYZ in case ABC." If you need help coding something, you shouldn't be implementing HTTP 1.1. HTTP is not that complex, it doesn't need informative examples. What examples can you possibly need? "When using this header, the values are X, Y, or Z." Well.. it tells you that.

I wrote a complete HTTP 1.1 implementation according to the RFC without issue. They are remarkably easy to write, and validate HTTP headers. The problem comes in from non-compliant browsers (which are non-compliant to handle non-compliant servers)

--
Dacels Jewelers can't be trusted.
Re:RFCs have all the info you need by kazrak · 2003-05-20 04:59 · Score: 2, Insightful

I've read the RFCs. I have the O'Reilly book as well. There is a lot of information in the O'Reilly book that is not in the RFCs. (Information on robots.txt, for example. A lot more proxy information than the RFCs contain. Some basic information on WebDAV. These are just a few things I found flipping through my copy.)
Sure, you can find all this stuff online. You buy a book so you have a well-organized place to find it all together, though. This book succeeds marvelously at this task.
Re:RFCs have all the info you need by iabervon · 2003-05-20 05:12 · Score: 2, Insightful

A compliant browser SHOULD handle non-compliant servers, and a compliant server SHOULD handle non-compliant browsers. An important property of a good specification is that old and broken programs may be handled gracefully without violating the standard.
/me too by DrSkwid · 2003-05-20 05:25 · Score: 2, Insightful

until divs will auto resize we'll be stuck with pages like this one (light orange on white for them menus ffs!) that only go 20% to the width of my browser window.

& his menus don't resize to fit the text if you turn up the size

still, never mind, im sure he makes $ from his book, but not from me

--
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
Re:zeldman by Meeble · 2003-05-20 05:55 · Score: 2, Insightful

sure maybe at 7000 x 7000 resolution it doesn't take up everything on your screen - however his compatible design works in all browsers and WAP out there currently - including Safari.

--
Fear Breeds Knowledge
Re:If it's broken, fix it! by Anonymous Coward · 2003-05-20 06:14 · Score: 1, Insightful

By being permissive about nonstandards compliance (or is that standards noncompliance?) you are encouraging sloppy coding.

Postel's Prescription: "Be liberal in what you accept, and conservative in what you send."

In other words, try and understand the junk that you see, but only give out tidy stuff.
Lean vs Trivial by SnakeStu · 2003-05-20 06:27 · Score: 3, Insightful

Standards should be lean and so easy to understand and so trivial to implement that one undergrad student can implement it to full compliance in one afternoon.

I suppose that appeals to undergrads, and those who like extremely granular standards that only address small parts of a solution. Beyond that, it's an absurd overstatement. Standards should be lean in the sense that they should be focused, but to be trivial enough for full implementation by an undergrad in one afternoon ducks below the bar of general usefulness. It's somewhat analogous to what I've heard more than one teacher respond when asked by a student "how long" a paper should be: It should be like a skirt -- long enough to cover the important parts, short enough to keep it interesting. You're right that it should be lean (short enough to keep it interesting) but your criterion for that might not cover the important parts.

--
No Laughing Allowed!
Thou shalt not SHOULD? by fm6 · 2003-05-20 06:34 · Score: 2, Insightful

A standards document should never use the word SHOULD.
Don't you mean, "A standards document must never use the word SHOULD? ;)
Strictly speaking, RFCs are not standards -- only government-sanctioned bodies can issue standards. Of course, that's a distinction only of interest to compulsive nit-pickers (aka Tech Writers).
In practical terms, I think a good RFC plays the role both of a standards document (MUST) and a best practices document (SHOULD). Given the ad hoc nature of the Internet, it makes a lot of sense to combine the two. It's the sort of informal process and documentation that has allowed the net to grow so quickly.
And (the bring us back to the real topic) that's a good reason to not waste money on a book if there's a good RFC at hand.
Re:When is HTTP 2.0 coming out? by shiflett · 2003-05-20 06:42 · Score: 4, Insightful

Never.

To quote the W3C:

Now that both HTTP extensions and HTTP/1.1 are stable specifications, W3C has closed the HTTP Activity. The Activity has achieved its goals of creating a successful standard that addresses the weaknesses of earlier HTTP versions.
Most overlooked HTTP feature by KjetilK · 2003-05-20 06:54 · Score: 2, Insightful

OK, so what are people's favorite overlooked HTTP feature?
Mine are definately content negotation, specifically language negotation, since I develop multilingual websites (yeah, English is not my first language).
I find that extremely useful, yet, nobody cares about it... It is really annoying when you get to a website and you have to choose the language, "Hey, I told you that in my accept-language header, just listen!"
Things are moving sooooo slowly...

--
Employee of Inrupt, Project Release Manager and Community Manager for Solid
Re:HTTP is amazingly badly engineered by Fefe · 2003-05-20 07:34 · Score: 2, Insightful

First of all, it's perfectly OK to serve the dynamic content without a content-length header.

Second of all, the whole point of the content-length header is so that the client knows how much data will come and is thus able to allocate memory, see whether it will be able to process the whole content and display a progress bar. All of these are not possible with chunked encoding, so you get none of the benefits from content-length. Why not drop it in the first place?

Not having a content-length header has only one drawback: it breaks keep-alive connections. But since sane sites are compressing their dynamic content anyway to save bandwidth cost and make it appear quicker on the client machine, and dynamic HTML pages of 100k typically compress down to below 10k because HTML is so bloated, there really is no point in not buffering those 10k. The system has larger buffers than that for TCP anyway, so memory consumption is not a valid excuse. Also, if you do the buffering, you can add the content-length header and get all the benefits.

Oh, and one last point: we have had security problems caused by chunked encoding. We also have had a trillion security problems by idiots and static buffering, but so far nobody has been stupid enough to do compression and HTTP output buffering using a static buffer.
Re:Invalid Question by Cromac · 2003-05-20 08:25 · Score: 2, Insightful

The correct statement should be:
The Keep-Alive header was valid in HTTP 1.0, but has been deprecated in HTTP 1.1. True or False
By adding "did you know" there isn't a good answer since both True and False are correct depending on who answers the question.
Tests in school would have been much easier if they all started out with "Did you know...".
Re:Or... by Zeinfeld · 2003-05-20 10:02 · Score: 2, Insightful

HTTP was created long before it was handed off to be maintained by the IETF. It existed prior to the RFC that you claim to have co-wrote. The only reason that exchange was made is because HTTP is viewed as a piece of the Internet's infrastructure; in fact it is essentially where the Internet and the Web intersect.
Well yes, before there was HTTP 1.1 there was HTTP 1.0. There was also an HTTP 0.9 that was arround before that...
HTTP was NOT handed off to the IETF by the W3C as your post appears to imply, there was no W3C at that time. HTTP was taken to the IETF to get recognition as a protocol standard. There was no 'handing off', the same people continued to work on the protocol as before. The only significant change was that the mailing list changed, www-talk had become very noisy by this time. The IETF has change control in a nominal sense, they can write new versions of the spec and call them HTTP, but so can anyone else, they just might have more difficulty getting others to recognise them...
That is the reason there are two sets of acknowledgements in the spec. The first set is the original authors, the second the set of people who worked on the draft after the IETF process started.
I don't seem to remember your name from any of the Web working groups I have been associated with. It is unlikely that if you know as much as you claim to about the Web that you don't know mine. I don't think that publishing a book about my work gives you the right to accuse me or for that matter anyone else of being a liar.
Perhaps if you actually read what I wrote rather than what you think I wrote you might not have made such a fool of yourself.

--
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/