Slashdot Mirror


HTTP: The Definitive Guide

Michael Palmer writes "OK, how well you know HTTP? Here's a pop quiz: QUESTION: Did you know that the Keep-Alive header was valid in HTTP 1.0, but has been deprecated in HTTP 1.1? A) What does "deprecated" mean? B) What is the "Keep-Alive header?" C) That's too bad - I kind of thought Keep-Alive was handy! D) Get with the program... HTTP 1.1 came out in 1999. The Internet boom is over already! Persistent connections are the default in HTTP 1.1 anyway." Answer (not necessarily your answer) and the rest of Palmer's review follows. HTTP: The Definitive Guide author David Gourley, Brian Totty pages 656 pages publisher O'Reilly & Associates; 1st edition (September 2002) rating excellent overview, plus detail in core areas reviewer Michael Palmer ISBN 1565925092 summary An overview of HTTP and related topics

OK, so I answered "C". I am going to make bold the claim that HTTP: The Definitive Guide, the long-awaited O'Reilly book on HTTP is ambitious enough in breadth and depth that if you answered "B," "C," or "D," you will find this book useful and informative. This is primarily due to clear organization of the book, as well as its friendly (even chummy) writing style.

Even if you are a technically-inclined sort from the Marketing department, and answered "A," you could get a good technical overview of the plumbing of the Web by skimming through this book; plus, having any O'Reilly book on the shelf in your cubicle would score you some street cred with the guys sitting over in Development -- this could be the one you've actually read. :-)

Breadth Unless you answered "D," HTTP is more complicated than you think. This is especially true if, as the authors of a good technical book should do (and these authors do), one spends some time touching on matters one level down (to TCP/IP, and other areas, in this case), and one level up (to HTML, generally, in this case). Because the authors are particularly concerned with HTTP performance, details of the interactions between HTTP and adjacent levels can be important.

The book is divided into five main sections: 1) an overview of HTTP, URLs, and connection management; 2) HTTP Architecture, including Web servers, proxies, caches, gateways, tunnels, robots; 3) Identification, Authorization, and Security; 4) Entities, Encodings, and Internationalization; 5) Content Publishing and Distribution, including hosting, publishing, load balancing, logging. So, even if you classify yourself as a "D," or even if you are hacking on an extensible open-source router software platform (in that case, you are an "F"), you will find yourself pulling this book from the shelf from time to time to check on something in one of these areas. The modular organization of the book is good.

The full Table of Contents is available on line.

Depth One (unfortunate?) thing about the Web is that its "architecture" (if you can even call it that) evolved and grew piece by piece. The design goals people had in mind back in 1993, or even in 1999, have been blown away by what has happened on the ground. Inter-company politics have also been a big factor -- never helpful for promoting standardization, or sound design. (Perhaps another problem has been the lack of an O'Reilly book on HTTP to tie everything together!) Hence, not only do you have a confusing mass of obsolete and/or overlapping specifications documents, you also have major differences between how different browsers, servers, and proxies adhere to these specifications in practice. This is one place the book shines: sprinkled throughout the pages are little tidbits about compatibility or performance pitfalls, gleaned from much practical experience. (The authors were some of the architects of Inktomi's Traffic Server "enterprise class" Web cache. Think "proxy caching for all of AOL's Web traffic.") As one example: "Technically, any Connection header fields (including Connection: Keep-Alive) received from an HTTP/1.0 device should be ignored, because they may have been forwarded mistakenly by an older proxy server. In practice, some clients and servers bend this rule, although they run the risk of hanging on older proxies." I can just imagine the series of bug reports leading to the inclusion of that piece of advice in the book. There are many other such warnings and bits of advice, generally aimed at HTTP application developers, often with an eye to performance tuning.

Here again, appropriate depth of discussion for a variety of readers is handled by clear organization of the book. The basic background material is laid out, and as the authors dive deeper into detail they may make a suggestion like, "If you are [not] writing high-performance HTTP software... feel free to skip ahead." Then, at the end of every chapter, there is a section labelled, "For More Information," which is a collection of relevant references and links, for those who want to dig into the source documents themselves.

Cautions This book review is addressed to the Slashdot crowd, a very technically savvy audience, so it's appropriate to mention what this book is not. It's not a detailed technical reference on all the topics mentioned in the table of contents (above); it would be tough to fit all that material into the book's 650-plus pages. However, the book is a good overview of HTTP and many related topics. The book does dip down into the grungy detail in many areas, but this won't be your only reference if you are a Web application developer.

Conclusion Overall, this is one of the more accessible O'Reilly books I own. In addition, while experts will certainly seek out greater depth in their particular area of expertise, few people are expert in the whole range of topics related to HTTP that this book covers. In addition, the book provides many tips drawn from practical experience, and references to more detailed material. HTTP, if not the heart and soul of the Web (perhaps that is Web content itself), could perhaps be called the Web's circulatory system. If you have a professional interest in Web content distribution, or Web application development, I believe this book deserves a spot on your shelf.

You can purchase HTTP: The Definitive Guidefrom bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

21 of 283 comments (clear)

  1. Or... by cqpalzm · · Score: 1, Informative

    Or, you could just check out the W3C and read up on it without the need of someone making edits to the explanations of the actual specs.

    1. Re:Or... by Zeinfeld · · Score: 5, Informative
      Or, you could just check out the W3C and read up on it without the need of someone making edits to the explanations of the actual specs.

      Where do you think you can find HTTP on the W3C site?

      HTTP was standardized in IETF process, not W3C. HTML started in IETF process and then we yanked it out and did it in W3C. IETF process is not the place to work on something where there are religious wars, the SGML folk were big on religious wars.

      The RFCs on HTTP are useful if you are writing a server or client, however they are less useful as a guide to how what is out there works. One of the big problems with the IETF is that the RFCs look like shit, they are designed to be printed in a fixed width font because thats the way they did things in Babbage's day. So not surprisingly engineers tend to go for documentation that is easier on the eye, even if it turns out to be wrong.

      The other issue with the specs is that they describe what the WG came up with. That does not necessarily represent reality, the group took seven years to complete. If you want to know what will work you need more information than is in the RFC.

      I wrote parts of the HTTP spec and even I would want more information than just the spec. I am not sure about the 'advice' about working arround older broken proxies, I tend to think its not a bad thing if folk running obsolete software lose every so often. But it is useful to know that it can be an issue.

      --
      Looking for an Information Security student project suggestion?
      Try http://dotcrimeManifesto.com/
    2. Re:Or... by shiflett · · Score: 3, Informative

      Your entire post could not be more untrue.

      HTTP was created long before it was handed off to be maintained by the IETF. It existed prior to the RFC that you claim to have co-wrote. The only reason that exchange was made is because HTTP is viewed as a piece of the Internet's infrastructure; in fact it is essentially where the Internet and the Web intersect.

      Also, HTTP is very useful as "a guide to how what is out there works." Check out a mailing list for mod_perl, PHP, etc. You will find countless questions being asked that would be answered by a simple understanding of HTTP - how the Web works. This is what real Web developers need; then maybe I can check my bank account balance or sell some stocks without having to interact with a poorly-constructed Web site.

      As the author of the HTTP Developer's Handbook, you might think that I would point out weaknesses in O'Reilly's effort. On the contrary, I think this work is very good, and I would highly recommend it to anyone involved in Web development. I think my book is more suited for the everyday reference that you carry with you that explains things specifically from a Web developer's perspective rather than focusing on clarifying the standards, and I think the two go well together.

      At any rate, I think this is a quality book on a very important topic.

  2. Keep-Alive... by Xerithane · · Score: 5, Informative
    HTTP 1.1 Specification does allow the difference between Keep-Alive and Close. By default it says it's peristent (Keep-Alive) but you can still turn it off (Connection: close\n)

    Mozilla Sends:
    GET / HTTP/1.1
    ...
    Keep-Alive: 300
    Connection: keep-alive
    Which isn't necessarily a bad thing, but they have to be backwards compatible in case they hit a poorly implemented HTTP 1.1 server. Gets annoying to code hybrid httpd systems.

    HTTP isn't that complicated of a specification though, the RFC is easy enough to understand.
    --
    Dacels Jewelers can't be trusted.
  3. RFCs have all the info you need by Anonymous Coward · · Score: 5, Informative

    Honestly, save yourself ~ $50 for an O'Reilly book and go directly to the source of the information:

    HTTP 1.0
    HTTP 1.1

    It's remarkably easy to read for a technical document.

    1. Re:RFCs have all the info you need by statusbar · · Score: 3, Informative
      This is what matters in the RFC:


      1.2 Requirements

      The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [34].


      RFC 2119 says:


      1. MUST This word, or the terms "REQUIRED" or "SHALL", mean that the definition is an absolute requirement of the specification.

      2. MUST NOT This phrase, or the phrase "SHALL NOT", mean that the
      definition is an absolute prohibition of the specification.

      3. SHOULD This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.


      So in this case should is not synonymous with must.

      --jeff++
      --
      ipv6 is my vpn
  4. zeldman by Meeble · · Score: 5, Informative

    > One (unfortunate?) thing about the Web is that its "architecture" (if you can even call it that) evolved and grew piece by piece. The design goals people had in mind back in 1993, or even in 1999, have been blown away by what has happened on the ground. Inter-company politics have also been a big factor - never helpful for promoting standardization, or sound design. >

    I couldn't agree with this more from a web development area as well, so many designers are still using hack and slash methods from the early 90's it's sad[although not always their fault!]. It correlates to the same principles used to build the architecture itself.

    side note: if you're interested in learning more about forward compatible web design you should check out Jeffrey Zeldman's new book 'Designing With Web Standards' you can find him at www.zeldman.com - I just finished this book and it was well worth the $24.50 - all you nested table designers should pick this one up or those looking to bridge the gap from using tabled design. =)

    --
    Fear Breeds Knowledge
    1. Re:zeldman by Anonymous Coward · · Score: 1, Informative

      There's ways to get around tables. They're 3x as long, and not as nice.

      Rubbish. Maybe if you experiment with a page here and there, yes. But when you deal with lots of content, no. You only have to write the styles once for a whole set of pages.

      Example 1: I want to make a form. Forms look nice if the labels are all lined up and all the form elements are lined up. Let's assume I want a page that can have varying size text (which is what people say should be allowed with css).. Ah fuck, can't use divs here. Oh well. I tried. From what I've found, there's no way to link the size of one div to the size of the others in a way that makes logical sense to the document format and flow. I'll welcome any suggestions proving me wrong.

      1. Forms are tabular data. Use a table.
      2. CSS 2 handles this kind of layout with ease - Internet Explorer fucks it up though. Look into display: table

      Example 2: Multi-column layouts are annoying as hell.

      Subjective. I find them much less annoying than when using tables.

      Example 3: [...] Yes, this can't even come close to being done with tables, but it's something that pisses me off anyway.

      'Nuff said.

      Honestly, I don't know why the hell we're taking so long to get AWAY from HTML. HTML as originally designed was a way to structure content logically. Then along came graphical browsers, and it turned in to a presentation language. Then along came the W3C, who try and force it back to a structural language. But that's not what people want! XSL:FO are too verbose, imho, but are much nicer for what I want to do. (Bleh, just cuz I like examples: Give the damned web browser a clue as to what a slashbox is. Don't give it a hierarchy of divs. Don't give it a table. Give it a fucking . Tell it using some other method HOW to render a slashbox.

      Are you completely insane? That's the whole point of separating content (HTML) from presentation (CSS)! <div class="slashbox"> is all the HTML you should need.

      Some might argue that that's not a good idea, but holy shit I don't care! Limiting me to such small pieces of crap as divs and spans to build a decent webpage is retarded.

      You are right, it is retarded. Luckily, you seem to be completely unfamiliar with HTML, XHTML or any of the future plans for this family of markup languages. You aren't limited to divs and spans.

      Imagine slashboxes with curved corners.

      That would be the border-radius CSS 3 property. Educate yourself, really.

      Tables are limited, but damnit so is CSS, just in different ways. Within 5 years, people are going to be wondering what the hell we were thinking when using CSS. Especially 1, but 2 as well: they leave out such obvious things that are craved by people trying to make presentation.

      Well go and suggest them to the W3C then! It's not like they don't operate public mailing lists for exactly that purpose! It's not like you see them going backwards with CSS 1 -> 2 -> 3! You are pissing and moaning because it's not perfect, yet you acknowledge that CSS is getting better and that the only alternatives are not viable!

  5. Re:well by fryguy451 · · Score: 3, Informative

    The first and fully accepted meaning of deprecate is "to express disapproval of." But the word has steadily encroached on the meaning of depreciate. It is now used, almost to the exclusion of depreciate, in the sense "to belittle or mildly disparage,".

    http://dictionary.reference.com/search?q=depreca te d

  6. Re:Jesus Christ! Get with the program, grandpa! by Gibble · · Score: 2, Informative

    *psst*

    HTML != HTTP

    --
    Gibble: Descriptive of an emotional state in which one's mind is scrabbling for some purchase on reality
  7. Re:I do know this... by Anonymous Coward · · Score: 1, Informative

    Uhm... that has absolutely nothing to do with HTTP, does it?

  8. Re:When is HTTP 2.0 coming out? by leighklotz · · Score: 2, Informative

    An AC Writes:
    > I figure XHTML 2 is going to require a big re-design of everything anyway, ...
    XHTML 2 has been working in many browsers since August, 2002, even though it's still a draft. Part of the point of point of XHTML 2 is to cleanly re-seat HTML on top of the stack of stuff that browsers are supposed to implement already (CSS, XML, linking, etc.).

  9. deprecated by ap0stle · · Score: 3, Informative
    From w3.org :

    deprecated

    Deprecated

    A deprecated element or attribute is one that has been outdated by newer
    constructs. Deprecated elements are defined in the reference manual in
    appropriate locations, but are clearly marked as deprecated. Deprecated
    elements may become obsolete in future versions of HTML.

    User agents should continue to support deprecated
    elements for reasons of backward compatibility.


    Definitions of elements and attributes clearly indicate which are
    deprecated.


    This specification includes examples that illustrate how to avoid using
    deprecated elements. In most cases these depend on user agent support for style
    sheets. In general, authors should use style sheets to achieve stylistic and
    formatting effects rather than HTML presentational attributes. HTML
    presentational attributes have been deprecated when style sheet alternatives
    exist.


  10. Re:It even answers by _Bunny · · Score: 3, Informative

    Error 300 isn't as unusual as you might think.

    Apache's mod_speling module will correct small typeos in URLs that are requested, and if it finds more than one possible match it returns an error 300 with the possible choices.

    For example:

    http://www.madriver.k12.oh.us/network/netware/wefs 1

    - Bunny

  11. Re:HTTP is amazingly badly engineered by cdipierr · · Score: 5, Informative

    Um...chunked encoding is not useless.

    If you've got dynamic output, and don't want to buffer then entire content so you can generate a Content-Length header, then chunked encoding is for you. There's no reason for a server to be buffering up a potentially huge reply if the client can accept it piece-meal instead.

  12. Why link to bn.com? by RonBurk · · Score: 1, Informative

    I thought it was common knowledge by now that you always check (at least) buy.com for the cheapest price before pointing people to bn.com or amazon.com.

    bn.com price: $44.95
    buy.com price: 28.31

    I have no affiliation with buy.com, except I've saved a lot of money with them.

  13. The only book you need... by spazoid12 · · Score: 2, Informative

    For the full-featured HTTP server that I designed and implemented at my last job...I found just one book to be all the help a person needs:

    "HTTP Pocket Reference", O'Reilly, maybe 4 bucks at Bookpool.

    75 pages, of which about 65 aren't necessary.

    656 pages on HTTP??? It's not a detailed technical reference on all the topics mentioned in the table of contents (above); it would be tough to fit all that material into the book's 650-plus pages. ... good grief!!

  14. Re:HTTP is amazingly badly engineered by Anonymous Coward · · Score: 1, Informative

    Examples: chunked encoding -- absolutely superfluous! Amazingly useless.

    Nope - it allows for pipelining dynamic content without buffering it all beforehand.

    Did I mention the monstrosity that is content negotiation? It is impossible to write a proxy that can cache content in the face of content negotiation.

    Nope. It's called the Vary header, look it up.

    Luckly, nobody uses it on their servers, because it is a pig to implement and configure on the server.

    Nope, it's just a case of dropping a couple of files into the same directory with similar names, and switching on MultiViews in Apache. On the fly switching between GIF and PNG, for example:

    logo.png 15034 bytes
    logo.gif 18455 bytes

    <img src="logo" alt="[MyCo]" />
  15. Re:HTTP is amazingly badly engineered by mmcshane · · Score: 5, Informative

    Troll city. I'll bite.

    Chunked encoding is usefull to me everyday. I use a protocol one level up from HTTP1.1 (AS2) where messages and their digests are transferred in the same request - in chunks.

    As for supporting ranges, this is why agents are encouraged to delegate difficult MIME handling to helper apps like a Flash plugin. Plenty of servers implement this, it's actually not even that hard. There is a separate issue related to what a range response actually represents (in the theoretical sense), but I won't touch that for now. Read www-tag @W3C for more info.

    Content negotiation works nicely. We serve French pages to agents that prefer French. We also serve unstyled xml to agents which we're sure are not browsers. It's not hard to do, we look at a header and then decide which representation to serve. Caches use the Vary header to choose which responses to serve from cache. It's not rocket science.

    My favorite part: "HTTP needs to die quickly and be replaced by something sane"

    Yeah, it'll never catch on.

  16. Useful book by Anonymous Coward · · Score: 2, Informative

    I used this book in addition to the RFC when writing my webserver software.

    It's a good addition to the RFC's but not a substitute. The introductory stuff is a bit too basic but the rest of the chapters clarify several things about the RFC's. 2616 can be a bit ambiguous at times.

    All in all, it was worth the money if you are planning to do any serious work with HTTP.

  17. Re:Missing poll option by onomatomania · · Score: 3, Informative

    I don't know what is more pathetic: that you would make requests just to see X-Bender headers, or that I would know where to look in the slashcode CVS to see the list (scroll down to the end of that page.)