Slashdot Mirror


Speed Up Sites with htaccess Caching

produke writes "Increase your page load times and save bandwidth with easy and really effective methods using apache htaccess directives. mod_headers to set expires, and max-age, and cache-control headers on certain filetypes. The second method employs mod_expires to do the same thing -- together with FileETag, makes for some very fast page loads!"

29 comments

  1. Increase page load times? by daranz · · Score: 5, Funny

    Why would I want to do that?

    --
    This is a sig. It is appended to the end of comments I post.
    1. Re:Increase page load times? by Anonymous Coward · · Score: 0

      Maybe if you're a slow reader?

    2. Re:Increase page load times? by Anonymous Coward · · Score: 0

      In life, you need to remember to stop and smell the roses. In our hectic, faced paced web browsing lives this could be just the respite we all need.

    3. Re:Increase page load times? by grommit · · Score: 2, Funny

      Maybe you'd like to increase page load times to make your site behave like a Web 2.0 (3.0?) site so you can get some investor to give you truckloads of money?

    4. Re:Increase page load times? by skiingyac · · Score: 1

      Not every Mr. Slowsky lives far from the DSL central office.

  2. I use it all the time, but be aware.. by slashkitty · · Score: 5, Interesting
    It works great for images. I remember when I first started using it. It cut the number of http requests to the server in 1/2, and substantially reduced the bandwidth usage.

    However, if you are one to be changing images around, like using a Holiday logo or something, you have to change the image file name to force browsers to reload it.

    I'm sorta surprised that slashdot doesn't use this on their images:

    wget -S --spider http://images.slashdot.org/logo.png
    --08:31:01-- http://images.slashdot.org/logo.png
    => `logo.png'
    Resolving images.slashdot.org... 66.35.250.55
    Connecting to images.slashdot.org|66.35.250.55|:80... connected.
    HTTP request sent, awaiting response...
    HTTP/1.0 200 OK
    Date: Mon, 04 Dec 2006 14:30:12 GMT
    Server: Boa/0.94.14rc17
    Accept-Ranges: bytes
    Cache-Control: max-age=43200
    Connection: Keep-Alive
    Keep-Alive: timeout=10, max=1000
    Content-Length: 7256
    Last-Modified: Fri, 01 Dec 2006 03:02:14 GMT
    Content-Type: image/png
    Length: 7,256 (7.1K) [image/png]
    200 OK

    --
    -- these are only opinions and they might not be mine.
    1. Re:I use it all the time, but be aware.. by Fastolfe · · Score: 1

      You should need to change filenames. You just need to come up with a good age/expiration scheme for whatever content you want to see cached.

      If you're making regular changes to a particular piece of content, your max-age and/or expiration date needs to be set up to facilitate that. If you change something every day at 6am, set your expiration date for 6am. If you could change it at any time, and you want to see changes picked up within an hour, set a max-age=3600. Let your caching policies work with your content management policies, and you won't have to do invasive things like this.

      If your standard caching policies are such that you normally are OK with things being cached days or even weeks at a time, and you anticipate making a change in a week, set the expiration date of the resource to be the date you anticipate making the change, and set the max-age of the cached resource low enough that anyone requesting the resource now knows to check back when the change occurs. Maybe add a must-revalidate directive to ensure everyone has the freshest copy. Then, make your change, and bump up the age/expiration date again.

      It really pays to read up about how HTTP caching works. Too often I see kludges where developers just want to work around caching when it doesn't work like they want it to, when they could save themselves a headache by just configuring it properly in the first place.

    2. Re:I use it all the time, but be aware.. by Fastolfe · · Score: 1

      Sorry, that should read: You shouldn't need to change filenames.

  3. bad title by Anonymous Coward · · Score: 0

    article is not about caching the htaccess file
    it's about caching via the htaccess file

    "Speed Up Sites with Caching via htaccess"

  4. caching htaccess? by oneiros27 · · Score: 4, Informative

    Here I was, thinking that someone had a solution for the slowdown caused by using htaccess files in the first place.

    They don't.

    If you're going to set caching in your server to decrease load time, make sure to set in the main configuation files, and disable htaccess, which can potentially increase the time of every page load. (the decreased hits and bandwidth may be an advantage to you -- you'll have to benchmark to see if this solution helps or hurts you for your given platform and usage patterns)

    --
    Build it, and they will come^Hplain.
    1. Re:caching htaccess? by mwvdlee · · Score: 2, Interesting

      On shared hosting, which most (smaller) sites use, you typically don't have access to the server configuration files. I have a shared hosted site and I'm definitly going to implement this for images and other static files.

      What is the performance loss in htaccess files anyway? For instance, would it be faster to have htaccess redirect moved pages or would it be faster to have a server-side script (i.e. php, python, etc.) do redirecting?

      --
      Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
    2. Re:caching htaccess? by Mad+Merlin · · Score: 1
      What is the performance loss in htaccess files anyway?

      The performance loss comes from Apache having to check the current directory and every directory above it up to the webroot for .htaccess files. This means that if you store your images in /foo/bar/etc/images/ and you have 50 images per page, Apache needs to check for 5*50 = 250 .htaccess files just to serve the images. A stat isn't that expensive, but they add up.

      For instance, would it be faster to have htaccess redirect moved pages or would it be faster to have a server-side script (i.e. php, python, etc.) do redirecting?

      Maybe. It depends on the use case. Try it yourself.

  5. httpd.conf by Nos. · · Score: 5, Informative

    Its in the comments on that site, but remember, you're always better off putting this kind of stuff in your httpd.conf as opposed to .htaccess files. htaccess files reduce performance on your webserver.

    1. Re:httpd.conf by bcat24 · · Score: 3, Informative

      Unless you're on a shared hosting plan. Then you kind of need to go with .htaccess.

  6. So using cache control headers is "news", huh? by xxxJonBoyxxx · · Score: 3, Informative

    So using cache control headers is "news", huh?

    Also, from the comment on this "innovative" article:

    1.DrBacchus said:
    Yes, these techniques *can* result in performance improvements, but should be put in your main server configuration file, rather than in .htaccess files. .htaccess files, by their very nature, cause performance degradation on your website, and so should be avoided whenever possible.

    1. Re:So using cache control headers is "news", huh? by FooAtWFU · · Score: 1
      This is news to a lot of people, especially amateur web site owners.

      Anyway, next thing to do is teach equivalent techniques to PHP programmers. You, too, can learn the wonders of the HTTP specification!

      --
      The World Wide Web is dying. Soon, we shall have only the Internet.
    2. Re:So using cache control headers is "news", huh? by mrsbrisby · · Score: 3, Insightful
      .htaccess files, by their very nature, cause performance degradation on your website, and so should be avoided whenever possible.
      NO! .htaccess files by their implementation cause performance degradation. The Apache group could have made that degradation zero, but thought that a Novell Netware port was more important.

      Seriously, Linux's F_NOTIFY has been around since 2.4 and other operating systems have similar.
    3. Re:So using cache control headers is "news", huh? by Trogre · · Score: 2, Informative

      I'm interested in this. Can you please elaborate and point me to the relevant discussion or, even better, a patch?

      --
      "Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
    4. Re:So using cache control headers is "news", huh? by Anonymous Coward · · Score: 0

      If you find his ideas fascinating, perhaps you should subscribe to his newsletter.

  7. Increase page load times? by tmh+-+The+Mad+Hacker · · Score: 1

    " Increase your page load times and save bandwidth"

    Probably not exactly what most people want to do, but yeah, if you can throttle your server so that page load time approaches infinity, bandwidth consumption will approach zero -- especially once people stop trying to use your site...

  8. htaccess performance loss by oneiros27 · · Score: 3, Informative

    If you're on a shared hosting site, and htaccess is already turned on, you're already affected.

    Basically, if someone were to request a file from your site: /this/is/some/deep/file

    Then apache has to look for, and if there, parse, each of the following files: /.htaccess /this/.htaccess /this/is/.htaccess /this/is/some/.htaccess /this/is/some/deep/.htaccess

    And then, should the rules allow the file to be served, it'll be sent to the requestor.

    So the problem isn't the .htaccess file itself (unless you have a whole bunch of unnecessary rules, increasing the size of the file), but just turning on support for .htaccess files. I think the parsing of the .htaccess files is cached, but the system still has to check for the files each time, and see if they've changed.

    As for question about redirects -- you have to tell the system how to process the 404s ... I've seen lots of implementations, including setting a template system to resolve all 404s, and then using the path requested to drive a template system ... which of course meant that _every_ page on the site was served as a 404. (I was given the task of trying to figure out what the person had done, as they had tried upgrading the site, and wanted to archive the old site, and it took me much longer than expected to figure out what the horrible hack was that they used. (and of course, no services had cached the site, so I could see what it used to look like, because it always served 404s)) ... unless you have some way of specifying a handler for 404 errors without .htaccess (which you don't, as you've mentioned it's shared hosting), the question about .htaccess makes no sense.... it's still getting called, and you're still taking the performance hit, no matter what you pass off to.

    --
    Build it, and they will come^Hplain.
    1. Re:htaccess performance loss by abradsn · · Score: 1

      I think this is correct behaviour, but there is nothing stopping anyone from changing the code on their own servers. If this mattered to me on my server, I would just modify the 3 lines of code that do this, and only have it load on startup of the server.

      Or, it could be an added option to the already huge config file, during the next release. Maybe someone wants to add the feature?

      If code is not your expertise, then you can probably pay someone $100 to do it for you.

    2. Re:htaccess performance loss by NerveGas · · Score: 1

      The nice part, however, is that web-serving is so easily and cheaply scalable that it's almost pathetic. If your alternative is to buy an extra few megabits (at guaranteed bandwidth rates, not shared-connection rates), then for what you'd pay for bandwidth in a single month, you can throw in another Apache machine to help carry the load.

      I'm pretty excited for "hardware season" this year (making purchases to accomodate growth in our peak season). These are sexy, cheap, and compact. At 14" deep, I can double them up, mounting them in both the front and back of the rack.

      --
      Oh, you're not stuck, you're just unable to let go of the onion rings.
    3. Re:htaccess performance loss by Fastolfe · · Score: 1

      The contents of the .htaccess file are only parsed when the .htaccess file is changed. Each request will still cause a stat to occur, but chances are, for frequently-requested files, this will be handled out of memory without requiring a hit to the disk. There is a small performance hit for the stat invocations, but it's pretty small, especially compared with the IO that occurs to actually fetch and deliver the resource that was requested. (And, as you already note, a stat seeking a file that doesn't exist is probably just as expensive as one seeking to see when a .htaccess file was last updated, so you're already incurring the performance hit.)

      Unfortunately all of this is speculation; I'm not aware of any concrete numbers that show how performance is impacted.

  9. _decrease_ page load times by wilton · · Score: 0, Redundant

    shouldn't that be _decrease_ page load times

    --
    per mere, per terras
  10. Go to Apache 2.2 by tcopeland · · Score: 1

    I recently upgraded RubyForge to Apache 2.2 and it's been such an improvement. mod_cache is great, the worker MPM is solid, and now I can run ViewVC under mod_python. And there's mod_proxy and mox_proxy balancer for making Rails apps work nicely with Mongrel. If you're still back on 1.3, I highly recommend 2.2.

  11. I meant Decrease page load times/str by produke · · Score: 1

    Ooops.. I meant

    Decrease page load times

    As an example of how I implement this caching scheme..

    <FilesMatch "\.(flv|gif|jpg|jpeg|png|ico)$">
    Header set Cache-Control "max-age=2592000" # YEAR
    </FilesMatch>
    <FilesMatch "\.(js|css|pdf|swf)$">
    Header set Cache-Control "max-age=604800" # WEEK
    </FilesMatch>
    <FilesMatch "\.(html|htm|txt)$">
    Header set Cache-Control "max-age=600" # 10 minutes
    </FilesMatch>

    So the js and css get cached for a week, but if I make a change to one of them my site visitors won't get the updated content!

    But there is a fix for that that I use on all my sites now.
    Because the .html file is the file that specifies the URI for the js, css, and thereby every file on the site, I can just make a change to the html file and all site visitors update their cache!

    href="/z/c/askapache.css?v1008" type="text/css" />
    href="/z/c/askapache.js?v1008" type="text/javascript"></script>

    So when I make change I rename the css and js file to ?v1009

    1. Re:I meant Decrease page load times/str by Fastolfe · · Score: 1

      Some suggestions:

      • You probably want to set an Expires header here too for HTTP 1.0 user agents.
      • Other users may want to consider using "Header add" rather than "Header set" so as not to overwrite other Cache-Control headers set elsewhere in the handling of the response.

      It's important to be aware that the max-age cache-control directive is only one part of a site's caching strategy. The presence of Last-Modified headers and/or Etag allow for Conditional GETs, which are also great at improving a site's performance. Yes, browsers still have to make a request against the site, but it takes far less time to handle a 304 response than it does to retrieve and handle the resource's content under a 200 response. You should balance your freshness requirements against your need to have resources served out of local or shared caches. Your query string approach is fairly hackish and only works around undesirable caching policies rather than making the caching policies work for your content requirements.

      Keep in mind that it's entirely likely that browsers aren't even going to respect some of these Cache-Control headers over the lengths of time you're specifying here. You'll probably get more predictable results with age requirements on the order of hours or a day, at most. (Browsers will still make conditional GETs and your server can still respond with 304s even if a cached resource is no longer fresh.)

      Your workflow for planned content changes could involve changes to a page's Expires or Cache-Control headers well in advance of the change to ensure old content will be expired when the new content needs to be published.

  12. the benefit is a well-thought-out caching scheme by produke · · Score: 1

    For everyone making the point about the performance hits of running these types of operations in htaccess as opposed to httpd.conf file yes I don't think anyone would argue with that, but it is true that this is for those billions of people on some type of shared hosting environment.. Besides, You can use the AllowOverride directive in httpd.conf to allow .htaccess in /z/image/ folder but not /z/df/even/cgi-b/live/ folder. Just turn it off, problem solved.

    Remember the article is called "Speed Up Sites with htaccess Caching" for a specific reason, this is about htaccess.. Power to the people! :)

    Just like how a lot of major servers with major processing power and bandwidth all of a sudden need to diagnose why their services aren't performing as expected. The reason in this example is because 99% of their customers were connecting to their bada$$ server every 3-7 minutes to check their email.. Only 1% of their customers were using IMAP..

    So likewise this may not seem like much of a performance gain to cut a small sites bandwidth IN HALF by implementing a well-thought-out caching scheme, but when you think what it would be like if the giant web hosting providers implemented a watered-down version of this in httpd.conf, man that would be huge and would help everyone. The bandwidth savings are dramatic