Optimizing Page Load Times

HTTP Pipelining by onion2k · 2006-10-29 22:13 · Score: 5, Informative

If the user were to enable pipelining in his browser (such as setting Firefox's network.http.pipelining in about:config), the number of hostnames we use wouldn't matter, and he'd make even more effective use of his available bandwidth. But we can't control that server-side.

For those that don't know what that means: http://www.mozilla.org/projects/netlib/http/pipeli ning-faq.html

I've had it switched on for ages. I sometimes wonder why it's off by default.

--
http://twitter.com/onion2k

Re:HTTP Pipelining by baadger · 2006-10-29 22:49 · Score: 4, Interesting

This is NOT just Opera fanboyism, but Opera however *does* do pipelining by default (with a safe fallback)

Opera pipelines by default - and uses heuristics to control the level of pipelining employed depending on the server Opera is connected to
Reference

HTTP/1.1 Design by keithmo · 2006-10-29 22:14 · Score: 5, Insightful

From TFA:

By default, IE allows only two outstanding connections per hostname when talking to HTTP/1.1 servers or eight-ish outstanding connections total. Firefox has similar limits.

And:

If your users regularly load a dozen or more uncached or uncachable objects per page load, consider evenly spreading those objects over four hostnames. Due to browser oddness, this usually means your users can have 4x as many outstanding connections to you.

From RFC 2616, section 8.1.4:

Clients that use persistent connections SHOULD limit the number of simultaneous connections that they maintain to a given server. A single-user client SHOULD NOT maintain more than 2 connections with any server or proxy.

It's not a browser quirk, it's specified behavior.

Re:HTTP/1.1 Design by jakoz · 2006-10-29 23:24 · Score: 2, Informative

Then perhaps they need to invest in some modern systems. The following definitions are interesting:

3. SHOULD This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course. 4. SHOULD NOT This phrase, or the phrase "NOT RECOMMENDED" mean that there may exist valid reasons in particular circumstances when the particular behavior is acceptable or even useful, but the full implications should be understood and the case carefully weighed before implementing any behavior described with this label. They don't say DO NOT or MUST NOT. Like they say, the behavior can is useful... and they could see this would be the case IN 1997!

It is time we updated things. It's particularly funny that Microsoft found this RFC, of all things, to obey.
Re:HTTP/1.1 Design by x2A · 2006-10-29 23:29 · Score: 4, Interesting

The limit's not to do with your connection speed as such - it's to do with being polite and not putting too much drain on the server your downloading from.

--
The revolution will not be televised... but it will have a page on Wikipedia
Re:HTTP/1.1 Design by x2A · 2006-10-29 23:48 · Score: 3, Insightful

Depends on server load; how many of the objects are static vs dynamic etc. 5-10 connections for images might be okay, but for dynamic objects it might not be. Perhaps it should be specifiable within the html page?

--
The revolution will not be televised... but it will have a page on Wikipedia
Re:HTTP/1.1 Design by hany · 2006-10-30 00:56 · Score: 2, Insightful

At the end you have just one pipe to push that data even if you have say 100 connections.

By still having one pipe with certain capacity (i.e. bandwidth) but increasing amount of connections, you're wasting your bandwidth for maintenance of multiple connections.

Also you're wasting the resources of the server for the same reason.

At the end, you're slowing yourself down.

Yes, there are scenarios where using for example 4 connections as opposed to just 1 yields better download performance but AFAIK almost all such scenarios are very specific for given implementation of webserver, given implementation of network, given implementation of browser, ...

So to sum myself up: I think that the 1-2 active connections per client as mentioned in RFC 2616 was generaly valid in 1997, is generaly valid now and also will be generaly valid in the future.

Contrary, "the hack" of using multiple connections to speed-up downloads may have been, is and may be in the future sometimes valid but generaly degrades performance.

Pity is, Aaron Hopkins is mentioning true solution (HTTP pipelining) only as "(Optional)" and at total end of the article. But he correctly describe his previous propositions as "tricks". :)

--
hany

Re:Erm.. huh? by rf0 · 2006-10-29 22:15 · Score: 3, Informative

If you are on a fast broadband pipe you are correct but there is still a lot of other people on small connections with low upload limits (64k-256kbit) and I can see why this could be a bottle neck as it can't get the requests out fast enough. That said there are things a user can do to help themselves.

Firstly if the ISP has a proxy server then using it will reduce the trip time for some stored content meaning it only has to go over a few hops than prehaps all the way across the world. You can also look at something like Onspeed which is a paid for product but compresses images (though makes them look worse) and content and can give a decent boost on very slow (GPRS/3G) connections and also get more out of your transfer quota.

--
Cheap UK and US VPS

Simulation software available? by leuk_he · 2006-10-29 22:17 · Score: 3, Informative

"Regularly use your site from a realistic net connection. Convincing the web developers on my project to use a "slow proxy" that simulates bad DSL in New Zealand (768Kbit down, 128Kbit up, 250ms RTT, 1% packet loss) rather than the gig ethernet a few milliseconds from the servers in the U.S. was a huge win. We found and fixed a number of usability and functional problems very quickly."

What (free) simulation is available for this? I only know dummynet which requires a linux server and some advanced routing. But surely there is more. Is there?

Re:Simulation software available? by Jussi+K.+Kojootti · 2006-10-29 22:34 · Score: 2, Interesting

Try trickle. It won't do fancy stuff like simulating packet loss, but a
trickle -d 100 -u 20 -L 50 firefox
should limit download, upload and latency rates.
Re:Simulation software available? by ggvaidya · 2006-10-29 23:10 · Score: 4, Interesting

You could try using Sloppy. I've only ever heard about it because its programmer has a very nice page on getting a free Thwarte FreeMail certificate to work with Java WebStart, so this isn't a recommendation or anything. Looks pretty decent, though.

Re:Erm.. huh? by mabinogi · 2006-10-29 22:27 · Score: 3, Interesting

1.5Mbps ADSL.
5 Seconds to refresh the page on slashdot. That's just to getting the page to actually blank and refresh, there's still then the time it takes to load all the comments.
Sometimes it's near instant, but most of the time it's around about that.
Most of the time is spent "Waiting for slashdot.org", or "connecting to images.slashdot.org".
It used to be a hell of a lot worse, but I installed adblock to eliminate all the extra unecesary connections (google analytics, and the various ad servers). I didn't care about the ads or the tracking, it just bugged me that those things made my browsing experience slower.
I find it funny that this guy is suggesting spreading across multiple hosts, it's my completely unscientific and entirely anecdotal experience that the more host names the browser has to resolve to load the page, the longer it takes before you get to see anything.

I'm in Australia so there's a minimum 200 ms latency on roundtrips - five roundtrips and you've added 1 second to the rendering time. Approaches that add extra DNS lookups really aren't going to help. (Though the DNS lookups themselves aren't necesarily going to take 200ms - they could be much faster if they're in my ISPs DNS cache, or the could be longer if it's got to query them)

--
Advanced users are users too!

Css and Scripts by Gopal.V · 2006-10-29 22:36 · Score: 5, Informative

I've done some benchmarks and measurements in the past which will never be made public (I work for Yahoo!). And the most important bits in those have been CSS and Scripts. A lot of performance has been squeezed out of the HTTP layers (akamai, Expires headers), but not enough attention has been paid to the render section of the experience. You could possibly reproduce the benchmarks with a php script which does a sleep() for a few seconds to introduce delays at various points and with a weekend to waste.

The page does not start rendering till the last CSS stream is completed, which means if your css has @import url() entries, the delay before render increases (until that file is pulled & parsed too). It really pays to have the quickest load for the css data over anything else - because without it, all you'll get it a blank page for a while.

Scripts marked defer do not always defer and a lot of inline code in <script> tags depend on such scripts that a lot of browsers just pull the scripts as and when they find it. There seems to be just two threads downloading data in parallel (from one hostname), which means a couple of large (but rarely used) scripts in the code will block the rest of the css/image fetches. See flickr's organizr for an example of that in action.

You should understand that these resources have different priorities in the render land and you should really only venture here after you've optimized the other bits (server and application).

All said and done, good tutorial by Aaron Hopkins - a lot of us have had to rediscover all that (& more) by ourselves.

--
Quidquid latine dictum sit, altum videtur

Re:Css and Scripts by Evets · 2006-10-29 22:56 · Score: 2, Informative

I've found that once a page has layout it will begin rendering and not before. CSS imported in the body do not prevent rendering. CSS imported in the HEAD will. In fact, the css and javascript in the head section seem to need downloading prior to rendering.

I have also found that cached CSS and Javascript can play with you a little bit. When developing a site you tend to see an expected set of behaviors based on your own experience with a site - but you can find later that having the external files either cached or not cached can have an effect on things. (i.e. a cached javascript file with a load event may be triggered before the DOM is ready if you aren't checking for the readiness of the DOM itself)

ETAG headers are very important as well. Running "tail -f access.log" while you browse your own site will show a lot of redundant calls to javascript, css, and image files that should be cached but aren't. IE has a setting of "Check for new content" or something like that that really fouls up css background images without proper expiration headers (lots of flickering).

There is still a significant portion of the web community that utilizes dialup connections. These users are seemingly ignored by many popular sites. I try to get pages to load in under 8 seconds for dialup users, but with any significant javascript or CSS it is sometimes a difficult task. It's much easier on consecutive page loads by forcing cacheing, but that doesn't matter one bit if the user goes elsewhere because the initial page load was too slow.

There are certainly a plethora of optimization techniques not even touched on in this article. I know that Google and Yahoo are very keen on these subjects and it's worth taking a look at the source of some of their pages for ideas. Last I checked, they could care less about validation, though. But with the bandwidth they must utilize saving a few bytes here and there can mean significant dollar differences at the end of the month and what truly matters is whether or not the browser renders the page correctly.

Caching of dynamic content by baadger · 2006-10-29 22:43 · Score: 4, Insightful

This is a good place to start testing the 'cacheability' of your dynamic web pages. Quite frankly it's appauling that even the big common web apps used today like most forum or blog scripts don't generate sensible Last-Modified, Vary, Expires, Cache-Control headers. With most of the metadata you need to generate this stuff stored in the existing database scheme theres just really no excuse for it.

Abolishment of nasty long query strings into nicer, more memorable URI's is also something we should be seeing more of in "Web 2.0." Use mod_rewrite, you'll feel better for it.

Those tenths of seconds add up by giafly · 2006-10-29 22:53 · Score: 4, Informative

If a big part of your job involves using a Web-based application, reducing page-load times really helps. My real job is writing one of these applications and getting the caching right is much more important than sexier topics like AJAX. There's some good advice in TFA.

--
Reduce, reuse, cycle

Connection Limits by RAMMS+EIN · 2006-10-29 23:04 · Score: 2, Interesting

``By default, IE allows only two outstanding connections per hostname when talking to HTTP/1.1 servers or eight-ish outstanding connections total. Firefox has similar limits.''

Anybody know why? This seems pretty dumb to me. Request a page with several linked objects (images, stylesheets, scripts, ...) in it (i.e., most web pages), and lots of these objects are going to be requested sequentially, costing you lots of round trip times.

--
Please correct me if I got my facts wrong.

Re:Connection Limits by MathFox · 2006-10-29 23:32 · Score: 2, Informative

The "max two connections per webserver" limit is to keep resource usage in the webserver down; a single apache thread can use 16 or 32 Mbyte of RAM for dynamicly generated webpages. If you get 5 page requests a second and it takes (on average) 10 seconds to handle the request and send back the results you need 1 Gb RAM in the webserver, if you can ignore Slashdot. (2-4 Gb to handle peaks)
If you have a second webserver for all static data, that can be a simpeler http deamon with 1 Mb/connection or less. You can handle more parallel connactions (and Akamai the setup if needed!)
Yes, it's best to avoid inline images, Google text ad objects, etc. But allowing parallel loading of the objects (and that's the trick with using several separate hosts for images) you can take 8 or 16 roundtrips at the same time; here is your perceived speedup.

--
extern warranty;
main()
{
(void)warranty;
}

Requests Too Large by RAMMS+EIN · 2006-10-29 23:08 · Score: 2, Interesting

FTFA:

``Most DSL or cable Internet connections have asymmetric bandwidth, at rates like 1.5Mbit down/128Kbit up, 6Mbit down/512Kbit up, etc. Ratios of download to upload bandwidth are commonly in the 5:1 to 20:1 range. This means that for your users, a request takes the same amount of time to send as it takes to receive an object of 5 to 20 times the request size. Requests are commonly around 500 bytes, so this should significantly impact objects that are smaller than maybe 2.5k to 10k. This means that serving small objects might mean the page load is bottlenecked on the users' upload bandwidth, as strange as that may sound.''

I've said for years that HTTP requests are larger than they should be. It's good to hear it confirmed by someone who's taken seriously. This is even more of an issue when doing things like AJAX, where you send HTTP requests and receive HTTP responses + XML verbosity for what should be small and quick user interface actions.

--
Please correct me if I got my facts wrong.

Re:Pipelining by smurfsurf · 2006-10-29 23:21 · Score: 2, Informative

Pipelining is not the same as keep-alive. Although pipelining needs a keep-alive connection.
Pipeling means "multiple requests can be sent before any responses are received. "

Re:Erm.. huh? by x2A · 2006-10-29 23:23 · Score: 3, Informative

There are other factors.

1 - keepalive/pipelining connections means only 1 dns lookup is performed, often cached on your local machine means this delay is minimal.

2 - the dns lookup can be happening for the second host while connections to the first host are still downloading, rather than stopping everything while the second host is looked up. This hides the latency of the second lookup.

3 - most browsers limit the number of connections to each server to 2. If you're loading loads of images, this means you can only be loading two at once (or one while the rest of the page is still downloading). If you put images on a different host, you can get extra connections to it. Also, cookies will usually stop an object from taking advantage of proxies/caches. Putting images on a different host is an easy to way make sure they're not cookied.

--
The revolution will not be televised... but it will have a page on Wikipedia

Re:Pipelining by TheThiefMaster · 2006-10-29 23:26 · Score: 4, Informative

Pipelining is not keep-alive. Keep alive means sending multiple requests down one connection, waiting for the response to the request before sending the next. Pipelining sends all the requests at once without waiting.

Keep-alive no:
Open connection
-Request
-Response
Close Connection
Open connection
-Request
-Response
Close Connection
-Repeat-

Keep-alive yes:
Open connection
-Request
-Response
-Request
-Response
-Repeat-
Close Connection

Pipe-lining yes:
Open connection
-Request
-Request
-Repeat-
-Response
-Response
-Repeat-
Close Connection

Gmail by protomala · 2006-10-29 23:39 · Score: 2, Insightful

I hope they apply this study on Gmail. Using it on a non-broadband connection (plain 56k modem) is a pain unless you use the pure HTML view that is crap compared to other HTML webmails.
The fun is that newer AJAX products from google (like goffice) don't suffer from this behavior, they have a much more cleaner code (just pick view code on your favorite browser and see). Probally Gmail HTML/Javascript is already showing it's age, and paying the price for being a first at google AJAX apps.

Re:Pipelining by x2A · 2006-10-29 23:45 · Score: 2, Informative

Keep-alive sends the next request after the first has completed, but on the same connection (this requires the server to send Content-length: header, so it knows after how many bytes the page has finished loading. Without this, the server must close the connection so the browser knows it's done).

Pipelining sends requests out without having to wait for the previous to complete (this does also require a Content-length: header. This is fine for static files, such as images, but many scripts where output is sent straight to the browser as it's being generated will break this, as it won't know the content length until generated has completed).

--
The revolution will not be televised... but it will have a page on Wikipedia

Some reasons by harmonica · 2006-10-29 23:56 · Score: 2, Informative

I've had it switched on for ages. I sometimes wonder why it's off by default.

Some reasons against pipelining.

Re:Erm.. huh? by orasio · 2006-10-30 00:32 · Score: 3, Interesting

User perception of responsiveness on interfaces has a lower bound of 200 ms. Some times even lower.

Just because 1 seconds seems fast, it doesn't mean that it's fast enough to stop improving.
When you reach that 200ms barrier, the interface has perfect responsiveness, a bigger interval is always perfectible.

Re:4 hostnames and security by mrsbrisby · 2006-10-30 02:34 · Score: 2, Interesting

Nice trick with 4 hostnames, but this means 4 security contexts for your content, which may make a lot of development hard (especially client based with JavaScript).

Why? Doesn't your javascript explicitly state document.domain to the common root?

Not to mention the management issues of having to link to content on 4 different domains in an efficient enough manner.

You mean creating four hostnames for the same address? Or do you mean changing a few src="" attributes?

This leaves us with pipelining on the client, which could results in much worse load peaks on the servers though.

Wrong. It leaves us with nothing. Didn't you read the article? HTTP Pipelining isn't enabled in the big two web browsers, so as far as "reality" is concerned it doesn't exist. It's like IPV6- who cares how much "better" it is if no one is using it?

Static content in multipart packages? by AkimAmaklav · 2006-10-30 03:24 · Score: 2, Interesting

Has anyone played around with multipart/mixed or such replies? These could reduce the number of requests but is there any support for them in browsers?

All the offsite stuff is ads anyway. Block them. by Animats · 2006-10-30 05:39 · Score: 2, Insightful

This is an excellent argument for ad blocking. The article never mentions the basic truth - almost all offsite content on web pages is ads. (Of course, this is someone from Google talking, and Google, after all, is an ad-delivery service which runs a search engine to boost their hits.) Web page load is choking on ads. I noted previously that some sites load ads from as many as six different sources. This saturates the number of connections the browser supports. Page load then bottlenecks on the slowest ad server.

So install AdBlock and FlashBlock in Firefox, and watch your browsing speed up.

Web-based advertising looks like a saturated market. Watch for some big bankruptcies among advertising-supported services.

29 of 186 comments (clear)