Gzip Encoding of Web Pages?
Both Brendan Quinn and msim were curious about the ability to send gzip-encoded Web pages. Brendan asks: "It's possible to make Apache detect the "Accept-encoding: gzip" field sent by NS 4.7+, IE 4+ and Lynx, and send a gzip-encoded page, thus saving lots of bandwidth all over the place. So why don't people do it?
Here is a module written by the Mozilla guys a couple of years ago that -almost- does what I want, and I could change it pretty easily... but I thought someone else would have done it by now? eXcite do it, does anyone know of any other large-scale sites that use gzip encoding?"
"If you have LWP installed, you can check with:
GET -p '<my proxy>' -H 'Accept-encoding: gzip' -e http://www.site.com/ | less
Try that with 'www.excite.com' and you'll get binary (gzipped) data. That's what I want to do."
- A.P.
--
* CmdrTaco is an idiot.
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
This is a bit of a plug, but I found a really big win for the server side (not the client side) when I added this feature to AxKit (link in .sig). I'm behind a 64Kb line, and some of the AxKit pages are pure documentation. This feature reduced the outgoing page size by about 80% for many pages, which seriously helps me deliver more content to my users. And the gzipped content is cached, so its just as fast as the non-gzipped content when using cacheable pages.
Yes, its not much help for images, but then you just shouldn't enable this concept for images.
Apache::GzipChain can also provide this option for people working with static pages on mod_perl enabled servers, but it has a serious memory leak in it that I found last week (and posted details of to the mod_perl mailing list).
Matt. Want XML + Apache + Stylesheets? Get AxKit.
When ever I try to open a file that's been gzipped, Netscape (4.75 on linux) automatically prompts me with a file dialog box. This is even if I'm reading it straight from the file system. Thanks
the good ground has been paved over by suicidal maniacs
You just made it so that pages can't incrementally load any more. The browser would have to wait until the whole .pak was downloaded before it could start laying out the page.
Yes, there are many places along the transmission lines where compression is attempted, but like the standard setting in most disk compression packages it's a little simple and typically does the worst job of compression in the system. Since compression in a modem is handled independent of any CPU, if you can do better somewhere else it then it doesn't really matter if the modem's efforts are wasted.
In addition, people have been saying it isn't worth compressing .gif or .jpg files. While that's typically true with .gif files, .jpgs can usually have 10-15% of their bulk squeezed out even with the humble zip program.
I'm a huge fan of compression and I strongly believe that transmission of compressed HTML files will have a major positive impact on the 'Net. Don't just think of the lower serving overhead on the servers, think of all the (caching) proxies and other routers and gateways. HTML files seriously lose 80% of their bulk when compressed.
But we need to go further. We need to start bringing in a new highly compressed image format now so it's in popular use before 2005. There are a couple of nice fractal formats around that result in smaller files than the equivalent zipped .jpg -- we need to get at least one into the standard installation of the next IE or NS.
Actually, you can display the files in the order they're packed, you just can't parallel download so some of the multilink systems might be disadvantaged...
Something like;
- Client: I want http://blah.com/foo.html
- Server: That has files; foo.html, foopic1.gif, foopic2.jpg/foopic2.fractal, fooflash & adiframe10111.html
- C: I have adiframe10111.html and I support
.fractal - S: Here is foo.html.your.pak
Make any sense?Doesn't Keep-Alive in HTTP/1.1 take care of the problem of sending multiple resources for one page?
Though I definitely agree with you about the whole multiple-version of a single resource thing (foopic2.jpg/foopic2.fractal)
pooptruck
Acctually I built in GZIP compression to the core product at the company I'm working for (a web application) about a year ago. All HTML content coming out of our application passes through a layer which examines the browser and compresses it. The programmers never need to think about it. All the compression is done in realtime though, so there is a minute cpu overhead assosciated with it. We average about 4% extra cpu time because of GZIP. However, we've been averaging about 75% compression of our html. That -triples- the speed of page loads on modems. It's really noticable when I'm doing work from home. GZIP is a run-length compression, so if the page load stalls half way though, it still renders perfectly fine.
GZIP Compression is supported in NS4.5 and higher, IE4.01 and higher, and all versions of Mozilla. We have, in the past year, never had a reported problem with the GZIP compression. There are some known bugs if you try to compress other mimetypes other then html.
On a side note in probably about a month or so, I will be releasing into open source a java servlet web application framework. Included, among other goodies, is a layer which can automatically do GZIP encoding if the browser supports it. So anybody writing a web application using this automatically gets the benefits. Eventually coming to http://www.projectapollo.org
>Of course, for high-text, heavy traffic sites (for example, right here on /.), this may make some sense.
Ah, but (like I mentioned in another comment) when you have a page that is say 500k of text (a hundred or so comments), dynamically generated for each hit, the overhead of compression is rather dangerous, and if a server is already somewhat near capacity, it could slow it dramatically... if you can't cache it, and have high traffic, it's a big problem.
[Insert your own joke about Jon Katz wasting even more time with compression]
--
"It's tough to be bilingual when you get hit in the head."
Actually, there are the needed provisions to render those.. For owners of 95(c), 98, 98SE, Millenium and W2K, the needed .dlls come with the OS. For MacOS, 95(a), and (b), they were supplied when you installed Internet Explorer 4+.
Also, IE4+ does work correctly with gzipped pages.
.sig: Now legally binding!
I know for a fact that Netscape 4.75 can handle gzip-compressed data.
I set up a program to listen on port 80 and told NS to browse to localhost. It sent the "Accept-encoding: gzip". I then telnetted to www.excite.com:80 and sent that data. I got gzipped data in return. I then browsed the site using Netscape, and it loaded properly; therefore, Netscape 4.75 can handle gzipped downloads.
I then tricked IE 5.5 into sending the same HTTP request; I connected to a proxy (127.0.0.1) which would transparently forward to excite.com, filtered out the HTTP request, pasted in Netscape's; it also loaded properly.
So yes, gzip downloads work fine under Windows systems using Netscape 4.75 or IE5.5 (not sure about older versions, though), though IE5.5 sends an odd "Accept-encoding: gzip, deflate" which results in some sites not compressing it at all.
-- Sig (120 chars) --
Your friendly neighborhood mIRC scripter.
* Q
P.S. If you don't get this note, let me know and I'll write you another.
-- Sig (120 chars) --
Your friendly neighborhood mIRC scripter.
* Q
P.S. If you don't get this note, let me know and I'll write you another.
For conventional web pages, I agree. The slowness of most web sites is either due to graphics, or they are using some slow CGI on the server side. Compression of HTML wouldn't help them much.
There are also cases where the HTML is just plain resource-intensive for the browser to render (lots of nested tables, for example). Adding in the extra step of de-compressing wouldn't help there either.
However, I could see clients (not necessarily browsers) sucking down large chunks of XML in a gzipped form. It could be used for things like sending thousands of raw database records to a client application for further processing and presentation to the end user.
where there's fish, there's cats
http://perl.apache.org/guide/modules.html#Apache_G zipChain_compress_HTM
Here's what IE5.5 gives when I go to http://127.0.0.1/:
GET / HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/msword, application/vnd.ms-powerpoint, */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)
Host: 127.0.0.1
Connection: Keep-Alive
In comparison, Netscape 4.75:
GET / HTTP/1.0
Connection: Keep-Alive
User-Agent: Mozilla/4.75 [en] (Win98; U)
Host: 127.0.0.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */*
Accept-Encoding: gzip
Accept-Language: en
Accept-Charset: iso-8859-1,*,utf-8
The main points of interest are that IE5.5 can handle HTTP/1.1 while Netscape only requests HTTP/1.0, and that IE5.5 also claims to handle gzip AND deflate encoding, even though they're exactly the same (last time I checked, gzip used the deflate algorithm).
I also tried sending the IE5.5 HTTP request via telnet to www.excite.com; it returned plain text, whereas Netscape's HTTP request returned gzipped data.
-- Sig (120 chars) --
Your friendly neighborhood mIRC scripter.
* Q
P.S. If you don't get this note, let me know and I'll write you another.
The page quoted in the article shows its a pretty big win for some "typical use" sites on slower modems.
Incidentally, no extra load would be neccessary on the server for static content if it was pre-compressed.