Google Publishes Zopfli, an Open-Source Compression Library
alphadogg writes "Google is open-sourcing a new general purpose data compression library called Zopfli that can be used to speed up Web downloads. The Zopfli Compression Algorithm, which got its name from a Swiss bread recipe, is an implementation of the Deflate compression algorithm that creates a smaller output size (PDF) compared to previous techniques, wrote Lode Vandevenne, a software engineer with Google's Compression Team, on the Google Open Source Blog on Thursday. 'The smaller compressed size allows for better space utilization, faster data transmission, and lower Web page load latencies. Furthermore, the smaller compressed size has additional benefits in mobile use, such as lower data transfer fees and reduced battery use,' Vandevenne wrote. The more exhaustive compression techniques achieve higher data density, but also make the compression a lot slower. This does not affect the decompression speed though, Vandenne wrote."
This team is clearly just trying to make a name for themselves. It improves over gzip by a mere 3% or so, but takes an order of magnitude longer to compress.
Their underlying implemented might be cool research. But it's practical merit is virtually nil.
Now, cue the people who are going to do some basic arithmetic to "prove" me wrong, yet who probably don't even bother using gzip content-encoding on their website right now, anyhow.
Looking at the data presented in the pdf, it seems to me that gzip does a fantastic job for the amount of time it takes to do it.
No sig. Move along - nothing to see here.
"Zopfli is a compression-only library, meaning that existing software can decompress the data." (source: http://littlegreenfootballs.com/page/294495_Google_Compression_Algorithm_Z). As long as the compression can be done on cached pages, hey- that's another 3-8% more people served with the same amount of bandwidth, without any additional requirements on the client side.
Visit http://ringbreak.dnd.utwente.nl/~mrjb/growingbettersoftware to download your free copy of the book
Looking at the data presented in the pdf, it seems to me that gzip does a fantastic job for the amount of time it takes to do it.
Pfft. Another blatant corporate shill for gzip early in the Slashdot comments. You can't trust anybody on the internet these...
Oh, wait, the data actually does say that. Huh. That's... a really weird feeling, having someone on the internet legitimately say something's good and have data to back it up.
In anything that is static enough that it will be downloaded many times in its lifetime, and not time sensitive enough that it needs to be instantly available when generated, very small gains in compression efficiency are worth paying very large prices in compression.
If you, for just one of many Google-relevant examples, host a fair number of popular JavaScript libraries (used on both your own sites -- among the most popular in the world -- and vast numbers of third party sites that use your hosted versions) and commit, once you have accept a particular stable version of a library, to hosting it indefinitely, you've got a bunch of assets that are going to be static for a very long time, and accessed very large numbers of times. One time cost to compress is going to be dwarfed by even a miniscule savings in transfer costs for those.
For server-hosted content, compression is obviously done on the server side, so that's not an additional requirement on the client side.
You probably wouldn't use this for time-sensitive, dynamic things like a cache page. You use it for completely static things, like, say, Google's hosted copies of stable versions of jQuery and other popular JavaScript libraries, and you do it once when you start hosting the content, not on the fly.
Why would you recompress static content every time it is accessed? For frequently-accessed, static content (like, for one example, the widely-used JavaScript libraries that Google hosts permanently), you compress it once, and then gain the benefit on every transfer.
For dynamic content, you probably don't want to do this, but if you're Google, you can afford to spend money getting people to research the best tool for very specific jobs.
The PDF linked in TFS is a paper which has detailed results of the testing, including time and size comparison with other DEFLATE implementations, including 7-zips.
I wonder how it compares to kzip (http://advsys.net/ken/utils.htm) which is trying to do the same just better and faster. Also google is trying to save 3% on gzipped content, but they dont use optipng/pngout on their images... up to 10% gains... jpegs, never heard of jpegtran google? it saves 20% on my digicam pictures (leaving exif and all meta intact).
Looking at the data presented in the pdf, it seems to me that gzip does a fantastic job for the amount of time it takes to do it.
So the obvious conclusion is that what we need is a gzip -11 option.
Those (except, naturally, gzip) are not compatible with gzip decompressors (of the type found in virtually every browser), so they are useless for the main use case for this, which is as for server side compression for web content that is completely invisible, compared to gzip, to web clients (requiring no changes and having similar-to-traditional-gzip decompression time), allowing reduced bandwidth (and, assuming the content is precompressed, which given the speed it better be, reduced storage space for the host) saving the host money and reducing client-side latency and bandwidth use.
I presume a different country of origin for the research ...
cf. http://en.wikipedia.org/wiki/Decimal_mark#Examples_of_use
- Michael T. Babcock (Yes, I blog)
I didn't forget. It's just utterly irrelevant to the current discussion.
systemd is Roko's Basilisk.