Google Publishes Zopfli, an Open-Source Compression Library
alphadogg writes "Google is open-sourcing a new general purpose data compression library called Zopfli that can be used to speed up Web downloads. The Zopfli Compression Algorithm, which got its name from a Swiss bread recipe, is an implementation of the Deflate compression algorithm that creates a smaller output size (PDF) compared to previous techniques, wrote Lode Vandevenne, a software engineer with Google's Compression Team, on the Google Open Source Blog on Thursday. 'The smaller compressed size allows for better space utilization, faster data transmission, and lower Web page load latencies. Furthermore, the smaller compressed size has additional benefits in mobile use, such as lower data transfer fees and reduced battery use,' Vandevenne wrote. The more exhaustive compression techniques achieve higher data density, but also make the compression a lot slower. This does not affect the decompression speed though, Vandenne wrote."
This team is clearly just trying to make a name for themselves. It improves over gzip by a mere 3% or so, but takes an order of magnitude longer to compress.
Their underlying implemented might be cool research. But it's practical merit is virtually nil.
Now, cue the people who are going to do some basic arithmetic to "prove" me wrong, yet who probably don't even bother using gzip content-encoding on their website right now, anyhow.
Looking at the data presented in the pdf, it seems to me that gzip does a fantastic job for the amount of time it takes to do it.
No sig. Move along - nothing to see here.
I implement server software and a very important factor to me is how fast the library performs. Does this new one faster than zlib?
"Zopfli is a compression-only library, meaning that existing software can decompress the data." (source: http://littlegreenfootballs.com/page/294495_Google_Compression_Algorithm_Z). As long as the compression can be done on cached pages, hey- that's another 3-8% more people served with the same amount of bandwidth, without any additional requirements on the client side.
Visit http://ringbreak.dnd.utwente.nl/~mrjb/growingbettersoftware to download your free copy of the book
Looking at the data presented in the pdf, it seems to me that gzip does a fantastic job for the amount of time it takes to do it.
Pfft. Another blatant corporate shill for gzip early in the Slashdot comments. You can't trust anybody on the internet these...
Oh, wait, the data actually does say that. Huh. That's... a really weird feeling, having someone on the internet legitimately say something's good and have data to back it up.
In anything that is static enough that it will be downloaded many times in its lifetime, and not time sensitive enough that it needs to be instantly available when generated, very small gains in compression efficiency are worth paying very large prices in compression.
If you, for just one of many Google-relevant examples, host a fair number of popular JavaScript libraries (used on both your own sites -- among the most popular in the world -- and vast numbers of third party sites that use your hosted versions) and commit, once you have accept a particular stable version of a library, to hosting it indefinitely, you've got a bunch of assets that are going to be static for a very long time, and accessed very large numbers of times. One time cost to compress is going to be dwarfed by even a miniscule savings in transfer costs for those.
Yes, and gzip isn't so slow that it can only be used on static content. Even if you always generate into a cached version, do you really want to spend 81x the CPU time to gain a few percent in compression, and delay the content load on the client each time that happens?
It's far slower, but it could be worth the extra CPU cost for rarely-changing data served to all users, such as big blobs of CSS or JavaScript or public Atom feeds. Compress when it changes, cache the resulting .gz file, serve that.
Never ran actual tests myself, but I've been told 7-zip's encoder already does DEFLATE better than zlib. I wonder how Zopfli looks compared to it.
For server-hosted content, compression is obviously done on the server side, so that's not an additional requirement on the client side.
You probably wouldn't use this for time-sensitive, dynamic things like a cache page. You use it for completely static things, like, say, Google's hosted copies of stable versions of jQuery and other popular JavaScript libraries, and you do it once when you start hosting the content, not on the fly.
Why would you recompress static content every time it is accessed? For frequently-accessed, static content (like, for one example, the widely-used JavaScript libraries that Google hosts permanently), you compress it once, and then gain the benefit on every transfer.
For dynamic content, you probably don't want to do this, but if you're Google, you can afford to spend money getting people to research the best tool for very specific jobs.
I'll stick with compacted wrappers for JS functions plus LZMA binaries encoded in base91 and decoded on browser end for anything needing serious compression for the web.
So far I haven't even needed to really use that anyway, base91 and compacted code works fine.
On top of server-browser compression as well.
Don't forget those vector textures. Screw pixels, get with the times.
This looks interesting though. LZMA JS Decoder
There seems to be a bunch of different projects for this.
I wonder how it compares to kzip (http://advsys.net/ken/utils.htm) which is trying to do the same just better and faster. Also google is trying to save 3% on gzipped content, but they dont use optipng/pngout on their images... up to 10% gains... jpegs, never heard of jpegtran google? it saves 20% on my digicam pictures (leaving exif and all meta intact).
Looking at the data presented in the pdf, it seems to me that gzip does a fantastic job for the amount of time it takes to do it.
So the obvious conclusion is that what we need is a gzip -11 option.
There is zero need for this. There are a number of free compressors available that already cover the spectrum well, for example lzop, gzip, bzip2, xz (in order of better compression and more resource consumption). The stated 3-8% better compression in relation to zlib is not even worth considering using this. Also, anything new will have bugs and unexpected problems.
This is over-hyped and basically a complete non-event.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Most systems for image optimization, content reordering, inlining, linearization, etc. already work on models where the worst-case performance is "no impact" because the heavy-lifting is done asynchronously.
The first client doesn't get the compressed version -- it just gets whatever the origin server sent. But if caching is triggered by the first request the content queues for further out-of-band processing, and when the compressed version is ready subsequent requests get that version instead. And you can take it even further than that -- you could always compress with deflate -3 when you write to the cache, then when CPU load is low go back and re-compress the most frequently requested documents with this more expensive algorithm.
Look, I'm all for discussion, but it's obvious that you never configured a web server before or that you're pretty horrible at it.
Compress for static content is done -once- per file. That's it. If the file changes, you recompress it, and it's done for the rest of your life.
The client hardly even feels the decompression delay, so that's pretty irrelevant.
What's the name of this illness exactly? Is sounds related to schizophrenia but I can't put my finger on it. Anyway, good luck.
re Looking at the data presented in the pdf,... ;>) )???
Benchmark Corpus size gzip-9 7-zip kzip Zopfli
.
One obvious truth that is appartent from look at the data presented in the pdf is that those in the googleborg don't know how to format data or text in their documents. (they've scrubbed all doc-generation info from the document before pdf'ing it, but considering that the fonts are all Arial family [Arial-BoldMT, Arial-ItalicMT, Arial-MT, fully embedded truetype fonts] it's possible to guess what word processor they used)
:>p
The other thing that is obvious from looking at their data (table at the bottom of page 2) is that the google team does not know how to right-align numerical integer values so as to allow easy visual comparison and what the hell is the deal with using apostrophes as comma separators for integers at thousands and millions ??? I kept trying to parse it for thirty seconds before I figured out how fucked up their data printing is. Use commas, use periods like the europeans, but why the hell use apostrophes which mean "minutes" (which are 1/60th of a degree) or "feet" (which are 1/5280th of a mile, eh,
Alexa-top-10k 693'108'837 128'498'665 125'599'259 125'163'521 123'755'118
Just in case you forgot, there are a lot of patents behind the H.264 standard, and a lot of patent trolls owning those patents
Muchas Gracias, Señor Edward Snowden !
I use advdef (7zip deflate implementation from MAME) to improve compression for both gz encoded static files and PNG images (after pngcrush / optipng). advcomp actually includes a tool called advpng that doesn't handle grayscale PNG images whereas advdef will work fine.
Can this tool do that to an already compressed file?
> This team is clearly just trying to make a name for themselves ...
Shut Up !!!
I presume a different country of origin for the research ...
cf. http://en.wikipedia.org/wiki/Decimal_mark#Examples_of_use
- Michael T. Babcock (Yes, I blog)
Just add another compression level and merge the code.
Everything and everyone reaps the benefits automatically as soon as they update.
Ah, thank you very much for the extra info. My sincere apologies for not knowing about that particular formatting option. I've played with internationalization settings before, but I had never seen that one.
.
You might have to concede, however, that the bizarre use of flush left justification or left-alignment of integer values does not make much sense. Numbers are easier to parse and perceive the relative log-magnitude of when they are presented as decimal aligned for integers or floating point values or as right-aligned columns of text for integer values.
Your corporate shill quip made me remember a passing comment I once overheard that went something like:
I almost cried.
"Save the whales, feed the hungry, free the mallocs" -- author unknown
Even as a native German speaker I misread the name until I saw that it was supposed to be a Swiss word.
"Zopf" means braid and "li" is the minimization that is used (IMHO way too much) in German speaking parts of Switzerland.
It is pronounced tsopf-lee.
logorrhea
Oh totally, in fact, right-aligned is also incorrect. Using a decimal tab-stop is the correct option.
- Michael T. Babcock (Yes, I blog)