Slashdot Mirror


Google Publishes Zopfli, an Open-Source Compression Library

alphadogg writes "Google is open-sourcing a new general purpose data compression library called Zopfli that can be used to speed up Web downloads. The Zopfli Compression Algorithm, which got its name from a Swiss bread recipe, is an implementation of the Deflate compression algorithm that creates a smaller output size (PDF) compared to previous techniques, wrote Lode Vandevenne, a software engineer with Google's Compression Team, on the Google Open Source Blog on Thursday. 'The smaller compressed size allows for better space utilization, faster data transmission, and lower Web page load latencies. Furthermore, the smaller compressed size has additional benefits in mobile use, such as lower data transfer fees and reduced battery use,' Vandevenne wrote. The more exhaustive compression techniques achieve higher data density, but also make the compression a lot slower. This does not affect the decompression speed though, Vandenne wrote."

34 of 124 comments (clear)

  1. Overhyped by Anonymous Coward · · Score: 4, Informative

    This team is clearly just trying to make a name for themselves. It improves over gzip by a mere 3% or so, but takes an order of magnitude longer to compress.

    Their underlying implemented might be cool research. But it's practical merit is virtually nil.

    Now, cue the people who are going to do some basic arithmetic to "prove" me wrong, yet who probably don't even bother using gzip content-encoding on their website right now, anyhow.

    1. Re:Overhyped by TeknoHog · · Score: 5, Interesting

      If I understand this correctly, the point is to be compatible with zlib decompression. Obviously, you can bet much better compression with xz/lzma, for example, but that would be out of range for most browsers.

      --
      Escher was the first MC and Giger invented the HR department.
    2. Re:Overhyped by Baloroth · · Score: 5, Insightful

      Actually, they state that the 3-8% better maximum compression than zlib is 2-3 orders of magnitude longer to compress.

      I can't imagine what kind of content you're hosting that'd justify 3 orders of magnitude compression time to gain 3% compression.

      Static content that only has to be compressed once, yet is downloaded hundreds of thousands or millions of times. 3-8% is a pretty significant savings in that case.

      --
      "None can love freedom heartily, but good men; the rest love not freedom, but license." --John Milton
    3. Re:Overhyped by sideslash · · Score: 5, Informative

      It improves over gzip by a mere 3% or so, but takes an order of magnitude longer to compress [...] it's practical merit is virtually nil.

      Maybe it's useless to you as a developer(?), and to most people. However, you benefit from this kind of technology all the time. Compare this to video encoding, where powerful machines spend a heck of a lot of time and CPU power to gain extra 3%'s of compression to save bandwidth and give you a smooth viewing experience.

      This tool could have many useful applications for any kind of static content that is frequently served, including web services, as well as embedded content in mobile games and other apps. Every little bit of space savings helps (as long as it isn't proportionally slower to expand, which the article says it stays comparable).

    4. Re:Overhyped by Trepidity · · Score: 4, Informative

      One example that comes to mind: Android APKs use the zip format.

    5. Re:Overhyped by K.+S.+Kyosuke · · Score: 4, Interesting

      But it's practical merit is virtually nil.

      ...unless you're a large web-based company serving terabytes of identical textual files to end users using deflated HTTP streams.

      --
      Ezekiel 23:20
    6. Re:Overhyped by Goaway · · Score: 2

      In addition to all the other explanations of how you missed the point, Deflate is also used in PNG. This will allow you to make smaller PNG files, too, which can be quite a significant part of your bandwidth.

    7. Re:Overhyped by K.+S.+Kyosuke · · Score: 4, Informative

      In addition to all the other explanations of how you missed the point, Deflate is also used in PNG. This will allow you to make smaller PNG files, too, which can be quite a significant part of your bandwidth.

      Well, If you're Google and you detect Chrome on the client side, it might be even better for you to serve a WebP version instead. Out of a random sample of 1,000 PNG files, a lossless WebP version was at least 20% smaller in more than 50% of the cases (link).

      --
      Ezekiel 23:20
    8. Re:Overhyped by AliasMarlowe · · Score: 4, Funny

      Android APK

      The hosts troll is a robot? Somehow, I'm not surprised.

      --
      Those who can make you believe absurdities can make you commit atrocities. - Voltaire
    9. Re:Overhyped by nabsltd · · Score: 4, Insightful

      For example, assuming browsers incorporate the capability to decompress it, lowering the bandwidth of Youtube by ~3% is an achievement.

      I don't know why people keep mentioning Youtube, since all videos are already compressed in such a way that pretty much no external compression is going to gain anything.

      Although when compressing a video Zopfli might result in a smaller file compared to gzip, that doesn't mean either will be smaller than the original. All H.264 files should be using CABAC after the motion, macroblock, psychovisual, DCT, etc. stages, and that pretty much means that the resulting files have as much entropy per bit as possible. At that point, nothing can compress them further.

    10. Re:Overhyped by n7ytd · · Score: 3, Interesting

      If I understand this correctly, the point is to be compatible with zlib decompression. Obviously, you can bet much better compression with xz/lzma, for example, but that would be out of range for most browsers.

      Odd that Google doesn't just push to extend the supported compression formats to include more of these more modern compression libraries if this is a serious concern for them. This sounds like two guys using their 20% time to figure out a way to optimize the deflate algorithm. Kudos to them, but this is not comparable to releasing a royalty-free video codec or other large Googly-type project.

      According to the article, "Zopfli is 81 times slower than the fastest measured algorithm gzip -9" Almost two orders of magnitude of time taken, in return for a compression gain of 3%-8%. It would have been informative to know how much working memory was used vs. what gzip requires. This is a small gain of network bandwidth; trivial, even. But, if you're Google and already have millions of CPUs and petabytes of RAM running at less than 100% capacity, this is the type of small gain you might implement.

    11. Re:Overhyped by Anonymous Coward · · Score: 3, Informative

      But the decompressors for those algorithms are not available in most web browsers, making them totally unusable for the stated use case.

      But hey, why read the article when you can whine about it blindly on /.?

    12. Re:Overhyped by Pausanias · · Score: 3, Interesting

      The numbers cited are for gzip. The improvement over 7-zip is much less than 3%; it's more like 1%, at the cost of a factor of four slowdown with respect to 7-zip. Note that this is for 7-zip when restricted to deflate-compatible formats only.

      Here's the paper:
      https://code.google.com/p/zopfli/downloads/list

    13. Re:Overhyped by SuricouRaven · · Score: 3, Interesting

      Wrong field. For general-purpose compression formats, rar is already far more capable than this, and 7z is better still. But neither of these are suitable for webbrowsers to transparently decompress - there, gzip and DEFLATE still reigns supreme. Zopfil is backwards-compatible: Browsers that support gzip/DEFLATE will work with it, no updates required.

      Personally I think Google should have worked on increasing the number of decompressors browsers support - bzip would be nice, at least. The Accept-Encoding negotiation is already there, very easy to extend. But this will have to do.

    14. Re:Overhyped by SuricouRaven · · Score: 2

      It's a matter of where. The extra resources are required on the server - even if the content is dynamic, it's quite possible that power and processor time will be cheap there. The corresponding savings are achieved on the clients, which includes smartphones - where connection quality ranges from 'none' to 'crap,' and the user will begrudge every last joule you need to display the page. It's worth throwing a lot of resources away on the server if it can save even a much smaller amount on the more-constrained client.

    15. Re:Overhyped by SuricouRaven · · Score: 3, Informative

      There are tricks to that h264 encoding to squeeze a bit more. You can improve the motion estimation by just throwing power at it, though the gains are asymptotic. Or increase the frame reference limit - that does great thing on animation, if you don't mind losing profile compliance. Things like that. Changing the source is also often of great benefit - if it's a noisy image, a bit of noise-removal filtering before compression can not just improve subjective quality but also allow for much more efficient compression. Interlaced footage can be converted to progressive, bad frame rate conversions undone - progressive video just compresses better. It's something of a hobby of mine.

      I wrote a guide on the subject: http://birds-are-nice.me/publications/Optimising%20x264%20encodes.htm

      You're right about Zopfli though. Regarding h264, it changes nothing.

    16. Re:Overhyped by citizenr · · Score: 4, Insightful

      Word, when I'm downloading the latest pirated release of a 1080p movie

      "word", and intend to download zipped h.264 files leads me to believe you are retarded.

      --
      Who logs in to gdm? Not I, said the duck.
    17. Re:Overhyped by SuricouRaven · · Score: 2

      Interlacing is good if you need to use analog electronics. But that 'annoying' goes beyond just annoying: It over-complicates everything. The compression benefits are more than offset by the reduced efficiency of the more modern encoding, plus almost every stage in the process - every filter, as well as the encoder and decoder - need to be interlacing-aware. It's an awkward, obsolete technology and I eagerly await the day it is no longer to be found outside of historical video.

      The link looks very interesting indeed. I've done a few restorations before, but you can't see any of them other than http://birds-are-nice.me/video/restorations.shtml - all the rest are of various copyrighted videos. I did one of Steamboat Willie to test some filters that was the most popular version on youtube for a time, until Disney DMCAed it.

    18. Re:Overhyped by bzipitidoo · · Score: 2

      Yes, I also wonder what they did. They don't say in their article, and I didn't want to spend time just now wading through the source code to find out for sure. But I suspect it's just throwing more CPU cycles at the compression problem so it can look further ahead.

      In this pointer compression, greedy often isn't best. Here's an example text to illustrate: "resident prevent president". The greedy approach is to always make a maximum length match. It would compress the example text as follows: "resident p(re)v(ent pre)(sident)", where the bracketed text is represented with pointers plus lengths, that is, the "(re)" would actually be encoded as a pointer to the start of the text with an instruction that 2 letters should be copied from that location. But often, not trying to match to the maximum possible length leads to greater compression. A compressor that looks at more options might come up with this: "resident p(re)v(ent p)(resident)". This is, I believe, what 7-zip, kzip, and all those other decompressor compatible improvements do. If you're willing to have the computer take more time, it can exhaustively try more possibilities to see which way produces the best compression. The decompressor will not know the difference, all it does is follow instructions about how many letters to copy from various locations in text it decompressed earlier. Zopfli is simply pushing this even further than 7-zip, so of course its compression will be a little better. Hardly worth dressing the technique up with a new name, as if it really was a new idea.

      --
      Intellectual Property is a monopolistic, selfish, and defective concept. It is "tyranny over the mind of man"
    19. Re:Overhyped by pjt33 · · Score: 2

      That would be the optimal parse approach. Of course, the well-known problem with optimal parsing is that sometimes a sub-optimal parse turns out to be better once you take into account the Huffman step. It could be that they're focussing on the feedback between those two steps.

  2. Wow, gzip -9 is very competitive for most usages by Antony+T+Curtis · · Score: 4, Insightful

    Looking at the data presented in the pdf, it seems to me that gzip does a fantastic job for the amount of time it takes to do it.

    --
    No sig. Move along - nothing to see here.
  3. The interesting bit is this: by mrjb · · Score: 4, Insightful

    "Zopfli is a compression-only library, meaning that existing software can decompress the data." (source: http://littlegreenfootballs.com/page/294495_Google_Compression_Algorithm_Z). As long as the compression can be done on cached pages, hey- that's another 3-8% more people served with the same amount of bandwidth, without any additional requirements on the client side.

    --
    Visit http://ringbreak.dnd.utwente.nl/~mrjb/growingbettersoftware to download your free copy of the book
    1. Re:The interesting bit is this: by Trepidity · · Score: 2

      Considering how slow it is (~100x slower than zlib), I doubt anyone will be using it for on-the-fly compression of web content. It'd only really make sense for one-time compression, e.g. Google might use this to slim Android APKs down a little bit.

  4. Re:Wow, gzip -9 is very competitive for most usage by Anonymous Coward · · Score: 5, Funny

    Looking at the data presented in the pdf, it seems to me that gzip does a fantastic job for the amount of time it takes to do it.

    Pfft. Another blatant corporate shill for gzip early in the Slashdot comments. You can't trust anybody on the internet these...

    Oh, wait, the data actually does say that. Huh. That's... a really weird feeling, having someone on the internet legitimately say something's good and have data to back it up.

  5. JavaScript libraries, for one thing by DragonWriter · · Score: 4, Insightful

    I can't imagine what kind of content you're hosting that'd justify 3 orders of magnitude compression time to gain 3% compression.

    In anything that is static enough that it will be downloaded many times in its lifetime, and not time sensitive enough that it needs to be instantly available when generated, very small gains in compression efficiency are worth paying very large prices in compression.

    If you, for just one of many Google-relevant examples, host a fair number of popular JavaScript libraries (used on both your own sites -- among the most popular in the world -- and vast numbers of third party sites that use your hosted versions) and commit, once you have accept a particular stable version of a library, to hosting it indefinitely, you've got a bunch of assets that are going to be static for a very long time, and accessed very large numbers of times. One time cost to compress is going to be dwarfed by even a miniscule savings in transfer costs for those.

  6. Understanding the use case for Zopfli by DragonWriter · · Score: 2

    "... without any additional requirements on the client side."

    Except for the 2-3 ordeers of magnitude longer to compress.

    For server-hosted content, compression is obviously done on the server side, so that's not an additional requirement on the client side.

    If it takes you 5 seconds to compress that cache page, with zopfli it could take you up to 8 minutes to compress.

    You probably wouldn't use this for time-sensitive, dynamic things like a cache page. You use it for completely static things, like, say, Google's hosted copies of stable versions of jQuery and other popular JavaScript libraries, and you do it once when you start hosting the content, not on the fly.

  7. Re:Wow, gzip -9 is very competitive for most usage by DragonWriter · · Score: 4, Insightful

    Yes, and gzip isn't so slow that it can only be used on static content. Even if you always generate into a cached version, do you really want to spend 81x the CPU time to gain a few percent in compression, and delay the content load on the client each time that happens?

    Why would you recompress static content every time it is accessed? For frequently-accessed, static content (like, for one example, the widely-used JavaScript libraries that Google hosts permanently), you compress it once, and then gain the benefit on every transfer.

    For dynamic content, you probably don't want to do this, but if you're Google, you can afford to spend money getting people to research the best tool for very specific jobs.

  8. Re:how about 7-zip? by DragonWriter · · Score: 2

    Never ran actual tests myself, but I've been told 7-zip's encoder already does DEFLATE better than zlib. I wonder how Zopfli looks compared to it.

    The PDF linked in TFS is a paper which has detailed results of the testing, including time and size comparison with other DEFLATE implementations, including 7-zips.

  9. how does it compare to kzip? by xrmb · · Score: 2

    I wonder how it compares to kzip (http://advsys.net/ken/utils.htm) which is trying to do the same just better and faster. Also google is trying to save 3% on gzipped content, but they dont use optipng/pngout on their images... up to 10% gains... jpegs, never heard of jpegtran google? it saves 20% on my digicam pictures (leaving exif and all meta intact).

    1. Re:how does it compare to kzip? by serviscope_minor · · Score: 2

      I wonder how it compares to kzip

      RTFA. No seriously. RTFA. It's in there.

      --
      SJW n. One who posts facts.
  10. Re:Wow, gzip -9 is very competitive for most usage by n7ytd · · Score: 4, Funny

    Looking at the data presented in the pdf, it seems to me that gzip does a fantastic job for the amount of time it takes to do it.

    So the obvious conclusion is that what we need is a gzip -11 option.

  11. Re:Redundant and not even good... by DragonWriter · · Score: 2

    There are a number of free compressors available that already cover the spectrum well, for example lzop, gzip, bzip2, xz (in order of better compression and more resource consumption).

    Those (except, naturally, gzip) are not compatible with gzip decompressors (of the type found in virtually every browser), so they are useless for the main use case for this, which is as for server side compression for web content that is completely invisible, compared to gzip, to web clients (requiring no changes and having similar-to-traditional-gzip decompression time), allowing reduced bandwidth (and, assuming the content is precompressed, which given the speed it better be, reduced storage space for the host) saving the host money and reducing client-side latency and bandwidth use.

  12. Re:googleborgs don't know how to format data or te by MikeBabcock · · Score: 2

    I presume a different country of origin for the research ...

    cf. http://en.wikipedia.org/wiki/Decimal_mark#Examples_of_use

    In Brazil, Germany, Netherlands, Denmark, Italy, Portugal, Romania, Sweden, Slovenia, Greece and much of Europe: 1 234 567,89 or 1.234.567,89. In handwriting, 1234567,89 is also seen, but never in Denmark, the Netherlands, Portugal, Sweden or Slovenia. In Italy a straight apostrophe is also used in handwriting: 1'234'567,89.

    In Switzerland: There are two cases. 1'234'567.89 is used for currency values. An apostrophe as thousands separator along with a "." as decimal symbol. For other values the SI style 1 234 567,89 is used with a "," as decimal symbol. When handwriting, a straight apostrophe is often used as the thousands separator for non-currency values: 1'234'567,89.

    --
    - Michael T. Babcock (Yes, I blog)
  13. Re:H.264 isn't open-sourced by wonkey_monkey · · Score: 2

    I didn't forget. It's just utterly irrelevant to the current discussion.

    --
    systemd is Roko's Basilisk.