Slashdot Mirror


Dropbox Open Sources New Lossless Middle-Out Image Compression Algorithm (dropbox.com)

Dropbox announced on Thursday that it is releasing its image compression algorithm dubbed Lepton under an Apache open-source license on GitHub. Lepton, the company writes, can both compress and decompress files, and for the latter, it can work while streaming. Lepton offers a 22% savings reductions for existing JPEG images, and preserves the original file bit-for-bit perfectly. It compresses JPEG files at a rate of 5MB/s and decodes them back to the original bit at 15MB/s. The company says it has used Lepton to encode 16 billion images saved to Dropbox, and continues to utilize the technology to recode its older images. You can find more technical details here.

5 of 135 comments (clear)

  1. Re:Regardless of CPU clock speed? by darkain · · Score: 4, Informative

    From TFA: "Lepton decode rate when decoding 10,000 images on an Intel Xeon E5 2650 v2 at 2.6GHz"

  2. Huffman alternative by hsa · · Score: 1, Informative

    I worked as a part-time assistant in Data Structures and Algorithms course 10 years ago in Helsinki University of Technology.

    JPEG is a lossy compression algorithm. It does not preserve the image. It creates these blocks of image data and then compresses them using Huffmann encoding. Same encoding is used in zip-files.

    Dropbox's algorithm uses these same blocks JPEG algorithm produces (meaning, that the information is still lost in compression), but uses a clever way to compress them and ditches Huffmann encoding entirely.

    So, the old process was:
    1. Encode image into coefficients (lossy)
    2. Encode coefficient blocks with Huffmann encoding

    The new process is:
    1. Encode image into coefficients (lossy)
    2. Encode coefficient blocks with Lepton

    Pfft.. too little, too late. JPEG is "good enough" and I don't want a huge clusterfuck of incompatibility problems with my libraries.

    1. Re:Huffman alternative by B1 · · Score: 4, Informative

      This isn't about restoring a JPEG file back into its original RAW format. The information lost from converting RAW to JPEG is gone. There is no way to get that back.

      This is about storing JPEG files more efficiently. DropBox is in the business of providing cloud storage, and it is in their best interest to keep their costs as low as possible. The more they can compress data for their customers, the more efficiently they use their infrastructure. Some files such as text documents are easy to compress. Some files such as JPEG files are difficult to compress, especially with lossless algorithms.

      For DropBox, this allows them to store the LEP representation of a JPEG file instead of the actual JPEG file. This saves them approximately 22% of their storage needs. They can then decompress it on the fly whenever a user tries to read the original JPEG file, essentially trading savings in storage costs for a bit of extra CPU demand. As long as the compression is lossless and the user sees acceptable performance, there is no user impact.

      Depending on the cost of extra CPU cycles vs. the cost of reduced storage, and the relative mix of JPEG files vs. other data files, this could save DropBox quite a bit of money.

  3. Re:comparison by miknix · · Score: 3, Informative

    They are basically just bringing the entropy coder from JPEG2000 into JPEG... Why the heck not just fully re-encode the images in lossless JPEG2000 instead? There is a good reason why the DWT was used instead of DCT on JPEG2000, it is because it yields higher coding efficiency. It is also why JPEG2000 is the standard format for digital cinema (yes, movies are coded intra-only with JPEG2000).

  4. Re:Regardless of CPU clock speed? by retchdog · · Score: 5, Informative

    PR? The code is on github, and imho a very nice accessible explanation of their algorithm is in the linked article. They developed some neat software to save money by essentially modernizing JPEG to compress beyond the 8x8 blocks it was designed to use and, having done that, are now letting other people use it too. What is with your crabby, paranoid attitude? Instead of being an asshole, you could just, you know, build the code yourself and experiment with it, rather than sneering at a gift horse. This is exactly the use case for open source software.

    Although I would prefer if they explained the sampling methodology for their images, they do present a few simple scatterplots of (de-)compression performance as a function of original JPEG file size. It's not as in-depth as xiph.org foundation's stuff, but it's a hell of a lot more than a PR piece.

    --
    "They were pure niggers." – Noam Chomsky