Dropbox Open Sources New Lossless Middle-Out Image Compression Algorithm (dropbox.com)
Dropbox announced on Thursday that it is releasing its image compression algorithm dubbed Lepton under an Apache open-source license on GitHub. Lepton, the company writes, can both compress and decompress files, and for the latter, it can work while streaming. Lepton offers a 22% savings reductions for existing JPEG images, and preserves the original file bit-for-bit perfectly. It compresses JPEG files at a rate of 5MB/s and decodes them back to the original bit at 15MB/s. The company says it has used Lepton to encode 16 billion images saved to Dropbox, and continues to utilize the technology to recode its older images. You can find more technical details here.
From TFA: "Lepton decode rate when decoding 10,000 images on an Intel Xeon E5 2650 v2 at 2.6GHz"
I worked as a part-time assistant in Data Structures and Algorithms course 10 years ago in Helsinki University of Technology.
JPEG is a lossy compression algorithm. It does not preserve the image. It creates these blocks of image data and then compresses them using Huffmann encoding. Same encoding is used in zip-files.
Dropbox's algorithm uses these same blocks JPEG algorithm produces (meaning, that the information is still lost in compression), but uses a clever way to compress them and ditches Huffmann encoding entirely.
So, the old process was:
1. Encode image into coefficients (lossy)
2. Encode coefficient blocks with Huffmann encoding
The new process is:
1. Encode image into coefficients (lossy)
2. Encode coefficient blocks with Lepton
Pfft.. too little, too late. JPEG is "good enough" and I don't want a huge clusterfuck of incompatibility problems with my libraries.
They are basically just bringing the entropy coder from JPEG2000 into JPEG... Why the heck not just fully re-encode the images in lossless JPEG2000 instead? There is a good reason why the DWT was used instead of DCT on JPEG2000, it is because it yields higher coding efficiency. It is also why JPEG2000 is the standard format for digital cinema (yes, movies are coded intra-only with JPEG2000).
PR? The code is on github, and imho a very nice accessible explanation of their algorithm is in the linked article. They developed some neat software to save money by essentially modernizing JPEG to compress beyond the 8x8 blocks it was designed to use and, having done that, are now letting other people use it too. What is with your crabby, paranoid attitude? Instead of being an asshole, you could just, you know, build the code yourself and experiment with it, rather than sneering at a gift horse. This is exactly the use case for open source software.
Although I would prefer if they explained the sampling methodology for their images, they do present a few simple scatterplots of (de-)compression performance as a function of original JPEG file size. It's not as in-depth as xiph.org foundation's stuff, but it's a hell of a lot more than a PR piece.
"They were pure niggers." – Noam Chomsky