Dropbox Open Sources New Lossless Middle-Out Image Compression Algorithm (dropbox.com)
Dropbox announced on Thursday that it is releasing its image compression algorithm dubbed Lepton under an Apache open-source license on GitHub. Lepton, the company writes, can both compress and decompress files, and for the latter, it can work while streaming. Lepton offers a 22% savings reductions for existing JPEG images, and preserves the original file bit-for-bit perfectly. It compresses JPEG files at a rate of 5MB/s and decodes them back to the original bit at 15MB/s. The company says it has used Lepton to encode 16 billion images saved to Dropbox, and continues to utilize the technology to recode its older images. You can find more technical details here.
From TFA: "Lepton decode rate when decoding 10,000 images on an Intel Xeon E5 2650 v2 at 2.6GHz"
They are basically just bringing the entropy coder from JPEG2000 into JPEG... Why the heck not just fully re-encode the images in lossless JPEG2000 instead? There is a good reason why the DWT was used instead of DCT on JPEG2000, it is because it yields higher coding efficiency. It is also why JPEG2000 is the standard format for digital cinema (yes, movies are coded intra-only with JPEG2000).
This isn't about restoring a JPEG file back into its original RAW format. The information lost from converting RAW to JPEG is gone. There is no way to get that back.
This is about storing JPEG files more efficiently. DropBox is in the business of providing cloud storage, and it is in their best interest to keep their costs as low as possible. The more they can compress data for their customers, the more efficiently they use their infrastructure. Some files such as text documents are easy to compress. Some files such as JPEG files are difficult to compress, especially with lossless algorithms.
For DropBox, this allows them to store the LEP representation of a JPEG file instead of the actual JPEG file. This saves them approximately 22% of their storage needs. They can then decompress it on the fly whenever a user tries to read the original JPEG file, essentially trading savings in storage costs for a bit of extra CPU demand. As long as the compression is lossless and the user sees acceptable performance, there is no user impact.
Depending on the cost of extra CPU cycles vs. the cost of reduced storage, and the relative mix of JPEG files vs. other data files, this could save DropBox quite a bit of money.
PR? The code is on github, and imho a very nice accessible explanation of their algorithm is in the linked article. They developed some neat software to save money by essentially modernizing JPEG to compress beyond the 8x8 blocks it was designed to use and, having done that, are now letting other people use it too. What is with your crabby, paranoid attitude? Instead of being an asshole, you could just, you know, build the code yourself and experiment with it, rather than sneering at a gift horse. This is exactly the use case for open source software.
Although I would prefer if they explained the sampling methodology for their images, they do present a few simple scatterplots of (de-)compression performance as a function of original JPEG file size. It's not as in-depth as xiph.org foundation's stuff, but it's a hell of a lot more than a PR piece.
"They were pure niggers." – Noam Chomsky