Dropbox Open Sources New Lossless Middle-Out Image Compression Algorithm (dropbox.com)

← Back to Stories (view on slashdot.org)

Dropbox Open Sources New Lossless Middle-Out Image Compression Algorithm (dropbox.com)

Posted by msmash on Thursday July 14, 2016 @04:40AM from the affinity-for-open-source dept.

Dropbox announced on Thursday that it is releasing its image compression algorithm dubbed Lepton under an Apache open-source license on GitHub. Lepton, the company writes, can both compress and decompress files, and for the latter, it can work while streaming. Lepton offers a 22% savings reductions for existing JPEG images, and preserves the original file bit-for-bit perfectly. It compresses JPEG files at a rate of 5MB/s and decodes them back to the original bit at 15MB/s. The company says it has used Lepton to encode 16 billion images saved to Dropbox, and continues to utilize the technology to recode its older images. You can find more technical details here.

27 of 135 comments (clear)

Min score:

Reason:

Sort:

Where am I? by freeze128 · 2016-07-14 04:46 · Score: 3, Funny

This headline sounds a lot like a press release from Pied Piper, the fictional company in the TV show "Silicon Valley".
1. Re:Where am I? by AmiMoJo · 2016-07-14 04:56 · Score: 2
  
  I think whoever wrote that was confused by the screen cap of Silicon Valley (TV show on HBO) in the article, which is of the fictional "Pied Piper" company and not of Dropbox.
  It's been known that you can compress JPEGs losslessly by about 20% for many years, because JPEG only uses run length encoding rather than say Huffman encoding after the DCT stage. In fact I seem to recall an app called StuffIt that could do this in the late 90s. Their improvements seem to be some kind of prediction to make the coding more efficient and the ability to do it on the fly with a stream (i.e. without having to read the whole file first).
  
  --
  const int one = 65536; (Silvermoon, Texture.cs)
  SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
2. Re:Where am I? by nyet · 2016-07-14 05:09 · Score: 2
  
  This headline sounds a lot like a press release from Pied Piper, the fictional company in the TV show "Silicon Valley".
  derp
3. Re:Where am I? by FrostedWheat · 2016-07-14 05:31 · Score: 2
  
  JPEG does do Huffman coding, or less commonly arithmetic coding.
4. Re:Where am I? by Anonymous Coward · 2016-07-14 05:46 · Score: 2, Funny
  
  I'm confused... is this a box, or is this a platform?
5. Re:Where am I? by 110010001000 · 2016-07-14 06:00 · Score: 3, Funny
  
  It is a webscale cloud service written in Angular JS using Agile techniques in a Docker container. That should be obvious.
6. Re:Where am I? by wonkey_monkey · 2016-07-14 07:02 · Score: 2
  
  How about BPG? Looks better than JPEG2000 to me.
  
  --
  systemd is Roko's Basilisk.
Re:Regardless of CPU clock speed? by darkain · 2016-07-14 04:49 · Score: 4, Informative

From TFA: "Lepton decode rate when decoding 10,000 images on an Intel Xeon E5 2650 v2 at 2.6GHz"
Middle-out? by nine-times · 2016-07-14 04:49 · Score: 2

Is this actually a "middle-out" compression, or is that just a joke? Do we know what the Weissman score is?
Wow by tylersoze · 2016-07-14 05:08 · Score: 2

It can both compress *and* decompress.
Great Name... Everyone is using it. by geek111 · 2016-07-14 05:19 · Score: 2

I'm all for companies open-sourcing cool algorithms. But not a great choice on the name. There are already several products out there called 'Lepton'. There's a software CMS, and also FLIR's thermal sensors are branded 'Lepton'. (Worth noting - Lepton IS an actual word so it probably won't qualify for Trademark protection. But an Apple Music vs. Apple Computer like scenario is not impossible to conceive.)
Re:Huffman alternative by DreadPiratePizz · 2016-07-14 05:22 · Score: 2

Encoding the image with the coefficients is not the lossy part. The lossy part is when you ditch the coefficients which contribute little to the image, and when you downsample the chroma.
Re:About time by the_povinator · 2016-07-14 05:30 · Score: 2

[I understand compression algorithms and watch Silicon Valley].
After reading their blurb, it looks like the middle-out thing was a bit of a joke Their use of the term 'middle-out' is not unreasonable but refers to something much more specific, and less fundamental, than what seemed to be depicted in the TV show. Their 'middle' is the just the place where two squares of the image meet.

--
The .sig is dead, and I believe I had a hand in killing it.
Re:comparison by miknix · 2016-07-14 05:49 · Score: 3, Informative

They are basically just bringing the entropy coder from JPEG2000 into JPEG... Why the heck not just fully re-encode the images in lossless JPEG2000 instead? There is a good reason why the DWT was used instead of DCT on JPEG2000, it is because it yields higher coding efficiency. It is also why JPEG2000 is the standard format for digital cinema (yes, movies are coded intra-only with JPEG2000).
Re:Huffman alternative by Anonymous Coward · 2016-07-14 05:49 · Score: 3, Insightful

So you are an idiot. If you run a JPEG through Lepton the ORIGINAL file (from Lepton's point of view) is the JPEG. Not the Nikon raw file which it has now knowledge of.
Re:Again. by Anonymous Coward · 2016-07-14 05:50 · Score: 4, Insightful

This isn't a "better than JPEG" format. It's a "store existing JPEG files your users upload & use more efficiently" format. Flickr, for instance, could theoretically save 22% of its disk space using this.
Re:Huffman alternative by B1 · 2016-07-14 05:57 · Score: 4, Informative

This isn't about restoring a JPEG file back into its original RAW format. The information lost from converting RAW to JPEG is gone. There is no way to get that back.
This is about storing JPEG files more efficiently. DropBox is in the business of providing cloud storage, and it is in their best interest to keep their costs as low as possible. The more they can compress data for their customers, the more efficiently they use their infrastructure. Some files such as text documents are easy to compress. Some files such as JPEG files are difficult to compress, especially with lossless algorithms.
For DropBox, this allows them to store the LEP representation of a JPEG file instead of the actual JPEG file. This saves them approximately 22% of their storage needs. They can then decompress it on the fly whenever a user tries to read the original JPEG file, essentially trading savings in storage costs for a bit of extra CPU demand. As long as the compression is lossless and the user sees acceptable performance, there is no user impact.
Depending on the cost of extra CPU cycles vs. the cost of reduced storage, and the relative mix of JPEG files vs. other data files, this could save DropBox quite a bit of money.
Re:Again. by Ramze · 2016-07-14 05:59 · Score: 4, Insightful

It's not a file format, it's a compression algorithm that happens at the data storage level. This is similar to compressing a hard drive -- the files are individually compressed, but the file formats are the same, and the OS handles the compression/decompression seamlessly so that the applications don't even know they're accessing compressed versions of the file formats they normally use.
You can keep all your JPEGs, and with the open-source license, compress the contents of a drive or partition with this algorithm and save maybe 20% or so of the space the JPEG files took up. Not worth it for most people but photographers and image sites might save a lot of money using this.
Re:ZIP rules them a!! by 110010001000 · 2016-07-14 06:02 · Score: 2

I think he meant Phil Katz who wrote PKZIP.
Re:Huffman alternative by Anonymous Coward · 2016-07-14 06:13 · Score: 2, Interesting

If they're smart (and they are) the decompression will happen on the user's computer, in the web browser/native client.
Making it (almost) a free lunch for dropbox.
Re:Huffman alternative by virve · 2016-07-14 06:39 · Score: 4, Insightful

Look, they clearly state that the operate at the level of JPEG-files. So, where is the confusion coming from? They are analyzing JPEG files and using features of that format to compress the already compressed files further.
Which I, honestly, find very impressive.
The reproduce JPEG files in a bit-by-bit faithful fashion. And the have tested in on 16 million (or was it billion) files where it worked without problems plus they don't replace user files unless they have checked that it decodes correctly. I presume that the process is actually transparent to the Dropbox user.
I don't see the problem that you have with this, sorry.
Good work lads!
Re:Huffman alternative by MGalactis · 2016-07-14 07:05 · Score: 2

Depending on the cost of extra CPU cycles vs. the cost of reduced storage, and the relative mix of JPEG files vs. other data files, this could save DropBox quite a bit of money.
Or more likely they'll build it into their clients and do the compression on the user's side, saving them on both disk space and bandwidth.
Re:Huffman alternative by AikonMGB · 2016-07-14 07:17 · Score: 2

Depending on the cost of extra CPU cycles vs. the cost of reduced storage, and the relative mix of JPEG files vs. other data files, this could save DropBox quite a bit of money.
Better yet, do it in the client at no CPU cycle cost to Dropbox, and also reducing data transport. Dropbox controls the desktop, mobile, and web clients, so this would be easy to do, and could revert to server-side translation from LEP to JPG for e.g. API clients etc.
Re:Regardless of CPU clock speed? by sexconker · 2016-07-14 07:19 · Score: 2

What resolution, bit depth, and JPEG encoding/compression methods did those images have? How compressible were those images with something generic like LZMA2?
Re:comparison by Solandri · 2016-07-14 07:34 · Score: 2

JPEG2000 suffered from the same problem JPEG initially did - it was slow. I remember downloading the first sample JPEG images in the early 1990s. An 800x600 image took about 20 seconds to decode on my PC back then. JPEG2000 had a similar problem, though it was asymmetric. Over 1 min to encode a 3504x2336 image from my DSLR, about 5-15 seconds to decode.

JPEG didn't have any competitors, and the growth of the Internet and Web made smaller-size picture files very important in the coming years Couple that with the rapid development of digital cameras in the late 1990s, and the use of photos on the web exploded. So as computers became faster, the decode time approached zero and JPEG eventually became the standard.

JPEG2000 faced rapidly increasing network speeds, decreasing storage costs, and a large growth in digital video. So even though computer became faster, it didn't really matter since it took less time to transmit the larger image file over the now-faster network than it took to decompress a JPEG2000 image on the now-faster computer. You could easily buy a bigger HDD to compensate for the larger size of other image formats, and you did so anyway to store all the videos you were recording.

This is largely the reason JPEG has hung on despite being over 20 years old (even MPEG2 was displaced by MPEG4 and now by h.264/h.265). Compared to modern storage capacities and network speeds, JPEGs are small enough that making it 22% smaller just doesn't matter. Unless you're a massive storage company dealing with billions of JPEG files.
Evil patents by Ilgaz · 2016-07-14 09:01 · Score: 2

It is because the idiots in JPEG 2000 committee did everything to keep people, especially web browser development teams away from that excellent format.
Now 4K monitors and ultra resolution phones around, watching web developers struggle with 5-6 different files of same photo, I really feel pity. That was a solved problem, both multiple bandwidth& resolution and the compression rate.
There is a reason we deal with JPEG files today, ask JP2 committee. Even MS stayed away from it fearing the patents.
Re:Regardless of CPU clock speed? by retchdog · 2016-07-14 09:26 · Score: 5, Informative

PR? The code is on github, and imho a very nice accessible explanation of their algorithm is in the linked article. They developed some neat software to save money by essentially modernizing JPEG to compress beyond the 8x8 blocks it was designed to use and, having done that, are now letting other people use it too. What is with your crabby, paranoid attitude? Instead of being an asshole, you could just, you know, build the code yourself and experiment with it, rather than sneering at a gift horse. This is exactly the use case for open source software.
Although I would prefer if they explained the sampling methodology for their images, they do present a few simple scatterplots of (de-)compression performance as a function of original JPEG file size. It's not as in-depth as xiph.org foundation's stuff, but it's a hell of a lot more than a PR piece.

--
"They were pure niggers." – Noam Chomsky