Dropbox Open Sources New Lossless Middle-Out Image Compression Algorithm (dropbox.com)

← Back to Stories (view on slashdot.org)

Dropbox Open Sources New Lossless Middle-Out Image Compression Algorithm (dropbox.com)

Posted by msmash on Thursday July 14, 2016 @04:40AM from the affinity-for-open-source dept.

Dropbox announced on Thursday that it is releasing its image compression algorithm dubbed Lepton under an Apache open-source license on GitHub. Lepton, the company writes, can both compress and decompress files, and for the latter, it can work while streaming. Lepton offers a 22% savings reductions for existing JPEG images, and preserves the original file bit-for-bit perfectly. It compresses JPEG files at a rate of 5MB/s and decodes them back to the original bit at 15MB/s. The company says it has used Lepton to encode 16 billion images saved to Dropbox, and continues to utilize the technology to recode its older images. You can find more technical details here.

135 comments

Min score:

Reason:

Sort:

Regardless of CPU clock speed? by jabberw0k · 2016-07-14 04:43 · Score: 1, Troll

It compresses JPEG files at a rate of 5MB/s and decodes them back to the original bit at 15MB/s
Even if I cross-compile for my 2MHz TRS-80? Amazing!
1. Re:Regardless of CPU clock speed? by darkain · 2016-07-14 04:49 · Score: 4, Informative
  
  From TFA: "Lepton decode rate when decoding 10,000 images on an Intel Xeon E5 2650 v2 at 2.6GHz"
2. Re:Regardless of CPU clock speed? by Anonymous Coward · 2016-07-14 04:50 · Score: 0
  
  No, that's ridiculous. Don't be an idiot. Read beyond the summary if you don't want to make a fool of yourself.
  FTFA: Lepton decode rate when decoding 10,000 images on an Intel Xeon E5 2650 v2 at 2.6GHz
3. Re:Regardless of CPU clock speed? by bstag · 2016-07-14 05:04 · Score: 1
  
  pot meet kettle.
4. Re:Regardless of CPU clock speed? by Anonymous Coward · 2016-07-14 05:30 · Score: 0
  
  Not as fast as my ZX81 in 'Fast Mode'.
5. Re:Regardless of CPU clock speed? by Anonymous Coward · 2016-07-14 05:30 · Score: 0
  
  It compresses JPEG files at a rate of 5MB/s and decodes them back to the original bit at 15MB/s
  Even if I cross-compile for my 2MHz TRS-80? Amazing!
  While obviously said in jest, sometimes I wish more people would understand not everything can be solved with a fucking Pi board.
6. Re:Regardless of CPU clock speed? by 110010001000 · 2016-07-14 05:49 · Score: 1
  
  What kind of image? All white? Why 10,000 images and not 500 images or 1 image? Is the speed dependent on the number of images being decompressed? That is meaningless. These PR articles are foolish.
7. Re:Regardless of CPU clock speed? by Anonymous Coward · 2016-07-14 05:50 · Score: 0
  
  Yup, I know my issues. Do you know yours? Do you want me to tell you? I can ready read you pretty clearly based on your response.
8. Re:Regardless of CPU clock speed? by Yvan256 · 2016-07-14 06:22 · Score: 1
  
  Sometimes I wish more people would understand not everything can be solved with a fucking Pi board.
  Indeed. Sometimes it takes two fucking Pi boards and an Arduino!
9. Re:Regardless of CPU clock speed? by Anonymous Coward · 2016-07-14 06:26 · Score: 1
  
  "Random things that people upload to dropbox" I assume. This would be why they quoted a number based on a large number of images, a small number of images would be more susceptible to bias.
10. Re:Regardless of CPU clock speed? by Anonymous Coward · 2016-07-14 06:54 · Score: 0
  
  It's called an average, Einstein.
11. Re:Regardless of CPU clock speed? by sexconker · 2016-07-14 07:19 · Score: 2
  
  What resolution, bit depth, and JPEG encoding/compression methods did those images have? How compressible were those images with something generic like LZMA2?
12. Re:Regardless of CPU clock speed? by sexconker · 2016-07-14 07:20 · Score: 0
  
  For starters, he's a liberal cuck.
13. Re:Regardless of CPU clock speed? by Bengie · 2016-07-14 09:24 · Score: 1
  
  Good questions, but in lue of facts, compressibility usually goes up as bit depth and resolution goes up, and it seems cell phones are taking greater than 4k resolutions.
14. Re:Regardless of CPU clock speed? by retchdog · 2016-07-14 09:26 · Score: 5, Informative
  
  PR? The code is on github, and imho a very nice accessible explanation of their algorithm is in the linked article. They developed some neat software to save money by essentially modernizing JPEG to compress beyond the 8x8 blocks it was designed to use and, having done that, are now letting other people use it too. What is with your crabby, paranoid attitude? Instead of being an asshole, you could just, you know, build the code yourself and experiment with it, rather than sneering at a gift horse. This is exactly the use case for open source software.
  Although I would prefer if they explained the sampling methodology for their images, they do present a few simple scatterplots of (de-)compression performance as a function of original JPEG file size. It's not as in-depth as xiph.org foundation's stuff, but it's a hell of a lot more than a PR piece.
  
  --
  "They were pure niggers." – Noam Chomsky
15. Re: Regardless of CPU clock speed? by Anonymous Coward · 2016-07-14 10:37 · Score: 0
  
  *lieu
16. Re:Regardless of CPU clock speed? by Anonymous Coward · 2016-07-14 11:39 · Score: 0
  
  in lieu of spelling
17. Re:Regardless of CPU clock speed? by Anonymous Coward · 2016-07-14 13:30 · Score: 0
  
  Hey, this is what Google autocorrected to, and the internet is never wrong.
18. Re: Regardless of CPU clock speed? by Anonymous Coward · 2016-07-14 23:31 · Score: 0
  
  But zx81 fast mode is so blinky.
19. Re: Regardless of CPU clock speed? by Anonymous Coward · 2016-07-15 05:30 · Score: 0
  
  Still not really correct, though, since that would mean "substituting for facts."
Where am I? by freeze128 · 2016-07-14 04:46 · Score: 3, Funny

This headline sounds a lot like a press release from Pied Piper, the fictional company in the TV show "Silicon Valley".
1. Re:Where am I? by phishybongwaters · 2016-07-14 04:50 · Score: 1
  
  beat me to the punch, "Middle out"? But the thing is, the tech isn't complete BS on that show, the terms they use are real and the application is actually possible, likely not to the extent of pied piper in the show though
2. Re:Where am I? by AmiMoJo · 2016-07-14 04:56 · Score: 2
  
  I think whoever wrote that was confused by the screen cap of Silicon Valley (TV show on HBO) in the article, which is of the fictional "Pied Piper" company and not of Dropbox.
  It's been known that you can compress JPEGs losslessly by about 20% for many years, because JPEG only uses run length encoding rather than say Huffman encoding after the DCT stage. In fact I seem to recall an app called StuffIt that could do this in the late 90s. Their improvements seem to be some kind of prediction to make the coding more efficient and the ability to do it on the fly with a stream (i.e. without having to read the whole file first).
  
  --
  const int one = 65536; (Silvermoon, Texture.cs)
  SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
3. Re:Where am I? by Fwipp · 2016-07-14 04:58 · Score: 1
  
  From TFA:
  
  For those familiar with Season 1 of Silicon Valley, this is essentially a “middle-out” algorithm.
4. Re:Where am I? by nyet · 2016-07-14 05:09 · Score: 2
  
  This headline sounds a lot like a press release from Pied Piper, the fictional company in the TV show "Silicon Valley".
  derp
5. Re:Where am I? by imgod2u · 2016-07-14 05:16 · Score: 1
  
  Except it isn't. "Middle-out" is a fictional name up until the advent of the show. Nobody researching compression had a "middle-out" algorithm.
  Also, the Pied Piper algorithm offered lossless compression of just about anything at a ridiculously high rate (something like 10x what HVEC is capable of with no loss). They also had a distributed storage platform that used drive space of everyone's phone to store files.
6. Re:Where am I? by AmiMoJo · 2016-07-14 05:20 · Score: 1
  
  That's what I mean, the writer seems to that that was some kind of documentary and that "middle-out" is a real thing. It's just a meaningless phrase they came up with for the show.
  
  --
  const int one = 65536; (Silvermoon, Texture.cs)
  SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
7. Re:Where am I? by FrostedWheat · 2016-07-14 05:31 · Score: 2
  
  JPEG does do Huffman coding, or less commonly arithmetic coding.
8. Re:Where am I? by Anonymous Coward · 2016-07-14 05:41 · Score: 0
  
  From TFA:
  
  Thus, we can compute the gradient from the second row of the current block to the edge, and from the neighbors back to the edge, meeting in the middle, as illustrated:
  [picture]
  Where these two gradients meet in the middle, between these two pixels is the prediction point that Lepton uses to predict the DC of the current 8×8 block. The delta of this prediction is written in the same manner as the AC’s, using length, followed by sign and residual.
  For those familiar with Season 1 of Silicon Valley, this is essentially a “middle-out” algorithm.
9. Re:Where am I? by Anonymous Coward · 2016-07-14 05:46 · Score: 2, Funny
  
  I'm confused... is this a box, or is this a platform?
10. Re:Where am I? by miknix · 2016-07-14 05:52 · Score: 1
  
  Why even use JPEG?? JPEG2000 has been out there for a while, professional photographers and digital cinema use it for a reason..
11. Re: Where am I? by Anonymous Coward · 2016-07-14 05:54 · Score: 0
  
  The older I get the more middle-out seems to be the default state.
12. Re:Where am I? by 110010001000 · 2016-07-14 06:00 · Score: 3, Funny
  
  It is a webscale cloud service written in Angular JS using Agile techniques in a Docker container. That should be obvious.
13. Re:Where am I? by Anonymous Coward · 2016-07-14 06:23 · Score: 0
  
  Why even use JPEG?? JPEG2000 has been out there for a while, professional photographers and digital cinema use it for a reason..
  My DSLR spits out JPGs... it could spit out RAW as well, but then I need to do development of it is say, Canon Digital Photo Professional, which well, spits out JPG.
14. Re:Where am I? by wonkey_monkey · 2016-07-14 07:02 · Score: 2
  
  How about BPG? Looks better than JPEG2000 to me.
  
  --
  systemd is Roko's Basilisk.
15. Re:Where am I? by gerddie · 2016-07-14 07:47 · Score: 1
  
  Why even use JPEG?? JPEG2000 has been out there for a while, professional photographers and digital cinema use it for a reason..
  My DSLR spits out JPGs... it could spit out RAW as well, but then I need to do development of it is say, Canon Digital Photo Professional, which well, spits out JPG.
  But if you use Darktable then you can also output JPEG200, PNG, OpenEXR, TIFF, and a few more file formats when developing your RAW photo.
16. Re:Where am I? by Anonymous Coward · 2016-07-14 08:08 · Score: 0
  
  It makes me super sad that this describes exactly what I am working on now :-(
17. Re:Where am I? by DMJC · 2016-07-14 08:41 · Score: 1
  
  Actually this is the team that wrote the Pied Piper algorithm which featured on Slashdot a few months ago. A good friend of mine is the person who actually created the Algorithm. He was the lead developer on Vegastrike. Really great guy. It's great to see him achieving success in his career.
18. Re:Where am I? by Anonymous Coward · 2016-07-14 09:05 · Score: 0
  
  JPEG *does* usually use Huffman - see http://www.digicamsoft.com/itu/itu-t81-54.html
  Its usually the alternate embedded segments that may or may not use compression - in digital cameras, the thumbnails are often stored as part of JPEG as an uncompressed or RLE bitmap
19. Re:Where am I? by AmiMoJo · 2016-07-14 09:06 · Score: 1
  
  JPEG2000 never really took off outside certain niches because the processing overhead was too high. WebM is a better general purpose option for web and JPEG is so universal nothing else has made any inroads.
  
  --
  const int one = 65536; (Silvermoon, Texture.cs)
  SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
20. Re:Where am I? by Bengie · 2016-07-14 09:30 · Score: 1
  
  You can make your pictures look better when you have the raw 48bit color depth and several extra dimensions of brightness to play with. My wife's wedding dress outside was causing white-out from what seemed to be over-exposure. But once you opened up the RAW in gimp, you can change the curve and suddenly it looked normal while still having a gradient. All of the detail was there, but my monitor couldn't handle the range. Even when I was there in person it was intensely bright, so I could say my eyes couldn't even handle the dynamic range.
21. Re:Where am I? by Anonymous Coward · 2016-07-14 10:17 · Score: 0
  
  There is also WebP, though its adoption has been hindered by Mozilla. Many many patches were written for multiple versions of Firefox, but Mozilla refused to add the support even though they already have most of the code already in Firefox because WebP is based on VP8 which is supported by Firefox. They refuse to add support because the person is charge of the media formats is butt hurt because he had a competing image format which no one adopted.
22. Re:Where am I? by mattack2 · 2016-07-14 11:03 · Score: 1
  
  likely not to the extent of pied piper in the show though
  Not "likely". Absolutely. In the show, they have a compression algorithm that compresses _ANY_ data some ridiculously high percentage.
  Real world example: Put data through compression.. then put the resulting compressed data through compression again... and so on and so on.. To get impossibly good compression...
23. Re:Where am I? by Anonymous Coward · 2016-07-14 14:00 · Score: 0
  
  BPG is not viable due to the licensing situation around HEVC. Don't waste your time on formats which require patent royalty payments. AV1 is the future of web video, so a new still image format based on that (similar to WebP) is a better option.
24. Re:Where am I? by Anonymous Coward · 2016-07-14 16:52 · Score: 0
  
  Ha! I totally thought the same thing. I heard Erlich's voice in my head reading it.
25. Re:Where am I? by Anonymous Coward · 2016-07-14 20:09 · Score: 0
  
  That can be gotten around with a JavaScript shim.
About time by Anonymous Coward · 2016-07-14 04:48 · Score: 0

Nice to see Pied Piper is finally getting their tech out there. Their startup has been so rocky, especially since they started feuding with Hooli.
1. Re:About time by the_povinator · 2016-07-14 05:30 · Score: 2
  
  [I understand compression algorithms and watch Silicon Valley].
  After reading their blurb, it looks like the middle-out thing was a bit of a joke Their use of the term 'middle-out' is not unreasonable but refers to something much more specific, and less fundamental, than what seemed to be depicted in the TV show. Their 'middle' is the just the place where two squares of the image meet.
  
  --
  The .sig is dead, and I believe I had a hand in killing it.
2. Re: About time by slappynipsy · 2016-07-14 08:26 · Score: 1
  
  Sounds cool, but what's the weisman score on it?
comparison by zlives · 2016-07-14 04:48 · Score: 1

does any one have knowledge about how this compares to other compression algorithms? also wonder if they are releasing this because they have lepton2 or whatever now?
1. Re:comparison by Anonymous Coward · 2016-07-14 05:00 · Score: 0
  
  does any one have knowledge about how this compares to other compression algorithms? also wonder if they are releasing this because they have lepton2 or whatever now?
  Two answers:
  a) yes, I do
  b) that is possible
  Any other questions?
2. Re:comparison by Anonymous Coward · 2016-07-14 05:21 · Score: 0
  
  There's a lepton version 0
  https://github.com/sirikata/sirikata/tree/master/libcore/src/jpeg-arhc
3. Re:comparison by Anonymous Coward · 2016-07-14 05:22 · Score: 0
  
  They're releasing it because it has no commercial value. Probably costs them more in energy doing all the compression and decompression than it would to just put more storage in their datacenters. Nice technically, but the niche of useful applications is probably pretty small.
4. Re:comparison by Black+LED · 2016-07-14 05:33 · Score: 1
  
  This is specifically for compressing JPEG (lossy) with an extra layer of lossless compression to bring file sizes down further. It would only be useful if you have a large collection of JPEG images to archive and not enough disk space. In my own quickie test:
  Source image was 2560x1440 TGA at 32MB
  PNG (lossless, level 9) took that down to 6,912KB
  WebP (lossless) took it down to 5,868KB
  JPEG (lossy, quality 100) took it down to 3,402KB
  JPEG (lossy, quality 95) took it down to 1,995KB
  They are claiming a 22% further reduction in file size on JPEG, so it should be roughly reductions to 2,654KB and 1,557KB for the above two JPEGs respectively. Not a whole lot individually, but it can add up if you're storing a lot of JPEG images.
  Still, it's based on a lossy compression method so I don't have any real interest in it. WebP lossless (and PNG when absolutely needed for compatibility) is my preferred format.
5. Re:comparison by Anonymous Coward · 2016-07-14 05:38 · Score: 0
  
  The current state-of-the-art in JPEG recompression seems to be this:
  http://encode.ru/threads/2459-EMMA-Context-Mixing-Compressor
6. Re:comparison by miknix · 2016-07-14 05:49 · Score: 3, Informative
  
  They are basically just bringing the entropy coder from JPEG2000 into JPEG... Why the heck not just fully re-encode the images in lossless JPEG2000 instead? There is a good reason why the DWT was used instead of DCT on JPEG2000, it is because it yields higher coding efficiency. It is also why JPEG2000 is the standard format for digital cinema (yes, movies are coded intra-only with JPEG2000).
7. Re:comparison by Anonymous Coward · 2016-07-14 06:40 · Score: 0
  
  Because when users go to retrieve their images, they would not longer be in JPG format? And converting back to JPG would mean double lossy compression artefacts.
8. Re:comparison by Anonymous Coward · 2016-07-14 06:56 · Score: 0
  
  JPEG2000 is NOT a good lossy compression format, across a large range of compression ratios and content it actuall performs WORSE than JPEG.
  It is mostly useful if you want higher quality where JPEG is rather bad. Or in the case of digital cinema possibly for no good reason at all.
9. Re:comparison by Solandri · 2016-07-14 07:34 · Score: 2
  
  JPEG2000 suffered from the same problem JPEG initially did - it was slow. I remember downloading the first sample JPEG images in the early 1990s. An 800x600 image took about 20 seconds to decode on my PC back then. JPEG2000 had a similar problem, though it was asymmetric. Over 1 min to encode a 3504x2336 image from my DSLR, about 5-15 seconds to decode.
  
  JPEG didn't have any competitors, and the growth of the Internet and Web made smaller-size picture files very important in the coming years Couple that with the rapid development of digital cameras in the late 1990s, and the use of photos on the web exploded. So as computers became faster, the decode time approached zero and JPEG eventually became the standard.
  
  JPEG2000 faced rapidly increasing network speeds, decreasing storage costs, and a large growth in digital video. So even though computer became faster, it didn't really matter since it took less time to transmit the larger image file over the now-faster network than it took to decompress a JPEG2000 image on the now-faster computer. You could easily buy a bigger HDD to compensate for the larger size of other image formats, and you did so anyway to store all the videos you were recording.
  
  This is largely the reason JPEG has hung on despite being over 20 years old (even MPEG2 was displaced by MPEG4 and now by h.264/h.265). Compared to modern storage capacities and network speeds, JPEGs are small enough that making it 22% smaller just doesn't matter. Unless you're a massive storage company dealing with billions of JPEG files.
10. Re:comparison by Lieutenant_Dan · 2016-07-14 09:17 · Score: 1
  
  They're releasing it because it has no commercial value. Probably costs them more in energy doing all the compression and decompression than it would to just put more storage in their datacenters. Nice technically, but the niche of useful applications is probably pretty small.
  That's a very valid point; what's the cost in cpu-power versus storage costs?
  Now, the issue is that storage is permanent, in the sense that you're using your disk/SAN/tape storage space with the file. Compression happens only once, the quicker decompression only happens when someone accesses it. So the 22% storage savings of JPGs across TBs may be worthwhile.
  It's not totally clear how much of their space is being used up by JPGs? Also tiered storage may have been an option? Generic compression using already established libraries for other file types, etc, etc.
  
  --
  Wearing pants should always be optional.
11. Re:comparison by Anonymous Coward · 2016-07-14 09:41 · Score: 0
  
  are you blind? parent was talking about lossless jpeg 2000. stop pulling words out of your ass, jpeg 2000 is superior to jpeg go search for the subjective comparison curves, it is old discussion.
  and clearly you know nothing about digital cinema [1]:
  
  DCI created standard evaluation material (the ASC/DCI StEM material) for testing of 2K and 4K playback and compression technologies. DCI selected JPEG2000 as the basis for the compression in the system the same year.
  what a dumb post, I dont even know why I am wasting my time with you.
  [1] https://en.wikipedia.org/wiki/...
12. Re:comparison by Anonymous Coward · 2016-07-14 12:08 · Score: 0
  
  It's DropBox we are talking about here. What they want is gain space from JPEG files their users upload to their service. If they would recompress to JPEG2000 then users would upload a JPEG file and download a JPEG2000 file. I don't think there is any users that wouldn't be pissed if they would do this..
13. Re:comparison by Anonymous Coward · 2016-07-14 12:14 · Score: 0
  
  Bits also costs energy to transmit. Even if the upfront cost of compression put you in the red, if the data is being requested multiple times it can make sense for the sender to implement more aggressive encoding and pass energy cost to the receivers.
14. Re:comparison by Anonymous Coward · 2016-07-14 12:51 · Score: 0
  
  There is a good reason why the DWT was used instead of DCT on JPEG2000, it is because it yields higher coding efficiency.
  DWT is better then DCT but after the transform there is many more things that can be much more easily done when DCT was used then DWT. For example intra prediction which is just too complex when you do DWT. There is a good reason why current best video codecs don't use DWT and also current best image codecs don't. [1] is a good read from x264 developer why DWT is not the way to go. From that I found a link [2] to the picture which shows how a JPEG2000 visually looses to JPEG at the same file size even when it wins on metrics and is generally a superior format.
  [1] http://web.archive.org/web/20100228145846/http://x264dev.multimedia.cx/?p=317
  [2] http://web.archive.org/web/20070421122445/http://upload.wikimedia.org/wikipedia/commons/5/51/JPEG_JFIF_and_2000_Comparison.png which
15. Re:comparison by Anonymous Coward · 2016-07-14 17:49 · Score: 0
  
  Remember, they are in the file storage business. They don't get to tell their customers which format to use. The customers are using JPEG and they expect to get back the same file they saved. Lossless JPEG2000 would save the exact same pixels, but not the exact same file - and it would be bigger than the lossy compression anyway.
16. Re:comparison by Anonymous Coward · 2016-07-15 17:29 · Score: 0
  
  Their usecase is taking JPEG and then loslessly compressing it further. If you decompress JPEG, re-compress as JPEG2000, decompress, and re-compress as JPEG, do you end up with exactly the same file? That's the game they're playing.
17. Re:comparison by miknix · 2016-07-19 12:31 · Score: 1
  
  The article you point out is a bit misinformed, there are several papers out there specifying how to do (multi-resolution) motion-compensation in a lifting scheme; the implementation is not more complex. From my point of view, the main reason why the DWT is not attractive and only remains a topic of interest in the Academia is because all the patents surrounding it.
Middle-out? by nine-times · 2016-07-14 04:49 · Score: 2

Is this actually a "middle-out" compression, or is that just a joke? Do we know what the Weissman score is?
1. Re:Middle-out? by Anonymous Coward · 2016-07-14 04:50 · Score: 0, Insightful
  
  Another fucking moron unable to read TFA!!!!!!!!!! Shoot yourself.
2. Re:Middle-out? by jittles · 2016-07-14 04:57 · Score: 1
  
  Is this actually a "middle-out" compression, or is that just a joke? Do we know what the Weissman score is?
  Meh. I'm just going to wait for Pied Piper to hit open beta. Their Weissman scores are unbelievable.
3. Re:Middle-out? by xuvetyn · 2016-07-15 11:18 · Score: 1
  
  ah. beat me to it. =)
  
  --
  alive to the universe, dead to the world
Who would have thought... by Anonymous Coward · 2016-07-14 04:52 · Score: 0

Tip-To-Tip compression really works!
Open Sources is NOT TWO WORDS! by Anonymous Coward · 2016-07-14 04:57 · Score: 0

No wonder everyone thinks that is silly.
1. Re: Open Sources is NOT TWO WORDS! by cc1984_ · 2016-07-14 06:35 · Score: 1
  
  It is two words, but most people apply middle-out compression to make it one word.
Wow by tylersoze · 2016-07-14 05:08 · Score: 2

It can both compress *and* decompress.
1. Re: Wow by Anonymous Coward · 2016-07-14 05:18 · Score: 1
  
  Just the other day, I developed a slightly lossy compression algorithm with an infinite Weissman score and 100% compression.
  Still working on the decompression step.
2. Re:Wow by Anonymous Coward · 2016-07-14 05:19 · Score: 0
  
  I've created an algorithm that can compress any JPEG down to a single byte. Tomorrow I'm quitting my job so I can start working on the decompression algorithm.
3. Re: Wow by NotInHere · 2016-07-14 05:33 · Score: 1
  
  Your awesome compression algorithm works so well, I can paste all of the decompressor's code into its post. Maybe its helpful. The code (without the " of course): ""
  All you need to do is to decompress it once manually.
4. Re:Wow by 110010001000 · 2016-07-14 05:46 · Score: 1
  
  According to the summary it can decode back to the original bit. Slashdot 2016.
5. Re: Wow by Anonymous Coward · 2016-07-14 06:34 · Score: 0
  
  Pigzip?
6. Re:Wow by ShanghaiBill · 2016-07-14 09:00 · Score: 1
  
  It can both compress *and* decompress.
  That is actually very important. I know from first hand experience that compression can be much faster if later decompression is not a requirement.
7. Re:Wow by Anonymous Coward · 2016-07-14 09:48 · Score: 0
  
  I am mesmerized by that too (still glowing from excitement). what a great innovation! compression was no problem for ages. it's the decompression that always gets us. thank you dropbox I feel so relieved.
8. Re: Wow by Anonymous Coward · 2016-07-14 09:50 · Score: 0
  
  I have the perfect compression algorithm. decompression will likely use some probability calculation, not sure yet.
  mv /path/to/image.jpg /dev/null && touch /path/to/image.jpg
  compresses at 1GB/s or more, and leaves the file at size 0.
Great Name... Everyone is using it. by geek111 · 2016-07-14 05:19 · Score: 2

I'm all for companies open-sourcing cool algorithms. But not a great choice on the name. There are already several products out there called 'Lepton'. There's a software CMS, and also FLIR's thermal sensors are branded 'Lepton'. (Worth noting - Lepton IS an actual word so it probably won't qualify for Trademark protection. But an Apple Music vs. Apple Computer like scenario is not impossible to conceive.)
Huffman alternative by hsa · 2016-07-14 05:19 · Score: 1, Informative

I worked as a part-time assistant in Data Structures and Algorithms course 10 years ago in Helsinki University of Technology. JPEG is a lossy compression algorithm. It does not preserve the image. It creates these blocks of image data and then compresses them using Huffmann encoding. Same encoding is used in zip-files. Dropbox's algorithm uses these same blocks JPEG algorithm produces (meaning, that the information is still lost in compression), but uses a clever way to compress them and ditches Huffmann encoding entirely. So, the old process was: 1. Encode image into coefficients (lossy) 2. Encode coefficient blocks with Huffmann encoding The new process is: 1. Encode image into coefficients (lossy) 2. Encode coefficient blocks with Lepton Pfft.. too little, too late. JPEG is "good enough" and I don't want a huge clusterfuck of incompatibility problems with my libraries.
1. Re:Huffman alternative by DreadPiratePizz · 2016-07-14 05:22 · Score: 2
  
  Encoding the image with the coefficients is not the lossy part. The lossy part is when you ditch the coefficients which contribute little to the image, and when you downsample the chroma.
2. Re:Huffman alternative by Anonymous Coward · 2016-07-14 05:49 · Score: 3, Insightful
  
  So you are an idiot. If you run a JPEG through Lepton the ORIGINAL file (from Lepton's point of view) is the JPEG. Not the Nikon raw file which it has now knowledge of.
3. Re:Huffman alternative by Anonymous Coward · 2016-07-14 05:55 · Score: 0
  
  Obviously you will not get the nikon RAW bit for bit, as the information was thrown away in the JPEG quantization step. If you would use the JPEG compression without quantisation (a lossless JPEG like this is standardized, but AFAIK not used very often) you would get the RAW back, bit for bit.
  This algorithm undoes the Huffman compression on JPEG to get the original coefficients back, and then compresses those in a more effective way.
  FYI: dropbox also compresses and dedupes your non-picture files, it is in the EULA
4. Re:Huffman alternative by B1 · 2016-07-14 05:57 · Score: 4, Informative
  
  This isn't about restoring a JPEG file back into its original RAW format. The information lost from converting RAW to JPEG is gone. There is no way to get that back.
  This is about storing JPEG files more efficiently. DropBox is in the business of providing cloud storage, and it is in their best interest to keep their costs as low as possible. The more they can compress data for their customers, the more efficiently they use their infrastructure. Some files such as text documents are easy to compress. Some files such as JPEG files are difficult to compress, especially with lossless algorithms.
  For DropBox, this allows them to store the LEP representation of a JPEG file instead of the actual JPEG file. This saves them approximately 22% of their storage needs. They can then decompress it on the fly whenever a user tries to read the original JPEG file, essentially trading savings in storage costs for a bit of extra CPU demand. As long as the compression is lossless and the user sees acceptable performance, there is no user impact.
  Depending on the cost of extra CPU cycles vs. the cost of reduced storage, and the relative mix of JPEG files vs. other data files, this could save DropBox quite a bit of money.
5. Re:Huffman alternative by Anonymous Coward · 2016-07-14 05:58 · Score: 0
  
  Yep. Two things strike me as significant about this press release. First, the claim that it reproduces the original file "bit-for-bit perfectly". So, if I take a Nikon raw file and convert it to JPEG, then Lepton it, I can get back the Nikon raw file "bit-for-bit perfectly"? I don't think so.
  No, you get the converted JPEG file, bit-for-bit perfect. The moment you convert a raw file to another format (even if visually lossless) you may be throwing away information (EXIF data, proprietary vendor extensions for white balance, etc). You can obviously compress the raw file instead of converting it (ex: zip or rar) and be sure to get the original file back, but the compression ratio would most likely be poor, though there are some compressors that can get 30 to 50% compression on some raw formats.
6. Re:Huffman alternative by Anonymous Coward · 2016-07-14 06:09 · Score: 0
  
  You MUST have autism.
7. Re:Huffman alternative by Anonymous Coward · 2016-07-14 06:10 · Score: 1
  
  "Encoding the image into coefficients" is imprecise, but if you want to split JPEG encoding into two steps, the lossless Huffman coding and the part before that, then the first part is the lossy step in which the majority of the compression is achieved. The technical term for the conversion from the (in principle arbitrary precision) floating point coefficients to the more compact integer coefficients is "quantization". The quality factor controls the amount of information which is lost by setting the granularity of this quantization. After you've computed the integer coefficients, there is no further loss of information in JPEG coding.
8. Re:Huffman alternative by Anonymous Coward · 2016-07-14 06:13 · Score: 2, Interesting
  
  If they're smart (and they are) the decompression will happen on the user's computer, in the web browser/native client.
  Making it (almost) a free lunch for dropbox.
9. Re:Huffman alternative by kylemonger · 2016-07-14 06:19 · Score: 1
  
  But second, they claim they've been doing this to images uploaded to Dropbox. [...] But what happens when they find out their new algorithm -- which compresses AND decompresses! -- has a bug when it hits a certain data condition, and sorry, all your images are corrupted because the EXIF data common to them all triggered the bug.
  Assume that the engineers behind this aren't morons. Failing that, read the article. For every newly compressed image, Dropbox does a decompression and a bit-for-bit comparison with the original before replacing the original. If there's an image that triggers a bug that corrupts the image for whatever reason, their test will catch it before the original image is replaced.
10. Re:Huffman alternative by virve · 2016-07-14 06:39 · Score: 4, Insightful
  
  Look, they clearly state that the operate at the level of JPEG-files. So, where is the confusion coming from? They are analyzing JPEG files and using features of that format to compress the already compressed files further.
  Which I, honestly, find very impressive.
  The reproduce JPEG files in a bit-by-bit faithful fashion. And the have tested in on 16 million (or was it billion) files where it worked without problems plus they don't replace user files unless they have checked that it decodes correctly. I presume that the process is actually transparent to the Dropbox user.
  I don't see the problem that you have with this, sorry.
  Good work lads!
11. Re:Huffman alternative by MGalactis · 2016-07-14 07:05 · Score: 2
  
  Depending on the cost of extra CPU cycles vs. the cost of reduced storage, and the relative mix of JPEG files vs. other data files, this could save DropBox quite a bit of money.
  Or more likely they'll build it into their clients and do the compression on the user's side, saving them on both disk space and bandwidth.
12. Re:Huffman alternative by wonkey_monkey · 2016-07-14 07:07 · Score: 1
  
  First, the claim that it reproduces the original file "bit-for-bit perfectly".
  By "original file," they mean a JPEG.
  
  --
  systemd is Roko's Basilisk.
13. Re:Huffman alternative by AikonMGB · 2016-07-14 07:17 · Score: 2
  
  Depending on the cost of extra CPU cycles vs. the cost of reduced storage, and the relative mix of JPEG files vs. other data files, this could save DropBox quite a bit of money.
  Better yet, do it in the client at no CPU cycle cost to Dropbox, and also reducing data transport. Dropbox controls the desktop, mobile, and web clients, so this would be easy to do, and could revert to server-side translation from LEP to JPG for e.g. API clients etc.
14. Re:Huffman alternative by sexconker · 2016-07-14 07:30 · Score: 1
  
  Just a bout every step in JPEG quantization is lossy, even if you're using floating point DCT.
  If you're subsampling your chroma, you need to be shot.
15. Re:Huffman alternative by organgtool · 2016-07-14 08:01 · Score: 1
  
  You can compress the rendered area of your posts by avoiding the unnecessary use of pre-formatted text. :)
16. Re:Huffman alternative by chispito · 2016-07-14 08:21 · Score: 1
  
  This saves them approximately 22% of their storage needs.
  Correction: It saves them 22% of the storage taken up by jpegs.
  
  --
  The Daddy casts sleep on the Baby. The Baby resists!
17. Re:Huffman alternative by Motherfucking+Shit · 2016-07-14 08:41 · Score: 1
  
  Pfft.. too little, too late. JPEG is "good enough" and I don't want a huge clusterfuck of incompatibility problems with my libraries.
  
  In terms of widespread adoption, I think you're right, Joe's Image Viewer is unlikely to ever come with Lepton support. But I wouldn't dismiss this so quickly, as large sites might force the issue into the browser space.
  Take Facebook as an example, think of the trillions of photos they store (they claim 2 billion are uploaded each day). Facebook archives older, infrequently-accessed photos to Blu-Ray and has an army of jukeboxes ready to swap in discs when someone actually tries to load that family reunion pic from 8 years ago. Gaining another 20% on compression means not just 20% less live storage, but also 20% fewer optical discs, 20% smaller backups, 20% fewer disc-swapping robots, 20% less square footage to lease and cool... We're talking millions and millions of dollars in savings. Facebook would be stupid not to hand Mozilla a chunk of that money and say "Lepton, implement it." Google and Microsoft would realize their own enormous cost savings by putting Lepton capability into their respective browsers.
  
  --
  "BSD: Free as in speech. Linux: Free as in beer. Windows 10: Free as in herpes." --Man On Pink Corner in #52607549.
18. Re:Huffman alternative by Anonymous Coward · 2016-07-14 09:10 · Score: 1
  
  The primary use case for JPEG compression is storing digital photos. With few exceptions, the chroma information in digital photos is interpolated from Bayer pattern sensors, so the chroma information is naturally lower resolution than the luma information. The interpolated information is often reduced in the camera hardware, before the data is even written to main memory, where it is stored in a subsampled YUV format. You would first have to interpolate it in order to expensively store it at "full resolution" in a lossy JPEG. If you shoot people for not doing that, you're the one who needs to be shot.
19. Re:Huffman alternative by Obfuscant · 2016-07-14 09:14 · Score: 1
  
  This isn't about restoring a JPEG file back into its original RAW format.
  
  I know what it is really about, thanks. What I pointed out is that "bit-for-bit perfectly" of "the original file" is nonsense and is just marketing hype.
  ANY lossless compression will return "the original file" "bit-for-bit perfectly" when "the original file" is considered to be what the lossless compressor starts with. That's a tautology. It's a useless statement. When someone says the result of their lossless compression/decompression is a "bit-for-bit perfect" copy of "the original file", it is reasonable to assume they meant "the original file" to be something other than what their compressor starts with, otherwise they're just wasting words and time.
  There is another "original file" that they could refer to that doesn't create a useless statement on their part -- the original from which the JPEG was created. If you do high-level photographic work, for example, your "original file" will be the raw file from the camera image sensor. You process that and then save a JPEG. You would NEVER claim that the JPEG was "the original file", because it is not.
  So, I think the point has been made, they are speaking nonsense trying to impress people who don't know better. "Oh my, how great this lossless compression system is -- it can return a bit-for-bit copy of the file it started with!" Yes, that's what "lossless" means, thank you. If that's all you're saying, why bother?
  
  The information lost from converting RAW to JPEG is gone. There is no way to get that back.
  That was my point. The ORIGINAL is not recoverable, when you use a meaningful definition of "original".
20. Re:Huffman alternative by Anonymous Coward · 2016-07-14 11:53 · Score: 0
  
  nobody ever said the original file was recoverable. please argue about something someone said outside your head.
21. Re:Huffman alternative by thegarbz · 2016-07-14 19:39 · Score: 1
  
  Which I, honestly, find very impressive.
  Not to belittle their achievement, but what do you find impressive with someone beating a compression algorithm that is 23 years old? In terms of image storage JPEG is an old hat beaten by many in absolute terms.
  JPEG also screws the image quite a lot (but does so in an eye pleasing way) which certainly leads to better lossless compression after than on the original image. Take a look at the blue channel of a JPEG image for instance and you'll see why that channel in particular would be trivial to get good lossless compression on. The same can not be said about the RAW original data.
22. Re: Huffman alternative by Anonymous Coward · 2016-07-15 00:34 · Score: 0
  
  "Yes, that's what "lossless" means, thank you. If that's all you're saying, why bother?"
  It is called "emphasis."
doesn't flif by Anonymous Coward · 2016-07-14 05:24 · Score: 0

already compress much better?
Again. by SuricouRaven · 2016-07-14 05:31 · Score: 1

We've been here before. JPEG2000, webp, BPG, JPEG XR. There are many formats that are superior to JPEG. And look - none of them caught on!
Why? Because JPEG, though far from the best modern algorithms could offer, is still 'good enough' for most purposes. It's also supported by every web browser, photo viewer, image editor, mobile phone, camera, digital picture frame, slideshow maker and every other thing that might need to process an image. A new format, no matter how superior, cannot offer the same ubiquitous support - and without that support it will never become widely used enough for developers to spend time including support for it.
We can't even get rid of MP3, and there are more formats than I can count both open and proprietary that could so everything MP3 does but better.
1. Re:Again. by Anonymous Coward · 2016-07-14 05:50 · Score: 4, Insightful
  
  This isn't a "better than JPEG" format. It's a "store existing JPEG files your users upload & use more efficiently" format. Flickr, for instance, could theoretically save 22% of its disk space using this.
2. Re:Again. by Ramze · 2016-07-14 05:59 · Score: 4, Insightful
  
  It's not a file format, it's a compression algorithm that happens at the data storage level. This is similar to compressing a hard drive -- the files are individually compressed, but the file formats are the same, and the OS handles the compression/decompression seamlessly so that the applications don't even know they're accessing compressed versions of the file formats they normally use.
  You can keep all your JPEGs, and with the open-source license, compress the contents of a drive or partition with this algorithm and save maybe 20% or so of the space the JPEG files took up. Not worth it for most people but photographers and image sites might save a lot of money using this.
3. Re:Again. by im_thatoneguy · 2016-07-14 06:08 · Score: 1
  
  The point is that this can be implemented server side to save storage of existing JPEGs. This is a better way to store existing images, not a better way to compress images.
4. Re:Again. by swb · 2016-07-14 06:14 · Score: 1
  
  I wonder if there's a way to come up with a format that decoders would process as a JPEG but only containing a preview-quality image but have the rest of the file be some higher quality version of the image in a more advanced format for a format-aware decoder. And do it all in a total file size better than high quality JPEG.
  You'd get backwards compatibility (albeit with degraded quality) but higher quality than existing JPEG.
  Although with storage continually getting better and cheaper, you have to work miracles in terms of size and quality to make changing from JPEG worthwhile. Like high bitrate MP3s, there are extremely good enough for most people.
5. Re:Again. by Kjella · 2016-07-14 09:22 · Score: 1
  
  Not worth it for most people but photographers and image sites might save a lot of money using this.
  I would think most serious photographers keep the RAW files which are much bigger and will dominate their storage. And even MP monsters only produce ~20MB jpegs so ~200,000 photos on a $99 4TB drive. Pretty sure you won't bother with this unless you're Dropbox, Facebook or some other big image site with many, many millions of photos.
  
  --
  Live today, because you never know what tomorrow brings
6. Re:Again. by squeeze69 · 2016-07-14 10:12 · Score: 1
  
  And it's not really new, there is a similar program (actually, it's a mix of algorithm and heuristics) called packJPG, it also achieves similar results.
7. Re:Again. by Anonymous Coward · 2016-07-15 04:15 · Score: 0
  
  Photographer here; yes, I keep the RAW files. I also keep .jpg of many photos, for just daily viewing of photos or showing them to people the.jpg is smaller so loads much faster and is more convenient. But absolutely the RAW file is kept for everything!
Lepton under an Apache by allquixotic · 2016-07-14 05:33 · Score: 1

So a program named literally "Lepton under an Apache" that happens to also, confusingly, be an open source license (*and* a program)?
Okaaaaaaay.... ...Took me like a minute to figure out it was saying
"...dubbed 'Lepton,' under an Apache open-source license..."
ZIP rules them a!! by Anonymous Coward · 2016-07-14 05:35 · Score: 0

and from a drunk, no less. A dead drunk, at that.
Phil Zimmerman.
Look it up.
1. Re:ZIP rules them a!! by Anonymous Coward · 2016-07-14 05:54 · Score: 0
  
  what the hell does this post mean?
  zip compression has nothing to do with Phil Zimmerman, he invented PGP.
  And he's not dead, he works at Silent Circle.
  I couldn't find any evidence of a drinking habit.
  citations plz.
2. Re:ZIP rules them a!! by 110010001000 · 2016-07-14 06:02 · Score: 2
  
  I think he meant Phil Katz who wrote PKZIP.
3. Re:ZIP rules them a!! by Anonymous Coward · 2016-07-14 06:19 · Score: 0
  
  Leaving Las Vegas was based on Katz's life.
4. Re:ZIP rules them a!! by Anonymous Coward · 2016-07-14 15:29 · Score: 0
  
  So he confused both - the technology and the name of the creator :) fail
Everybody should use DOKAN and ENCFS by Anonymous Coward · 2016-07-14 05:37 · Score: 0

Everybody should use DOKAN and ENCFS on cloud shares such as DropBox.
Compress that :)
So they beat PiedPiper to market? by Anonymous Coward · 2016-07-14 05:44 · Score: 0

I feel bad for our boys....
Lepton vs. Leptonica by mi · 2016-07-14 06:56 · Score: 0

Dropbox' software is called "lepton". There is an image-processing library called Leptonica — could someone comment on the relationship, if any?

--
In Soviet Washington the swamp drains you.
Pied piperer lives! by Anonymous Coward · 2016-07-14 07:03 · Score: 0

All hail pied piperer.
Main committer even looks like... by Anonymous Coward · 2016-07-14 07:19 · Score: 0

The guy from piped piper
https://github.com/danielrh
So... by Dan+East · 2016-07-14 08:34 · Score: 1

So this means instead of getting 5 GB free storage, I should get 22% more if I'm storing JPEGs, so I get 6.1 GB free storage now? ;)

--
Better known as 318230.
Evil patents by Ilgaz · 2016-07-14 09:01 · Score: 2

It is because the idiots in JPEG 2000 committee did everything to keep people, especially web browser development teams away from that excellent format.
Now 4K monitors and ultra resolution phones around, watching web developers struggle with 5-6 different files of same photo, I really feel pity. That was a solved problem, both multiple bandwidth& resolution and the compression rate.
There is a reason we deal with JPEG files today, ask JP2 committee. Even MS stayed away from it fearing the patents.
1. Re:Evil patents by miknix · 2016-07-14 09:30 · Score: 1
  
  Exactly! The patents are not just underlying JPEG2000, they affect everything from multi-resolution analysis to the algorithms of DWTs. I speculate it's the reason why MPEG stayed away from it, even though the DWT is clearly supperior to the DCT.
  Today it is not just the UHD resolutions that would benefit from JPEG2000. JPEG 2000 also supports arbitrary precision coding which is good for today's HDR which is starting to popup in consumer and cinema.
  Anyway, this is a lost cause because everybody is moving to MPEG/AVC which addresses all of that.
abandon standard format numbers by peter303 · 2016-07-14 09:26 · Score: 1

Much of their compression comes from they dont use full 32 bit floats or integers to store the discrete cosine transform coefficients, but variable bit length numbers which can be squished more tightly. I didnt read the paper deep enough to study how efficient this bit hacking is machine operations. There might be few clever tricks there. Bit hacking was more common in the early days of computers when core memory was very expensive. I recall Woz had some clever way of compressing color and shape graphics in the Apple II.
1. Re:abandon standard format numbers by Anonymous Coward · 2016-07-14 09:54 · Score: 0
  
  his nick name is testimony to his great compression skills indeed. decompression is distributed to the recipient(s). very clever indeed.