Ten Dropbox Engineers Build BSD-licensed, Lossless 'Pied Piper' Compression Algorithm

← Back to Stories (view on slashdot.org)

Ten Dropbox Engineers Build BSD-licensed, Lossless 'Pied Piper' Compression Algorithm

Posted by Soulskill on Friday August 28, 2015 @08:15AM from the what-if-i-don't-have-a-wind-instrument-handy dept.

An anonymous reader writes: In Dropbox's "Hack Week" this year, a team of ten engineers built the fantasy Pied Piper algorithm from HBO's Silicon Valley, achieving 13% lossless compression on Mobile-recorded H.264 videos and 22% on arbitrary JPEG files. Their algorithm can return the compressed files to their bit-exact values. According to FastCompany, "Its ability to compress file sizes could actually have tangible, real-world benefits for Dropbox, whose core business is storing files in the cloud."The code is available on GitHub under a BSD license for people interested in advancing the compression or archiving their movie files.

8 of 174 comments (clear)

Min score:

Reason:

Sort:

naysayers are missing the point by Ionized · 2015-08-28 08:35 · Score: 4, Informative

comparing this to PNG or h.265 is missing the point - this is not a compression algorithm for creating new files. this is a way to take files you already have and make them smaller. users are going to upload JPG and h.264 files to dropbox, that is a given - so saying PNG is better is moot.
Re:From TFA: bit-exact or not? by danielreiterhorn · 2015-08-28 08:42 · Score: 5, Informative

I'm the author of the algorithm and it's bit-exact. It has no quality loss. I just committed a description of the algorithm https://raw.githubusercontent.... It is bit exact and lossless: you can get the exact bits of the file back :-)
Re:No description by danielreiterhorn · 2015-08-28 08:44 · Score: 3, Informative

Link to a layman's description of the algorithm here: https://raw.githubusercontent.... It's bit exact and lossless. We haven't done comprehensive studies, but on the included test files it gets 13% compression on H.264 movies. Similarly the not-committed, but similar JPEG algorithm gets 22% on a comprehensive sample set of photos from a variety of devices.
Re:Hard to believe by unrtst · 2015-08-28 08:54 · Score: 3, Informative

H.264 and JPEG are supposed to output random-looking bytes, by definitions.
If you can compress those, something is very wrong.
Where'd you get that idea?
$ bzip2 test.jpg
$ gzip -9 test.jpg
$ ls -la
-rw-r--r-- 1 me me 1519279 Feb 7 2012 test.jpg
-rw-r--r-- 1 me me 1430059 Aug 28 16:42 test.jpg.bz2
-rw-r--r-- 1 me me 1427872 Aug 28 16:44 test.jpg.gz ... I also tried it on a max-compressed file. Opened that test.jpg up in gimp, then saved with quality at 0 (lowest), and re-did the compressing on both:
-rw-rw-r-- 1 me me 189230 Aug 28 16:50 test2.jpg
-rw-rw-r-- 1 me me 111623 Aug 28 16:50 test2.jpg.bz2
-rw-rw-r-- 1 me me 117971 Aug 28 16:51 test2.jpg.gz
Feel free to try the same experiment yourself on random jpg's you find online, or your own.
The goal of H.264 and JPEG isn't minimum file size at all costs. It's also not encryption. Your premise is wrong, and even old tech can compress this stuff further than it may already be.
Re:From TFA: bit-exact or not? by danielreiterhorn · 2015-08-28 08:57 · Score: 3, Informative

This is an excellent summary and spot on! Our movie reduction claims are still early on. We'll need to find a more comprehensive set of H.264 movies to test on--and that requires the algorithm to understand B-slices and CABAC. These are both very close, but the code was only very recently developed. We're confident about the JPEG size reduction, however. If you want to learn more about how the JPEG stuff works, you can start with the open source repository from Matthias Stirner here http://www.matthiasstirner.com... Our work on JPEGs is very similarly inspired, but is completely streaming and works on partial JPEGs as well
Re:From TFA: bit-exact or not? by danielreiterhorn · 2015-08-28 09:49 · Score: 5, Informative

We also use arithmetic coding...but the gist of the improvement is that we have a much better model and a much better arithmetic coder (the one that VP8 uses) than JPEG did back then. I tried putting the JPEG arithmetic coder into the algorithm and compression got several percent worse, because that table-driven Arithmetic Coder just isn't quite as accurate as keeping counts as the VP8 one.
Re:From TFA: bit-exact or not? by sexconker · 2015-08-28 09:51 · Score: 1, Informative

Even cheap TN monitors use FRC to interpolate to 8-bit, which is better than nothing. IPS monitors can be had for $120, with an 8-bit color panel. Several gaming monitors use native 8-bit with FRC to 10-bit for less than $800, and a few even use native 10-bit.
What? Interpolation is WORSE than nothing. you're discarding signal then adding noise in the hopes that it matches up with what should've been there kinda okay.
Re:From TFA: bit-exact or not? by Megol · 2015-08-28 11:47 · Score: 4, Informative

Interpolation isn't about adding noise.
6 bit (per component) LCDs have for at least 10 years and probably much longer used dithering techniques to produce effective 16.2M colors (compared to a true 8 bit panel with 16.7M colors). This works very well for almost all use cases and provides smooth gradients but have the disadvantage that some image patters can produce flashing due to interference with the dithering algorithm.
Dithering isn't about adding noise either BTW.