Ten Dropbox Engineers Build BSD-licensed, Lossless 'Pied Piper' Compression Algorithm
An anonymous reader writes: In Dropbox's "Hack Week" this year, a team of ten engineers built the fantasy Pied Piper algorithm from HBO's Silicon Valley, achieving 13% lossless compression on Mobile-recorded H.264 videos and 22% on arbitrary JPEG files. Their algorithm can return the compressed files to their bit-exact values. According to FastCompany, "Its ability to compress file sizes could actually have tangible, real-world benefits for Dropbox, whose core business is storing files in the cloud."The code is available on GitHub under a BSD license for people interested in advancing the compression or archiving their movie files.
I'm the author of the algorithm and it's bit-exact. It has no quality loss. I just committed a description of the algorithm https://raw.githubusercontent.... It is bit exact and lossless: you can get the exact bits of the file back :-)
Very insightful comments... let me go into detail
I would say we have several advantages over H.264
a) Pied Piper has more memory to work with than an embedded device (bigger model)
b) Pied Piper does not need to seek within a 4 Megabyte block (though it must be able to stream through that block on decode) whereas H.264 requires second-by-second seekability (more samples in model).
c) Pied Piper does not need to reset the decoder state on every few macroblocks (known as a slice), whereas H.264 requires this for hardware encoders (again, more samples per model).
d) As opposed to a committee that designed H.264, Pied Piper had a team of 10 creative Dropboxers and guests, spending a whole week identifying correlations between the decoder state and the future stream. That is a big source of creativity! (design by commit, not committee)
Our algorithm is, however streaming---and it's happiest to work with 4 MB videos or bigger
Our decode window is a single previous frame--so we can pull in past information about the same macroblock-- but we only work in frequency space right now (there are some pixel space branches just to play with, but none has yielded any fruit so far) so the memory requirements are quite small.
We are doing this streaming with just the previous frame as our state--- and it may matter--but we have a lot of work to do to get very big wins on CABAC... but given that we're not limited by the very small window and encoding parallelization requirements that CABAC is tied to, Pied Piper could well be useful soon!
We also use arithmetic coding...but the gist of the improvement is that we have a much better model and a much better arithmetic coder (the one that VP8 uses) than JPEG did back then. I tried putting the JPEG arithmetic coder into the algorithm and compression got several percent worse, because that table-driven Arithmetic Coder just isn't quite as accurate as keeping counts as the VP8 one.
No one has tried to undo and redo compression of video files before. There are still doom9 forum posts asking for this feature from 12 years ago. I would say that saving lossless percentage points off of real world files is novel and important. And, since it's open source, if someone else gets more %age improvement than what we have, it could become as transformative as you describe. :-)
But the point is that we have something that's currently useful. It's out there and ready to be improved. It's lossless. And it has never before been tried.
Also we did the entire algorithm in a week and aren't out of ideas!
Besides we never claimed it was a revolution--leave that sort of spin to the marketeers...
we're engineers trying to make things more efficient, a few percentage points at a time
Rather than abuse every commenter who has not joined your specialty on Slashdot, please take the source and write about what you find.
Given that CPU and memory get less expensive over time, it is no surprise that algorithms work practically today that would not have when various standards groups started meeting. Ultimately, someone like you can state what the trade-offs are in clear English, and indeed whether they work at all, which is more productive than trading naah-naahs.
Bruce Perens.