Ten Dropbox Engineers Build BSD-licensed, Lossless 'Pied Piper' Compression Algorithm
An anonymous reader writes: In Dropbox's "Hack Week" this year, a team of ten engineers built the fantasy Pied Piper algorithm from HBO's Silicon Valley, achieving 13% lossless compression on Mobile-recorded H.264 videos and 22% on arbitrary JPEG files. Their algorithm can return the compressed files to their bit-exact values. According to FastCompany, "Its ability to compress file sizes could actually have tangible, real-world benefits for Dropbox, whose core business is storing files in the cloud."The code is available on GitHub under a BSD license for people interested in advancing the compression or archiving their movie files.
...Horn and his team have managed to achieve a 22% reduction in file size for JPEG images without any notable loss in image quality....
Without any notable loss in image quality.
.
Hmmm... that does not sound like "bit-exact" to me.
What are the real numbers? 13% compression is negligible really. But, that is compressing compressed data(H.264 and JPEG).
What compression ratio can they achieve on the original uncompressed data? How does this new compression compare to h.265 compression of MPEG data?
No benchmarks vs GIF or PNG? Article fail.
What's the Weissmann score for this algorithm on interesting and representative media?
No description of the algorithm. No performance measurements. No solid data. No useful information. No story.
If video games influenced behavior the Pac Man generation would be eating pills and running away from their problems.
"22% better compression" without "notable" quality loss on files which are ALREADY compressed in formats in which loss may be apparent is a far cry from their ultimate "goal" of "lossless" compression.
comparing this to PNG or h.265 is missing the point - this is not a compression algorithm for creating new files. this is a way to take files you already have and make them smaller. users are going to upload JPG and h.264 files to dropbox, that is a given - so saying PNG is better is moot.
H.264 and JPEG are supposed to output random-looking bytes, by definitions.
If you can compress those, something is very wrong.
Time for the new wave of Stacker clones, maybe a new DoubleSpace err DriveSpace?
Twinstiq, game news
We put a spoiler on a Prius.
Have gnu, will travel.
Can it compress 3d videos? That seems to be a real challenge.
I wonder if somebody can develop this into a transparent kernel-module.
13-22% of a video library could mean saving several hundred GB on a multi-terabyte collection. Depending on if it decompresses on-the-fly and how hard it is on a CPU, it may also reduce disk I/O somewhat.
compression expert on Slashdot, how long would that take.
the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff
The only thing that matters is the wiseman score they achieved.
Are you a racist that likes little girls?
I am currently merging this into the EXT4 master branch at kernel.org
8zip?
when they have made Nip Alert a reality.
Lossy codecs typically have two major stages -- the lossy parts (e.g. dct while throwing out some component frequencies, motion prediction, etc.) -- followed by lossless entropy coding (e.g. Huffman in JPEG) to further compress the resultant data.
These compression algorithms just decompress the lossless part of the process and then recompress it with a more efficient lossless algorithm. On decompression, it then recompresses with the standard algorithm. In some cases (e.g. JPEG) you can keep a copy of the Huffman table that lets you recompress the data into a bit-accurate copy of the original file (you can include a small bit of extra information to make sure any remaining metadata matches up exactly).
The MacOS compression software StuffIt did this years ago.
After reducing all this dropbox grandstanding filler and chest thumping (is that corporate policy or something? this is certainly not the first time), it all boils down to:
You took frequency space transformed H264 (pre-cabac) and wrote better range coder for it.
Yes/No?
Still pretty impressive, but for the love of god, please use succinct _technical_ descriptions. - https://raw.githubusercontent.... - is god awful, as it just describes general operation of a range coder.
Beating jpeg entropy coding is not that impressive, as thats just huffman which really awful. CABAC is better, but still decade behind behind top of the line research (I suppose you're encode.ru regulars).
I think the poster mixed up his compression. Saying bit-exact compression is usefull for cloud services is .... DUH.. Though a little late to the playing field. Any on disk compression will be loss-less by definition. otherwise you'd be screwed anytime you zip a file.
Now if he found a better streaming compression for video that keeps h.264 size but ups the quality.. COOL! But on-disk bit-exact compression is pretty mature now. See ZFS/BTRFS. Or Stacker/Doublspace if your over 35.
To clarify: do they actually have to get out of their chairs? If so, it would be pretty quick.
I may be stupid, but is that to 13% or 22% of the total, or is that 13% or 22% off the total?
Dropbox appear to be in the business of storing the existing files of clients and not forcing them to upgrade their hardware or software to support a new standard. That's where a bit of reversable compression on top instead of a complete re-encode makes sense.
On a personal scale maybe it makes sense for a user to completely re-encode all of their video files to a new standard but I don't think many people will be doing that. On an "industrial" scale with many users it makes even less sense so the reversable hack that saves space seems a better fit than a full unasked for re-encode of clients video files.