Ten Dropbox Engineers Build BSD-licensed, Lossless 'Pied Piper' Compression Algorithm
An anonymous reader writes: In Dropbox's "Hack Week" this year, a team of ten engineers built the fantasy Pied Piper algorithm from HBO's Silicon Valley, achieving 13% lossless compression on Mobile-recorded H.264 videos and 22% on arbitrary JPEG files. Their algorithm can return the compressed files to their bit-exact values. According to FastCompany, "Its ability to compress file sizes could actually have tangible, real-world benefits for Dropbox, whose core business is storing files in the cloud."The code is available on GitHub under a BSD license for people interested in advancing the compression or archiving their movie files.
...Horn and his team have managed to achieve a 22% reduction in file size for JPEG images without any notable loss in image quality....
Without any notable loss in image quality.
.
Hmmm... that does not sound like "bit-exact" to me.
How much CPU time to compress/decompress. Standard compression is hardly the best, just a good compromise between compression and usability.
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
comparing this to PNG or h.265 is missing the point - this is not a compression algorithm for creating new files. this is a way to take files you already have and make them smaller. users are going to upload JPG and h.264 files to dropbox, that is a given - so saying PNG is better is moot.
Would be nice to compare it against PNG, but the context is if you're storing other people's data and you have no control of what format they use.
Meh, doesn't matter. Any processing load will be moved to an unoptimized javascript implementation that runs in the end users browser.
And yet you can download the source code yourself and compile it.
"-1 Troll" is the apparently the same as "-1 I disagree with you."
Link to a layman's description of the algorithm here: https://raw.githubusercontent.... It's bit exact and lossless. We haven't done comprehensive studies, but on the included test files it gets 13% compression on H.264 movies. Similarly the not-committed, but similar JPEG algorithm gets 22% on a comprehensive sample set of photos from a variety of devices.
Can it compress 3d videos? That seems to be a real challenge.
I wonder if somebody can develop this into a transparent kernel-module.
13-22% of a video library could mean saving several hundred GB on a multi-terabyte collection. Depending on if it decompresses on-the-fly and how hard it is on a CPU, it may also reduce disk I/O somewhat.
H.264 and JPEG are supposed to output random-looking bytes, by definitions.
If you can compress those, something is very wrong.
Where'd you get that idea?
$ bzip2 test.jpg ... I also tried it on a max-compressed file. Opened that test.jpg up in gimp, then saved with quality at 0 (lowest), and re-did the compressing on both:
$ gzip -9 test.jpg
$ ls -la
-rw-r--r-- 1 me me 1519279 Feb 7 2012 test.jpg
-rw-r--r-- 1 me me 1430059 Aug 28 16:42 test.jpg.bz2
-rw-r--r-- 1 me me 1427872 Aug 28 16:44 test.jpg.gz
-rw-rw-r-- 1 me me 189230 Aug 28 16:50 test2.jpg
-rw-rw-r-- 1 me me 111623 Aug 28 16:50 test2.jpg.bz2
-rw-rw-r-- 1 me me 117971 Aug 28 16:51 test2.jpg.gz
Feel free to try the same experiment yourself on random jpg's you find online, or your own.
The goal of H.264 and JPEG isn't minimum file size at all costs. It's also not encryption. Your premise is wrong, and even old tech can compress this stuff further than it may already be.
H.264 and JPEG are supposed to output random-looking bytes, by definitions. If you can compress those, something is very wrong.
Well, it seems to be applied per codec not a general compression algorithm like zip. And they probably say mobile-encoded for a reason, simple encoders have to work on low power and in real time, random JPGs from the Internet is probably the same. From what I can gather the algorithm basically take a global scan of the whole media and applies an optimized variable-length transformation making commonly used values shorter at the expense of making less commonly used values longer. Nothing you couldn't do with a proper two-pass encoding in the codec itself, the neat trick is doing it to someone else's already compressed media afterwards in a bit-reversible way. Very nice when you're a third party host, assuming the increase in CPU time is worth it but not so useful for everyone else.
Live today, because you never know what tomorrow brings
the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff
Try the -h param for ls, calculating is for computers.
Not really useful in this context, because it truncates significant digits.
The tiny bit of slashdot community that is left still talks about the actual things. If this were on Reddit, it would just be a stream of lame, overused references to the Silicon Valley show. Somebody would say "This guy fucks". Somebody else would make a joke about "Optimal tip-to-tip efficiency". Then somebody would ask "Do you know what tres commas means".
Those things were hilarious when put forth by a group of comedic actors. They are incredibly lame when they are overused every single time something even comes tangentially close to referencing them.
So while this particular story still sucks...it could be a lot worse.
Bottles.
H.264 and JPEG are supposed to output random-looking bytes, by definitions.
Bullshit. JPEG, *by its definition*, after the quantization step, uses a fairly modest & inefficient compression algorithm, because it was designed to be run on embedded systems with very modest processing power.
It depends if the goal is to a) market a hip algorithm or b) store movies more efficiently.
Open source makes it easy for anyone to contribute to the algorithm.
The more people contribute, the better the code will be at compressing movies.
The better it is at compressing movies, the fewer resources it will take to store them.
This isn't a zero-sum game we're talking about: it's about making the world a more efficient place, one bit at a time.
But the bottom line is that, it's a lot easier for many organizations to contribute to a code base if there are no strings attached.
Interest from an article like this can get people playing around with compression.
Maybe another 10% gain is right around the corner.