Slashdot Mirror


Ten Dropbox Engineers Build BSD-licensed, Lossless 'Pied Piper' Compression Algorithm

An anonymous reader writes: In Dropbox's "Hack Week" this year, a team of ten engineers built the fantasy Pied Piper algorithm from HBO's Silicon Valley, achieving 13% lossless compression on Mobile-recorded H.264 videos and 22% on arbitrary JPEG files. Their algorithm can return the compressed files to their bit-exact values. According to FastCompany, "Its ability to compress file sizes could actually have tangible, real-world benefits for Dropbox, whose core business is storing files in the cloud."The code is available on GitHub under a BSD license for people interested in advancing the compression or archiving their movie files.

174 comments

  1. From TFA: bit-exact or not? by QuietLagoon · · Score: 4, Interesting

    ...Horn and his team have managed to achieve a 22% reduction in file size for JPEG images without any notable loss in image quality....

    Without any notable loss in image quality.

    .
    Hmmm... that does not sound like "bit-exact" to me.

    1. Re:From TFA: bit-exact or not? by suutar · · Score: 1

      bit-exact is easier to test than "image quality". I suspect a less than tech-savvy reporter heard "no loss" and stuck in "notable".

    2. Re:From TFA: bit-exact or not? by Anonymous Coward · · Score: 0

      On one hand, a bit-exact compressor/embiggenor would have no notable loss in image quality because there would be no difference in the binaries.

      On the other, that is a really lame low bar to use in a report that elsewhere claims to be lossless.

    3. Re:From TFA: bit-exact or not? by JoeMerchant · · Score: 2, Interesting

      If you are viewing images on an LCD monitor, the first thing you can do is strip them down from 24bit color to 18bit color, because your sub-$1000 monitors don't display more than 6 bits per color channel.

    4. Re:From TFA: bit-exact or not? by Anonymous Coward · · Score: 0

      And they named their invention: gzip

    5. Re:From TFA: bit-exact or not? by danielreiterhorn · · Score: 5, Informative

      I'm the author of the algorithm and it's bit-exact. It has no quality loss. I just committed a description of the algorithm https://raw.githubusercontent.... It is bit exact and lossless: you can get the exact bits of the file back :-)

    6. Re:From TFA: bit-exact or not? by Anonymous Coward · · Score: 0

      I'm the author of the algorithm and it's bit-exact. It has no quality loss. I just committed a description of the algorithm

      https://raw.githubusercontent....

      It is bit exact and lossless: you can get the exact bits of the file back :-)

      It seems to be redundant, even...!

    7. Re:From TFA: bit-exact or not? by gtwrek · · Score: 2

      Both the summary and the article are a little light on details, however the article mentions replacing, (or extending) the arithmetic (lossless) encoder - i.e. Huffman - used within the JPEG and H264 standards.

      This would result in a lossless reduction in size of those files.

      Again, short on details. Any size reduction claims are sorta hand wavy without more details.

      But I'd think the loss-less label (or bit-exact) are ok in this context. Loss less from Jpeg -> DropJpeg.

    8. Re: From TFA: bit-exact or not? by Anonymous Coward · · Score: 0

      So, if you put in a low quality jpeg, then you get out a low quality jpeg :-) :-)

    9. Re:From TFA: bit-exact or not? by dskoll · · Score: 4, Funny

      Compress his comment and all the redundancy will be gone.

    10. Re:From TFA: bit-exact or not? by danielreiterhorn · · Score: 3, Informative

      This is an excellent summary and spot on! Our movie reduction claims are still early on. We'll need to find a more comprehensive set of H.264 movies to test on--and that requires the algorithm to understand B-slices and CABAC. These are both very close, but the code was only very recently developed. We're confident about the JPEG size reduction, however. If you want to learn more about how the JPEG stuff works, you can start with the open source repository from Matthias Stirner here http://www.matthiasstirner.com... Our work on JPEGs is very similarly inspired, but is completely streaming and works on partial JPEGs as well

    11. Re:From TFA: bit-exact or not? by lq_x_pl · · Score: 1

      SSIM does a pretty decent job though

      --
      An internal system operation returned the error "The operation completed successfully.".
    12. Re:From TFA: bit-exact or not? by mentil · · Score: 2

      Even cheap TN monitors use FRC to interpolate to 8-bit, which is better than nothing. IPS monitors can be had for $120, with an 8-bit color panel. Several gaming monitors use native 8-bit with FRC to 10-bit for less than $800, and a few even use native 10-bit.

      --
      Corruption is convincing someone that the selfless ideal is the same as their selfish ideal.
    13. Re: From TFA: bit-exact or not? by Anonymous Coward · · Score: 0

      No, you get out a low quality jpg.

    14. Re:From TFA: bit-exact or not? by thesupraman · · Score: 2

      OK, As you are the author..
      Care to comment to the performance and window length of your encode/decode?

      As of course there is an innate difference between algorithms that must run streaming (for example... h264) and ones
      that can consider all of the content - the same for computational complexity - for video to be useful it must decode in
      real time on 'normal' machines.. Memory footprint for the compression window also matters a lot..

      My guess is that your decode overhead is not high, but you need a LOT of memory resource to hold your decode window,
      and that encode performance is horrific as you need to search a long way for matches.

      If that is true, then its (unfortunately) just a case of not much to see here - as I am sure you know longer windows = better
      compression. If it is not true (you are doing this with short windows and low encode overhead, then congratulations, it
      may well matter.

      jpeg is well known these days as leaving significant lossless rate on the table due to computational limitations when it was
      created. h264 does the same because of its need to support reasonably live streaming latency and be implemented in hardware.

    15. Re:From TFA: bit-exact or not? by UnknownSoldier · · Score: 1

      That's complete nonsense and easy to disprove.

      Your are claiming only 18-bit color - a total of (2^6)^3 = 262144 colors; or 2^6 = 64 colors for primary RGB gradients. That would mean every 4 colors out of 256 primaries you wouldn't be able to tell the difference! Since one can easily tell the difference between:

      0xFF, 0xFE, 0xFD, 0XFC, 0xFB

      That means your claim is complete bullshit.

      QED.

    16. Re:From TFA: bit-exact or not? by thesupraman · · Score: 1

      And just to reply to myself.. it is generally a BAD idea to imply you have an encoding method better than arithmetic (lets hope
      the article horrible miss quoted you there..
      'yet it is well known that applying an additional arithmetic coder to existing JPEG files brings a further 10% reduction in file size at no cost to the file," he says. "Our Pied Piper algorithm aims to go even further with a more efficient encoding algorithm that maps perfectly back to existing formats."'
      As of course it is a numerical impossibility to be more efficient than correctly implemented arithmetic encoding. But of course you know that right?
      Its the modeling of the distribution per token before the arith that matters of course.

    17. Re:From TFA: bit-exact or not? by danielreiterhorn · · Score: 5, Interesting

      Very insightful comments... let me go into detail
      I would say we have several advantages over H.264
      a) Pied Piper has more memory to work with than an embedded device (bigger model)
      b) Pied Piper does not need to seek within a 4 Megabyte block (though it must be able to stream through that block on decode) whereas H.264 requires second-by-second seekability (more samples in model).
      c) Pied Piper does not need to reset the decoder state on every few macroblocks (known as a slice), whereas H.264 requires this for hardware encoders (again, more samples per model).
      d) As opposed to a committee that designed H.264, Pied Piper had a team of 10 creative Dropboxers and guests, spending a whole week identifying correlations between the decoder state and the future stream. That is a big source of creativity! (design by commit, not committee)
      Our algorithm is, however streaming---and it's happiest to work with 4 MB videos or bigger
      Our decode window is a single previous frame--so we can pull in past information about the same macroblock-- but we only work in frequency space right now (there are some pixel space branches just to play with, but none has yielded any fruit so far) so the memory requirements are quite small.
      We are doing this streaming with just the previous frame as our state--- and it may matter--but we have a lot of work to do to get very big wins on CABAC... but given that we're not limited by the very small window and encoding parallelization requirements that CABAC is tied to, Pied Piper could well be useful soon!

    18. Re:From TFA: bit-exact or not? by Punto · · Score: 2

      That's nice but did you ever find out what is the optimal way to jerk off all those people?

      --

      --
      Stay tuned for some shock and awe coming right up after this messages!

    19. Re:From TFA: bit-exact or not? by danielreiterhorn · · Score: 5, Informative

      We also use arithmetic coding...but the gist of the improvement is that we have a much better model and a much better arithmetic coder (the one that VP8 uses) than JPEG did back then. I tried putting the JPEG arithmetic coder into the algorithm and compression got several percent worse, because that table-driven Arithmetic Coder just isn't quite as accurate as keeping counts as the VP8 one.

    20. Re:From TFA: bit-exact or not? by sexconker · · Score: 1, Informative

      Even cheap TN monitors use FRC to interpolate to 8-bit, which is better than nothing. IPS monitors can be had for $120, with an 8-bit color panel. Several gaming monitors use native 8-bit with FRC to 10-bit for less than $800, and a few even use native 10-bit.

      What? Interpolation is WORSE than nothing. you're discarding signal then adding noise in the hopes that it matches up with what should've been there kinda okay.

    21. Re:From TFA: bit-exact or not? by AmiMoJo · · Score: 1

      So, correct me if I'm wrong, but you are basically fixing a few known limitations of JPEG and mobile recorded video files.

      For example JPEG uses RLE, and for decades we have been able to shave about the same as you do off their size by replacing that with a more efficient compression scheme in a lossless way. Mobile recorded video makes similar compromises to reduce processing overhead.

      To be clear, you have no invented a really new, revolutionary compression algorithm like the TV show. No 4k uncompressed video streaming etc, Not really anything like Pied Piper at all.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    22. Re:From TFA: bit-exact or not? by danielreiterhorn · · Score: 5, Interesting

      No one has tried to undo and redo compression of video files before. There are still doom9 forum posts asking for this feature from 12 years ago. I would say that saving lossless percentage points off of real world files is novel and important. And, since it's open source, if someone else gets more %age improvement than what we have, it could become as transformative as you describe.
      But the point is that we have something that's currently useful. It's out there and ready to be improved. It's lossless. And it has never before been tried.
      Also we did the entire algorithm in a week and aren't out of ideas!
      Besides we never claimed it was a revolution--leave that sort of spin to the marketeers...
      we're engineers trying to make things more efficient, a few percentage points at a time :-)

    23. Re:From TFA: bit-exact or not? by Cassini2 · · Score: 3, Interesting

      The grandparent poster is talking about compressing videos. If something is known about the data being encoded, then it is trivial to show that you can exceed the performance of arithmetic coding, because arithmetic coding makes no assumptions about the underlying message.

      For instance, suppose I was encoding short sequences of positions that are random integer multiples of pi. Expressed as decimal or binary numbers, the message will seem highly random, because of the multiplication by an irrational number (pi). However, if I can back out the randomness introduced by pi, then the compression of the resulting algorithm can be huge.

      The same applies to video. If it is possible to bring more knowledge of the problem domain to the application, then it is possible to do better on encoding. Especially with real-life video, there are endless cheats to optimize compression. Also, Dropbox may not be limited by real-time encoding. Drop-box might not even need intermediate frames to deal with fast-forward and out-of-order viewing. Dropbox may be solely interested in creating an exact image of the original file. Knowing the application affects compression dramatically.

      Lastly, application specific cheats can save real-world companies and individuals money and time. Practical improvements count as advancements too.

    24. Re:From TFA: bit-exact or not? by Anonymous Coward · · Score: 0

      Since one can easily tell the difference between

      Can you? If the monitor flickered between 0xFF and 0xFD at 60Hz would you be able to tell? Or would you see 0xFE?

    25. Re:From TFA: bit-exact or not? by Anonymous Coward · · Score: 0

      even less for the blue channel if you target human eyes.. You can probably remove one bit or two

    26. Re:From TFA: bit-exact or not? by JoeMerchant · · Score: 1

      Depends on your monitor, of course, but a whole (recent) generation of "LCD gaming screens" only showed 6 bits of color depth:

      http://compreviews.about.com/o...

      Also, even when you show people the bottom 2 bits, they usually don't perceive them:

      http://rahuldotgarg.appspot.co...

    27. Re:From TFA: bit-exact or not? by JoeMerchant · · Score: 1

      Sorry, I was probably off on the price point - technology has moved on. Still, it wasn't a widely advertised fact that almost all "gaming" LCD monitors sold before IPS were 6 bit, or 6 bit with "dithering" which is not really much better.

    28. Re:From TFA: bit-exact or not? by JWSmythe · · Score: 1

      I believe this was covered in the documentary.

      --
      Serious? Seriousness is well above my pay grade.
    29. Re:From TFA: bit-exact or not? by pla · · Score: 3, Interesting

      Interpolation is WORSE than nothing. you're discarding signal then adding noise in the hopes that it matches up with what should've been there kinda okay.

      1, 2, 3, X, 5, 6. Guess the value of X... Congratulations, you just interpolated the right answer.

      In the case of what the GP described, though, it works out even better than that, because the panel actually "knows" the right answer, so it hasn't "thrown away" information; it just lacks the luminance resolution to display it. It can, however, interpolate in the temporal domain way, way faster than the human eye can tell, to create a color we perceive as the correct value.

      / Go ahead, twitch gamers, tell us all about your ability to resolve sub-millisecond 1.5% color changes. XD

    30. Re:From TFA: bit-exact or not? by AmiMoJo · · Score: 1

      Sure, I'm not saying it isn't useful, it certainly is... But Pied Piper, really?

      It's a good project, no need to over-sell it.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    31. Re:From TFA: bit-exact or not? by thesupraman · · Score: 1

      You seem to be confused as to what arithmetic coding is..
      What you seem to be talking about is the accuracy of the token counts being used to drive the arith coder.. arithmetic coding says nothing about those, except that they have to exist.
      Beating a given implementation? of course, there are several ways..
      But claiming to have better arithmetic coding itself is silly, what you have is better token distribution figures.

      Want to pony up some estimates on performance and memory requirements?

    32. Re:From TFA: bit-exact or not? by Anonymous Coward · · Score: 0

      Have you tried running it on raw video streams or bitmap images?
      Will this work for audio, or text for that matter?

    33. Re:From TFA: bit-exact or not? by thesupraman · · Score: 0

      Sigh, another person who doesnt actually know what arithmetic coding IS.
      Your first statement is completely false. Arithmetic coding is demonstrably perfect within 1 bit over the entire stream at optimally representing the token distributions you give it. Of course you are confusing it with the distribution model and tokeniser preceding it.

      You second statement is also completely wrong, unless the application is 'a system to compress pi'. If it is a system to compress arbitrary length decimal numbers, then good luck compressing pi.. There have been attempts at algorithmic source derivation compressors.. none have even come close to working.

      Your third statement is of course true, however what on earth that has to do with the subject of arithmetic coding I would love to know. It is true only in the context of the tokeniser and statistical model driving the arith coder.

      Your fourth statement is just wasting everyones time, as it has nothing to do with the problem at hand.

      Let me throw another one in for you, just for fun.
      You will notice I have concerns about the runtime (therefore run energy) and memory footprint costs of this full implementation. There is a damn good reason for that which IS related to your last statement. To develop a system that actually saves organisations something, the cost in extra time, energy, and resource must significantly beat the cost of the extra storage required to store it without such treatment.
      Yes, reducing lets say dropboxes total storage requirements by 20% would be a saving, but not if it doubled their computational costs..

      Welcome to the real world. This has been looked at many times, and the questions that matter are well established.

    34. Re:From TFA: bit-exact or not? by tobiasly · · Score: 0

      That's nice but did you ever find out what is the optimal way to jerk off all those people?

      Haha nice.. too bad it'll be moderated as trolling by people who didn't watch the show.

    35. Re:From TFA: bit-exact or not? by thesupraman · · Score: 2

      Its good that you understand that bold claims require clear evidence.. Thank you for replying.

      It is not surprising you can compress h264 using a 4mb block and token decode/recode, because of course that means you are using more resources than it (as you state) and removing functionality..
      I refer you to the following, hopefully you are aware of it..
      http://mattmahoney.net/dc/text.html
      Perhaps you should try your core modeling/tokenising against that, then consider how the ones that beat you do so.. not as an insult to your systems
      but as a guide to current advanced techniques. IF you cannot match them, perhaps you should consider why and if using some of those techniques
      would help... (hint: they will)

      BTW, by your description your system is not useful for streaming - streaming requires the ability to both recover from errors rapidly and to enter a live
      stream at an point withing a small window - that is pretty much WHY h264 has to reset state with great regularity. If you cannot do that then you do
      not support streaming.

      Towards the end you seem to be talking at odds to your 4MB block.. you claim you only need a single previous frame for decode, and that your memory requirements are small.. If that is so then I would suggest that there is other memory also being used.. or you are not fully utilising your 4MB block.

      Just a suggestion, you should compare yourselves with h264 that is extended to use similar resources - that can still be beaten (as it must support streaming and you dont), but you will find its compression goes up significantly - even though you are going 'off book' with respect to its standards.

      What you seem to be doing in effect is decoding the h264 token stream and then recompressing that without some of the functional demands that cause h264 to be structured as it is - that works, just be aware of the limitations you create - they are not just there because of 'committee'.

    36. Re:From TFA: bit-exact or not? by fredgiblet · · Score: 1

      I'm curious to know if you've tried this on a video compressed using the H.264 lossless compression settings.

    37. Re:From TFA: bit-exact or not? by UnknownSoldier · · Score: 1

      And the specific manufacturers and models listed with 18-bit are listed where again???

      Oh wait, they aren't.

      Stop spouting bullshit. At least link to an article with hard facts and a timestamp.

    38. Re:From TFA: bit-exact or not? by Megol · · Score: 4, Informative

      Interpolation isn't about adding noise.

      6 bit (per component) LCDs have for at least 10 years and probably much longer used dithering techniques to produce effective 16.2M colors (compared to a true 8 bit panel with 16.7M colors). This works very well for almost all use cases and provides smooth gradients but have the disadvantage that some image patters can produce flashing due to interference with the dithering algorithm.

      Dithering isn't about adding noise either BTW.

    39. Re:From TFA: bit-exact or not? by Anonymous Coward · · Score: 0

      Want to pony up some estimates on performance and memory requirements?

      It's open source, do it yourself.

    40. Re:From TFA: bit-exact or not? by davester666 · · Score: 2

      There's always somebody who will say they can tell the difference between the original file and the compressed/decompressed copy...

      --
      Sleep your way to a whiter smile...date a dentist!
    41. Re:From TFA: bit-exact or not? by Bruce+Perens · · Score: 5, Insightful

      Rather than abuse every commenter who has not joined your specialty on Slashdot, please take the source and write about what you find.

      Given that CPU and memory get less expensive over time, it is no surprise that algorithms work practically today that would not have when various standards groups started meeting. Ultimately, someone like you can state what the trade-offs are in clear English, and indeed whether they work at all, which is more productive than trading naah-naahs.

    42. Re: From TFA: bit-exact or not? by sonicmerlin · · Score: 1

      Stop whining jesus.

    43. Re:From TFA: bit-exact or not? by Bruce+Perens · · Score: 1

      There used to be a web page called "Your Eyes Suck at Blue". You might find it on the Wayback machine.

      You can tell the luminance of each individual channel more precisely than you can perceive differences in mixed color. This is due to the difference between rod and cone cells. Your perception of the color gamut is, sorry, imprecise. I'm sure that you really can't discriminate 256 bits of blue in the presence of other, varying, colors.

    44. Re:From TFA: bit-exact or not? by Anonymous Coward · · Score: 0

      What bullshit? The 18 bit LVDS interface is well-defined for a reason. Why are you such an idiot?

    45. Re:From TFA: bit-exact or not? by AaronW · · Score: 1

      There are a couple settings in the nVidia tool for Linux to turn on the temporal dithering so it can be done in the graphics card when the monitor doesn't do it. It's easy to turn on in nvidia-settings.

      --
      This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
    46. Re:From TFA: bit-exact or not? by AaronW · · Score: 1

      This information is often transmitted over EDID from the monitor to the host computer. Some graphics cards can use this information to automatically turn on and configure temporal dithering. The Linux nVidia driver can do this with the nvidia-settings utility. It will also report what the monitor is actually capable of.

      --
      This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
    47. Re:From TFA: bit-exact or not? by UnknownSoldier · · Score: 2

      Look, it is real easy to prove whether a monitor is 24-bit or 18-bit:

      * https://imgur.com/XF3LBOz

      Do you see Mach banding in the rows? (Easiest to tell in the greens and grays)

      Yes - your monitor is 24-bit
      No - your monitor is 18-bit

      Show me proof of _any_ LCD monitors that are 18-bit.

    48. Re:From TFA: bit-exact or not? by AaronW · · Score: 1

      That jives with my experience when I took a class that covered compression back in college. The professor, Glen Langdon held a bunch of patents at the time on arithmetic coding. Encoding efficiency could be improved by having it forget old data and making it more dynamic as I recall.

      --
      This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
    49. Re:From TFA: bit-exact or not? by UnknownSoldier · · Score: 1

      I said **primary RGB gradients**, as in, Red, Green, Blue or White for a reason.

      Quit changing the topic to steganography which is an apples to oranges comparison and no one gives a shit about _that_ to tell if your monitor is 24-bit.

    50. Re:From TFA: bit-exact or not? by Anonymous Coward · · Score: 1

      Loseless PNG images just lack the warmth and vibrance of raw BMP images. Even RLE introduces noticable artifacts.

    51. Re: From TFA: bit-exact or not? by Anonymous Coward · · Score: 0

      Twitch gamers also need high end cables to ensure that the extra 2 bits in 10 bit colour are transmitted with warmth and timbre.

    52. Re:From TFA: bit-exact or not? by stridebird · · Score: 1

      ...with hard facts and a timestamp

      Very nice phrase, very succinct. You could walk into a bank and announce a robbery with it.

    53. Re:From TFA: bit-exact or not? by arglebargle_xiv · · Score: 1

      Loseless PNG images just lack the warmth and vibrance of raw BMP images. Even RLE introduces noticable artifacts.

      That only works if you use a tube-based computer to do the decoding.

    54. Re:From TFA: bit-exact or not? by viperidaenz · · Score: 1

      Cool, my laptop is 24 bit

    55. Re:From TFA: bit-exact or not? by danielreiterhorn · · Score: 2

      Fair, but not all streaming use cases require seeking within a 4MB block (depends on the application). For those applications that require sub-4MB seeking, this won't be a good algorithm.

      Also there is a branch off the master repo that is exactly "h264 that is extended to use similar resources." (branch name h264priors) So yes--great idea.
      h264priors does pretty well, but not quite as good as the master branch--we're still getting to the bottom of how it does on a representative set of videos-- this is a week's work so far, not a dissertation :-)

      This won't work on text data since it uses state deep within a video decoder that doesn't apply to text streams (like what above-neighbor color presence bits are set).

    56. Re:From TFA: bit-exact or not? by subreality · · Score: 1

      Dithering isn't about adding noise either BTW.

      "Dither is an intentionally applied form of noise used to randomize quantization error..." -- https://en.wikipedia.org/wiki/...

      It's also unrelated to interpolation ("...a method of constructing new data points within the range of a discrete set of known data points..."). No new data points are generated - the monitor knows the exact RGB values it wants to display; instead, it's about doing the best job presenting them within the limits of the hardware.

      Regardless, your original point is still correct: 6-bit panels do make meaningful use of 8-bit input. Throwing away two bits per channel will visibly degrade an image.

    57. Re:From TFA: bit-exact or not? by phantomfive · · Score: 1

      Well said.

      --
      "First they came for the slanderers and i said nothing."
    58. Re:From TFA: bit-exact or not? by Anonymous Coward · · Score: 0

      Jesus Christ, armchair engineering at it's worst.

    59. Re: From TFA: bit-exact or not? by Anonymous Coward · · Score: 0

      Not interpolation, not dithering... maybe pulse width modulation?

    60. Re:From TFA: bit-exact or not? by thegarbz · · Score: 1

      One of the great ways Wikipedia can contradict itself.

      They call dithering noise
      They describe the process of dithering as adding a pattern.
      They describe noise a stochastic and therefore random phenomenon.

      All three can't be right, and in this case it's the first. Dithering can be achieved by adding noise or by adding engineered patterns and in visual processing it's usually the latter. In audio processing it's usually the former.

    61. Re:From TFA: bit-exact or not? by Solandri · · Score: 1

      Given that CPU and memory get less expensive over time, it is no surprise that algorithms work practically today that would not have when various standards groups started meeting.

      I remember when the preliminary JPEG standard first showed up in the early 1990s, a 640x480 8-bit GIF would decode and display in about a second on my PC. A 640x480 24-bit JPEG took about 30 seconds. JPEG's strength back then was its much smaller file size. Aforementioned GIF was about 200 kB, while the JPEG was about 35 kB with better colors (if your video card could do 24-bit color). That was a huge deal when most of us were still using 14.4 kbps modems and hard drives were around 500MB - 2GB.

    62. Re:From TFA: bit-exact or not? by Anonymous Coward · · Score: 1

      I guess that the problem is that this sounds like a PR stunt: Ten engineers at Dropbox do what many more engineers have been doing for years and only in a week! And they don't even are specialized on this! Screw those Qualcom (for example) engineers that don't know shit about video compression after full PhDs on the topic.

      Either back your claims staying compliant, or compare with the state of the art. With unsubstantiated claims, you are pissing large teams of engineers that spent lots of time improving current video standards and entire PhD's theses. This is why you get such lovely replies.

      No one has tried to undo and redo compression of video files before

      Really? The transcoding concept is pretty old and a quick query to ieeexplore shows a few items:

      http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=413496
      http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6889252
      http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7080596
      http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4380010
      http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5235388

      As for the JPEG improvements, the inefficiency of the non-ac entropy coding stage is well know. Last time I checked nobody was using it due to patents. I.e., no support -> nobody uses it -> no support -> ... You know there are more efficient image coding algorithms that classic JPEG (e.g., JPEG2000), and using them would fall in the same category as modifying classic JPEG.

    63. Re:From TFA: bit-exact or not? by sexconker · · Score: 1

      You have no idea what X is. You think 4 looks nice, but it could be anything.

      The panel doesn't know what the right answer is. The display controller might see it in the signal, but it can't send that information to the panel if it's processing 6 bpp.

      For 6-bit panels, you simply can't do anything to recover the signal in any correct way. Interpolation doesn't do anything but make everything look worse. Temporal interpolation is fucking terrible, and no, it's not "way faster than the human eye can tell". Most panels have a hard enough time with ghosting and motion blur already, even when adding timed strobing to the mix.

      I don't think gamers are the ones who are concerned with color accuracy - gamers tend to buy the TN or other trash panels that absolutely fuck color in favor of speed. As for sub millisecond color changes? Please show me an LCD panel with a true response time under 1 ms. As for 1.5% color changes, that's a huge difference for anyone with normal color vision. A good panel will show this difference quite readily. http://i.imgur.com/w8qQ7Kg.gif for an example. If you can't see it flashing, get a better monitor or get your eyes checked.

    64. Re:From TFA: bit-exact or not? by Lisandro · · Score: 1

      1, 2, 3, X, 5, 6. Guess the value of X... Congratulations, you just interpolated the right answer.

      Cool. Was it 3782?

    65. Re:From TFA: bit-exact or not? by sexconker · · Score: 0

      Interpolation isn't about adding noise.

      6 bit (per component) LCDs have for at least 10 years and probably much longer used dithering techniques to produce effective 16.2M colors (compared to a true 8 bit panel with 16.7M colors). This works very well for almost all use cases and provides smooth gradients but have the disadvantage that some image patters can produce flashing due to interference with the dithering algorithm.

      Dithering isn't about adding noise either BTW.

      Interpolation is about adding noise by attempting to recreate / create data that was lost / never in the original signal.
      You CANNOT assure that the data is correct. It is therefore not signal. It is therefore noise, however much you try to make it subjectively look like it isn't, it mathematically is noise.

      Dithering is all about adding noise that looks like noise to various degrees (random dithering, ordered dithering) to achieve a subjective aesthetic, often to achieve a smoothing effect to mask limited resolution (dithering in a 255-color GIF, for example) or to add a noisy effect to mask noise in the original signal (digital film grain effects, for example).

      Both interpolation and dithering are adding noise, by definition. Whether or not you find them acceptable is your personal problem.

    66. Re: From TFA: bit-exact or not? by Anonymous Coward · · Score: 0

      Who is Whining Jesus and why must he be stopped?

    67. Re:From TFA: bit-exact or not? by serviscope_minor · · Score: 1

      I get x=10

      My interpolating formula is roughly:

              0.10323x^6 + -1.66775x^5 + 9.56461x^4 + -22.37135x^3 + 14.13956x^2 + 16.90876 x -15.67704

      --
      SJW n. One who posts facts.
    68. Re:From TFA: bit-exact or not? by thesupraman · · Score: 1

      I would really REALLY suggest you spend a little more time researching those other compressors you so easily consider to be 'text streams', they are not.
      for example, one of them also happens to hold the current record for non lossy image compression..

      Its all a matter of feeding them the right models, and I can guarantee that a good PPM or CM set of models will do much better than a weeks worth
      of model development - but of course they reason they WILL is because they take care of the downstream details - the work you have done in finding
      context is exactly what they do need.

      Remember, there are three stages to compression, and using 'state deep within a video decoder that doesn't apply to text streams (like what above-neighbor color presence bits are set)' is the top level - finding context to model. What I would suggest is that the decades of research as to how best to utilise that context
      could be of use... then again perhaps you have done better than they can - and that is what testing against the corpus will show.
      When it comes to non lossy compression, there is no such thing as a text compressor, there is no such thing as an exe compressor, there are just different
      models of data, and different ways of using those models.

      You are not the first, or I would suspect the last to look at bitstream detokenisation and recompression in its many forms..

      If you dont read up on this, you are missing something that matters, for example:
      https://en.wikipedia.org/wiki/PAQ
      http://www.squeezechart.com/bitmap.html
      http://mattmahoney.net/dc/dce.html
      http://www.maximumcompression.com/data/jpg.php
      But then perhaps you are aware of that all.

      Dont get me wrong, 22% is VERY respectable on jpeg.. but why not try to do better.

    69. Re:From TFA: bit-exact or not? by SirSlud · · Score: 1

      May I interest you in a tall glass of perspective?

      --
      "Old man yells at systemd"
    70. Re:From TFA: bit-exact or not? by serviscope_minor · · Score: 1

      Quantization is also noise. Quantization + dithering doesn't necessarily add more noise overall.

      Interpolation is something else entirely and can replicate the exact fourier spectrum (i.e. be noiseless).

      --
      SJW n. One who posts facts.
    71. Re:From TFA: bit-exact or not? by serviscope_minor · · Score: 1

      No one has tried to undo and redo compression of video files before.

      Great job! I've thought about it before, as clearly have others given those doom9 posts. I'm glad to hear it works well and that someone's done it! It sounds like you were going for 4mb blocks if I gather correctly?

      How much do you gain/lose going to 8 or 2mb block.

      --
      SJW n. One who posts facts.
    72. Re:From TFA: bit-exact or not? by Khyber · · Score: 1

      "Quantization + dithering doesn't necessarily add more noise overall."

      Have you ever used node-based texture generation? Quantization + dithering = INSANE FUCKING AMOUNTS OF NOISE.

      --
      Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
    73. Re:From TFA: bit-exact or not? by Khyber · · Score: 1

      Sweet. I see the mach banding all over the fucking place.

      Perhaps you should ditch your old 6-bit Apple monitors, sonny, and catch up with current technology.

      Even my MONOCHROME CRT shows all the mach banding. 24-bit GREYSCALE FTW.

      --
      Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
    74. Re:From TFA: bit-exact or not? by Khyber · · Score: 1

      "no one gives a shit about _that_ to tell if your monitor is 24-bit."

      As I said before, get with real fucking technology. Catch up, you're way the fuck behind. 10-bit (that's 30-bit A-RGB colorspace) 4K monitors, S-IPS, 28" for $600.

      Yawn. As I told you before, we're not in the days of your shitty Apple displays.

      --
      Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
    75. Re:From TFA: bit-exact or not? by Khyber · · Score: 1

      "Welcome to the real world. This has been looked at many times, and the questions that matter are well established."

      And were answered about ten years ago when we got better fucking technology.

      I've bothered to compile the source package and look at the straight-forward code. You very fucking obviously have not.

      --
      Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
    76. Re:From TFA: bit-exact or not? by Khyber · · Score: 1

      "No one has tried to undo and redo compression of video files before."

      I'm sorry, that's just nonsense. What do you think a format converter does?

      --
      Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
    77. Re:From TFA: bit-exact or not? by Khyber · · Score: 1

      "I guess that the problem is that this sounds like a PR stunt: Ten engineers at Dropbox do what many more engineers have been doing for years and only in a week!"

      Technology advances. What took a dedicated team of people a couple of years to do 20 years ago (2D game design) takes maybe a month or two at most with a couple of people now days. MY current 2D game is proof of that. There's two bits of code to fix and then it's ready to release.

      --
      Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
    78. Re:From TFA: bit-exact or not? by gustygolf · · Score: 1

      So, how come you're only getting 20 % compression with JPEGs? StuffIt (a Mac-only compression program/format) could do around 50 % thanks to a specialised algorithm, or so I heard. And that was ten years ago.

      --
      "Slow Down Cowboy! It's been 58 minutes since you last successfully posted a comment" -- slashdot, driving users away.
    79. Re:From TFA: bit-exact or not? by Bengie · · Score: 1

      I can immediately see the mach banding if I'm about 1.5' away, but if I sit back in my chair, about 2.5' away, I don't see it at all for the first few seconds, but it can be see after a few seconds. I do have a cheap $130 LED LCD 2ms gaming monitor. Very power efficient though, 23 watts.

    80. Re:From TFA: bit-exact or not? by serviscope_minor · · Score: 1

      No I haven't (I don't know what node-based texture generation is) but my comment was "not necessarily", not "never".

      Here's some code (octave) which generates a signal, quantizes it to 10 levels with and without dithering. If you run it, you'll see that you start getting substantial extra noise from dithering below about 1E-3. I've put noise in too, and 1e-3 corresponds to noise with a scale of 0.03, which is much less than the quantization error of 0.1

      xs = -1000:1000;
      % A signal
      ys = min((xs/50).^2, 100) / 100;
       
      [ys_p, f] = periodogram(ys);
       
      noise=randn(size(ys))*0.03;
      n_p=periodogram(noise);
       
      %Quantize
      ysq = round(ys*10)/10;
       
      ysq_p = periodogram(ysq);
       
      ysqn = round(ys*10 + (rand(size(ys))-.5)*0.99)/10;
       
      ysqn_p = periodogram(ysqn);
       
      clf
      hold on
      semilogy(f, n_p);
      semilogy(f, ysqn_p, 'g');
      semilogy(f, ysq_p, 'r');
      semilogy(f, ys_p, 'k');
      legend('Noise', 'Dithered', 'Quantized', 'Signal');

      --
      SJW n. One who posts facts.
    81. Re: From TFA: bit-exact or not? by Anonymous Coward · · Score: 0

      How did you get 16.x million colors from 8 bits. The best the rest of us can manage is 256. I think your referring to 24 bit.

    82. Re:From TFA: bit-exact or not? by fnj · · Score: 0

      ...compress h264 using a 4mb block...

      Really? Four millibits?

    83. Re:From TFA: bit-exact or not? by fnj · · Score: 1

      That jives with my experience

      It is always jarring to find a college grad who is not fluent in the difference between such common words as jive and jibe.
      "Harlem jive is the argot of jazz."
      "Your belief does not jibe with reality."

    84. Re:From TFA: bit-exact or not? by hackwrench · · Score: 1

      So, if I understand correctly, what you're saying is a panel that doesn't know what the right answer is, is for cows?

    85. Re:From TFA: bit-exact or not? by Anonymous Coward · · Score: 0

      My very-sub-$1000 UP2414Q does more than 6-bits just fine, thanks.

    86. Re:From TFA: bit-exact or not? by Chelloveck · · Score: 1

      The original poster wasn't clear. In the linked image the top half of each row of color is 24-bit. The bottom half of each row is 18-bit. So on a 24-bit display you should see color banding in the bottom half of each row, but not in the top half. I had to zoom way in before I realized that each row was split in half.

      --
      Chelloveck
      I give up on debugging. From now on, SIGSEGV is a feature.
    87. Re:From TFA: bit-exact or not? by Existential+Wombat · · Score: 1

      You need to use a high performance digital video cable for best results. I recommend Monster.

    88. Re:From TFA: bit-exact or not? by Anonymous Coward · · Score: 0

      Using a sequence of triplets of x, x*2, x*2+1, I got 2.5.

    89. Re: From TFA: bit-exact or not? by Anonymous Coward · · Score: 0

      Make sure it is coated in 24k gold first. If it ain't gold, it won't hold.

    90. Re:From TFA: bit-exact or not? by UnknownSoldier · · Score: 1

      Thanks for helping out with the instructions. That is exactly right !

    91. Re:From TFA: bit-exact or not? by UnknownSoldier · · Score: 1

      I think you have me confused with someone else ...

      > As I told you before, we're not in the days of your shitty Apple displays

      1. Who said anything about Apple displays??
      2. When?

      > 10-bit (that's 30-bit A-RGB colorspace) 4K monitors, S-IPS, 28" for $600.

      You're talk about %0.001 of users. We're not talking about 10-bit -- we're talking about the claim of 6-bit/channel monitors and wanting proof of _actual_ monitors.

      Thanks for the heads up 10-bit displays are finally south of $500 though!

      * http://www.amazon.com/Asus-PA2...

      I'm still holding out for the "holy grail" of monitors is 10-bit, 120+ Hz refresh rate, 2560x1440, 28"+.

    92. Re:From TFA: bit-exact or not? by EETech1 · · Score: 1

      I see the same pattern X 4 on the top half if I zoom in?!?

      Any Ideas what that is all about?

    93. Re:From TFA: bit-exact or not? by RockDoctor · · Score: 1
      Where he should, of course, have stuck in "noticeable."

      wrong on so many levels.

      --
      Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
    94. Re:From TFA: bit-exact or not? by RockDoctor · · Score: 1

      Given that CPU and memory get less expensive over time,

      Ouch!

      Certainly that has happened in the past (I too remember paying £200 extra for the 4MB version of a computer instead of the 1MB version) ; there are processes in the production pipeline that should keep the trend downwards for a decade or so.

      Beyond that ... much thinner ice. And looking into the immediate future (speaking as a geologist, say doubling our species' age to ~100,000 years ... well, you'll need sub-gluon storage, and a hard drive failure could locally cause heat-death of the visible universe.

      --
      Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
    95. Re:From TFA: bit-exact or not? by Anonymous Coward · · Score: 0

      What I really want to know is. Are you going to sell out to Gavin Belson?

    96. Re:From TFA: bit-exact or not? by Bruce+Perens · · Score: 1

      Someday we really will hit physical limits. But every forecast that we were close to them that I've heard since 1980 has failed.

    97. Re:From TFA: bit-exact or not? by danielreiterhorn · · Score: 1

      Format converters usually modify the metadata of the file, often inadvertently.
      The goal here is doing something lossless, bit exact, and reversible.
      I haven't found any format converters that can do *that*.
      If you're willing to make it approximately the same, or have the pixels the same in the videos that's one thing. But actually keeping every bit the same after a round trip is not something I know about, other than the losslessh264 pied piper project
      If you know of any, please do link here--because it might support a wider range of container formats, etc.

    98. Re:From TFA: bit-exact or not? by Anonymous Coward · · Score: 0
    99. Re:From TFA: bit-exact or not? by Anonymous Coward · · Score: 0

      And some LCD panels just use old style bayer dithering. It's usually very obvious, even for still images. The worst offenders I've seen are Sony laptops which combine a 6-bit panel with such a non-linear response that the result often has about the same image quality a 5-bit panel would have.

    100. Re:From TFA: bit-exact or not? by metrix007 · · Score: 1

      You have little credibility here. You are coming of as someone desperate to prove what they are talking about.

      Except you are doing it without showing anything you have accomplished, while talking down and criticism without reason someone who HAS shown what they have accomplished. They likely know more than you do on this subject, so just shut the fuck up. Please.

      --
      If you ignore ACs because they are anonymous - you're an idiot.
    101. Re:From TFA: bit-exact or not? by metrix007 · · Score: 1

      No one uses that retarded unit of measurement.

      --
      If you ignore ACs because they are anonymous - you're an idiot.
  2. Real Numbers? by Anonymous Coward · · Score: 0

    What are the real numbers? 13% compression is negligible really. But, that is compressing compressed data(H.264 and JPEG).

    What compression ratio can they achieve on the original uncompressed data? How does this new compression compare to h.265 compression of MPEG data?

    1. Re:Real Numbers? by HornWumpus · · Score: 3, Insightful

      How much CPU time to compress/decompress. Standard compression is hardly the best, just a good compromise between compression and usability.

      --
      John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
    2. Re:Real Numbers? by Anonymous Coward · · Score: 3, Funny

      Meh, doesn't matter. Any processing load will be moved to an unoptimized javascript implementation that runs in the end users browser.

    3. Re:Real Numbers? by Dracos · · Score: 1

      Exactly. We need to see the Weissman score.

  3. Benchmarks by Anonymous Coward · · Score: 0

    No benchmarks vs GIF or PNG? Article fail.

    1. Re:Benchmarks by Bengie · · Score: 2

      Would be nice to compare it against PNG, but the context is if you're storing other people's data and you have no control of what format they use.

  4. what's the score on this algorithm? by tanimislam · · Score: 0

    What's the Weissmann score for this algorithm on interesting and representative media?

  5. No description by Iamthecheese · · Score: 0

    No description of the algorithm. No performance measurements. No solid data. No useful information. No story.

    --
    If video games influenced behavior the Pac Man generation would be eating pills and running away from their problems.
    1. Re:No description by harrkev · · Score: 2, Insightful

      And yet you can download the source code yourself and compile it.

      --
      "-1 Troll" is the apparently the same as "-1 I disagree with you."
    2. Re:No description by danielreiterhorn · · Score: 3, Informative

      Link to a layman's description of the algorithm here: https://raw.githubusercontent.... It's bit exact and lossless. We haven't done comprehensive studies, but on the included test files it gets 13% compression on H.264 movies. Similarly the not-committed, but similar JPEG algorithm gets 22% on a comprehensive sample set of photos from a variety of devices.

    3. Re:No description by ottothecow · · Score: 3, Insightful
      Yeah, but I've got to say that it is nice to see a bunch of comments actually talking about the compression algorithm.

      The tiny bit of slashdot community that is left still talks about the actual things. If this were on Reddit, it would just be a stream of lame, overused references to the Silicon Valley show. Somebody would say "This guy fucks". Somebody else would make a joke about "Optimal tip-to-tip efficiency". Then somebody would ask "Do you know what tres commas means".

      Those things were hilarious when put forth by a group of comedic actors. They are incredibly lame when they are overused every single time something even comes tangentially close to referencing them.

      So while this particular story still sucks...it could be a lot worse.

      --
      Bottles.
    4. Re:No description by AuMatar · · Score: 1

      This guy fucks.

      Had to be done.

      --
      I still have more fans than freaks. WTF is wrong with you people?
    5. Re: No description by thePig · · Score: 1

      Superb work, Danielreiterhorn . Amazing work, and amazing, providing it as open source.

      Would you mind if I ask for the motivation to put it as open source?
      When it provides 10-20% compression, it would be worth a bit of money, right. In such a case why are you keeping it under BSD licence?
      I am in awe of people who do great things without expecting anything in return. Because try as I may, I can never be truly altruistic. So, I try to pick the brains of the ones who are to really understand their motivations.

      Are there any hidden selfish motivations, or is it purely altruistic? If I can understand, I will be able to understand a bit more about people. And not me alone, many others in the forum too. Will you be able to help, Danielreiterhorn?

      --
      rajmohan_h@yahoo.com
    6. Re: No description by danielreiterhorn · · Score: 3, Interesting

      It depends if the goal is to a) market a hip algorithm or b) store movies more efficiently.

      Open source makes it easy for anyone to contribute to the algorithm.
      The more people contribute, the better the code will be at compressing movies.

      The better it is at compressing movies, the fewer resources it will take to store them.
      This isn't a zero-sum game we're talking about: it's about making the world a more efficient place, one bit at a time.

      But the bottom line is that, it's a lot easier for many organizations to contribute to a code base if there are no strings attached.

      Interest from an article like this can get people playing around with compression.
      Maybe another 10% gain is right around the corner.

  6. "Lossless" on lossy encodes? by bradgoodman · · Score: 1

    "22% better compression" without "notable" quality loss on files which are ALREADY compressed in formats in which loss may be apparent is a far cry from their ultimate "goal" of "lossless" compression.

    1. Re:"Lossless" on lossy encodes? by Anonymous Coward · · Score: 0

      It's fully lossless and bit exact. The algorithm for H.264 is public on github

    2. Re:"Lossless" on lossy encodes? by SirSlud · · Score: 1

      There is nothing confusing here. And yet you've managed to be completely confused.

      --
      "Old man yells at systemd"
  7. naysayers are missing the point by Ionized · · Score: 4, Informative

    comparing this to PNG or h.265 is missing the point - this is not a compression algorithm for creating new files. this is a way to take files you already have and make them smaller. users are going to upload JPG and h.264 files to dropbox, that is a given - so saying PNG is better is moot.

    1. Re:naysayers are missing the point by Dahamma · · Score: 1

      Except unless standard DECODERS can handle them it's fairly useless in practice.

      From what I can tell from the source & description posted it does NOT conform to H.264, so what's the point? SOMETHING has to decode it, and it's clearly not going to be standard hardware decoders. So it's useless as CDN storage. Same applies to PNGs for most usage.

      And besides, H.265 implements everything they did and MUCH more. And if you want even further lossless compression that humans can't notice there are proprietary solutions like Beamr that can get you even better compression.

    2. Re:naysayers are missing the point by mobby_6kl · · Score: 2

      It's not useless - it can be decoded by dropbox when serving the files.

      Seriously, it's that simple: users upload existing files to dropbox, they get loslessly compressed by this algorithm, and decompressed on access. Bam.

    3. Re:naysayers are missing the point by SirSlud · · Score: 1

      Jesus dude. Dropbox controls the in and the out of the pipe. So their client can compress further on upload and decompress when downloading/streaming. I don't understand how a simple business case can be so confusing for people.

      --
      "Old man yells at systemd"
  8. Hard to believe by Anonymous Coward · · Score: 0

    H.264 and JPEG are supposed to output random-looking bytes, by definitions.

    If you can compress those, something is very wrong.

    1. Re:Hard to believe by unrtst · · Score: 3, Informative

      H.264 and JPEG are supposed to output random-looking bytes, by definitions.

      If you can compress those, something is very wrong.

      Where'd you get that idea?

      $ bzip2 test.jpg
      $ gzip -9 test.jpg
      $ ls -la
      -rw-r--r-- 1 me me 1519279 Feb 7 2012 test.jpg
      -rw-r--r-- 1 me me 1430059 Aug 28 16:42 test.jpg.bz2
      -rw-r--r-- 1 me me 1427872 Aug 28 16:44 test.jpg.gz ... I also tried it on a max-compressed file. Opened that test.jpg up in gimp, then saved with quality at 0 (lowest), and re-did the compressing on both:
      -rw-rw-r-- 1 me me 189230 Aug 28 16:50 test2.jpg
      -rw-rw-r-- 1 me me 111623 Aug 28 16:50 test2.jpg.bz2
      -rw-rw-r-- 1 me me 117971 Aug 28 16:51 test2.jpg.gz

      Feel free to try the same experiment yourself on random jpg's you find online, or your own.

      The goal of H.264 and JPEG isn't minimum file size at all costs. It's also not encryption. Your premise is wrong, and even old tech can compress this stuff further than it may already be.

    2. Re:Hard to believe by Kjella · · Score: 3, Interesting

      H.264 and JPEG are supposed to output random-looking bytes, by definitions. If you can compress those, something is very wrong.

      Well, it seems to be applied per codec not a general compression algorithm like zip. And they probably say mobile-encoded for a reason, simple encoders have to work on low power and in real time, random JPGs from the Internet is probably the same. From what I can gather the algorithm basically take a global scan of the whole media and applies an optimized variable-length transformation making commonly used values shorter at the expense of making less commonly used values longer. Nothing you couldn't do with a proper two-pass encoding in the codec itself, the neat trick is doing it to someone else's already compressed media afterwards in a bit-reversible way. Very nice when you're a third party host, assuming the increase in CPU time is worth it but not so useful for everyone else.

      --
      Live today, because you never know what tomorrow brings
    3. Re: Hard to believe by Anonymous Coward · · Score: 0

      Try the -h param for ls, calculating is for computers.

    4. Re: Hard to believe by ttucker · · Score: 2

      Try the -h param for ls, calculating is for computers.

      Not really useful in this context, because it truncates significant digits.

    5. Re: Hard to believe by Anonymous Coward · · Score: 0

      Learn your ls options people. I always religiously do ls -allah

    6. Re:Hard to believe by sjames · · Score: 1

      Actually, perfect compression WILL make the output resemble random bits if looked at statistically. What OP was missing is that perfect compression is hard and that most compression features a number of compromises for CPU speed, memory requirements, seekability, resilience, etc.

    7. Re:Hard to believe by sribe · · Score: 2

      H.264 and JPEG are supposed to output random-looking bytes, by definitions.

      Bullshit. JPEG, *by its definition*, after the quantization step, uses a fairly modest & inefficient compression algorithm, because it was designed to be run on embedded systems with very modest processing power.

    8. Re:Hard to believe by Anonymous Coward · · Score: 0

      Since this is Dropbox, it wouldn't be any trouble at all to add this functionality to the client, meaning the additional compression occurs on the users machine instead of the Dropbox servers, saving not only space on their cloud but cutting their bandwidth consumption.

    9. Re:Hard to believe by Dahamma · · Score: 1

      The goal of H.264 and JPEG isn't minimum file size at all costs. It's also not encryption. Your premise is wrong, and even old tech can compress this stuff further than it may already be.

      True, but that's obvious to you and me - which does reinforce the point that the article & Dropbox "innovation" is pretty stupid.

      Not to mention JPEG and H.264 are old news - if you want to compare "new" development JPEG2000 and H.265 are the benchmarks...

    10. Re:Hard to believe by Dahamma · · Score: 1

      And they probably say mobile-encoded for a reason, simple encoders have to work on low power and in real time,

      Actually, the encoders are rarely limited by power or CPU cycles. The decoders are, but the great thing about lossy encoding like JPEG/H.264/H.265 is the encoders can continually be improved without affecting the decoders.

      That said, the reason this article is pointless is you can't USE the results - it breaks H.264 standards so HW decoders can't handle it, and no one wants to decode some proprietary format on the fly to stream to standard H.264 decoders...

    11. Re:Hard to believe by dbIII · · Score: 1

      it breaks H.264 standards so HW decoders can't handle it

      It's not pointless for dropbox, since they can store it compressed a bit more and decompress it when a user asks for it. It's also not pointless for software decoders such as VLC that have access to a bit more memory and CPU capability to deal with it.
      Two points is a bit more than pointless by my count.

      Along those lines it doesn't seem that long ago that arguments about using floating point in mp3 decoding was seen as a flaw.

    12. Re:Hard to believe by Dahamma · · Score: 1

      t's also not pointless for software decoders such as VLC that have access to a bit more memory and CPU capability to deal with it.

      If you are talking flexibility of decoders and extra CPU, why make a non-compatible file based on a decade+ old codec when you could just re-encode to H.265? Same with JPEG92 vs JPEG2000.

      Along those lines it doesn't seem that long ago that arguments about using floating point in mp3 decoding was seen as a flaw.

      Totally different issue, since it wasn't about making the codec non-compatible, but how it's decoded.

      So, not entirely pointless, but not nearly as interesting as the article pretends it is, which is typical of business journalists who don't really understand tech. Many of the "inefficiencies" have already been solved with more recent codecs. Retrofitting things like variable macroblock sizes and alternate compression strategies onto old formats is not particularly revolutionary...

    13. Re:Hard to believe by SuricouRaven · · Score: 1

      Perfect compression is also noncomputable.

  9. Stacker by HalAtWork · · Score: 1

    Time for the new wave of Stacker clones, maybe a new DoubleSpace err DriveSpace?

    1. Re:Stacker by denis-The-menace · · Score: 1

      I'd settle for Mr. 7-Zip (Igor Pavlov) to add this method to his program.

      With a BSD license, he should be able to do it. (Time permitting)

      --
      Obama's legacy: (N)othing (S)ecure (A)nywhere and (T)error (S)imulation (A)dministration
  10. Bad Car Analogy by PPH · · Score: 1

    We put a spoiler on a Prius.

    --
    Have gnu, will travel.
    1. Re:Bad Car Analogy by tepples · · Score: 1

      We put a spoiler on a Prius.

      Which one? "The Lone Gunmen are dead"? "Snape kills Dumbledore"?

  11. Can it compress 3d videos? by leipzig3 · · Score: 4, Funny

    Can it compress 3d videos? That seems to be a real challenge.

    1. Re:Can it compress 3d videos? by Anonymous Coward · · Score: 0

      The best way I found at compressing 3D videos was a hammer, and repetitive hits to it.

      Then I just shovelled it in to the bin and moved on with my life.

      Worked excellent if I do say so myself.

    2. Re:Can it compress 3d videos? by Anne+Thwacks · · Score: 1

      I have a more efficient system: put them on VHS tapes and shove them under a road roller!

      --
      Sent from my ASR33 using ASCII
    3. Re:Can it compress 3d videos? by danielreiterhorn · · Score: 2

      If you organize the pixel data of the 3d movie into a spiral, then the algorithm my 9 colleagues and I put together will operate on it "middle-out". This can allow us to compress movies with a Weissman score that's off the chart!

    4. Re:Can it compress 3d videos? by SuricouRaven · · Score: 1

      Have you tried a Hilbert curve rather than a spiral? It might mean fewer edge transitions, better prediction.

  12. Image/Video libraries by phorm · · Score: 2

    I wonder if somebody can develop this into a transparent kernel-module.
    13-22% of a video library could mean saving several hundred GB on a multi-terabyte collection. Depending on if it decompresses on-the-fly and how hard it is on a CPU, it may also reduce disk I/O somewhat.

    1. Re:Image/Video libraries by godrik · · Score: 1

      Indeed, CPU cost is a real problem. The key in a compression algorithm for a network storage server is to be able to perform compression/decompression without much impact on latency and bandwidth. Also in these days of massive server farm, the impact on energy consumption might be interesting to see; but it is difficult to measure directly as compression might result in less disk spinning, machine kept on, but more CPU usage.
      An interesting engineering problem overall.

    2. Re:Image/Video libraries by Anonymous Coward · · Score: 0

      thats why you would run dedupe on your san/nas fs?

    3. Re:Image/Video libraries by AaronW · · Score: 1

      This would be better suited for FUSE.

      --
      This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
  13. But if you had to give a handjob to every armchair by Anonymous Coward · · Score: 0

    compression expert on Slashdot, how long would that take.

  14. bah, I've got it down to 50% compression: by Thud457 · · Score: 2

    switch (rand % 2)
    {
    case 0 : /* here's some pr0n */;
    break;
    case 1 : /* here's a funny cat pikshur */;
    default : /* are you're really sure you aren't looking for some pr0n? */
    }

    --

    the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff

  15. What was the wisemen score by Anonymous Coward · · Score: 0

    The only thing that matters is the wiseman score they achieved.

    1. Re:What was the wisemen score by Anonymous Coward · · Score: 0

      It's off the chart.

    2. Re:What was the wisemen score by Anonymous Coward · · Score: 0

      Weissman Score you fucking prick!

    3. Re:What was the wisemen score by knorthern+knight · · Score: 1

      > Weissman Score you fucking prick!

      You have a +10 Wiseguy Score.

      --

      I'm not repeating myself
      I'm an X window user; I'm an ex-Windows user
    4. Re:What was the wisemen score by Anonymous Coward · · Score: 0

      I wasn't actually insulting the AC, just trying to sound like Erlich :)

  16. About yourself? by Anonymous Coward · · Score: 0

    Are you a racist that likes little girls?

    1. Re:About yourself? by Anonymous Coward · · Score: 0

      Are there racists that like fat girls?

  17. EXT4 by Sam36 · · Score: 0

    I am currently merging this into the EXT4 master branch at kernel.org

  18. Re: Adding it to 7zip - would that make it ... by Anonymous Coward · · Score: 0

    8zip?

  19. Call me back ... by quenda · · Score: 1

    when they have made Nip Alert a reality.

  20. This isn't really a new thing. by An+Ominous+Cow+Erred · · Score: 1

    Lossy codecs typically have two major stages -- the lossy parts (e.g. dct while throwing out some component frequencies, motion prediction, etc.) -- followed by lossless entropy coding (e.g. Huffman in JPEG) to further compress the resultant data.

    These compression algorithms just decompress the lossless part of the process and then recompress it with a more efficient lossless algorithm. On decompression, it then recompresses with the standard algorithm. In some cases (e.g. JPEG) you can keep a copy of the Huffman table that lets you recompress the data into a bit-accurate copy of the original file (you can include a small bit of extra information to make sure any remaining metadata matches up exactly).

    The MacOS compression software StuffIt did this years ago.

    1. Re:This isn't really a new thing. by Anonymous Coward · · Score: 0

      ahhhh so that's why stuffit worked so well

    2. Re:This isn't really a new thing. by Dahamma · · Score: 1

      Postprocessing software like Beamr (look it up yourself...) can often do even better for video. Basically the H.264 codecs are fairly conservative on their quantizers, with a minimum that's way above what they could get away with. Way better off throwing away useless data than figuring out how to compress it.

  21. To the devs by ezdiy · · Score: 1

    After reducing all this dropbox grandstanding filler and chest thumping (is that corporate policy or something? this is certainly not the first time), it all boils down to:

    You took frequency space transformed H264 (pre-cabac) and wrote better range coder for it.

    Yes/No?

    Still pretty impressive, but for the love of god, please use succinct _technical_ descriptions. - https://raw.githubusercontent.... - is god awful, as it just describes general operation of a range coder.

    Beating jpeg entropy coding is not that impressive, as thats just huffman which really awful. CABAC is better, but still decade behind behind top of the line research (I suppose you're encode.ru regulars).

    1. Re:To the devs by danielreiterhorn · · Score: 1

      If you can find something that does better compression for H.264 videos please point me to it so the ideas can be merged--maybe something even better will come out of it!

      I haven't even located a program that undoes entropy coding and redoes it for h.264 videos exactly--our code does that, and it wasn't super trivial--but clearly not rocket science either.

      The idea was to make the algorithm fast by only looking at the decoder state to be able to compress (and decompress) things in a single pass.

      Again if you have a link to a better algorithm I'd love to hear it-- I learned so much by reading through the packjpg source and conversing with its brilliant author who hangs out on encode.ru, but I'm sure there are other valuable insights to be had--I just haven't located them yet. Feedback is useful, but I'd like some constructive feedback with links to real, open, resources.

  22. Dropbox and Cloud??? Hello Zip/bzip/lzjb.... by Anonymous Coward · · Score: 1

    I think the poster mixed up his compression. Saying bit-exact compression is usefull for cloud services is .... DUH.. Though a little late to the playing field. Any on disk compression will be loss-less by definition. otherwise you'd be screwed anytime you zip a file.

    Now if he found a better streaming compression for video that keeps h.264 size but ups the quality.. COOL! But on-disk bit-exact compression is pretty mature now. See ZFS/BTRFS. Or Stacker/Doublspace if your over 35.

  23. Re:But if you had to give a handjob to every armch by Anonymous Coward · · Score: 0

    To clarify: do they actually have to get out of their chairs? If so, it would be pretty quick.

  24. 13% ? 22% ? by Anonymous Coward · · Score: 0

    I may be stupid, but is that to 13% or 22% of the total, or is that 13% or 22% off the total?

  25. Yes, but that's not a solution for existing files by dbIII · · Score: 1

    Many of the "inefficiencies" have already been solved with more recent codecs.

    Dropbox appear to be in the business of storing the existing files of clients and not forcing them to upgrade their hardware or software to support a new standard. That's where a bit of reversable compression on top instead of a complete re-encode makes sense.

    On a personal scale maybe it makes sense for a user to completely re-encode all of their video files to a new standard but I don't think many people will be doing that. On an "industrial" scale with many users it makes even less sense so the reversable hack that saves space seems a better fit than a full unasked for re-encode of clients video files.