Slashdot Mirror


How Stanford Engineers Created a Fictitious Compression For HBO

Tekla Perry (3034735) writes Professor Tsachy Weissman and Ph.D student Vinith Misra came up with (almost) believable compression algorithms for HBO's Silicon Valley. Some constraints -- they had to seem plausible, look good when illustrated on a whiteboard, and work with the punchline, "middle out." Next season the engineers may encourage producers to tackle the challenge of local decodability.

18 of 90 comments (clear)

  1. Stanford as a buzzword factory by hax4bux · · Score: 3, Funny

    Now they can admit it.

  2. Meh by ShakaUVM · · Score: 3, Insightful

    Anyone who knows anything about compression knows that universal lossless compression is impossible to always do, because if such an algorithm existed, you could run it repeatedly on a data source until you were down to a single bit. And uncompresing a single bit that could be literally anything is problematic.

    I sort of wish they'd picked some other sort of woo.

    1. Re:Meh by Carewolf · · Score: 3

      I don't think they mean univeral that way, I believe they mean universal lossless compression as gzip, bzip2 or 7zip. They will work on almost any data, but not all kinds of data. The idea here is that the show has a new way to do this that is supposed to be even better. The method they use remind me though of FLAC.

    2. Re:Meh by hankwang · · Score: 4, Funny

      "you could run it repeatedly on a data source until you were down to a single bit."

      That's why you need two distinct compression algorithms. Sometimes one will work better, sometimes the other. While repeatedly compressing, don't forget to write down in which sequence you need to apply the decompression. I believe this can compress abitrary data down to zero bits, if you are patient enough.

    3. Re:Meh by Anonymous Coward · · Score: 2, Funny

      You do the same thing you did the first time: two algorithms, write down the order. ;)

    4. Re:Meh by AYeomans · · Score: 3, Funny

      Metadata? You just let the NSA store it for you.

      --
      Andrew Yeomans
    5. Re:Meh by StripedCow · · Score: 2

      Meh. You only need the basic rules of physics to compute the universe from scratch, including all possible movies.

      --
      If Pandora's box is destined to be opened, *I* want to be the one to open it.
    6. Re:Meh by WoOS · · Score: 2

      > if such an algorithm existed, you could run it repeatedly on a data source until you were down to a single bit.

      Ah, but you are not describing universal lossless compression but universal lossless compression with a guaranteed compression ratio of better than 1:1.
      That indeed isn't possible but I can't see it claimed in TFA.

    7. Re: Meh by toxcspdrmn · · Score: 2

      You mean 1 (you forgot the parity bit).

      --
      "E pur si muove!" - attributed to Galileo Galilei, 1564-1642
    8. Re:Meh by serviscope_minor · · Score: 3, Funny

      While repeatedly compressing, don't forget to write down in which sequence you need to apply the decompression.

      Pretty much. I've found that I can do this. Essentially for N bits, I've got a large family (2^N) of compression algorithms. I pick the best one and write down it's number. The resulting data is 0 bits long, but there's a little metadata to store.

      --
      SJW n. One who posts facts.
    9. Re:Meh by TeknoHog · · Score: 3, Interesting

      Or if you're into math, you invoke the pigeonhole principle.

      --
      Escher was the first MC and Giger invented the HR department.
    10. Re:Meh by pla · · Score: 5, Insightful

      Or if you're into math, you invoke the pigeonhole principle

      Though technically true, in fairness we need to differentiate between meaningful data and noise. Yes, a universal compressor doesn't care. Human users of compression algorithms, for the most part, do care.

      So the limit of useful compression (Shannon aside) comes down to how well we can model the data. As a simple example, I can give you two 64 bit floats as parameters to a quadratic iterator, and you can fill your latest 6TB HDD with conventionally "incompressible" data as the output. If, however, you know the right model, you can recreate that data with a mere 16 bytes of input. Now extend that to more complex functions - Our entire understanding of "random" means nothing more than "more complex than we know how to model". As another example, the delay between decays in a sample of radioactive material - We currently consider that "random", but someday may discover that god doesn't play dice with the universe, and an entirely deterministic process underlies every blip on the ol' Geiger counter.


      So while I agree with you technically, for the purposes of a TV show? Lighten up. :)

    11. Re:Meh by Anonymous Coward · · Score: 2, Interesting

      I haven't seen the show, but I have experience in dinking around with lossless compression, and suffice it to say, the problem would be solved if time travel existed, because then we could compress data that doesn't yet exist.

      Basically to do lossless, you have to compress data linearly. You can't compress the chunk of data it will get to in 10 seconds now on another core, because the precursor symbols do not yet exist. Now there is one way around this, but it's even more horribly inefficient, and that is by compressing from each end (or in the case of HBO's codec... the middle) so instead of a single "dictionary" for the compressor to operate from, it uses two. At the end could then throw away the duplicate dictionary entries on a second pass. That's why it's inefficient. In order to split compression accross cores, you have to do some inefficient compression by duplicating efforts.

      If I have a 16 core processor and I want to compress it using all 16 cores, I'd be doing it putting each scanline of an image on a separate core, so effectively every 16 pixels, wrap around to another core. At the end of the the compression scheme, a second pass is run over the dictionary to remove duplicates. In video, this won't really exist, because video doesn't have a lot of duplication unless it's animation (eg Anime, Video games) This is why lossy compression is always used for video/still captures from CMOS/CCD cameras. That data has data that can be lost because there is inherent noise in the capture process.

      That second pass is still going to be stuck to one core.

      The ideal way to solve lossless compression problems is by not trying to make the algorithm more efficient, but by intentionally trading off efficiency for speed. So to go back to the previous example. Instead of having 1 progressive stream, you instead have 16 progressive streams divided horizontally. This will work fine for compression, but decompression will have a synchronization problem. You may have seen this when you watch h.264 videos and some parts of the I frames aren't rendered, resulting in "colorful tiles" in the missing spaces if your CPU is too slow. This is because the 16 parts of the frame won't all decompress at the same speed, because they will have different complexity. So the end result is you end up having to buffer enough for two I frames, so that you can still seek the video. At UHD resolution, this means 33554432 bytes per frame. So if you have a 120fps video, you need 4GB of RAM just to buffer 1 second. Our current technology can't even read data off a SSD this fast. The fastest you can get is 1500MB/sec and even then it costs you 4000$. Hence why we use lossy compression, so the disk can keep up.

    12. Re:Meh by AmiMoJo · · Score: 2

      16 bytes plus the model.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
  3. Re:Why call them "engineers"? by Horshu · · Score: 3, Interesting

    I wasn't even aware that programmers in Cali could even legally call themselves "engineers". I worked for a company out of college HQed in California, and I was told coming in that we used the term "Programmer/Analyst" because California required "engineers" to have a true engineering degree (with the requisite certifications et al)

  4. Recompress the coefficients by tepples · · Score: 2

    JPEG is a lossy compression and it's impossible for an archiving utility using a lossless compression to best that.

    Of course it's possible. JPEG encoding has three steps: cosine transform of each block (DCT), then quantization (where the loss happens), then coding. In JPEG, the coding involves a zig-zag order and a Huffman/RLE structure, and this isn't necessarily optimal. A lossless compressor specially tuned for JPEG files could decode the quantized coefficients and losslessly encode them in a more efficient manner, producing a file that saves a few percent compared to the equivalent JPEG bitstream. Then on decompression, it would decode these coefficients and reencode them back into a JPEG file.

  5. JPEG recompression by Sits · · Score: 2

    ACT has a JPEG recompression test which clearly shows a bunch of compressors making a JPEG smaller. Even better - there's a great paper by the author of packJPG talking about how to compress a JPEG losslessly using the technique teppples described...

  6. Shannon's source coding theorem by eyebits · · Score: 2

    http://en.wikipedia.org/wiki/S... "In information theory, Shannon's source coding theorem (or noiseless coding theorem) establishes the limits to possible data compression, and the operational meaning of the Shannon entropy."