How Stanford Engineers Created a Fictitious Compression For HBO

← Back to Stories (view on slashdot.org)

How Stanford Engineers Created a Fictitious Compression For HBO

Posted by timothy on Friday July 25, 2014 @07:50PM from the buzzword-bingo dept.

Tekla Perry (3034735) writes Professor Tsachy Weissman and Ph.D student Vinith Misra came up with (almost) believable compression algorithms for HBO's Silicon Valley. Some constraints -- they had to seem plausible, look good when illustrated on a whiteboard, and work with the punchline, "middle out." Next season the engineers may encourage producers to tackle the challenge of local decodability.

90 comments

Min score:

Reason:

Sort:

Stanford as a buzzword factory by hax4bux · 2014-07-25 20:04 · Score: 3, Funny

Now they can admit it.
Meh by ShakaUVM · 2014-07-25 20:08 · Score: 3, Insightful

Anyone who knows anything about compression knows that universal lossless compression is impossible to always do, because if such an algorithm existed, you could run it repeatedly on a data source until you were down to a single bit. And uncompresing a single bit that could be literally anything is problematic.
I sort of wish they'd picked some other sort of woo.
1. Re:Meh by Carewolf · 2014-07-25 20:35 · Score: 3
  
  I don't think they mean univeral that way, I believe they mean universal lossless compression as gzip, bzip2 or 7zip. They will work on almost any data, but not all kinds of data. The idea here is that the show has a new way to do this that is supposed to be even better. The method they use remind me though of FLAC.
2. Re:Meh by hankwang · 2014-07-25 20:56 · Score: 4, Funny
  
  "you could run it repeatedly on a data source until you were down to a single bit."
  That's why you need two distinct compression algorithms. Sometimes one will work better, sometimes the other. While repeatedly compressing, don't forget to write down in which sequence you need to apply the decompression. I believe this can compress abitrary data down to zero bits, if you are patient enough.
  
  --
  Avantslash: low-bandwidth mobile slashdot.
3. Re:Meh by Anonymous Coward · 2014-07-25 21:10 · Score: 0
  
  Old news. That's the technology behind /dev/null.
4. Re:Meh by Anonymous Coward · 2014-07-25 21:11 · Score: 0
  
  Yeah, but isn't the order to decompression in itself metadata? How do you compress that down to nothing?
5. Re:Meh by Anonymous Coward · 2014-07-25 21:17 · Score: 2, Funny
  
  You do the same thing you did the first time: two algorithms, write down the order. ;)
6. Re:Meh by Anonymous Coward · 2014-07-25 21:30 · Score: 0
  
  Okay, calm down tiny, there's no need to bash people over your inadequacy issues.
7. Re:Meh by AYeomans · 2014-07-25 21:44 · Score: 3, Funny
  
  Metadata? You just let the NSA store it for you.
  
  --
  Andrew Yeomans
8. Re:Meh by StripedCow · 2014-07-25 21:57 · Score: 2
  
  Meh. You only need the basic rules of physics to compute the universe from scratch, including all possible movies.
  
  --
  If Pandora's box is destined to be opened, *I* want to be the one to open it.
9. Re:Meh by Anonymous Coward · 2014-07-25 22:01 · Score: 0, Troll
  
  Its called the quantum wave compression algorithm , it be any thing until you decompress it into what you what.
  In real life its a bit like the American dream over-hyped collapsing and full of shit when realized.
10. Re:Meh by Anonymous Coward · 2014-07-25 22:32 · Score: 0
  
  Meh. You only need the basic rules of physics to compute the universe from scratch, including all possible movies.
  No, that is not possible, the laws of quantum mechanics are probabilistic.
11. Re:Meh by Anonymous Coward · 2014-07-25 23:00 · Score: 0
  
  0
  (run universal lossless decompression algorithm to get full answer)
12. Re:Meh by WoOS · 2014-07-25 23:16 · Score: 2
  
  > if such an algorithm existed, you could run it repeatedly on a data source until you were down to a single bit.
  Ah, but you are not describing universal lossless compression but universal lossless compression with a guaranteed compression ratio of better than 1:1.
  That indeed isn't possible but I can't see it claimed in TFA.
13. Re: Meh by toxcspdrmn · 2014-07-25 23:23 · Score: 2
  
  You mean 1 (you forgot the parity bit).
  
  --
  "E pur si muove!" - attributed to Galileo Galilei, 1564-1642
14. Re:Meh by serviscope_minor · 2014-07-26 00:29 · Score: 3, Funny
  
  While repeatedly compressing, don't forget to write down in which sequence you need to apply the decompression.
  Pretty much. I've found that I can do this. Essentially for N bits, I've got a large family (2^N) of compression algorithms. I pick the best one and write down it's number. The resulting data is 0 bits long, but there's a little metadata to store.
  
  --
  SJW n. One who posts facts.
15. Re:Meh by TeknoHog · 2014-07-26 00:49 · Score: 3, Interesting
  
  Or if you're into math, you invoke the pigeonhole principle.
  
  --
  Escher was the first MC and Giger invented the HR department.
16. Re:Meh by pla · 2014-07-26 01:23 · Score: 5, Insightful
  
  Or if you're into math, you invoke the pigeonhole principle
  
  Though technically true, in fairness we need to differentiate between meaningful data and noise. Yes, a universal compressor doesn't care. Human users of compression algorithms, for the most part, do care.
  
  So the limit of useful compression (Shannon aside) comes down to how well we can model the data. As a simple example, I can give you two 64 bit floats as parameters to a quadratic iterator, and you can fill your latest 6TB HDD with conventionally "incompressible" data as the output. If, however, you know the right model, you can recreate that data with a mere 16 bytes of input. Now extend that to more complex functions - Our entire understanding of "random" means nothing more than "more complex than we know how to model". As another example, the delay between decays in a sample of radioactive material - We currently consider that "random", but someday may discover that god doesn't play dice with the universe, and an entirely deterministic process underlies every blip on the ol' Geiger counter.
  
  So while I agree with you technically, for the purposes of a TV show? Lighten up. :)
17. Re:Meh by tepples · 2014-07-26 01:49 · Score: 1
  
  I'm not sure you understand. Prepending the order to the compressed data would still increase the length of some files.
  (In before whoosh.)
18. Re:Meh by MightyYar · 2014-07-26 01:52 · Score: 1
  
  are probabilistic
  I'm sorry, but that can't be right. If we relied on probability, even in an infinite universe we'd never see the likes of Mariah Carey's "Glitter".
  
  --
  W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
19. Re: Meh by Anonymous Coward · 2014-07-26 02:07 · Score: 0
  
  Compressing everything down to a single bit, then uncompressing, is what started the universe... so it's a great compression algorithm, but decompressing is hard on system resources.
20. Re:Meh by TeknoHog · 2014-07-26 02:09 · Score: 1
  
  Or if you're into math, you invoke the pigeonhole principle So the limit of useful compression (Shannon aside) comes down to how well we can model the data. As a simple example, I can give you two 64 bit floats as parameters to a quadratic iterator, and you can fill your latest 6TB HDD with conventionally "incompressible" data as the output. If, however, you know the right model, you can recreate that data with a mere 16 bytes of input. Now extend that to more complex functions - Our entire understanding of "random" means nothing more than "more complex than we know how to model". As another example, the delay between decays in a sample of radioactive material - We currently consider that "random", but someday may discover that god doesn't play dice with the universe, and an entirely deterministic process underlies every blip on the ol' Geiger counter.
  IOW, Kolmogorov complexity. For example, tracker and MIDI files are a great way to "compress" music, as they contain the actual notation/composition rather than the resulting sound. Of course, that doesn't account for all the redundancy in instruments/samples.
  
  So while I agree with you technically, for the purposes of a TV show? Lighten up. :)
  IMHO, half the fun of such TV shows is exactly in discussions like this -- what it got right, where it went wrong, how could we use the ideas in some real-world innovation. I find that deeper understanding only makes me enjoy things more, not less, and I enjoy "lightening up" my brain cells.
  
  --
  Escher was the first MC and Giger invented the HR department.
21. Re:Meh by Anonymous Coward · 2014-07-26 02:10 · Score: 0
  
  What kind of claim is that? Here's a method that doesn't increase the length of some file:
  Algorithm 0: compress by discarding the first bit, uncompress by guessing "0" for the first bit.
  Algorithm 1: compress by discarding the first bit, uncompress by guessing "1" for the first bit.
  After n passes of this algorithm, the compressed data has size "length - n", and the metadata has size n. Prepending the order to the compressed data gives n + (length - n) = length. The length didn't increase.
22. Re:Meh by tepples · 2014-07-26 02:18 · Score: 1
  
  The algorithm you describe is not compression, as there does not exist an input for which choice-of-algorithm metadata plus compressed data is smaller than the input data.
23. Re:Meh by Nemyst · 2014-07-26 02:44 · Score: 1
  
  If we "someday" discover that radioactive decay is not inherently random and unpredictable to an atomic level, it'd mean we suddenly contradict a hundred years of scientific research, models and theories. While not impossible, your post implies that there's a model and we just don't know it; the truth is that it's extremely unlikely to be the case.
24. Re:Meh by Anonymous Coward · 2014-07-26 02:54 · Score: 2, Interesting
  
  I haven't seen the show, but I have experience in dinking around with lossless compression, and suffice it to say, the problem would be solved if time travel existed, because then we could compress data that doesn't yet exist.
  Basically to do lossless, you have to compress data linearly. You can't compress the chunk of data it will get to in 10 seconds now on another core, because the precursor symbols do not yet exist. Now there is one way around this, but it's even more horribly inefficient, and that is by compressing from each end (or in the case of HBO's codec... the middle) so instead of a single "dictionary" for the compressor to operate from, it uses two. At the end could then throw away the duplicate dictionary entries on a second pass. That's why it's inefficient. In order to split compression accross cores, you have to do some inefficient compression by duplicating efforts.
  If I have a 16 core processor and I want to compress it using all 16 cores, I'd be doing it putting each scanline of an image on a separate core, so effectively every 16 pixels, wrap around to another core. At the end of the the compression scheme, a second pass is run over the dictionary to remove duplicates. In video, this won't really exist, because video doesn't have a lot of duplication unless it's animation (eg Anime, Video games) This is why lossy compression is always used for video/still captures from CMOS/CCD cameras. That data has data that can be lost because there is inherent noise in the capture process.
  That second pass is still going to be stuck to one core.
  The ideal way to solve lossless compression problems is by not trying to make the algorithm more efficient, but by intentionally trading off efficiency for speed. So to go back to the previous example. Instead of having 1 progressive stream, you instead have 16 progressive streams divided horizontally. This will work fine for compression, but decompression will have a synchronization problem. You may have seen this when you watch h.264 videos and some parts of the I frames aren't rendered, resulting in "colorful tiles" in the missing spaces if your CPU is too slow. This is because the 16 parts of the frame won't all decompress at the same speed, because they will have different complexity. So the end result is you end up having to buffer enough for two I frames, so that you can still seek the video. At UHD resolution, this means 33554432 bytes per frame. So if you have a 120fps video, you need 4GB of RAM just to buffer 1 second. Our current technology can't even read data off a SSD this fast. The fastest you can get is 1500MB/sec and even then it costs you 4000$. Hence why we use lossy compression, so the disk can keep up.
25. Re:Meh by Anonymous Coward · 2014-07-26 03:16 · Score: 0
  
  None of you guys get the joke.
  It's a JOKE. Funny; ha ha.
  I will explain: You use a single bit to record which of the two algorithms you used, a 0 for the first, and a 1 for the second...
26. Re:Meh by iggymanz · 2014-07-26 03:24 · Score: 1
  
  physics is a manmade endeavor, the "laws of physics" are inventions of man. Reality may work another way, not according to any model that man's mind could devise.
27. Re:Meh by AmiMoJo · 2014-07-26 04:03 · Score: 2
  
  16 bytes plus the model.
  
  --
  const int one = 65536; (Silvermoon, Texture.cs)
  SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
28. Re:Meh by JeffAtl · 2014-07-26 04:23 · Score: 1
  
  It may not have been claimed in the article, but it was claimed on the show itself.
29. Re:Meh by Anonymous Coward · 2014-07-26 05:33 · Score: 0
  
  Uhh, WRONG!!!!
  It can only compress down to (a value arbitrarily close to) the entropy of the source, no matter how many times you apply a compression algorithm, and no matter which algirithm you use. (Shannon proved this in 1948).
30. Re:Meh by Josh+Coalson · 2014-07-26 05:47 · Score: 1
  
  The method they use remind me though of FLAC.
  FLAC is actually in the first episode for a few seconds; it was the baseline they were comparing against.
  
  --
  FLAC - Free Lossless Audio Codec
31. Re: Meh by Anonymous Coward · 2014-07-26 05:51 · Score: 0
  
  They never talk about compression ratios in the show. They always talk about a fictitious "Weissman score" to judge the compression algorithm.
32. Re:Meh by UnknownSoldier · 2014-07-26 06:17 · Score: 1
  
  Exactly. The first step in ANY compression algorithm is:
  Know Thy Data
  Your mention of FLAC is a perfect example.
33. Re:Meh by flargleblarg · 2014-07-26 06:30 · Score: 1
  
  Anyone who knows anything about compression knows that universal lossless compression is impossible to always do, because if such an algorithm existed, you could run it repeatedly on a data source until you were down to a single bit. And uncompresing a single bit that could be literally anything is problematic.
  Actually, anyone who knows anything about compression knows that universal lossless compression is always possible to do. Sometimes you get good compression, sometimes you get shitty compression, and sometimes you get 0% compression. Note that 0% compression is still compression — it's just a degenerate case.
  You are right, of course, that you can't repeatedly compress something down to a single bit — there is always a point of diminishing returns. But just because you can't compress something down to a single bit doesn't mean that no universal lossless compression algorithm exists. For example, f(x) = x is a universal lossless compression algorithm. It's a shitty algorithm, but it is lossless and it does compress.
34. Re:Meh by Anonymous Coward · 2014-07-26 07:54 · Score: 0
  
  Why stop there - plus the operating system, all associated firmware, application, keyboard, monitor, mouse, cables, power, etc.
35. Re:Meh by Anonymous Coward · 2014-07-26 08:57 · Score: 0
  
  Specifically, we have ruled out local hidden variables. If there is something non-random going on, it's something universal, with state distributed over at least visible space. That means your "model" needs an entire universe as input. Do you have the entire universe? No? Then it doesn't matter whether it's actually random or entirely predictable given the entire universe.
36. Re:Meh by Junior+J.+Junior+III · 2014-07-26 15:43 · Score: 1
  
  1
  Apparently the /. crapfilter doesn't like my compressed comment.
  
  --
  You see? You see? Your stupid minds! Stupid! Stupid!
37. Re:Meh by Anonymous Coward · 2014-07-27 09:40 · Score: 0
  
  ... because if such an algorithm existed, you could run it repeatedly on a data source until you were down to a single bit.
  Nothing should be impossible with the appropriate amount of processor power. Over a decade ago I designed a compression routine which I thought should be able to compress almost any file to around a kilobyte in size. It worked on paper as a theory, but I (still) lack the mathematics skills to even make a prediction of such a routine working reliably in real life. The fact that it required a ridiculous amount of processor power (at least back then) as well as reading funny stories on the internet about novices coming up with "new" unbreakable encryption systems that professionals found trivial to crack made it impractical to pursue. But that does not mean that such an algorithm is not out there waiting to be discovered.
38. Re:Meh by Anonymous Coward · 2014-07-27 12:19 · Score: 0
  
  16 bytes plus the model.
  See there's the problem. They're geeks, they'll never be able to get a model!
39. Re:Meh by Anonymous Coward · 2014-07-27 19:53 · Score: 0
  
  lets just call it a compression issue. It sounds like he holds it too tight.
40. Re:Meh by brantondaveperson · 2014-07-28 12:30 · Score: 1
  
  Troll? Joke? Fundamental mis-understanding regarding the nature of information?
  Insufficient information to tell.
41. Re:Meh by werepants · 2014-08-01 08:22 · Score: 1
  
  Your definition of "random" and your understanding of quantum mechanics isn't quite right, although the rest of your post was quite interesting. A thought experiment known as EPR (Einstein-Podolsky-Rosen), followed up by the Stern-Gerlach apparatus and Bell's theorem, actually proved that certain attributes of particles in no way persist between measurements, and that they "choose" outcomes based on a mechanism that is not only unknown, but proven by experiment to be non-describable by deterministic theories.
  
  I was going to spend a while typing up an explanation, but this is a very thorough discussion of all the possibilities that is much better than I could hope to do myself: http://www.scienceclarified.co...
Torrent by Anonymous Coward · 2014-07-25 21:17 · Score: 0, Informative

Here's the torrent link to Season 1 for anyone who hasn't yet checked out this show.
Why call them "engineers"? by Anonymous Coward · 2014-07-25 21:44 · Score: 0

To me it appears they are professors, programmers and comp sci people in general.
1. Re:Why call them "engineers"? by Horshu · 2014-07-25 22:03 · Score: 3, Interesting
  
  I wasn't even aware that programmers in Cali could even legally call themselves "engineers". I worked for a company out of college HQed in California, and I was told coming in that we used the term "Programmer/Analyst" because California required "engineers" to have a true engineering degree (with the requisite certifications et al)
2. Re:Why call them "engineers"? by Quirkz · 2014-07-28 02:52 · Score: 1
  
  I don't think that's true. Fully 50% of Silicon Valley job postings are for "XXX Engineer" and most of those are programming positions.
  
  --
  The Quirkz Handbook of Self-Improvement for People Who Are Already Pretty Okay
The cast by Anonymous Coward · 2014-07-25 22:17 · Score: 0

I dunno, mebbe it's me, but I find the Hollywood casting of hackers and geeks somewhat off
Have been in the Valley for decades, having mixed with at least 2.5 generations of geeks / hackers I don't find Hollywood's portrayal of geeks believable
1. Re:The cast by thrillseeker · 2014-07-26 01:28 · Score: 1
  
  It doesn't have to be totally accurate - only humorous to a sufficiently large audience.
2. Re:The cast by JockTroll · 2014-07-26 02:45 · Score: 0, Insightful
  
  Of course it's not believable. Geeks and nerds are usually portrayed by talented, good-looking actors who play their characters as slightly eccentric enthusiasts who happen to be socially awkward in an endearing way. Nobody pays money or wastes time to see ugly, repulsive abhumans behaving in socially unacceptable ways, stalking women, being rude and obnoxious and talking about crap nobody cares about. Perhaps the most accurate portrayal of a nerd in movies was Scott Weidemeyer from "Zero Charisma", and even there it was radically toned down. Face it, the only way to portray that subculture in media is to simply NOT portraying it at all and showing a falsified and viewer-friendly version. Your typical nerd is not the cast from "The Big Bang Theory". Your typical nerd is Elliot Rodger: a lifelong loser obsessed with niche interests and hollow pursuits, good at nothing, with at most average intelligence and yet cultivating delusions of superiority and seething with rage towards the Real, Beautiful People. Nerds are shunned and derided for good reasons.
  
  --
  Geeks are so full of shit that "beating the crap out of them" takes a whole new meaning.
3. Re:The cast by Anonymous Coward · 2014-07-26 03:29 · Score: 0
  
  Glad to see you worked out all your anger and self-hatred issues!
4. Re:The cast by Anonymous Coward · 2014-07-26 04:24 · Score: 0
  
  The Supreme Gentleman was not a nerd. The guy was a loser, sure, but not all losers are nerds.
5. Re: The cast by Anonymous Coward · 2014-07-28 06:32 · Score: 1
  
  I concur, the directors don't extract the essence or haven't coached the actors to make it believeable enough to make it funny. Like I once worked with an RF guru from Quebec who instead of putting serial numbers on each aluminum casting he designed/tested for a CATV WAN to the curb with T1 in be 80's put names of a different girlfriend in his life for each Pole mounted unit.
  Another who knew how to tweak another 0.1 less loss in a microwave 3.4dB splitter and also how to transmit a burst that could burn out the front end of any cop radar in range of ticketing him, who was also a potato farmer and expert in Celtic and Scottish history in Canada.
  Another who would never violate his religious rules of working overtime past sunset or weekends yet could design a fully automated self test on a motherboard in a few days and have it work, 1st time, but was studying to be a Rabbi.
  Another software manager who was so humble at managing a 100 tasks with 20 developers and then at lunch would make me a graphical analysis data entry program before excel, Multiplan and Visicalc were even invented, but some of staff woud play tricks on his boss, who forced him to accept super aggresive schedules, like drilling a hole in his often closed door. .
  Or a couple guys were forced to ship product before it was ready, so the Texan boss coud send an invoice early and get paid immediately, then redirect the shipment back to finish it and then sometimes issue rubber paycheques, so the guys would play golf , putting in his office until he paid them.
Regarding compression by Anonymous Coward · 2014-07-25 22:28 · Score: 1

Some 20 years ago when there were some choices of compression software I remember I ran some tests - and found that a utility called "Winace" is the only compression utility that produces a smaller compressed file than the original if compressing .avi, .mov, and .jpg files
All the rest produced larger "compressed" files than the original
1. Re: Regarding compression by Anonymous Coward · 2014-07-26 00:15 · Score: 0
  
  Bullshit. JPEG is a lossy compression and it's impossible for an archiving utility using a lossless compression to best that. Now if you're claiming to have a lossy archiving solution, you need to get your head checked.
2. Re: Regarding compression by Anonymous Coward · 2014-07-26 00:33 · Score: 0
  
  You don't know what you're talking about. There is no reason that the output of a lossy compression algorithm cannot be compressed any further. It may be the case, but is not, as you suggest, necessarily so.
3. Re:Regarding compression by Anonymous Coward · 2014-07-26 00:55 · Score: 0
  
  .avi, .mov and .jpg files all have some header data that is compressible, and if the algorithm doesn't make the compressed parts worse (perhaps by recognizing that and not re-"compressing" those parts at all), there are some small gains to be had. My guess is that most of the programs you tried didn't recognize the parts where they are making things worse and just re-compressed everything, averaging out as slightly worse.
4. Re: Regarding compression by Anonymous Coward · 2014-07-28 08:52 · Score: 0
  
  So... assuming he has to pleasure every guy in the audience to make it to the next round, what all does he have to consider?
  say 1000 guys for a 15 minute presentation, have to adjust for height differences, ergonomics, angles etc... (goes on as the guys discuss the details of how to maximize for time as the other guy runs off and re-writes the whole compression algorithm overnight, getting to the astonishingly high rating that won the deal...)
  Funny episode, but sadly no more HBO so I won't be watching the show anymore...
You want believability? by stonedead · 2014-07-25 22:29 · Score: 1

Ask CSI. GUI interface using visual basic to track the killer --> http://www.youtube.com/watch?v... Seriously, the average joe that watches TV doesn't care about what algorithm is used, let alone know what an algorithm is.
1. Re:You want believability? by jones_supa · 2014-07-25 22:45 · Score: 1
  
  I disagree. We should not be dumbing down the GUIs just because some Average Joe does not get extra enjoyment from seeing a realistic one. I greatly appreciated the whiteboards showing some believable discrete cosine mathematics in Silicon Valley.
2. Re:You want believability? by Anonymous Coward · 2014-07-25 23:08 · Score: 0
  
  Exactly. To Average Joe "nonsensical technobabble" and "actually correct terminology used properly" are indistinguishable and functionally equivalent. To a slightly more informed viewer, the former is infinitely more grating than the latter, and worse, demonstrated that the creators are too lazy to spend 5 minutes on wikipedia, and that does not reflect well on the rest of the book/show/movie.
  In exchange for opening a new tab and spending a minute or two looking up the subject matter on google, you get something that results in people talking about "hey, show X actually did the research!". Don't bother, and you get to be a laughing stock. Guess which results in a cult following (the guys who will spend the most on your franchise)?
3. Re:You want believability? by Anonymous Coward · 2014-07-26 02:07 · Score: 0
  
  "Zoom. Enhance. Rotate."
  "Um...we can't, that's not how security tapes work."
  "But that's how they do it on CSI?"
4. Re:You want believability? by Anonymous Coward · 2014-07-26 02:49 · Score: 0
  
  CSI also has their Crime Scene professional do things like look for evidence with tiny flashlights trampling over real evidence which does not happen in the real world. The producers lazily put together a show based on what they believed their writers could get away with and they struck gold. Those producers annoy me like a yak in heat.
5. Re:You want believability? by Anonymous Coward · 2014-07-26 10:47 · Score: 0
  
  The one reply I like was the person who told their boss, "Give me the same budget as the CSI Show and I will give you the same results" and truthful it is bad that people want the impossible and near impossible but worse they want it for free.
Universal? by Anonymous Coward · 2014-07-25 23:06 · Score: 0

I watched the show twice and what still isn't clear to me is if the algorithm is universal for all data files or just *media* files (audio, video, pictures, 3D). If it is a universal algorithm then it shouldn't matter if the file is a movie or an application binary executable. But in the show he fretted about being given a 3D movie file. So it is just for media files?
In any case to write "almost believable" is a stretch. I knew from episode one that the algorithm is "totally unbelievable." That is fine because I understand that I have to suspend disbelief. The show was awesome and had some true moments of the trials and tribulations of life in software development. Mike Judge does not disappoint. It makes me wonder at the potential if he and Scott Adams collaborated on technology satire.
1. Re:Universal? by barlevg · 2014-07-26 00:25 · Score: 1
  
  Presumably there's a general compression algorithm that works on all data types, but Richard wrote specific optimizations for certain media types (the original goal of the project was to process music, so he probably started there).
Future troubles by WoOS · 2014-07-25 23:29 · Score: 1

Now let's just hope that no aliens listening in to our broadcasting and no far future humans actually believe this and try to recreate the "groundbreaking compression algorithm" the whiz humans of the digital age came up with.
1. Re:Future troubles by Anonymous Coward · 2014-07-26 00:00 · Score: 0
  
  Assuming they could recognize it as a broadcast at all... the thing about compression is that it increases entropy, making the resulting signal indistinguishable from background noise.
2. Re:Future troubles by MightyYar · 2014-07-26 01:56 · Score: 1
  
  Aren't you thinking of encryption?
  
  --
  W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
3. Re:Future troubles by Anonymous Coward · 2014-07-26 08:28 · Score: 0
  
  No
Correlation != Causation by Anonymous Coward · 2014-07-25 23:38 · Score: 0

That's another good insight for the wrong article.
Next on Slashdot by Anonymous Coward · 2014-07-26 00:27 · Score: 0

How the Unix command line that flashed onscreen in "Tron Legacy" was developed, including rejected alternatives.
Recompress the coefficients by tepples · 2014-07-26 01:47 · Score: 2

JPEG is a lossy compression and it's impossible for an archiving utility using a lossless compression to best that.
Of course it's possible. JPEG encoding has three steps: cosine transform of each block (DCT), then quantization (where the loss happens), then coding. In JPEG, the coding involves a zig-zag order and a Huffman/RLE structure, and this isn't necessarily optimal. A lossless compressor specially tuned for JPEG files could decode the quantized coefficients and losslessly encode them in a more efficient manner, producing a file that saves a few percent compared to the equivalent JPEG bitstream. Then on decompression, it would decode these coefficients and reencode them back into a JPEG file.
1. Re:Recompress the coefficients by Fnord666 · 2014-07-26 12:04 · Score: 1
  
  Of course it's possible. JPEG encoding has three steps: cosine transform of each block (DCT), then quantization (where the loss happens), then coding. In JPEG, the coding involves a zig-zag order and a Huffman/RLE structure, and this isn't necessarily optimal. A lossless compressor specially tuned for JPEG files could decode the quantized coefficients and losslessly encode them in a more efficient manner, producing a file that saves a few percent compared to the equivalent JPEG bitstream. Then on decompression, it would decode these coefficients and reencode them back into a JPEG file.
  I believe what they meant was that you would not be able to apply a lossless algorithm to the original data stream and achieve greater compression than applying a lossy algorithm. Your composite algorithm is just a more efficient lossy algorithm.
  If we look at the original statement from an information theoretic point of view, the GP's statement should be easily understood. With a lossless algorithm, you have to encode all of the original information and restore it. Assuming an optimal encoding, it will still take a minimum number of bits to fully realize all of the original data on decompression. With a lossy encoding scheme, I can reduce the number of bits in the original stream before using the same optimal encoding. With fewer bits to represent it should be obvious that the encoded bitstream will always be smaller.
  
  --
  'The tyrant will always find pretext for his tyranny.' - Aesop's Fables
2. Re:Recompress the coefficients by Anonymous Coward · 2014-07-26 19:39 · Score: 0
  
  A lossless compressor specially tuned for JPEG files could decode the quantized coefficients and losslessly encode them in a more efficient manner, producing a file that saves a few percent compared to the equivalent JPEG bitstream. Then on decompression, it would decode these coefficients and reencode them back into a JPEG file.
  There's actually a compressor that does that. And it does better than 'a few percent'; closer to fifty per cent.
  It's called StuffIt, and it used to be 'the' file compressor on Macs.
  Not really sure what happened but back in 2006 when I had my first touch with Apple devices and I tried using the software, it was just plain icky. Deleted it shortly thereafter.
  Unfortunately, a lot of companies still supply "Mac" files as .sit files and AFAIK that format is proprietary and nothing else can open it...
Pilot signal and FEC by tepples · 2014-07-26 02:02 · Score: 1

Any digital broadcast will include two things that are recognizable as a broadcast. One is a pilot signal, used to communicate the existence of a signal to the receiver. Another is forward error correction, used to reconstruct a signal partially obscured by noise. Analog television included both: the pilot signal was the sync pulses during horizontal and vertical blanking, and the error correction was the presence of a double sideband in the bottom 1 MHz of the video.
Professional Engineer here by Anonymous Coward · 2014-07-26 02:06 · Score: 1

I am a PE in California..
one can do engineering in California without a license under the "industrial exemption", and even be called an engineer on your business card.
What you can't do is hang up a shingle and run your own business as Joe Bloggs, Engineer, unless you have a license.
A lot of companies have an HR policy that to be an "engineer" requires a 4 year degree, otherwise you are a "technician". To a certain extent, this is a "exempt" vs "non-exempt" (overtime) distinction. Engineers are exempt from overtime because they are "professional" (having conducted a course of advanced study), Technicians are not.
Tecently, most places treat "fresh out of college" engineers as non-exempt (e.g. get overtime), because the Dept of Labor is looking at other factors: independent work, etc. and the "bar" for advanced study has moved up. Back when the Fair Labor Standards Act was written, a high school diploma and some on the job work was "advanced study".
1. Re:Professional Engineer here by k6mfw · 2014-07-26 02:37 · Score: 1
  
  What you can't do is hang up a shingle and run your own business as Joe Bloggs, Engineer, unless you have a license.
  true but I've seen non-licensed people who call themelves consulting engineers instead of consultants. Though many of these people use "engineer" but whaddaya gonna do, place them under citizen's arrest? However, civil engineers are very strict on licensing unlike vast number of silicon valley engineers.
  
  Engineers are exempt from overtime because they are "professional" (having conducted a course of advanced study), Technicians are not.
  reminds me of Dilbert cartoon where he is working lot of unpaid overtime where the hardhat maintenance technician either gets to go home at end of day or gets 1.5 or 2 times normal wage.
  
  --
  mfwright@batnet.com
Meh by Anonymous Coward · 2014-07-26 03:06 · Score: 0

"Universal lossless compression" doesn't refer to always reducing the file size. It refers to always being lossless, and approaching asymptotically optimal compression levels for any piece of data. Data that is already compressed will generally be slightly expanded by such algorithms (e.g. gzip), so the paradox doesn't factor in. (as a side note, universal compression is actually a pretty big field (cf wikipedia)).
There is amost lossless compression by Anonymous Coward · 2014-07-26 03:07 · Score: 0

For the video, it's called AVC-Intra. Basically it works like "everything is preserved", but in some extreme scenarios some details are lost.
First of all, some colour is lost because human eye can see levels of gray better, but also pixels are lost when there is a lot of noise on the picture.
Regarding genetics scheme were there is some amount of static information on the decoder and encoder and this partial data can be used to shrink the information transmitted between is already very well known and well investigated by the crypto community but not only.
For example, when there is only text on the screen, the OCR can be performed and the text can be rendered as vector graphics effectively reconstructing original shape picture into much more detail with huge bandwidth savings, like it is done in DejaVu multi-layer compression methods.
This multi-layer compression method would be very good to encode the video, so that if it's news channel, the text is encoded as meta-data while the rest of the picture as standard video. This would be very good as the text could be always rendered as hires even if the background picture is fuzzy.
No by Anonymous Coward · 2014-07-26 05:22 · Score: 0

No you would run the compression and it would not compress. Like when you try to zip a zip file.
Mod parent up - applicable to gzip/deflate by Sits · 2014-07-26 06:28 · Score: 1

Sometimes you don't even need to change the file format - optimization can be applied to already compressed gzip/deflate files (which PNG uses) which can be used to create a more optimal deflate/gzip file. See tools like DeflOpt and defluff (DeflOpt can sometimes make even zopfli encoded files smaller).
JPEG recompression by Sits · 2014-07-26 08:08 · Score: 2

ACT has a JPEG recompression test which clearly shows a bunch of compressors making a JPEG smaller. Even better - there's a great paper by the author of packJPG talking about how to compress a JPEG losslessly using the technique teppples described...
He just described how MPEG works (sort of) by paradigm82 · 2014-07-26 09:30 · Score: 1

He came up with the idea of using lossy compression techniques to compress the original file, then calculating the difference between the approximate reconstruction and the original file and compressing that data; by combining the two pieces, you have a lossless compressor.
This type of approach can work, Misra said, but, in most cases, would not be more efficient than a standard lossless compression algorithm, because coding the error usually isn’t any easier than coding the data.
Well, this is almost how MPEG movie compresion works - and it really does work! MPEG works by partly describing the next picture from the previous using motion vectors. These vectors described how the next picture will look based on movements of small-ish macroblocks on the original picture. Now, if that was the only element of the algorithm movies would look kind of strange (like paper-doll characters being moved about)! So the secret to make it work is to send extra information allowing the client to calculate the final picture. This is done by subracted the predicted picture from the actual next frame. This difference-picture will (if the difference between the two frames was indeed mostly due to movement) be mostly black but it will contain some information due to noise, things that have become visible due to the movement, rotation etc. So MPEG also contains an algorithm that can very efficiently encode this "difference picture". Basically an algorithm that is very efficient for encoding an almost black picture.

So there you have it - MPEG works by applying a very crude, lossy compression (only describing the picture difference in terms of motion vectors) and in addition transmitting the information required to correct the errors and allow reconstruction of the actual frame.

The only part where the comparison breaks down is that MPEG is not lossless. Even when transmitting the difference picture, further compression and reduction takes place (depending on bandwidth) so that the actual frame can not be reconstructed 100%. Still MPEG is evidence that the basic principle of using a crude lossy compressor combined with sending a compensation, works.
1. Re:He just described how MPEG works (sort of) by Anonymous Coward · 2014-07-26 20:21 · Score: 0
  
  Uh, no. What are we talking about specifically? MPEG is a variety of compression algorithms, most of which are not that good. MPEG-4 part 10, known as H.264 or AVC, is good. Did I say good? I meant great. It is way better than the previous MPEG algorithms. That said, the funny thing about the family of MPEG algorithms is that they are not inherently lossy. The core math behind the things is actually lossless. The lossy part comes in two ways (really one, but it can be argued two, which I'll do).
  First, it's the quantization step that introduces the lossy characteristic. They quantize after the core math is all said and done. Why do they quantize? Because it throws away information by reducing the calculated values to homogenous values (for example, a lot of 0s, 3s, 15s, etc.). That's where you get your compression opportunities. The reverse quantization cannot restore to all of the original values. That's your data loss. This is controllable in the encoding process. The bigger the quantization weight, the better the compression. However, the crappier the image quality.
  Second, and it's a stretch, if we are talking about a movie then information is thrown away during sampling of the video. Specifically, color information is thrown away and at an alarming rate. However, it turns out that we are quite bad at distinguishing colors so we don't notice. The most common sampling is to record one 1 Cr/Cb sample per 4 intensity samples! (When I say intensity think of a gray scale image). You'd have to have eagle-eye vision to spot the color discrepancies, which most people do not have. Few people do (they are the ones that can tell the different between high color and true color images).
  Anyways, what am I trying to say? I'm saying that you can encode a video stream in H.264 with a quant parameter that effectively does no quantization (preserves all value). You will have a lossless video sequence, although your bandwidth costs will go up as you don't maximize compression (you still have some compression due to the tricks of the prediction/bi-prediction scheme that you described).
2. Re:He just described how MPEG works (sort of) by paradigm82 · 2014-07-27 01:25 · Score: 1
  
  I don't think any of what you wrote goes against what I wrote. I have the exact same distinction that some components of MPEG movie compression (and my post applies to MPEG-2 also btw) are lossy, that is compensated for by other steps, but other steps yet again makes it lossy.
Shannon's source coding theorem by eyebits · 2014-07-26 10:47 · Score: 2

http://en.wikipedia.org/wiki/S... "In information theory, Shannon's source coding theorem (or noiseless coding theorem) establishes the limits to possible data compression, and the operational meaning of the Shannon entropy."
Powerful compression of movies by Anonymous Coward · 2014-07-28 01:18 · Score: 0

Forget streaming the movie. Just send the UPC code for the DVD...
Trends.... by Anonymous Coward · 2014-07-28 09:08 · Score: 0

Odd because I'm designing something that I believe to be a new type of compression which directly targets seemingly random streams.
Funny how the world keeps following these little endeavors I partake in.....
The last 4 major things I've done where I intentionally go against the grain and try something new has resulted in a sudden shift to that idea a year later in the industry.
Perhaps I'm either onto something or thinking similarly to others who are breaking new ground. Definitely more motivation to continue forwards.
In 2000 I imported an AWD turbo 4cylinder rally car from overseas due to the lack of them in the USA. I wanted to show how a little 2.0 engine could make well over 600 horsepower and with four wheel drive simply destroy a "drag-racing" V8. In 2003 they finally sell a detuned imported model from Japan. In 2001 the Fast and the Furious comes out and blam, everyone has to have one. Now in 2012+ everyone including non-sports car enthusiasts all want turbo 4 engines.
I start using tricks in OpenGL to use vector coordinates as a shorthand for getting the GPU to do math in 2004-2005. Then in 2007 here comes CUDA and suddenly everyone is doing it.
I disliked the P4 which everyone loved and decided to stick with a dual CPU Tualatin-pin modded board running dual P3s and preferring SMP. I thought dual slower processors were more useful than one faster one as a developer. Everyone else said I was wrong. Fast-forward the Core2 architecture abandoned P4 for P3-based design which also moved into dual processors. Also the X2 came out from AMD and suddenly *everyone* had to have 2 processors.
I decided to focus on crazy fuel economy and bought a 3 cylinder Geo Metro since no one produced a 3 banger since 2001. Now we have the recently released Ecoboost 1.0 3 cylinder turbo from Ford, and the new Mitsubishi Mirage with a 1.2 3cylinder non-turbo.
I gotta say I keep trying to think outside the box and do something fresh and different only to find that society then moves in my direction a year or two later no matter how odd my path was.
Now I'm building a compression algorithm based off of the idea of iteration (and I won't say a thing more) and now some show I haven't even seen is now based entirelly around the idea of building compression algorithms (which has been super un-cool to everyone I've tried to talking to about it for about the past 2 years).
What an odd set of coincidences minimally.