Slashdot Mirror


A Fictional Compression Metric Moves Into the Real World

Tekla Perry (3034735) writes The 'Weissman Score' — created for HBO's "Silicon Valley" to add dramatic flair to the show's race to build the best compression algorithm — creates a single score by considering both the compression ratio and the compression speed. While it was created for a TV show, it does really work, and it's quickly migrating into academia. Computer science and engineering students will begin to encounter the Weissman Score in the classroom this fall."

28 of 133 comments (clear)

  1. Bullshit.... by gweihir · · Score: 4, Interesting

    A "combined score" for speed and ratio is useless, as that relation is not linear.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    1. Re:Bullshit.... by i+kan+reed · · Score: 3, Insightful

      Well then write a paper called "an improved single metric for video compression" and submit it to a compsci journal. Anyone can dump opinions on slashdot comments, but if you're right, then you can get it in writing that you're right.

    2. Re:Bullshit.... by gweihir · · Score: 4, Insightful

      There is no possibility for a useful single metric. The question does obviously not apply to the problem. Unfortunately, most journals do not accept negative results, which is one of the reasons for the sad state of affairs in CS. For those that do, the reviewers would call this one very likely "trivially obvious", which it is.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    3. Re:Bullshit.... by nine-times · · Score: 4, Insightful

      Can you explain in more detail?

      I'm not an expert here, but I think the idea is to come up with a single quantifying number that represents the idea that very fast compression has limited utility if it doesn't save much space, and very high compression has limited utility if it takes an extremely long time.

      Like, if you're trying to compress a given file, and one algorithm compressed the file by 0.00001% in 14 seconds, another compressed the file 15% in 20 seconds, and the third compressed it 15.1% in 29 hours, then the middle algorithm is probably going to be the most useful one. So why can't you create some kind of rating system to give you at least a vague quantifiable score of that concept? I understand that it might not be perfect-- different algorithms might score differently on different sized files, different types of files, etc. But then again, computer benchmarks generally don't give you a perfect assessment of performance. It just provides a method for estimating performance.

      But maybe you have something in mind that I'm not seeing.

    4. Re:Bullshit.... by jsepeta · · Score: 2

      That's kind of like the Microsoft Windows Experience Index that is provided by Windows Vista / Windows 7 which gives a score based on CPU, RAM, GPU, and hard disk speed. Not entirely useful but gives beta-level nerds something to talk about at the water cooler.
      http://windows.microsoft.com/e...

      At work my desktop computer is a Pentium E6300 with a 6.3 rating on the CPU and an overall 4.8 rating due to the crappy graphics chipset.
      At work my laptop computer is an i3-2010M with a 6.4 rating on the CPU and an overall 4.6 rating due to the crappy graphics chipset.

      A compression algorithm rated by speed and compression ability would have to weight the speed vs. the compression, right?

      --
      Remember kids, if you're not paying for the service, YOU ARE THE PRODUCT THAT IS BEING SOLD.
    5. Re:Bullshit.... by Darinbob · · Score: 2

      I don't think this metric is really in any computer science journal, it's only in IEEE Spectrum.

    6. Re:Bullshit.... by mrchaotica · · Score: 5, Informative

      Can you explain in more detail?

      If you have a multi-dimensional set of factors of things and you design a metric to collapse them down into a single dimension, what you're really measuring is a combination of the values of the factors and your weighting of them. Since the "correct" weighting is a matter of opinion and everybody's use-case is different, a single-dimension metric isn't very useful.

      This goes for any situation where you're picking the "best" among a set of choices, not just for compression algorithms, by the way.

      Like, if you're trying to compress a given file, and one algorithm compressed the file by 0.00001% in 14 seconds, another compressed the file 15% in 20 seconds, and the third compressed it 15.1% in 29 hours, then the middle algorithm is probably going to be the most useful one.

      User A is trying to stream stuff that has to have latency less than 15 seconds, so for him the first algorithm is the best. User B is trying to shove the entire contents of Wikipedia into a disc to send on a space probe, so for him, the third algorithm is the best.

      You gave a really extreme[ly contrived] example, so in that case you might be able to say that "reasonable" use cases would prefer the middle algorithm. But differences between actual algorithms would not be nearly so extreme.

      --

      "[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz

    7. Re:Bullshit.... by nine-times · · Score: 3, Insightful

      Since the "correct" weighting is a matter of opinion and everybody's use-case is different, a single-dimension metric isn't very useful...[snip] User A is trying to stream stuff that has to have latency less than 15 seconds, so for him the first algorithm is the best.

      And these are very good arguments why such a metric should not be taken as an end-all be-all. Isn't that generally the case with metrics and benchmarks?

      For example, you might use a benchmark to gauge the relative performance between two video cards. I test Card A and it gets 700. I test Card B and it gets a 680. However, in running a specific game that I like, Card B gets slightly faster framerates. Meanwhile, some other guy wants to use the video cards to mine Bitcoin, and maybe these specific benchmarks test entirely the wrong thing, and Card C, which scores 300 on the benchmark, is the best choice. Is the benchmark therefore useless?

      No, not necessarily. if the benchmark is supposed to test general game performance, and generally faster benchmark tests correlate with faster game performance, then it helps shoppers figure out what to buy. If you want to shop based on a specific game or a specific use, then you use a different benchmark.

    8. Re:Bullshit.... by Beck_Neard · · Score: 2

      Uhm, do you really think that something as important as assessing the performance of compression algorithms wouldn't have attracted the attention of thousands (or, more likely, hundreds of thousands) of computer scientists over the years? Open up any academic journal that deals with this stuff even tangentially and you find many examples of different metrics for assessing compression performance. And there's nothing new about this 'score'. Dividing ratio by the logarithm of the compression time is a very widely-used theoretical scoring function; I can find references to it from the 90's. This particular form of that score may be new, but gweihir is right; such a score doesn't give much information and has very little use.

      --
      A fool and his hard drive are soon parted.
    9. Re:Bullshit.... by sg_oneill · · Score: 2

      A "combined score" for speed and ratio is useless, as that relation is not linear.

      Typing at 70 words per minute, slashdot poster declares quantity over time measurements meaningless.

      --
      Excuse the Unicode crap in my posts. That's an apostrophe, and slashdot is busted.
    10. Re:Bullshit.... by nine-times · · Score: 2

      I find it surprising and almost funny how much ire this has drawn from people with some kind of weird "purist" attitude about the whole thing.

      It doesn't seem "generally useless" to me, but it would be more appropriate to say that it's "useful only in general cases". I would say that in most circumstances, I'd want compression algorithms that balance speed and compression. I often don't zip my files to maximum compression, for example, because I don't want to sit around waiting for a long time in order to save a very small amount of space. I also don't zip without compression, because speed is not that *that* important. I look for compression that's balanced. "Compress it as much as you can without making me sit around and wait for it."

      Similarly, if I were ripping CDs to MP3, and you offered me a different format that would save me 1MB per song, I'd jump on board. If you told me that it would save me that space by requiring 1 hour to compress, and then another hour to decompress before I could play it, I'd tell you to fuck off. If you told me it would drain my battery life on my phone to play it, I'd say it's not worth the trouble.

      So I don't know if this is the right metric or the most useful metric, but certainly there could be a metric for compression that deals with "total space savings" vs. "time and complexity in compressing and decompressing". Such a metric could actually be a solid indicator of which compression is useful in a vague general sense.

  2. freemasons run the country by retchdog · · Score: 4, Interesting

    The so-called Weissman score is just proportional to (compression ratio)/log(time to compress).

    I guess the idea is that twice as much compression is always twice as good, while increases in time become less significant if you're already taking a long time. For example, taking a day to compress is much worse than taking an hour, but taking 24 days to compress is only somewhat worse than taking one day since you're talking offline/parallel processing anyway.

    The log() seems kind of an arbitrary choice, but whatever. It's no better or worse than any other made-up metric, as long as you're not taking it too seriously.

    --
    "They were pure niggers." – Noam Chomsky
    1. Re:freemasons run the country by AsmCoder8088 · · Score: 2

      The formula is not too bad, although I would suggest a minor tweak, mainly that one should change it from:

      (compression ratio)/log(time to compress)

      to:

      (compression ratio)/log(10+time to compress).

      This will ensure that no divide by zero occurs, specifically if the time to compress is 1 second, then you would have been dividing by zero in the original formula.

  3. Re:Useless without measure of lossiness/distortion by bill_mcgonigle · · Score: 2, Funny

    hey, "print 0" runs in O(1)!

    --
    My God, it's Full of Source!
    OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
  4. Inadequate by Are+You+Kidding · · Score: 2

    Not only does it fail to account for loss or distortion, but also fails to consider the time to decompress. If a compression algorithm with a high Weissman score is applied to a video, it is useless if it cannot be decompressed fast enough to show the video at an appropriate frame rate.

  5. Compression and decompression ratios would help by JoeyRox · · Score: 2

    Two scores would be useful, one for compression_time:size and decompression_time:size, since for many applications the latter is more important in compress-once consume-many applications.

  6. Re:It really works? by phoenix_rizzen · · Score: 5, Informative

    They're talking about the Score, not the compression algorithm. And your link doesn't mention anything about the Score.

  7. Sounds like the Drake equation all over again. by mmell · · Score: 2

    IIRC, the Drake equation was also a 'spitball' solution whipped off the cuff to address an inconvenient interviewer question. Subsequent tweaks have made it as accurate and reliable as when it was first spat out upon the world - and about as useless.

  8. Re:I thought it wasnt possible by Travis+Mansbridge · · Score: 2

    The fictional compression algorithm doesn't work. The metric for rating compression algorithms does work (insofar as more compressed/faster algorithms achieve a better rating).

  9. circle jerk by Anonymous Coward · · Score: 2, Funny

    Show About Self-Absorbed Assholes Who Think Their Stupid Ideas Are The Bees Knees Gains Popularity By Making Their Stupid Idea Sound Like Its The Bees Knees

  10. Re:Useless without measure of lossiness/distortion by retchdog · · Score: 4, Informative

    it's for lossless compression only.

    anyway, you can just add a term representing the lost information and throw it into this "score". hey, why not? just figure out how important the lossiness is relative to compression rate. if it's very important, take the exp() of the loss metric; if it's unimportant (like time is), take the log(); finally, if it's just kind of important, leave it linear, or maybe square or square root. whatever.

    seriously, just make some shit up and throw it in. you won't compromise anything. it's already just made-up shit.

    --
    "They were pure niggers." – Noam Chomsky
  11. Re:Useless without measure of lossiness/distortion by viperidaenz · · Score: 4, Insightful

    In the TV show only lossless compression was being considered, so MP3 would fail.

  12. Re:Trivial observation by fnj · · Score: 3, Insightful

    The reason the Score is utter bullshit is that the scale is completely arbitrary and useless. It says that 2:1 compression that takes 1 second should have the same score as 4:1 compression that takes log(2) seconds, or 1 million to 1 compression that takes log(1 million) seconds.

    WHY? State why log time is a better measure than straight time, or time squared, or square root of time. And look at the units of the ratio: reciprocal log seconds. What the hell is the significance of that? It also conveniently sidesteps the variability with different architectures. Maybe SSE helps algorithm A much more than it does algorithm B. Or B outperforms A on AMD, but not on Intel. Or maybe it is strongly dependent on size of source (there is an implicit assumption that all algorithms scale linearly with size of source; maybe in actual fact some are not linear and others are).

    In real life, for some compression jobs you don't CARE how long it takes, and for other jobs you care very much. Or imagine an algorithm that compresses half as fast but decompresses 1000 times faster. That doesn't even register in the score.

    It's bullshit.

  13. Re:It really works? by vux984 · · Score: 2

    Claiming that a score "works" has no meaning,

    I could easily devise a cpu scoring methodology that scores CPU based on chip area / cost * clock speed / register width.

    Such a score "works" in the sense that the function can be evaluated, but it wouldn't tell you anything about whether to buy an i7 vs a xeon vs a pentium 2.

    The suggestion in the article is that the particular scoring methodology that was created for the show is useful for comparing compression algorithms, to the point that it may well be adopted by industry.

    Therefore, the only interpretation of the hideously poor writing is that the submitter is claiming the algorithm works.

    The writing was perfectly fine, your reading comprehension is what failed here.

  14. Re:I thought it wasnt possible by khellendros1984 · · Score: 3, Informative
    FTA:

    And Jerry Gibson, a professor at the University of California at Santa Barbara, says he's going to introduce the metric into two classes this year. For a winter quarter class on information theory, he will ask students to use the score to evaluate lossless compression algorithms. In a spring quarter class on multimedia compression, he will use the score in a similar way, but in this case, because the Weissman Score doesn't consider distortion introduced in lossy compression, he will expect the students to weight that factor as well.

    The scoring method as stated is only useful for evaluating lossless compression. One could also take into account the resemblance of the output to the input to allow a modified version of the score to evaluate lossy compression.

    --
    It is pitch black. You are likely to be eaten by a grue.
  15. Re:The Misra Score by DoofusOfDeath · · Score: 4, Funny

    From the article:

    Misra came up with a formula

    So, now Jar Jar Binks does C.S.? Shit...

  16. Slashvertisement for HBO? by Gothmolly · · Score: 2

    Given that only a subset of Slashdot users are HBO subscribers, how is this relevant?

    --
    I want to delete my account but Slashdot doesn't allow it.
  17. Re:Useless without measure of lossiness/distortion by viperidaenz · · Score: 2

    That's correct. So what?

    So, comment I was replying to

    Using the "Weissman Score", MP3 is always better than FLAC

    MP3 wouldn't even have a "Weissman Score" because it's not a lossless compression algorithm.