Slashdot Mirror


A Fictional Compression Metric Moves Into the Real World

Tekla Perry (3034735) writes The 'Weissman Score' — created for HBO's "Silicon Valley" to add dramatic flair to the show's race to build the best compression algorithm — creates a single score by considering both the compression ratio and the compression speed. While it was created for a TV show, it does really work, and it's quickly migrating into academia. Computer science and engineering students will begin to encounter the Weissman Score in the classroom this fall."

9 of 133 comments (clear)

  1. Bullshit.... by gweihir · · Score: 4, Interesting

    A "combined score" for speed and ratio is useless, as that relation is not linear.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    1. Re:Bullshit.... by gweihir · · Score: 4, Insightful

      There is no possibility for a useful single metric. The question does obviously not apply to the problem. Unfortunately, most journals do not accept negative results, which is one of the reasons for the sad state of affairs in CS. For those that do, the reviewers would call this one very likely "trivially obvious", which it is.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    2. Re:Bullshit.... by nine-times · · Score: 4, Insightful

      Can you explain in more detail?

      I'm not an expert here, but I think the idea is to come up with a single quantifying number that represents the idea that very fast compression has limited utility if it doesn't save much space, and very high compression has limited utility if it takes an extremely long time.

      Like, if you're trying to compress a given file, and one algorithm compressed the file by 0.00001% in 14 seconds, another compressed the file 15% in 20 seconds, and the third compressed it 15.1% in 29 hours, then the middle algorithm is probably going to be the most useful one. So why can't you create some kind of rating system to give you at least a vague quantifiable score of that concept? I understand that it might not be perfect-- different algorithms might score differently on different sized files, different types of files, etc. But then again, computer benchmarks generally don't give you a perfect assessment of performance. It just provides a method for estimating performance.

      But maybe you have something in mind that I'm not seeing.

    3. Re:Bullshit.... by mrchaotica · · Score: 5, Informative

      Can you explain in more detail?

      If you have a multi-dimensional set of factors of things and you design a metric to collapse them down into a single dimension, what you're really measuring is a combination of the values of the factors and your weighting of them. Since the "correct" weighting is a matter of opinion and everybody's use-case is different, a single-dimension metric isn't very useful.

      This goes for any situation where you're picking the "best" among a set of choices, not just for compression algorithms, by the way.

      Like, if you're trying to compress a given file, and one algorithm compressed the file by 0.00001% in 14 seconds, another compressed the file 15% in 20 seconds, and the third compressed it 15.1% in 29 hours, then the middle algorithm is probably going to be the most useful one.

      User A is trying to stream stuff that has to have latency less than 15 seconds, so for him the first algorithm is the best. User B is trying to shove the entire contents of Wikipedia into a disc to send on a space probe, so for him, the third algorithm is the best.

      You gave a really extreme[ly contrived] example, so in that case you might be able to say that "reasonable" use cases would prefer the middle algorithm. But differences between actual algorithms would not be nearly so extreme.

      --

      "[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz

  2. freemasons run the country by retchdog · · Score: 4, Interesting

    The so-called Weissman score is just proportional to (compression ratio)/log(time to compress).

    I guess the idea is that twice as much compression is always twice as good, while increases in time become less significant if you're already taking a long time. For example, taking a day to compress is much worse than taking an hour, but taking 24 days to compress is only somewhat worse than taking one day since you're talking offline/parallel processing anyway.

    The log() seems kind of an arbitrary choice, but whatever. It's no better or worse than any other made-up metric, as long as you're not taking it too seriously.

    --
    "They were pure niggers." – Noam Chomsky
  3. Re:It really works? by phoenix_rizzen · · Score: 5, Informative

    They're talking about the Score, not the compression algorithm. And your link doesn't mention anything about the Score.

  4. Re:Useless without measure of lossiness/distortion by retchdog · · Score: 4, Informative

    it's for lossless compression only.

    anyway, you can just add a term representing the lost information and throw it into this "score". hey, why not? just figure out how important the lossiness is relative to compression rate. if it's very important, take the exp() of the loss metric; if it's unimportant (like time is), take the log(); finally, if it's just kind of important, leave it linear, or maybe square or square root. whatever.

    seriously, just make some shit up and throw it in. you won't compromise anything. it's already just made-up shit.

    --
    "They were pure niggers." – Noam Chomsky
  5. Re:Useless without measure of lossiness/distortion by viperidaenz · · Score: 4, Insightful

    In the TV show only lossless compression was being considered, so MP3 would fail.

  6. Re:The Misra Score by DoofusOfDeath · · Score: 4, Funny

    From the article:

    Misra came up with a formula

    So, now Jar Jar Binks does C.S.? Shit...