A Fictional Compression Metric Moves Into the Real World
Tekla Perry (3034735) writes The 'Weissman Score' — created for HBO's "Silicon Valley" to add dramatic flair to the show's race to build the best compression algorithm — creates a single score by considering both the compression ratio and the compression speed. While it was created for a TV show, it does really work, and it's quickly migrating into academia. Computer science and engineering students will begin to encounter the Weissman Score in the classroom this fall."
A "combined score" for speed and ratio is useless, as that relation is not linear.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
I thought I read an article the other day that said their algorithm seemed plausible on the surface but would eventually would begin to fall apart?
The so-called Weissman score is just proportional to (compression ratio)/log(time to compress).
I guess the idea is that twice as much compression is always twice as good, while increases in time become less significant if you're already taking a long time. For example, taking a day to compress is much worse than taking an hour, but taking 24 days to compress is only somewhat worse than taking one day since you're talking offline/parallel processing anyway.
The log() seems kind of an arbitrary choice, but whatever. It's no better or worse than any other made-up metric, as long as you're not taking it too seriously.
"They were pure niggers." – Noam Chomsky
From the article:
Misra came up with a formula
hey, "print 0" runs in O(1)!
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
He said it did work, it's just not as effective as other existing compression solutions.
Not only does it fail to account for loss or distortion, but also fails to consider the time to decompress. If a compression algorithm with a high Weissman score is applied to a video, it is useless if it cannot be decompressed fast enough to show the video at an appropriate frame rate.
Aside from centering around Silicon Valley, I don't see how these stories are related. That one is about a fictional compression algorithm, while this one is about a method for rating compression algorithms which is becoming nonfiction.
Two scores would be useful, one for compression_time:size and decompression_time:size, since for many applications the latter is more important in compress-once consume-many applications.
They're talking about the Score, not the compression algorithm. And your link doesn't mention anything about the Score.
The fictional compression algorithm doesn't work. The metric for rating compression algorithms does work (inasmuch as more compressed/faster algorithms achieve a better rating).
IIRC, the Drake equation was also a 'spitball' solution whipped off the cuff to address an inconvenient interviewer question. Subsequent tweaks have made it as accurate and reliable as when it was first spat out upon the world - and about as useless.
exactly. The compression algorithm is fictional; the score, while created for the show, can actually be calculated. Whether it will catch on as a metric remains to be seen.
Show About Self-Absorbed Assholes Who Think Their Stupid Ideas Are The Bees Knees Gains Popularity By Making Their Stupid Idea Sound Like Its The Bees Knees
it's for lossless compression only.
anyway, you can just add a term representing the lost information and throw it into this "score". hey, why not? just figure out how important the lossiness is relative to compression rate. if it's very important, take the exp() of the loss metric; if it's unimportant (like time is), take the log(); finally, if it's just kind of important, leave it linear, or maybe square or square root. whatever.
seriously, just make some shit up and throw it in. you won't compromise anything. it's already just made-up shit.
"They were pure niggers." – Noam Chomsky
No metric is adequate for all purposes. This one is adequate for the task it was designed for, and is adequate for some other purposes as well. That's the best that can be expected of any tool. Always use the appropriate tools for the task at hand, of course.
"Convictions are more dangerous enemies of truth than lies."
In the TV show only lossless compression was being considered, so MP3 would fail.
“We had to come up with an approach that isn’t possible today, but it isn’t immediately obvious that it isn’t possible,” says Misra.
Please explain why you think that means he said "it does work".
> so MP3 would fail.
That's correct. So what?
MP3 was never a good compression algorithm. It's an audio format that uses a normalization that cause SOME audio to be lossy. It's a great demonstration on how a negligible loss across a wide range of audio could result in a more useful algorithm for sound (it's quite compact). MP3 is not a good compression algorithm and doesn't see a lot of use outside of commodity audio, where you can afford to throw away data.
Often wrong but never in doubt.
I am Jack9.
Everyone knows me.
Holy shit! Math works! Somehow, I don't think you can have a discussion about if a formula really returns a result or not. I now see that the idiot who wrote the summary was trying to say that the algorithm doesn't work, but math does. Alas that idiot has no ability to write. ... oh wait, it was you! Never mind.
Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
Yes. That's the point, isn't it. They didn't invent math for the show. Claiming that a score "works" has no meaning, other than to say that math "works". Therefore, the only interpretation of the hideously poor writing is that the submitter is claiming the algorithm works.
Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
Sounds a bit like the f1 measure used in classification systems, where the F-score is the harmonic mean of precision and recall. (where trying to higher precision yields lower recall and vice-versa) ;-)
however, I'm wondering how stable this Weissman score is. Compression algorithms might not all perform O(n) where n is size of data to compress.
Or it may actually give a very high score to something that doesn't compress at all.
public byte[] compress( byte[] input) { return input;}
I bet this gets a high Weissman score
Claiming that a score "works" has no meaning,
I could easily devise a cpu scoring methodology that scores CPU based on chip area / cost * clock speed / register width.
Such a score "works" in the sense that the function can be evaluated, but it wouldn't tell you anything about whether to buy an i7 vs a xeon vs a pentium 2.
The suggestion in the article is that the particular scoring methodology that was created for the show is useful for comparing compression algorithms, to the point that it may well be adopted by industry.
Therefore, the only interpretation of the hideously poor writing is that the submitter is claiming the algorithm works.
The writing was perfectly fine, your reading comprehension is what failed here.
Yes. He failed to comprehend that the submitter was pointing out that math really works, and a ratio of compression over time really does express a ratio.
Oh boy. A useless metric!
Compression ratio: Sure. But the problem is, it's possible to increase compression ratio by "losing" data. So you can obtain a high ratio, but the images as rendered will be blurry/damaged.
Compression Speed: This is just as dumb since compression speed is partially a function of the compression ratio, partially a function of the efficiency of the algorithm and partially a function of the amount of "grunt power" hardware you throw at it. So one portion of this is a nebulous "hardware norm" factor that can be gamed. The other is a function of the other factor (compression ratio) which can ALSO be gamed (and creates a bias towards lossy compression).
Basically something with a high Weismann number would be extremely lossy compression on high power hardware. Which basically negates the point of high resolution viewing, as any idiot can reduce a 1920x1080 frame to 19px by 11px, and then compress it. I can already take precompressed (and lossy) JPEG files, resample down to 19x11, then back up to 1920x1080. I can wind up reducing a 930K file down to 40K (basically a 95+% savings). And the image is completely indecipherable.
Take a look at an original image versus the same image on the above-described UCCT (UltraCrappyCompressionTechique).
http://cox-supergroups.com/The...
The above image is a PNG to prevent further compression artifacts from creeping into the sample.
The top portion of the image is the original 930K JPEG file.
The bottom portion is the resampled 40K JPEG file.
Chas - The one, the only.
THANK GOD!!!
MP3 was never a compression algorithm.
FTFY
Given that only a subset of Slashdot users are HBO subscribers, how is this relevant?
I want to delete my account but Slashdot doesn't allow it.
"Algorithm" is the distinction. Otherwise you're basically saying "What's my algorithm for doing X? I just demand X be done." Perhaps you could call it The King's Algorithm.
That's correct. So what?
So, comment I was replying to
Using the "Weissman Score", MP3 is always better than FLAC
MP3 wouldn't even have a "Weissman Score" because it's not a lossless compression algorithm.
Because. Everything is immediately obvious to slashdotters. QED.
No he failed to comprehend that people have found that particular method of calculating ratio of compression over time is proving to be *useful*.
I couldn't watch the first episode. Quit maybe 10 minutes into it. Does anyone here actually enjoy the show and think it's any good?
C'mon now, equal rights for AMD here.
It's not the years, honey, it's the mileage. - Colonel Henry Walton Jones, Jr., Ph.D.