Slashdot Mirror


A Fictional Compression Metric Moves Into the Real World

Tekla Perry (3034735) writes The 'Weissman Score' — created for HBO's "Silicon Valley" to add dramatic flair to the show's race to build the best compression algorithm — creates a single score by considering both the compression ratio and the compression speed. While it was created for a TV show, it does really work, and it's quickly migrating into academia. Computer science and engineering students will begin to encounter the Weissman Score in the classroom this fall."

133 comments

  1. Bullshit.... by gweihir · · Score: 4, Interesting

    A "combined score" for speed and ratio is useless, as that relation is not linear.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    1. Re:Bullshit.... by i+kan+reed · · Score: 3, Insightful

      Well then write a paper called "an improved single metric for video compression" and submit it to a compsci journal. Anyone can dump opinions on slashdot comments, but if you're right, then you can get it in writing that you're right.

    2. Re:Bullshit.... by gweihir · · Score: 4, Insightful

      There is no possibility for a useful single metric. The question does obviously not apply to the problem. Unfortunately, most journals do not accept negative results, which is one of the reasons for the sad state of affairs in CS. For those that do, the reviewers would call this one very likely "trivially obvious", which it is.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    3. Re:Bullshit.... by nine-times · · Score: 4, Insightful

      Can you explain in more detail?

      I'm not an expert here, but I think the idea is to come up with a single quantifying number that represents the idea that very fast compression has limited utility if it doesn't save much space, and very high compression has limited utility if it takes an extremely long time.

      Like, if you're trying to compress a given file, and one algorithm compressed the file by 0.00001% in 14 seconds, another compressed the file 15% in 20 seconds, and the third compressed it 15.1% in 29 hours, then the middle algorithm is probably going to be the most useful one. So why can't you create some kind of rating system to give you at least a vague quantifiable score of that concept? I understand that it might not be perfect-- different algorithms might score differently on different sized files, different types of files, etc. But then again, computer benchmarks generally don't give you a perfect assessment of performance. It just provides a method for estimating performance.

      But maybe you have something in mind that I'm not seeing.

    4. Re:Bullshit.... by jsepeta · · Score: 2

      That's kind of like the Microsoft Windows Experience Index that is provided by Windows Vista / Windows 7 which gives a score based on CPU, RAM, GPU, and hard disk speed. Not entirely useful but gives beta-level nerds something to talk about at the water cooler.
      http://windows.microsoft.com/e...

      At work my desktop computer is a Pentium E6300 with a 6.3 rating on the CPU and an overall 4.8 rating due to the crappy graphics chipset.
      At work my laptop computer is an i3-2010M with a 6.4 rating on the CPU and an overall 4.6 rating due to the crappy graphics chipset.

      A compression algorithm rated by speed and compression ability would have to weight the speed vs. the compression, right?

      --
      Remember kids, if you're not paying for the service, YOU ARE THE PRODUCT THAT IS BEING SOLD.
    5. Re:Bullshit.... by Darinbob · · Score: 2

      I don't think this metric is really in any computer science journal, it's only in IEEE Spectrum.

    6. Re:Bullshit.... by mrchaotica · · Score: 5, Informative

      Can you explain in more detail?

      If you have a multi-dimensional set of factors of things and you design a metric to collapse them down into a single dimension, what you're really measuring is a combination of the values of the factors and your weighting of them. Since the "correct" weighting is a matter of opinion and everybody's use-case is different, a single-dimension metric isn't very useful.

      This goes for any situation where you're picking the "best" among a set of choices, not just for compression algorithms, by the way.

      Like, if you're trying to compress a given file, and one algorithm compressed the file by 0.00001% in 14 seconds, another compressed the file 15% in 20 seconds, and the third compressed it 15.1% in 29 hours, then the middle algorithm is probably going to be the most useful one.

      User A is trying to stream stuff that has to have latency less than 15 seconds, so for him the first algorithm is the best. User B is trying to shove the entire contents of Wikipedia into a disc to send on a space probe, so for him, the third algorithm is the best.

      You gave a really extreme[ly contrived] example, so in that case you might be able to say that "reasonable" use cases would prefer the middle algorithm. But differences between actual algorithms would not be nearly so extreme.

      --

      "[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz

    7. Re:Bullshit.... by Anonymous Coward · · Score: 0

      Well, you sure could easily detect these extreme cases automatically... But other than that, chosing a compression algorithm seriously, is generally a significantly more complex decision process...

      You notably have to factor in:

      - Decompression speed (which with some algorithms can be very different from the level of the compression speed, that is an algorithm can for example be optimized for compression speed, notably for large mostly write-only files/backups, but be very slow in comparison to uncompress the archives);

      - CPU/GPU and memory usage (it can be very important for servers and large data sets);

      - Possible data losses and their precise nature (a very fundamental and common subject for audio and image/video compression notably, with some subjective aspects);

      - Implementation complexity and code quality (particularly if you rely on it for backups).

      The first three items can have perfectly intended and useful variations, which you will have to select depending on your specific needs. Most algorithms provide options for various of these needs, and there are also more specialized algorithms.

      Even for desktop use, needs may vary a lot, even for basic lossless compression of miscellaneous files... Some people might want maximum compression whatever the speed beside possibly some extremes... Some others might want a somewhat balanced result (which might depend on their current computer, and thus evolve with time...). Some might prefer algorithms optimized for specific file types (you won't see many desktop users zipping BMP and WAV files directly, for example... well, beside the few ignorance cases we probably all know about here... and the few far more technical usages with less concern for size... and even then, modern zip format implementations will use different algorithms depending on the file type anyway...), and some others a more generalized algorithm. Some might want integrated encryption. Some might want redundancy options (e.g., for newsgroups, or important backups). Some might want various other algorithm or UI options which might only be implemented by specific implementations of specific algorithms...

      It's hard to summarize all this with a single number, even for common use cases... and it's really not needed at all... if you want a simple comparison base, just search for one of the numerous algorithm and software reviews on the web, and check the main points in comparison to your needs... And you'll need to check more than one, because of the different (and some time erroenous) testing methods, newer algorithm/implementation/UI versions, etc.

      It is currently impossible to summarize all this with a single number for all use cases... Maybe one day, with more perfected algorithms (and even then, it probably always be a set of perfected algorithm, at least for different file formats... but maybe they will have some common bases...), but then most current concerns of speed, energy, cost, and size, will probably not be valid anymore...

    8. Re:Bullshit.... by nine-times · · Score: 3, Insightful

      Since the "correct" weighting is a matter of opinion and everybody's use-case is different, a single-dimension metric isn't very useful...[snip] User A is trying to stream stuff that has to have latency less than 15 seconds, so for him the first algorithm is the best.

      And these are very good arguments why such a metric should not be taken as an end-all be-all. Isn't that generally the case with metrics and benchmarks?

      For example, you might use a benchmark to gauge the relative performance between two video cards. I test Card A and it gets 700. I test Card B and it gets a 680. However, in running a specific game that I like, Card B gets slightly faster framerates. Meanwhile, some other guy wants to use the video cards to mine Bitcoin, and maybe these specific benchmarks test entirely the wrong thing, and Card C, which scores 300 on the benchmark, is the best choice. Is the benchmark therefore useless?

      No, not necessarily. if the benchmark is supposed to test general game performance, and generally faster benchmark tests correlate with faster game performance, then it helps shoppers figure out what to buy. If you want to shop based on a specific game or a specific use, then you use a different benchmark.

    9. Re:Bullshit.... by Ardyvee · · Score: 1

      Why generate a score in the first place, when you can just provide compression ratio, compression speed, or in the case of the card: fps (at settings), energy used, consistency of the fps (at settings), along with any other characteristic you know or can test that doesn't combine two other things and let the user decide which of those things are better instead of trying to boil it all down to a single number?

      --
      I don't care if I'm wrong. I only care about everyone obtaining something from the discussion.
    10. Re:Bullshit.... by Beck_Neard · · Score: 2

      Uhm, do you really think that something as important as assessing the performance of compression algorithms wouldn't have attracted the attention of thousands (or, more likely, hundreds of thousands) of computer scientists over the years? Open up any academic journal that deals with this stuff even tangentially and you find many examples of different metrics for assessing compression performance. And there's nothing new about this 'score'. Dividing ratio by the logarithm of the compression time is a very widely-used theoretical scoring function; I can find references to it from the 90's. This particular form of that score may be new, but gweihir is right; such a score doesn't give much information and has very little use.

      --
      A fool and his hard drive are soon parted.
    11. Re:Bullshit.... by gweihir · · Score: 1

      It depends far too much on your border conditions. For example, LZO does compress not very well, but it is fast and has only a 64kB footprint. Hence it gets used in space-probes where the choice is to compress with this or throw the data away. On the other hand, if you distribute pre-compressed software or data to multiple targets, even the difference between 15.0% and 15.1% can matter, if it is, day 15.0% in 20 seconds and 15.1 in 10 Minutes.

      Hence a single score is completely unsuitable to address the "quality" of the algorithm, because there is no single benchmark scenario.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    12. Re:Bullshit.... by gweihir · · Score: 1

      Good comparison.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    13. Re:Bullshit.... by gweihir · · Score: 1

      The uses for that single number are as follows:

      a) Some class of people like to claim "mine is bigger", which requires a single number. While that is stupid, most people "understand" this type of reasoning.
      b) Anything beyond a single number is far to complicated for the average person watching TV.

      In reality, things are even more complicated, as speed and compression ratio depend both on the data being compressed, and do that independently to some degree. This means, some data may compress really well and do that fast, while other data may compress exceedingly bad, but also fast, while a third data set may compress well, but slowly and a 4th may compress badly and slow. So in reality, you need to state several numbers (speed, ratio, memory consumption) for benchmark data and in addition describe the benchmark data itself to get an idea of an algorithm's performance. If it is a lossy algorithm, it gets even more murky as then you need typically several quality measures. For video, you may get things like color accuracy, sharpness of lines, accuracy of contrast, behavior for fast moving parts, etc.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    14. Re:Bullshit.... by sootman · · Score: 1

      I'd just say it's useless because no two people can agree on what's important, so what's the point of giving a single score? And even something as seemingly simple as a compression algorithm has more than just two characteristics:
      1) speed of compression
      2) file size
      3) speed of decompression
      4) does it handle corrupt files well? (or at all?)

      Even just looking at 1 & 2, everyone has different needs. Some people value 1 above all others, some people value 2, and most people are somewhere in between, and "somewhere" is a pretty big area. Yes, your examples are pretty far apart and most people would agree that "best" is somewhere in the middle, but the middle is bigger than you think. Hence, there can simply never be a "best". So why bother trying to score one?

      > So why can't you create some kind of rating system to give
      > you at least a vague quantifiable score of that concept?

      Because it would just be too vague to be useful. I mean, yeah, it can sort out the great ones from the horrible ones, but that's easy anyway, so if you're just trying to compare a few really good ones, the difference isn't enough.

      A car that goes 200 mph is great, but not if it gets 2 mpg. Likewise, 100 mpg and a top speed of 30 mph isn't useful either. If you're comparing a bunch of cars that get 32-35 mpg and go 130-140 mph, there's not a meaningful way to pick the "best" in that group that everyone will agree on, unless one has the highest speed and the best mileage, but then, again, that's an obvious winner and you don't need an algorithm's help to pick it out of the pack.

      --
      Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
    15. Re: Bullshit.... by jrumney · · Score: 1

      It depends in the situation where it is used. If your data almost but not quite fits on your available media at 15%, and you're not pressed for time, you might still go for 15%. And if you only have 15 seconds to compress it, strictly no more, you might settle for significantly less compression than would be possible in 20 seconds.

    16. Re:Bullshit.... by nine-times · · Score: 1

      there's not a meaningful way to pick the "best" in that group that everyone will agree on

      Metrics often don't provide a definitive answer about what the best thing is, with universal agreement. If I tell you Apple scores highest in customer satisfaction for smartphones last year, does that mean everyone will agree that the iPhone is the best phone? If a bunch of people are working at a helpdesk, and one closes the most tickets per hour, does that necessarily mean that he's the best helpdesk tech?

      It's true that a lot of people misuse metrics, thinking that they always provide an easy answer, without understanding what they actually mean. That doesn't mean that metrics are useless.

      If you're comparing a bunch of cars that get 32-35 mpg and go 130-140 mph, there's not a meaningful way to pick the "best" in that group that everyone will agree on

      Yeah, but that's a really dumb metric since most people don't actually care what the top speed of a car is. Or to be more truthful, only morons care about top speed unless it's below 80mph, since you basically shouldn't be driving your car that fast. So really, in a metric like this, the "top speed" isn't a metric of "faster is better". It's a metric of "fast enough is good enough".

      But if you were in the habit of doing car reviews, it might make sense to take a bunch of assessments, qualitative and quantitative, like acceleration and handling, MPG, physical attractiveness, additional features, and price (lower is better), and then weigh and average each score. That would enable you to come up with a final score which, while subjective, makes some attempt to enable an overall ranking of the cars. In fact, this is the sort of thing that reviewers sometimes do.

    17. Re:Bullshit.... by nine-times · · Score: 1

      Depending on what you're talking about, providing a huge table of every possible test doesn't make for easy comparisons. In the case of graphics cards, I suppose you could provide a list of every single game, including framerates on every single setting on every single game. It would be hard to gather all that data, and the result would be information overload, and it still wouldn't allow you to make a good comparison between cards. Even assuming you ad such a table, it would probably be more helpful to add or average the results somehow, providing a cumulative score. Of course, then you might want to weight the scores, possibly based on how popular the game is, or how representative it is of the actual capabilities of the card. But if that's the result that's actually helpful, why not design a single benchmark that's representative of what games do, rather than having to test so many games?

    18. Re:Bullshit.... by nine-times · · Score: 1

      Hence a single score is completely unsuitable to address the "quality" of the algorithm, because there is no single benchmark scenario.

      So you're saying that no benchmark is meaningful because no single benchmark can be relied upon to be the final word under all circumstances? By that logic, measuring speed is not meaningful, because it's not the final word in all circumstances. Measuring the compression ratio is meaningless because it's not the final word in all circumstances. The footprint of the code is meaningless because it's not the final word in all circumstances.

      Isn't it possible that a benchmark could be useful for some purposes other than being the final word in all circumstances?

    19. Re:Bullshit.... by ultranova · · Score: 1

      A "combined score" for speed and ratio is useless, as that relation is not linear.

      A combined score could be quite useful when implementing, for example, compressed swap. Obviously you'd need to calibrate it for the specifics of a case.

      --

      Forget magic. Any technology distinguishable from divine power is insufficiently advanced.

    20. Re:Bullshit.... by sg_oneill · · Score: 2

      A "combined score" for speed and ratio is useless, as that relation is not linear.

      Typing at 70 words per minute, slashdot poster declares quantity over time measurements meaningless.

      --
      Excuse the Unicode crap in my posts. That's an apostrophe, and slashdot is busted.
    21. Re:Bullshit.... by gweihir · · Score: 1

      Whether measuring speed is a meaningful benchmark depends on what you measure the speed of, relatively to what and what the circumstances are. There are many situations where "speed" is not meaningful, and others that are limited enough that it is.

      However, the metric under discussion will not be meaningful in any but the most bizarre and specific circumstances, hence it is generally useless. For the special situations where it could be useful, it is much saner to adapt another metric than define a specific one as this pollutes the terminology.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    22. Re:Bullshit.... by gweihir · · Score: 1

      Other Slashdot poster adds meaningless posturing as that is the limit of what he can do.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    23. Re:Bullshit.... by gweihir · · Score: 1

      When you "calibrate" swap for specific uses, it becomes non-general. In that situation it is far better to let the application use on-disk storage, because _it_ knows the data profile. Sorry, but fail to understand swap.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    24. Re:Bullshit.... by loufoque · · Score: 1

      very high compression has limited utility if it takes an extremely long time

      I don't see how the utility is limited.
      Most content is mastered once and viewed millions of time.

      How much time it takes to compress is irrelevant, even if you get diminishing returns the longer you take. What's important is to save space when broadcasting the content.

    25. Re:Bullshit.... by Anonymous Coward · · Score: 0

      Wrong. Only if they are statistically independent, can you not replace them with a combined metric. If they have a linear relation, or any deterministic or statistical correlation for that matter, then one can be predicted from the other. Therefore only one of them will be useful and the other will be redundant. So if and only if they *have* a relation, can you use replace the two metrics with one.

      Have you heard of the PCA technique? PCA stands for Principal Components Analysis. It is used in statistics and in modern times in machine learning. It replaces multiple correlated varaibles with one combined single variable that in a combined fashion explains the statistical variation of all the original correlated variables. In machine learning this is called dimensionality reduction or feature extraction.
      In general what the PCA does is that if you have an original mix of correlated and uncorrelated variables, it removes the correlations. In other words, it replaces the original variables with a new set of variables that are uncorrelated. These new variables are basically weighted linear combinations of the original variables. The weighting vectors are simply the eigenvectors of the correlation matrix of the original variables.

      So in our compression problem, based on my knowledge of PCA, my suggestion is to use PCA on the compression speed and ratio to create two new metrics as two linear combinations of the speed and ratios. If the speed/ratio are correlated then one of the new metrics will be small in variance and can be discarded. The other having higher variance is the one to be used as the combined metric.

    26. Re:Bullshit.... by buchner.johannes · · Score: 1

      This point comes up often in genetic algorithms, when more than one quantity should be optimized for. A common solution is to build a Pareto frontier, and declare them the best.

      A combination between two quantities is always a personal weighting. It may be useful, but it may also be limited in application. In the case here, the balance between compression speed and achieved size is too personal to be general-purpose, but perhaps the metric is useful for the use case of TV streaming content providers.

      --
      NB: The message above might reflect my opinion right now, but not necessarily tomorrow or next year.
    27. Re:Bullshit.... by ultranova · · Score: 1

      When you "calibrate" swap for specific uses, it becomes non-general.

      Metric, not swap. I'm talking about compressing memory pages before swapping out, possibly to another memory region, and calibrating the metric to balance between CPU cycles used vs. disk traffick saved, possibly dynamically.

      In that situation it is far better to let the application use on-disk storage, because _it_ knows the data profile.

      And the OS knows the general state of the system. Also, virtual memory systems are far from trivial to create, and can't really be done via libraries or such since every memory access could potentially require swapping data in first so your algorithms get littered with calls to swap_in and swap_out. On the other hand, the OS can use hardware features to do this transparently.

      Sorry, but fail to understand swap.

      Yes, you do. And English too.

      --

      Forget magic. Any technology distinguishable from divine power is insufficiently advanced.

    28. Re:Bullshit.... by nine-times · · Score: 1

      How much time it takes to compress is irrelevant, even if you get diminishing returns the longer you take. What's important is to save space when broadcasting the content.

      Well, and also that it can be decompressed quickly and with little processing power, or else with enough hardware support that it doesn't matter. Otherwise, it'd take a long time to access and drain power on mobile devices.

    29. Re:Bullshit.... by nine-times · · Score: 2

      I find it surprising and almost funny how much ire this has drawn from people with some kind of weird "purist" attitude about the whole thing.

      It doesn't seem "generally useless" to me, but it would be more appropriate to say that it's "useful only in general cases". I would say that in most circumstances, I'd want compression algorithms that balance speed and compression. I often don't zip my files to maximum compression, for example, because I don't want to sit around waiting for a long time in order to save a very small amount of space. I also don't zip without compression, because speed is not that *that* important. I look for compression that's balanced. "Compress it as much as you can without making me sit around and wait for it."

      Similarly, if I were ripping CDs to MP3, and you offered me a different format that would save me 1MB per song, I'd jump on board. If you told me that it would save me that space by requiring 1 hour to compress, and then another hour to decompress before I could play it, I'd tell you to fuck off. If you told me it would drain my battery life on my phone to play it, I'd say it's not worth the trouble.

      So I don't know if this is the right metric or the most useful metric, but certainly there could be a metric for compression that deals with "total space savings" vs. "time and complexity in compressing and decompressing". Such a metric could actually be a solid indicator of which compression is useful in a vague general sense.

    30. Re: Bullshit.... by loufoque · · Score: 1

      Compression and decompression are different things.

    31. Re:Bullshit.... by Anonymous Coward · · Score: 0

      Moreover, every single case would rate the balance between speed and compression differently. So this nuber needs to be reevaluated for every case if it is to be usefull.

    32. Re: Bullshit.... by Anonymous Coward · · Score: 0

      Depends on the compression algorithm used in the first place.

      Adaptive and multi-context modelling have to rebuild the same model during decompression as they did during compression to get all the probabilities correct. A simpler flag-length-offset compression scheme doesn't need to know anything beyond the last 64kb or 128kb window.

      So decompression *can* (but not always) be just as intensive as compression.

    33. Re:Bullshit.... by Anonymous Coward · · Score: 0

      "Can you explain in more detail? "

      The problem is that every application has its own optimum range both for speed and comression ratio.
      Some applications don't care about speed but do care about compression.
      Some applications do care about speed but don't care about compression.
      Some applications don't care about speed and don't care about compression.
      Some applications do care about speed and do care about compression.

      And all of these can freely define both the wanted time range and compression ration range.
      A lot of the time the speed only starts to become an issue if the process becomes non real-time.
      If something decodes within the allotted time, then why care about speed at all?

      Both compression ratio requirements and speed requirements are application dependent. It seems pointless to have a metric calculating a single score that does not take into acount these requirements.
      This formula basically says that an algorithm that compresses very well but takes 100.000.000 years to complete is just as good as something that hardly compresses but does so very very fast.
      This is, of course, nonsense as noone will want to wait 100.000.000 years for anything.

      It's an interesting, but practically useless metric.

    34. Re: Bullshit.... by nine-times · · Score: 1

      Ok, so let's start from where you're wrong that "What's important is to save space when broadcasting the content." There are other important things.

      Next, what would you like to do then? Change this benchmark to measure decompression speed rather than compression speed? Sure, fine. Let's do that.

    35. Re: Bullshit.... by loufoque · · Score: 1

      Decompression time is always real time. That's obvious.
      Compression is a whole different beast. Some applications need real-time encoding (such as video-conferencing), but most do not.
      Have you even ever written an encoder?

    36. Re:Bullshit.... by hey! · · Score: 1

      It doesn't have to be linear to be useful. It simply has to be able to sort a set of choices into order -- like movie reviews. Nobody thinks a four star movie is "twice as good" as a two star movie, but people generally find the rank ordering of movies by stars useful provided they don't read to much into the rating. In fact the ordering needn't be unique; there can be other equally useful metrics which order the choices in a slightly different way. *Over certain domains of values* minor differences in orderings may not matter very much, especially as your understanding of your future requirements is always somewhat fuzzy (e.g. the future cost of bandwidth or computing power).

      The problem with any metric occurs outside those domains; some parameters may have discontinuities in their marginal utility. A parameter's value may be good enough and further improvements yield no benefit; or the parmater's value may be poor enough to disqualify a choice altogether. In such cases such a metric based on continuous functions will objectively misorder choices.

      For example Suppose A is fast enough but has poor compression ratios; B is not quite fast enough but has excellent compression ratios. There's really only one viable choice: A; but the metric may order the choices B,A.

      On the other hand suppose A has better compression ratios than B; B is faster than A, but A is already so fast that it makes no practical difference. The rational ordering of choices is A,B but the metric might order them B,A.

      This kind of thing is always a problem with boiling choices down to a single composite number. You have to understand what goes into that number and how those things relate to your needs. You have to avoid making your decisions on one number alone. But some people *will* fasten on a single number because it makes the job of choosing seem easier than it does. Just don't be one of those people.

      --
      Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
    37. Re:Bullshit.... by Samizdata · · Score: 1

      Yet another Slashdot poster sits back with popcorn and watches the fracas.

      --
      It's not the years, honey, it's the mileage. - Colonel Henry Walton Jones, Jr., Ph.D.
    38. Re:Bullshit.... by thunderclap · · Score: 1

      Can you explain in more detail?

      I'm not an expert here, but I think the idea is to come up with a single quantifying number that represents the idea that very fast compression has limited utility if it doesn't save much space, and very high compression has limited utility if it takes an extremely long time.

      Like, if you're trying to compress a given file, and one algorithm compressed the file by 0.00001% in 14 seconds, another compressed the file 15% in 20 seconds, and the third compressed it 15.1% in 29 hours, then the middle algorithm is probably going to be the most useful one. So why can't you create some kind of rating system to give you at least a vague quantifiable score of that concept? I understand that it might not be perfect-- different algorithms might score differently on different sized files, different types of files, etc. But then again, computer benchmarks generally don't give you a perfect assessment of performance. It just provides a method for estimating performance.

      But maybe you have something in mind that I'm not seeing.

      A compsci sacred cow being slaughtered. See there is nothing wrong with what you suggested. Thats the reason why the idea was inserted into Silicon valley to begin with. So why the bitching about its usefulness? People who spend time in computing as a whole are a fairly rigid lot. A lot of the have aspergers syndrome which gives them a leg up on coding while taking away their socialization skills. Others think its useless because they would prefer terms that dig deeper into the compression and its velocity. They don't believe any single term could do what they want. Finally. the rest just want to stay where they are in terms of terms and anything new muddies the water for them. However, the article itself gives the best definition as to why others hate it.

      It’s hard to convey to a lay audience that one compression algorithm is better than another—you could compress and decompress images, say, with some loss and look for glitches in the resulting image, but they are hard to spot. But metrics for compression algorithms that rate not only the amount of compression but the processing speed, are hard to find . So it asked the consultants it brought in to help develop the original algorithm—Stanford Professor Tsachy Weissman and then-PhD student Vinith Misra—to come up with a metric that could be used to score multiple algorithms and find a winner. (Misra recently graduated and will soon be working on IBM’s Watson project.) It seems that someone would have come up with such a metric by now. But, says Weissman, “there are two communities: the practitioners, who care about running time, and the theoreticians, who care about how succinctly you can represent the data and don’t worry about the complexity of the implementation.” As a result of this split, he says, no one had yet combined, in a single number, a means of rating both how fast and how tightly an algorithm compresses. Misra came up with a formula (photo above), incorporating both. Along with existing benchmarks the formula creates a metric that the show writers tagged the “Weissman Score.” It's not a fictional metric: although it didn’t exist before Misra created it for the show, it works and may soon find use in the real world. “It is essentially the compression ratio and the ratio of the log of the compression time,” Misra explains, “but it then normalizes that number against an industry standard compressor used for the same data. For music, say, we’d use might use FLAC.” (FLAC, or Free Lossless Audio Codec is an open format from the Xiph.org foundation.)

      The saddest thing is that it took media to force it out there. People here might say its meaningless but if Stanford teaches it will be and when these people here's friends have grandchildren (because the ratio of posting on slashdot divided by the propensity of being a neck-beard also leads to a number used to determine your reproductive likelihood) who graduate Compsci it will be as common as SLOC, Halstead complexity measures & Cyclomatic complexity.

    39. Re:Bullshit.... by gweihir · · Score: 1

      Really, you do not understand what makes swap slow or fast. Go play somewhere else.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    40. Re:Bullshit.... by gweihir · · Score: 1

      The ire is because quite a few people cannot distinguish fake TV science and engineering from the real thing anymore. This "metric" is a high-quality fake and completely useless.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    41. Re: Bullshit.... by nine-times · · Score: 1

      Decompression time is always real time? So it doesn't matter what computer, what processor, the size of the file, the complexity of the file, or even what kind of file it is? Or do you mean that it needs to be able to be done in real-time (or faster) for some particular use a a particular kind of file on a particular platform that you have in mind?

    42. Re:Bullshit.... by nine-times · · Score: 1
      Well no, the metric is real. The question would be whether it's useful or meaningful. You originally implied that it wasn't because:

      A "combined score" for speed and ratio is useless, as that relation is not linear.

      It seems now that it's not about the relation being linear, but about something else that you won't say. I'm afraid I'm not closer to understanding.

    43. Re:Bullshit.... by gweihir · · Score: 1

      You really do not get it, I agree. This metric is useless. It follows the definition of a metric, true, but it has no reasonable practical use, hence it does not deserve any special distinction, like being given a name. That is what is fake here.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    44. Re:Bullshit.... by Anonymous Coward · · Score: 0

      Well, that's completely ridiculous.

      Being on both sides, the speed only matters to the extent the algorithm of choice finishes the task before the deadline, which is application-specific.

      It's a black-and-white thing. Either the algorithm is usable, or not. If it takes just 0.05% more than the available amount of time to finish the job, it's unusable. If it takes less, it's usable. As binary as that.

      There are tasks that have no deadline, and a preference is possible, but that's not TV (realtime deadline there), nor network-related stuff (thoughput deadline there). Not even archival (again, thoughput-related deadline, added to latency-related deadline, ie, how fast does the to-be-archived data comes in, and how long can I wait to have it safely archived).

      On most serious applications of compression, the matter is a binary matter of usable or not.

    45. Re:Bullshit.... by nine-times · · Score: 1

      Ok, I was giving you the benefit of the doubt, but it seems your argument boils down to "It's useless because I say it's useless. Nevermind that you earlier pointed out that it could be useful, because I decided that it's useless."

      Glad we got that sorted out.

    46. Re:Bullshit.... by webmadman · · Score: 1

      I suspect another variable may be at play here: "Ambiguity Tolerance".

      This Weissman Score may provide a great test to determine where someones tolerance for ambiguity is, based on how useful or useless they think a metric like this might be- but then, if the WS becomes useful for determining AT it will then become less useful for determining AT because it's perceived usefulness will have increased, which will then make it more useful... I think I feel my AT falling.

  2. I thought it wasnt possible by Anonymous Coward · · Score: 1

    I thought I read an article the other day that said their algorithm seemed plausible on the surface but would eventually would begin to fall apart?

    1. Re:I thought it wasnt possible by Travis+Mansbridge · · Score: 2

      The fictional compression algorithm doesn't work. The metric for rating compression algorithms does work (insofar as more compressed/faster algorithms achieve a better rating).

    2. Re:I thought it wasnt possible by silas_moeckel · · Score: 0

      When talking about lossy compression for video it might technically work but it's still worthless. For example my highly proprietary heavily patented postage stamp algorithm reduces all video down to 90 era dialup rate mpeg 2 aka a blurry postage stamp. This means it's massively compressed and very quick so it scores high on both metrics. It also looks like crap. Output quality and ratio are generally the metrics that matter and output quality is a subjective factor that needs to be determined by humans. How long it takes to encode is general a non factor as outside of live encoding it's a one time event. The other factor is how hard is it to decode generally not an issue right now.

      --
      No sir I dont like it.
    3. Re:I thought it wasnt possible by Anonymous Coward · · Score: 0, Flamebait

      Maybe you can tell us why, champ.

    4. Re:I thought it wasnt possible by Anonymous Coward · · Score: 1

      Please tell us more about how compressing is not compression.

    5. Re:I thought it wasnt possible by Anonymous Coward · · Score: 0

      compressing it to 1 is not compression.

      LOL

    6. Re:I thought it wasnt possible by khellendros1984 · · Score: 3, Informative
      FTA:

      And Jerry Gibson, a professor at the University of California at Santa Barbara, says he's going to introduce the metric into two classes this year. For a winter quarter class on information theory, he will ask students to use the score to evaluate lossless compression algorithms. In a spring quarter class on multimedia compression, he will use the score in a similar way, but in this case, because the Weissman Score doesn't consider distortion introduced in lossy compression, he will expect the students to weight that factor as well.

      The scoring method as stated is only useful for evaluating lossless compression. One could also take into account the resemblance of the output to the input to allow a modified version of the score to evaluate lossy compression.

      --
      It is pitch black. You are likely to be eaten by a grue.
    7. Re:I thought it wasnt possible by Anonymous Coward · · Score: 0

      FTA:

      And Jerry Gibson, a professor at the University of California at Santa Barbara, says he's going to introduce the metric into two classes this year. For a winter quarter class on information theory, he will ask students to use the score to evaluate lossless compression algorithms. In a spring quarter class on multimedia compression, he will use the score in a similar way, but in this case, because the Weissman Score doesn't consider distortion introduced in lossy compression, he will expect the students to weight that factor as well.

      The scoring method as stated is only useful for evaluating lossless compression. One could also take into account the resemblance of the output to the input to allow a modified version of the score to evaluate lossy compression.

      Posting AC because I've modded in this thread and because I work at UCSB.
      We really need to stop hiring people who are so clueless and useless, we really need to start firing the ones who abuse their positions, we really need to stop illegally hiring people's spouses because they are people's spouses, we really need to stop paying people extra to not teach, etc.

  3. freemasons run the country by retchdog · · Score: 4, Interesting

    The so-called Weissman score is just proportional to (compression ratio)/log(time to compress).

    I guess the idea is that twice as much compression is always twice as good, while increases in time become less significant if you're already taking a long time. For example, taking a day to compress is much worse than taking an hour, but taking 24 days to compress is only somewhat worse than taking one day since you're talking offline/parallel processing anyway.

    The log() seems kind of an arbitrary choice, but whatever. It's no better or worse than any other made-up metric, as long as you're not taking it too seriously.

    --
    "They were pure niggers." – Noam Chomsky
    1. Re:freemasons run the country by AsmCoder8088 · · Score: 2

      The formula is not too bad, although I would suggest a minor tweak, mainly that one should change it from:

      (compression ratio)/log(time to compress)

      to:

      (compression ratio)/log(10+time to compress).

      This will ensure that no divide by zero occurs, specifically if the time to compress is 1 second, then you would have been dividing by zero in the original formula.

    2. Re:freemasons run the country by grep+-v+'.*'+* · · Score: 1

      (compression ratio)/log(time)

      I guess the idea is that twice as much compression is always twice as good, while increases in time become less significant if you're already taking a long time.

      Yeah, I guess I empirically decided this for myself way back with DOS PKZip v0.92: either FAST because I want it now, or MAXIMIZE because I'm somehow space limited and don't care how long it takes. The intermediate ones (and for WinZip, WinRAR, 7z, and the others) are useless for me; either SIZE or SPEED, there IS nothing else.

      (Unless you can do somehow delete or omit it; nothing's faster than not doing it to start with.)

      And look -- they're using logs! Now when someone on the show talks about some curve being exponential, they're actually correct!

      --
      If the universe is someone's simulation -- does that mean the stars are just stuck pixels?
    3. Re:freemasons run the country by Anonymous Coward · · Score: 0

      (Unless you can do somehow delete or omit it; nothing's faster than not doing it to start with.)

      Responding as AC because I moderated the thread already. You should look into real time compression techniques (LZ4 for example) for examples of how compressing data on disks can actually lead to faster read times than not compressing. These algorithmes can decompress up to 2GB/s, well over what any hard drive can offer you. If there is any compression, there is a benefit.

      Also, transmitting data over other slow layers (network for example) can benefit from compression. As a matter of fact, I'm sure the data from the slashdot pages was compressed between their servers and your browser.

      There is more to compression than what you seem to think there is.

    4. Re:freemasons run the country by Anonymous Coward · · Score: 0

      Umm... No. Logarithmic and exponential are not the same.

      See the graph at the following URL:
      http://science.larouchepac.com/gauss/ceres/InterimII/Arithmetic/Primes/Log_Exp_inverts.jpg

    5. Re: freemasons run the country by Anonymous Coward · · Score: 0

      you are a true nigger.

  4. The Misra Score by mfwitten · · Score: 1

    From the article:

    Misra came up with a formula

    1. Re:The Misra Score by Anonymous Coward · · Score: 0

      Except that the formula happens to work.

    2. Re:The Misra Score by Anonymous Coward · · Score: 0

      And the results are often plotted on a Misra bell curve.

    3. Re:The Misra Score by DoofusOfDeath · · Score: 4, Funny

      From the article:

      Misra came up with a formula

      So, now Jar Jar Binks does C.S.? Shit...

  5. Useless without measure of lossiness/distortion by Anonymous Coward · · Score: 0

    An algorithm can compress data quickly and fit it into a small number of bytes, but that doesn't mean what comes out the other end is recognizable. Without adding a weighting for lossiness, this "Weissman Score" has no merit whatsoever. Using the "Weissman Score", MP3 is always better than FLAC, and that's completely untrue for anyone who cares about audio.

    Additionally, new generations of video encoders would arguably be "worse" under this weighting system compared to older generations, as improvements in video encoding are currently rather incremental, generally with massive speed penalties as they require significantly higher numbers of CPU cycles to burn through the algorithms required to compress efficiently at low bitrates while maintaining very little distortion/lossiness.

    Again, this score doesn't matter because in the end, a compression algorithm is only as good as what comes out the other side.

    1. Re:Useless without measure of lossiness/distortion by bill_mcgonigle · · Score: 2, Funny

      hey, "print 0" runs in O(1)!

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    2. Re:Useless without measure of lossiness/distortion by retchdog · · Score: 4, Informative

      it's for lossless compression only.

      anyway, you can just add a term representing the lost information and throw it into this "score". hey, why not? just figure out how important the lossiness is relative to compression rate. if it's very important, take the exp() of the loss metric; if it's unimportant (like time is), take the log(); finally, if it's just kind of important, leave it linear, or maybe square or square root. whatever.

      seriously, just make some shit up and throw it in. you won't compromise anything. it's already just made-up shit.

      --
      "They were pure niggers." – Noam Chomsky
    3. Re:Useless without measure of lossiness/distortion by viperidaenz · · Score: 4, Insightful

      In the TV show only lossless compression was being considered, so MP3 would fail.

    4. Re:Useless without measure of lossiness/distortion by Jack9 · · Score: 1

      > so MP3 would fail.

      That's correct. So what?

      MP3 was never a good compression algorithm. It's an audio format that uses a normalization that cause SOME audio to be lossy. It's a great demonstration on how a negligible loss across a wide range of audio could result in a more useful algorithm for sound (it's quite compact). MP3 is not a good compression algorithm and doesn't see a lot of use outside of commodity audio, where you can afford to throw away data.

      --

      Often wrong but never in doubt.
      I am Jack9.
      Everyone knows me.
    5. Re:Useless without measure of lossiness/distortion by vakuona · · Score: 1

      MP3 was never a compression algorithm.

      FTFY

    6. Re:Useless without measure of lossiness/distortion by Anonymous Coward · · Score: 0

      > MP3 was never a compression algorithm.

      I'm not sure that's true. If a standard mandates a compression, but there are many ways to do that compression, what's the distinction?

    7. Re:Useless without measure of lossiness/distortion by Paradise+Pete · · Score: 1

      "Algorithm" is the distinction. Otherwise you're basically saying "What's my algorithm for doing X? I just demand X be done." Perhaps you could call it The King's Algorithm.

    8. Re:Useless without measure of lossiness/distortion by viperidaenz · · Score: 2

      That's correct. So what?

      So, comment I was replying to

      Using the "Weissman Score", MP3 is always better than FLAC

      MP3 wouldn't even have a "Weissman Score" because it's not a lossless compression algorithm.

    9. Re:Useless without measure of lossiness/distortion by Anonymous Coward · · Score: 0

      > and that's completely untrue for anyone who cares about audio.

      If I remember correctly, MP3 cuts out frequencies higher than 20khz because humans can't hear them anyway. Anyone who claims otherwise is probably the same kind of nutter who pays $1000 for gold speaker cables because "it sounds better".

      There is "good enough" audio (i.e. MP3) for 99.999% of the people, and then there's 192Khz 48 bit sampled FLAC for the other 0.001% who think they know better.

  6. Re:It really works? by TheSunborn · · Score: 1

    He said it did work, it's just not as effective as other existing compression solutions.

  7. Inadequate by Are+You+Kidding · · Score: 2

    Not only does it fail to account for loss or distortion, but also fails to consider the time to decompress. If a compression algorithm with a high Weissman score is applied to a video, it is useless if it cannot be decompressed fast enough to show the video at an appropriate frame rate.

    1. Re:Inadequate by Anonymous Coward · · Score: 0

      Adding that would make the formula too complex.

  8. Re:Dupe by Travis+Mansbridge · · Score: 1

    Aside from centering around Silicon Valley, I don't see how these stories are related. That one is about a fictional compression algorithm, while this one is about a method for rating compression algorithms which is becoming nonfiction.

  9. Compression and decompression ratios would help by JoeyRox · · Score: 2

    Two scores would be useful, one for compression_time:size and decompression_time:size, since for many applications the latter is more important in compress-once consume-many applications.

  10. Re:It really works? by phoenix_rizzen · · Score: 5, Informative

    They're talking about the Score, not the compression algorithm. And your link doesn't mention anything about the Score.

  11. Re:It really works? by Travis+Mansbridge · · Score: 1

    The fictional compression algorithm doesn't work. The metric for rating compression algorithms does work (inasmuch as more compressed/faster algorithms achieve a better rating).

  12. Sounds like the Drake equation all over again. by mmell · · Score: 2

    IIRC, the Drake equation was also a 'spitball' solution whipped off the cuff to address an inconvenient interviewer question. Subsequent tweaks have made it as accurate and reliable as when it was first spat out upon the world - and about as useless.

    1. Re:Sounds like the Drake equation all over again. by Anonymous Coward · · Score: 0

      I'm not so sure it actually is entirely useless, how more of it you can fill out, how more you can deduce about the remaining factors based on our current score of only 1 on found intelligent technologicall capable species. (As in ourselves)

      Seeing in which part of the equation the lack of further finds comes from, would gain us a small bit of extra knowledge. It's perhaps not some amazing discovery, but it's still narrows things down a little.

      Of course if some one can think up of something that lets us glean even more knowledge from our sparse data, I'm all ears.

    2. Re:Sounds like the Drake equation all over again. by Rockoon · · Score: 1

      IIRC, the Drake equation was also a 'spitball' solution whipped off the cuff to address an inconvenient interviewer question. Subsequent tweaks have made it as accurate and reliable as when it was first spat out upon the world - and about as useless.

      At least the Drake equation attempts to count something. I think people are missing this important fact about this bullshit compression rating: It isnt counting anything.

      --
      "His name was James Damore."
  13. Re:It really works? by TeklaPerry · · Score: 1

    exactly. The compression algorithm is fictional; the score, while created for the show, can actually be calculated. Whether it will catch on as a metric remains to be seen.

  14. circle jerk by Anonymous Coward · · Score: 2, Funny

    Show About Self-Absorbed Assholes Who Think Their Stupid Ideas Are The Bees Knees Gains Popularity By Making Their Stupid Idea Sound Like Its The Bees Knees

    1. Re:circle jerk by Anonymous Coward · · Score: 1

      Or simply SASAAWTTSIATBKGPBMTSISLITBK for short. What, are you some kind of pompous jerk who tries to sound smart saying it in full when all of us know it by the acronym?

  15. score works not algorithm - Re:It really works? by Anonymous Coward · · Score: 0

    "While it was created for a TV show, it does really work, and it's quickly migrating into academia."

    Somebody should explain that to Professor Tsachy Weissman and Ph.D student Vinith Misra, who specifically stated it doesn't really work, and then school them on it then.

    The compression algorithm is fictional and does not work. That is what your linked article discusses.

    This is about the Weissman Score.

  16. Trivial observation by osu-neko · · Score: 1

    No metric is adequate for all purposes. This one is adequate for the task it was designed for, and is adequate for some other purposes as well. That's the best that can be expected of any tool. Always use the appropriate tools for the task at hand, of course.

    --
    "Convictions are more dangerous enemies of truth than lies."
    1. Re:Trivial observation by retchdog · · Score: 1

      It was designed as a background prop for a TV show. Not a very high bar.

      It might be adequate as an artificial evaluation metric for homework in an "Intro to Data Compression" class. It might be, because it hasn't even been used for that yet.

      I wouldn't exactly call this a tool. For example, it would be really easy to game this 'score' if there were any significant incentive for doing so. That's usually a bad thing.

      --
      "They were pure niggers." – Noam Chomsky
    2. Re:Trivial observation by fnj · · Score: 3, Insightful

      The reason the Score is utter bullshit is that the scale is completely arbitrary and useless. It says that 2:1 compression that takes 1 second should have the same score as 4:1 compression that takes log(2) seconds, or 1 million to 1 compression that takes log(1 million) seconds.

      WHY? State why log time is a better measure than straight time, or time squared, or square root of time. And look at the units of the ratio: reciprocal log seconds. What the hell is the significance of that? It also conveniently sidesteps the variability with different architectures. Maybe SSE helps algorithm A much more than it does algorithm B. Or B outperforms A on AMD, but not on Intel. Or maybe it is strongly dependent on size of source (there is an implicit assumption that all algorithms scale linearly with size of source; maybe in actual fact some are not linear and others are).

      In real life, for some compression jobs you don't CARE how long it takes, and for other jobs you care very much. Or imagine an algorithm that compresses half as fast but decompresses 1000 times faster. That doesn't even register in the score.

      It's bullshit.

    3. Re:Trivial observation by Obfuscant · · Score: 1

      And look at the units of the ratio: reciprocal log seconds.

      The Weissman score is actually unitless. When one divides "log seconds" by "log seconds" the units cancel.

      It also conveniently sidesteps the variability with different architectures.

      If one measures the compression ratios and times for the same data on different architectures, one is measuring the score of the different architecture, not "sidestepping" it.

      Maybe SSE helps algorithm A much more than it does algorithm B.

      Then algorithm A compared to B would have a higher Weissman score on a system with SSE.

      Or B outperforms A on AMD, but not on Intel.

      Then the score would favor B over A when comparing the two processors. That's what the score is supposed to do. It compares two things.

      In real life, for some compression jobs you don't CARE how long it takes, and for other jobs you care very much.

      Then for the former you would not care what the Weissman score is, and for the latter you would care.

      Or imagine an algorithm that compresses half as fast but decompresses 1000 times faster. That doesn't even register in the score.

      That's not what the score measures. It also doesn't measure price (for commercial implementations of code), executable size, or whether the software salesman has BO or not.

    4. Re:Trivial observation by Lehk228 · · Score: 1

      decompression speed is unimportant for general purpose compression, it is either adequate or not adequate, if deompression speed is not adequate it does not matter how well it scores on other metrics it is unusable for your use case, if decompression speed id adequate, it really does not matter if it's just barely adequate or insanely fast.

      --
      Snowden and Manning are heroes.
    5. Re:Trivial observation by fnj · · Score: 1

      The Weissman score is actually unitless. When one divides "log seconds" by "log seconds" the units cancel.

      That is because it is presented as the ratio of the figure of merit of the candidate algorithm to the figure of merit of some bullshit "universal compresser", times a completely useless "scaling constant". To strip away the obscuration, all you have to do is see that for a completely transparent effectless compresser, r is unity and log t is log 0, or unity. 1/1, and it drops out.

      The underlying figure of merit once you cut through the bullshit is r / log t. r is the compression ratio (unitless) and log t is log seconds. So yes, the units of the underlying figure of merit are reciprocal log seconds.

      You need learn to cut through the hocus pocus and analyze the actual underlying equation before the Oz Sauce is ladeled on. You can well imagine that those who actually understand programming metrics are holding their sides laughing at those who are taking it seriously.

    6. Re:Trivial observation by TubeSteak · · Score: 1

      Maybe SSE helps algorithm A much more than it does algorithm B. Or B outperforms A on AMD, but not on Intel. Or maybe it is strongly dependent on size of source (there is an implicit assumption that all algorithms scale linearly with size of source; maybe in actual fact some are not linear and others are).

      In real life, for some compression jobs you don't CARE how long it takes, and for other jobs you care very much. Or imagine an algorithm that compresses half as fast but decompresses 1000 times faster. That doesn't even register in the score.

      The things you mention have always been left as an exercise for the reader.
      What benchmark isn't tagged with qualifiers that explain what it does and doesn't mean?

      Marketing literature in computing has always been littered with metrics that are completely useless unless you know how to interpret them in the context of what you want to be doing.

      --
      [Fuck Beta]
      o0t!
    7. Re:Trivial observation by swillden · · Score: 1

      some bullshit "universal compresser"

      Not a universal compressor, a standard compressor, such as gzip. The metric is ultimately just a comparison between the compressor being evaluated and the compressor chosen as the standard, and it is unitless.

      That said, I agree with you that the scaling constant has no reason to be present. As for using the logs of times... I don't know. It's essentially a base change, expressing the time of the compressor being evaluated in the base of the standard compressor, which is then multiplied by the ratio of the compression ratios. Handling the time relationship as a base change may have some useful properties, but I can't see what they would be.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    8. Re:Trivial observation by Anonymous Coward · · Score: 0

      No. The fictional compression *algorithm* was a background prop for a TV show. The *metric* which shares the same name is something entirely different from said fictional algorithm, and (as osu-neko said) is useful for the task for which it was designed.

      Reading comprehension is fun. You should try it some time.

    9. Re:Trivial observation by retchdog · · Score: 1

      yes, the metric is obviously a different thing, but they were both designed for the show.

      a few mediocre professors are thinking of using the metric in their courses, but are pretty open about it mostly being a gimmick.

      in conclusion, shove it up your bloated ass.

      --
      "They were pure niggers." – Noam Chomsky
    10. Re:Trivial observation by Obfuscant · · Score: 1

      The underlying figure of merit once you cut through the bullshit is r / log t. r is the compression ratio (unitless) and log t is log seconds. So yes, the units of the underlying figure of merit are reciprocal log seconds.

      The fact that the actual equation is a ratio between a proposed compression implementation and a reference is a hint that it is not a "figure of merit" in absolute terms, but only with respect to some common standard. Yeah, you get to pick your standard, but simply reporting r/log(t) is meaningless. The actual measurement is unitless simply because, as you point out, units of 1/log(s) is meaningless.

      It's done that way so things can be repeatable. If I create a compressor and report a Weissman of 3, then you should be able to repeat that on your computer, even if you've got a 3GHz I7 and mine is only 2.7GHz. Here's the data I used, here's the executables for both compressors, you run it. Now you can play with the source and see what happens. But first you need to be able to reproduce my results. That's called "scientific method".

      You need learn to cut through the hocus pocus and analyze the actual underlying equation

      The underlying equation is a ratio between two compression implementations, not an absolute measure of one.

      You can well imagine that those who actually understand programming metrics are holding their sides laughing at those who are taking it seriously.

      I'm not here to argue whether the metric is meaningful or not. The value of a metric is in the eye of the beholder. You don't think it has any value, and I really don't care if it does or not. I'm just pointing out that the actual metric is unitless. You can't throw half an equation out and then complain that the units on the result don't mean anything. Sometimes equations are constructed the way they are so that the units DO come out right, and there are many examples of equations that have empirical constants that have meaningless units just so the equation they are used in come out right. That's especially true in physical modeling where someone sees a relationship between the data and tries to create an equation to represent that. If the fit is best with some variable taken to the 8/3 power, that's how it winds up, and the constants get units to make it all come out right. A more common example is R in PV=nRT.

      Now, you're correct, the alpha constant is useless because the only purpose it could serve is to correct the units, and since it is unitless it doesn't even do that. So laugh at that part of it, but don't throw out the important parts and laugh at what's left when the units don't work anymore.

    11. Re:Trivial observation by anarchyboy · · Score: 1
      You should never take the logarithm of a dimensionful quantity like seconds. Clearly some choice of units is implied and really we should have log(t/1s) or log(t/1ms) or something which would then make the score unitless.

      You need learn to cut through the hocus pocus and analyze the actual underlying equation before the Oz Sauce is ladeled on. You can well imagine that those who actually understand programming metrics are holding their sides laughing at those who are taking it seriously.

      and you need to go take some remedial math lessons if you think log(0) = 1.

  17. Re:It really works? by Anonymous Coward · · Score: 0

    What doesn't "really work" is the fictitious compression algorithm
    developed on the show.
    The "Weissman Score" metric, however, does work in assigning
    a compression algorithm a somewhat valid score.

  18. Re:It really works? by fnj · · Score: 1

    “We had to come up with an approach that isn’t possible today, but it isn’t immediately obvious that it isn’t possible,” says Misra.

    Please explain why you think that means he said "it does work".

  19. Not quite as useful as the Slashdot score by Anonymous Coward · · Score: 0

    Where's our TV show?

    1. Re:Not quite as useful as the Slashdot score by Anonymous Coward · · Score: 0

      And what's next? A TV show to send common people to Mars?

  20. Re:It really works? by Anonymous Coward · · Score: 0

    The compression algorithm doesn't work, the compression and speed metric does. It does give arbitrary amounts of importance to compression and to speed, but Americans are used to arbitrary metrics.

  21. Re:It really works? by Zero__Kelvin · · Score: 1

    Holy shit! Math works! Somehow, I don't think you can have a discussion about if a formula really returns a result or not. I now see that the idiot who wrote the summary was trying to say that the algorithm doesn't work, but math does. Alas that idiot has no ability to write. ... oh wait, it was you! Never mind.

    --
    Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
  22. Re:It really works? by Zero__Kelvin · · Score: 1

    Yes. That's the point, isn't it. They didn't invent math for the show. Claiming that a score "works" has no meaning, other than to say that math "works". Therefore, the only interpretation of the hideously poor writing is that the submitter is claiming the algorithm works.

    --
    Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
  23. Re:It really works? by Anonymous Coward · · Score: 0

    Yes. Math "works". News at 11!

  24. Why not both? by Anonymous Coward · · Score: 0

    Why am I reminded of this Mexican ad when I read this?

    https://www.youtube.com/watch?v=vqgSO8_cRio

  25. F1 score, precision and recall by tommeke100 · · Score: 1

    Sounds a bit like the f1 measure used in classification systems, where the F-score is the harmonic mean of precision and recall. (where trying to higher precision yields lower recall and vice-versa)
    however, I'm wondering how stable this Weissman score is. Compression algorithms might not all perform O(n) where n is size of data to compress.
    Or it may actually give a very high score to something that doesn't compress at all.
    public byte[] compress( byte[] input) { return input;}
    I bet this gets a high Weissman score ;-)

  26. Re:It really works? by vux984 · · Score: 2

    Claiming that a score "works" has no meaning,

    I could easily devise a cpu scoring methodology that scores CPU based on chip area / cost * clock speed / register width.

    Such a score "works" in the sense that the function can be evaluated, but it wouldn't tell you anything about whether to buy an i7 vs a xeon vs a pentium 2.

    The suggestion in the article is that the particular scoring methodology that was created for the show is useful for comparing compression algorithms, to the point that it may well be adopted by industry.

    Therefore, the only interpretation of the hideously poor writing is that the submitter is claiming the algorithm works.

    The writing was perfectly fine, your reading comprehension is what failed here.

  27. Re: It really works? by Anonymous Coward · · Score: 1

    Yes. He failed to comprehend that the submitter was pointing out that math really works, and a ratio of compression over time really does express a ratio.

  28. Idiots born every day. by Chas · · Score: 1

    Oh boy. A useless metric!

    Compression ratio: Sure. But the problem is, it's possible to increase compression ratio by "losing" data. So you can obtain a high ratio, but the images as rendered will be blurry/damaged.

    Compression Speed: This is just as dumb since compression speed is partially a function of the compression ratio, partially a function of the efficiency of the algorithm and partially a function of the amount of "grunt power" hardware you throw at it. So one portion of this is a nebulous "hardware norm" factor that can be gamed. The other is a function of the other factor (compression ratio) which can ALSO be gamed (and creates a bias towards lossy compression).

    Basically something with a high Weismann number would be extremely lossy compression on high power hardware. Which basically negates the point of high resolution viewing, as any idiot can reduce a 1920x1080 frame to 19px by 11px, and then compress it. I can already take precompressed (and lossy) JPEG files, resample down to 19x11, then back up to 1920x1080. I can wind up reducing a 930K file down to 40K (basically a 95+% savings). And the image is completely indecipherable.

    Take a look at an original image versus the same image on the above-described UCCT (UltraCrappyCompressionTechique).

    http://cox-supergroups.com/The...

    The above image is a PNG to prevent further compression artifacts from creeping into the sample.

    The top portion of the image is the original 930K JPEG file.
    The bottom portion is the resampled 40K JPEG file.

    --


    Chas - The one, the only.
    THANK GOD!!!
    1. Re:Idiots born every day. by Chas · · Score: 1

      Actually replaced with a better example.

      Took an 8.1MB TGA file and did three things.

      1: Saved the first off as a PNG file. Resulted in a 1.7MB file with lossless compression.
      2: Saved the file off as a high-compression JPEG. Resulted in a 46K file that's noticeably blurry and indistinct.
      3: Downsampled to 19x11 and back up to 1920x1080 and saved as a high compression JPEG (36K file) or a lossless compression PNG (114K file). Labelled this method UCCT (Ultra Crappy Compression Technique).

      Amalgamated the three images into a single PNG file to eliminate/reduce further compression issues.

      --


      Chas - The one, the only.
      THANK GOD!!!
    2. Re:Idiots born every day. by Renozuken · · Score: 1

      This would be correct if the score wasn't being used for lossless compression where the only two variables that really matter are time/size.

  29. Slashvertisement for HBO? by Gothmolly · · Score: 2

    Given that only a subset of Slashdot users are HBO subscribers, how is this relevant?

    --
    I want to delete my account but Slashdot doesn't allow it.
    1. Re:Slashvertisement for HBO? by wilson_c · · Score: 1

      Because a much larger, non-overlapping subset also steal HBO services.

    2. Re:Slashvertisement for HBO? by Anonymous Coward · · Score: 0

      And some of us just downloaded it to see if it was good.

  30. bandwidth is not constant by Anonymous Coward · · Score: 0

    The reason there's no single metric available is because bandwidth isn't constant.
    I'll and solve for a "best algorithm" given some different bandwidths, ignoring decompression time.

    F1(X): 14 + X*(1- 0.00001%)
    F2(X): 20 + X*(1-15%)
    F3(X): 29*60*60 + X*(1-15.1%)

    solving pairwise:
    F1(40 seconds) = F2(40 seconds)
    F1(8 days) = F3(8 days)
    F2(3.31 years) = F3(3.31 years)

    If the file can be transferred in 7 seconds, algorithm 1 is the clear winner (23.6% faster than algorithm 2, and nearly 5000x faster than algorithm 3).
    If the file can be transferred in 7 days, algorithm 2 is the clear winner (17.6% faster than algorithm 1, and 20.2% faster than algorithm 3).
    If the file can be transferred in 7 years, algorithm 3 is a marginal winner (0.062% faster than algorithm 2, and it's 17.8% faster than algorithm 1); also note that 0.062% is in the 30-40 hours range (you can get different answers depending on the number of seconds you use to compute 7 years).

  31. Re:It really works? by martin-boundary · · Score: 1

    Because. Everything is immediately obvious to slashdotters. QED.

  32. Re: It really works? by vux984 · · Score: 1

    No he failed to comprehend that people have found that particular method of calculating ratio of compression over time is proving to be *useful*.

  33. Is the show any good? by ed1park · · Score: 1

    I couldn't watch the first episode. Quit maybe 10 minutes into it. Does anyone here actually enjoy the show and think it's any good?

    1. Re:Is the show any good? by lippydude · · Score: 1

      "I couldn't watch the first episode. Quit maybe 10 minutes into it. Does anyone here actually enjoy the show and think it's any good?"

      I stayed with it and watched a number of episodes, I thought it caught the techie zeitgeist brilliantly. There's even a semi-aspie tech tycoon in there, just like you-know-who.

    2. Re:Is the show any good? by Anonymous Coward · · Score: 0

      I watched the first four shows.

      There was enough stuff to make two good shows.

      While the worse I have seen come out of hollywood, I don't see why all the praise for it in the links I followed.

      The people were closer to real-life than past TV shows, but still were not close to the computer geeks/nerds I grew up with 1980s - present. Maybe it is because I live in Canada, most computer people I have met here would not agree to go along with the nonsense I see being pushed in this show.

      Example: Not one of the people I knew who became millionaires threw rock parties in expensive rental locations, instead they bought a great water-front property up North (usually in the Muskokas) and then will throw a party (rarely with rock stars) of their own instead at the cottage.

    3. Re:Is the show any good? by Anonymous Coward · · Score: 0

      That should read "While not the worse I have seen".

  34. Re:It really works? by Samizdata · · Score: 1

    C'mon now, equal rights for AMD here.

    --
    It's not the years, honey, it's the mileage. - Colonel Henry Walton Jones, Jr., Ph.D.