Slashdot Mirror


How I Completed The $5000 Compression Challenge

Patrick Craig points to this page which details (in email) the strange life of an online challenge as posed, clarified and (at least to some eyes) met successfully. This should give pause to anyone willing to pose -- or accept -- seemingly impossible problems. In short, Patrick's not $5,000 richer, but you may think he deserves to be.

11 of 414 comments (clear)

  1. Re:That wasn't in the challenge by Surak · · Score: 4

    Again, that depends on what you mean by "paper airplane." Admittedly, I won a similar contest involving rubber-band driven cars. The contest rules said that CDs or DVDs could not be used for tires due to the fact that they make for very narrow tires, thus having reduced friction. So I used hard drive platters. :-)

  2. Re:Mike defined total file size himself by ChadN · · Score: 3

    However, the decompressor DOES depend on the filename numbering in order to work; it is part of the algorithm (and hence the "decompressor"), and those bytes (the bytes used to number the files) should be charged against the decompressor size. Then Patrick loses.

    While not explicitly stated, those bytes ARE an integral part of the "decompressor" that was supplied (it can't work without them), and so including their size is consistent with the agreed upon rules, IMO.

    --
    "It's overkill, of course. But you can never have too much overkill." - Anonymous Slashdot Coward
  3. Intriquing by PhiRatE · · Score: 3

    Its an intriquing problem. I certainly wouldn't stake $5000 on nobody being able to compress the data I supplied, good random number generator or not. Why? because while the concept that random data in general cannot be compressed is correct, there are no patterns, it is not necessarily true that a given set of random data cannot be compressed.

    For example, a file full of 0's, is perfectly possible from a true random number generator, and just as likely as any other set of numbers. As any good geek knows, a file full of 0's compresses very very well.

    In this case, I think the challenge has one major problem, and that is that there is no real time limit on the entry, and no requirement that the decompressor work effectively in general, it can be entirely specialised.

    This means that the problem *may* be solvable, simply by throwing ungodly amounts of CPU power at it. With the appropriate code, it would be possible to simply hunt for unintentional patterns in the data and hope. If you're lucky, there will be enough indexable patterns present that it will outweigh the cost of the indexing, and the code necessary to replace the index points.

    If you're unlucky, the data will have no patterns in the space you searched, and therefore you're out of luck.

    This is why the guy noted that if he got a big enough file, he could win the competition. As the filesize increases, it becomes increasingly likely that there are, accidentally, enough patterns in the data to cover the space required for the decompressor and the dictionary. Of course, as the filesize increases, you also have to burn considerably more CPU in order to locate the patterns in the first place.

    Personally I think it would make an excellent project for a research team, not particularly for the money, but in order to create some really fast, specialised code for locating patterns in assumedly random data.

    I am not aware of much effort in that area, most compression research goes into identifying patterns inherent in the data type, waveforms in audio, visual patterns in images, duplicate strings in text and binary files. It would be very interesting to see just how effective fast analysis code and heavy CPU is against the problem.

    I suspect there may well be some crossover here with Cryptoanalysis, since one of the primary jobs in breaking crypto is finding the patterns that differentiate the ciphered data from pure randomness.

    Any links appreciated.

    --
    You can't win a fight.
  4. Are you for lawyers or against? by mjh · · Score: 5
    I think that a person's stance on this issue, of who was right and who was wrong, will indicate that same person's approval of lawyers and much of the practice of the law today.

    If you are for the challengee, then you believe that exact specificity is more important than intent. And you inadvertantly empower the legal profession to profit from being better able to specify rules, etc. Which, in turn, gives those with money even more power. They can write the rules by which your interaction with them will be governed. You can be certain that the rules will be written to the advantage of the guy with the money.

    If, on the other hand, you are for the challenger, then you believe that his intent was more important. (Considering that the challenge came from the compression FAQ, the exact definition of "compression" should have not been vague. It seems clear to me that his intent was obvious, even though his challenge wasn't very specific.) Still there is a problem with valuing intent over specificity. That is it's easy to misrepresent intent after the fact. Thus the need to clearly specify our intentions prior to anything we do. That being the case, with accurate documentation, it seems possible to me that we can discern the original intent w/out having to specify everything legalistically. For example, I think that I was able to clearly determine the challenger's intent well after the fact, and I did this having read the challengee's representation of the events.

    Now you may say, in rebuttal, "Who cares who I choose? I don't get to make the law." But think of yourself as the jury trying to determine to whom you would award the money. You may one day find yourself on a jury and trying to determine who should be awarded money. I see this as a microcosm of exactly how our court system has gotten into the state that it's in. Specificity is clearly more important than intent. And it's that way because juries award that way. Since juries are comprised of members of society, it's not a far stretch to say that society as a whole regards specificity as more important than intent.

    Is there any chance that we can change this? And if we did, would it be better?

    --
    Key to financial independence: Spend less than you earn. Save and invest the difference. Do it for a long time.
  5. PAY UP MIKE by selectspec · · Score: 3

    .

    Patrick I meant can I send you a compressor and several compressed files whose total file size is less than the original uncompressed file and from which I can regenerate the original uncompressed file

    Mike Sure.

    Pay up Mike. Pay up. You were too careless to plan out the rules and think about your intent before you threw 5k into the sink. Cough up the cash or loose all honor.

    --

    Someone you trust is one of us.

  6. Re:Bar room bets? by Apotsy · · Score: 3
    Exactly. "Pompus ass" were the first words to come to my mind as well.

    When Goldman gleefully posted to the newsgroup that someone had accepted his challenge, he said "some people just don't understand information theory too well". Yeah, and some people just don't understand issuing challenges very well, either!

    When you make a challenge like that you absolutely must be sure to word the rules carefully and try to avoid leaving any loopholes. In this case, all Goldman really needed to do was tack a sentence at the end of his original challenge saying "the data must actually be compressed according to the definition of 'compression' contained in this FAQ. Challengers who merely point out loopholes or exploits in the rules of the challenge will not be rewarded".

    The moment Mr. Craig started iquiring about about the specifics of the rules and saying "wait and see" regarding his methods, that should have been a warning sign to Goldman that Craig had some trick up his sleeve. But no, Goldman was just so in love with the idea that he found some "impossible" task which suckers might try to undertake that he couldn't be bothered to think about such things.

    What's even more embarassing to Goldman is his initial reaction to Craig's submission:

    I have accessed your server and downloaded exactly two files, the "decompressor" (dfi) and the "compressed file" (comp). The rest of the files in the directory (comp.0 through comp.217 inclusive) I have not touched.
    Even though Craig had already asked if more than one file was okay and Goldman had said yes!

    And Goldman's following point:

    In further point of fact, the 218 parts together must occupy more space than the original file (original.dat) that you were given. Each file requires space for a directory entry, and each such directory entry requires in excess of one byte of space. Thus you have in actuality expanded the data which you were given to compress.
    is even more ridiculous given that Goldman had already said that "sure" in response to Craig's question about multiple files whose "total file size" (which does not include directory entries) are less than the size of the original data file. Goldman tried to change the rules rather than admit he was sloppy in setting them up.

    Goldman also says "I think that I did not make a mistake", even though he admits in the same message that he needs to restate the rules:

    Perhaps it will be necessary to restate the terms in a legalistic form which is not open to this sort of miscommunication...
    "Perhaps"? How about "definitely"! And any "miscommunication" was entirely one-way. Craig was very sly and clever, and Goldman was too full of himself to realize what the fuck was going on until after the fact. A "pompus ass" indeed.
  7. The page has been removed by GeoCities by locutus074 · · Score: 5
    There is now a mirror available.

    --

    --

    --
    We have fought the AC's, and they have won.

  8. Compression by clare-ents · · Score: 4

    Ok, I've done some information theory, I know it's impossible to create a perfect compressor. Boxes arguments and all that.

    However, to make money, you don't have to.

    All that is required is a compresser that saves 1 bit on better than 1 in 50 attempts.

    For example, if we can save 1 bit on 50% of the attempts - even if the other 50% of the attempts results it catasrohpically large files. On average we've lost the compression war.

    However, on 100 attempts, we've paid $100 x 100 or $10000 but we've won on half of these - netting us $250000 or a cool $240000 profit.

    Anyone want to demonstrate this?

    --
    Only two things are infinite, the universe and human stupidity, and I'm not sure about the former. (Einstein)
    1. Re:Compression by edp · · Score: 3

      "Since this is not the case, ..."

      Sorry, not so. The proportion of compression-resistant files necessarily decreases as file length increases. To see this, consider that if a file had a nice block of 1024 bytes of zeroes somewhere in it, we could easily write a simple program that wrote the bytes before that block, then 1024 bytes of zeroes, then the bytes after that block. This program would easily fit in less than 1024 bytes, plus the bytes before and after the block of zeroes. So such a file is compressible. I will show that the proportion of files that are not compressible like this decreases as file length increases.

      To make this easy, I'll count for compressible files just the ones where a block of zeroes begins at a multiple of 1024 bytes from the beginning of a file. The chance that a single 1024-byte block (selected randomly from a uniform distribution) is not all zeroes is (2^8192-1)/2^8192, a very tiny bit less than one. The chance that all n blocks of n 1024-byte blocks are not zeroes is (2^8192-1)/2^8192)^n. Obviously this approaches zero as n increases. Thus the proportion of files that are not compressible decreases as file length increases. Pick a large enough length, and you are virtually guaranteed there is a 1024-byte block of zeroes in it somewhere.

      When we open compression up to a wider range of techniques, the same effect occurs; the proportion of files that are not compressible under any given, reasonable technique decreases as file length increases. This makes Goldman's challenge a bad offer for him -- the fact that the average file size change must be positive under any compression technique in the limit does not mean the probability of a positive change in a randomly selected file is large enough to give him a positive return on the challenge.

  9. Re:IF I'm not mistaken. by dachshund · · Score: 3
    > I meant can I send you a compressor and several compressed files whose
    > total file size is less than the original uncompressed file and from
    > which I can regenerate the original uncompressed file
    >
    > Patrick

    Sure -- but you send me a *decompressor*, I don't need the compressor.

    Sure means Sure. As far as I can tell this is acceptance of Patrick's request. As Mike is the guy inventing the rules, I don't see why this wouldn't be binding.

  10. That wasn't in the challenge by drew_kime · · Score: 4

    If the guy had read the FAQ, he would know that such tricks are not compression

    Did the rules as stated say that the solution had to comply with the principle laid out in the FAQ? Not that I can see. If you're going to pup up a $5,000 challenge, you damn well better post all the rules.

    For instance: I had a friend who entered a paper airplane contest. He signed up for the "maximum distance" part. The only rules were that the contest organizers supplied the single piece of paper, you had five minutes to fold or tear it however you want to, and then everyone threw them at the same time. The one that stops moving farthest from the launch line wins.

    He waited until 4:45 into the time, then crumpled the paper up into a ball and threw it. While everyone else's planes were circling lazily around, his went straight, landed, and rolled several more feet.

    While you can debate all you want about whether his wad of paper counted as a "paper airplane" -- and we did debate (hey, it was college) -- he did saitsfy the rules of the contest. Had there been enough money on the line for someone to threaten legal action, I suspect he would have won.

    --
    Nope, no sig