Slashdot Mirror


How I Completed The $5000 Compression Challenge

Patrick Craig points to this page which details (in email) the strange life of an online challenge as posed, clarified and (at least to some eyes) met successfully. This should give pause to anyone willing to pose -- or accept -- seemingly impossible problems. In short, Patrick's not $5,000 richer, but you may think he deserves to be.

414 comments

  1. From the FAQ. by Anonymous Coward · · Score: 1
    When the FAQ explains why it doesn't work, you'd assume he wouldn't try it. Goldman is correct.
    In short, the counting argument says that if a lossless compression program compresses some files, it must expand others, *regardless* of the compression method, because otherwise there are simply not enough bits to enumerate all possible output files. Despite the extreme simplicity of this theorem and its proof, some people still fail to grasp it and waste a lot of time trying to find a counter-example. This assumes of course that the information available to the decompressor is only the bit sequence of the compressed data. If external information such as a file name, a number of iterations, or a bit length is necessary to decompress the data, the bits necessary to provide the extra information must be included in the bit count of the compressed data. Otherwise, it would be sufficient to consider any input data as a number, use this as the file name, iteration count or bit length, and pretend that the compressed size is zero. For an example of storing information in the file name, see the program lmfjyh in the 1993 International Obfuscated C Code Contest, available on all comp.sources.misc archives (Volume 39, Issue 104).
  2. Re:sorry! by Stephan+Schulz · · Score: 1
    Right. I think a decent solution would be for Mike to match Patrick's $100 and donate the total to the EFF or some other related charity.

    And Mike should probably both reword the challenge and, to avoid the appearance of conflicting interests, offer to donate future processing fees (after the callenger admits defeat) in any case.

    --

    Stephan

  3. Re:Intriquing by sjames · · Score: 2

    which isn't true because if this were true every number would be a random number (because every number could potentially be generated by this number-generator).

    But every number CAN potentially be produced by a random number generator. The hypothetical file full of zeros doesn't violate this because you only know you got all 0s after the fact. Watching the bitstream, each bit that came out had a 50% probability of being a 1, it just happens that none of them were over that particular sample period.

    Thus, given 2000 (or any arbitarry number) if 1 Mbit samples (also arbitrary) from a true random number generator,upon examination, one of them MAY contain all 0s. You still have no way to predict the contents of the others (the all 0s file has exactly the same probability as any other arrangement of bits), and you have no basis to predict the content of the remaining files.

    If you flip a coin 100 times yielding all heads (assuming a non-biased coin), the odds of heads on the next flip is 1/2.

  4. Re:sorry! by sjames · · Score: 2

    It sounds to me like the person offering the $5000 was scamming everyone because it's theoretically impossible to win.

    The only place I've seen that challenge is in the compression FAQ. It's in the same section of the FAQ that explains the impossability. It's hard to call something a scam when the 'brochure' explains that you will certainly loose your money.

    It's even more understandable when you consider that the USPTO has issued several IMPOSSIBLE patents for compressors that work on random data (recursivly no less)!

  5. Re:Intriquing by sjames · · Score: 2

    I don't know about you, but if I came across a coin that flipped heads 100 times in a row, I'd strongly suspect a biased coin.

    Given no other information, I would suspect the same. That's why I had to specify. Replace coin with random bit generator if you prefer.

  6. What is "total file size"? by CaseyB · · Score: 2
    The problem is that the definition of "total file size" is subject to interpretation. If the guy holding the contest had said "total file size as reported by ls -l", he'd be screwed. But he could argue that "total file size" means the number of bytes of disk used to store the file.

    Although, by both providing a certain file AND specifying the file size (3145728 bytes), one could argue that he *implicitly* defined file size as the number of bytes within the file. Hence, he loses.

    1. Re:What is "total file size"? by malfunct · · Score: 1
      If the person holding the challenged had defined it as the "total amount the file took on my filesystem including filesystem overhead and such" all the person taking the challenge would have to do to win is put the file on a more efficient file system. The way to stop all the trickery in winning seems to be to specify the file name in advance and specifying only one final file and specifying a sandbox to work in that had a defined (yet Turing complete) set of operators to work in the data.

      Even given these restrictions the contest has the possibility to be won because the chances of the person holding the challenge being able to create a file of a size requested by the challenger that cannot be compressed by legitimate ways is very low. This is due to the fact that given any piece of data generated at random there is a decent chance of finding an algorithm for representing that data specific piece of data in a smaller form.

      Of course your algorithm would only work on creating that one piece of data.

      An impossible task would be for the holder of the challenge to have a file of fixed size that the challenger must compress (given the tigher restrictions mentioned in this post) because then the person holding the challenge has much more control.

      --

      "You can now flame me, I am full of love,"

  7. Re:"to be disbursed at my sole discretion" by CaseyB · · Score: 2
    He didn't put a phrase like "to be disbursed at my sole discretion" at the start of the description of the prize award.

    However, he was also asking for $100 up front, to encourage only serious submissions. I don't think he'd get many submissions if he tried both tactics. :)

  8. No, and it's provably false by hawk · · Score: 2
    There are 2^n ways to arrange bits. The number of ways to arrange all possible sequences of bits ranging from since 1 to (n-1) is 2^(n-1).


    There is *at least* one sequence that cannot, by an arbitrary method, be reduced to a shorter length. On top of that, you need to use some bits for the compressor.


    OTOH, if you can include "external" information in your compressor (either the algorithm or a function), you can have an algorithm that will compress any stream, and compress your favorite stream to a single bit. As an example, the first bit is 1 (and the only bit) if it is your stream, and 0 followed by the entire stream for any other stream.


    hawk

  9. Re:Compression by AxelBoldt · · Score: 2
    As the length of the file increases, the number of bits needed to *specify the position* of this run of zeroes also increases. And you're going to have to record that position somewhere in your decompressor or compressed file.

    Nope, you don't have to record it, simply attach the part of the original file following the block of zeros to your decompressor program. The "compressed file" is the part of the original file preceding the block of zeros. Your decompressor prints out the contents of the compressed file, then 1024 zeros, then it reads from itself the remainder of the file and prints that out.

    The bet can be won; not in every instance of course, but in the long run.

    --

  10. Re:Compression by AxelBoldt · · Score: 2
    Yes, or simply require that the contestant submit a single program (with prescribed file name) which can regenerate original.dat and is shorter than original.dat.

    In that case, the outcome of the bet clearly depends on the machine model. If the machine model is "complete Debian Linux running on Pentium", I would conjecture that the bet can't be won, but it is far from clear (obviously, a succinct powerful language is needed, maybe A+). On many machine models, the bet can be trivially won.

    --

  11. Re:Compression by AxelBoldt · · Score: 2
    No, the machine architecture doesn't matter. The bet can be won only if you are very astronomically lucky.
    I'd gladly stake large amounts of money on this, though it would feel like I'm running a scam.

    If you allow me to specify the machine model ahead of time, you'd lose. This was observed by Anonymous Coward in article 466.

    --

  12. Re:Compression by AxelBoldt · · Score: 2
    As understood the challenge, there was no requirement that the data I give you be random, though I might ordinarily give random data for such a challenge. If that was the machine model, however, I'd just give you a 0 followed by random data, and you would lose.

    Good point, the proposed machine model is not Turing complete and you can analyze it ahead of time.

    If I remember my Kolmogorov complexity right, the probability that a random file can be generated by a program that's n bits shorter is roughly a*b^(-n), where a and b depend on the (Turing complete) machine model. If the model is chosen wisely to make the constants small, it should be possible to compress more than 1/50 of files by one bit. But it's not nearly as simple as I thought.

    --

  13. Re:Compression by AxelBoldt · · Score: 2
    the probability that a random file can be generated by a program that's n bits shorter is roughly a*b^(-n)

    It's a*2^(-n), sorry.

    --

  14. I'd comment... by Enahs · · Score: 2

    ...but Yahoo! seems to have taken the site down. Thank you, Yahoo!

    --
    Stating on Slashdot that I like cheese since 1997.
  15. An ingenious solution... by jd · · Score: 1
    Technically, -both- guys are correct, because English is a horribly ambiguous language.

    The challanger is correct in that the problem of compressing an arbritary amount of truly random data is insoluble, and I'm not going to question that this is the problem as the challanger believed he had set.

    The challangee is correct in that the challange did not specify any particular form of compression. Compression is simply the removal of redundancy. If, by splitting the file and removing a number, redundancy is removed, he HAS compressed the file.

    On the flip-side, the challenger looses karma and brownie points for trying to fiddle the figures. You can't argue on the basis of inodes, as the challange doesn't refer to a specific OS or filing system. (MSDOS, for example, uses fixed-sized FAT tables, rather than inodes.)

    Also on the flip-side, the challangee almost certainly knew that the problem =as written= was NOT the problem as the challanger intended. True, it wasn't his job to second-guess, but if you enter a lion's den, you can still expect to get bitten.

    My personal opinion -- BOTH sides should withdraw their various claims, and the sum total of all monies involved ($5,100) should be donated to an appropriate charity.

    Overall, this should be a lesson to ALL -- Don't Assume. It can burn, and burn bad.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    1. Re:An ingenious solution... by jd · · Score: 2
      No. There is a difference between random DATA and a random number GENERATOR. To be random, data must follow some mathematical rules, one of which is that the probability distribution is equal at all points. Another is that the next point's probability is NOT affected by the prior point's.

      You are correct, however, in saying that -any- sequence can be generated by a random number generator. However, there is no general solution to the compression of a sequence S into a smaller sequence C, where a unique 1:1 mapping exists.

      HAVING SAID THAT, if you compressed S into C, and could manually determine which of the reverse mappings was S, you -would- be able to "compress" generic random data of any size and of any degree of randomness.

      (This is because the user posesses some additional information, I, which is therefore redundant and can be ignored, for the purpose of the compression.)

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    2. Re:An ingenious solution... by jd · · Score: 2
      The 218 files would have taken no additional space on the FAT, as it's fixed size.

      Further, you can format tracks down to 128 bytes per sector, leaving an average of 64 bytes of dead space per file. That's 13,592 bytes of dead space, if we store the files "conventionally".

      If, however, we store all the files in a single indexed file, there's no dead space to account for.

      The filenames could be considered another issue. However, again, DOS' root directory is fixed-size and therefore file entries within it take no additional space.

      Lastly, it doesn't matter if the guy "admits" to exploiting vague wording. Within the wording of the rules as given, he won. Within the intent of the rules, as given, he lost.

      Ergo, BOTH people have a claim to "victory". They can either split the purse, or they can shake hands and donate it to some worthy cause.

      In the end, this comes down to something I've been asked all too often myself. If you had to choose, would you rather be happy or right?

      IMHO, the guys should cut out the righteous act, and stick with being happy. The contestant that he "defeated" the challange, and the guy who set the contest up, that he achieved his goal of reducing the snake oil surplus.

      Let's face it, those are worthy achievements in themselves. Be satisfied! The money is just causing grief. Neither probably needs it. So why not clear the air, and give the cash to someone who does?!

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    3. Re:An ingenious solution... by jd · · Score: 2
      Yawn. I know perfectly well that the FAT uses clusters of sectors. So? You can configure the disk geometry in the boot sector.

      Hex disk editors are fun tools. You might want to try them, some day, when you're finished trolling and want to get some work done.

      The dead space (NOBODY calls it "slack space"! Slack is what people do on Fridays) is a function of sector size, and you can define whatever sector size you like. (The defaults are the "sensible" geometry, but by no means the only one.)

      I suggest you also get Peter Norton's books on MSDOS, and print out a copy of the Interrupt listings, which you can find on the Internet. The Norton Guides are extremely handy, too. Oh, and you'll want a copy of Flight Simulator 2.0, which came on a 5.25" floppy. The "copy protection" scheme is to use 1 Kb sectors, which not only stopped DOS reading them directly, it also gave the disk about 256 extra bytes per track.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    4. Re:An ingenious solution... by Ed+Avis · · Score: 2

      In this sort of contest, it's traditional that finding ingenious ways to exploit loopholes in the rules is a perfectly valid way to win. Then the rules are tightened up if ever the contest is re-run.

      Mike Goldman didn't make it clear what 'total size' meant - I think most programmers if you asked them would assume the sum of file sizes, if you didn't specify otherwise. His fault, I feel. He should pay the $5000 and write the rules more carefully next time.

      BTW, if you do assume that 'total size' is just the sum of file sizes, you can compress anything down to zero bytes. Number each possible sequence of binary digits - the empty string is zero, '0' is one, '1' is two, '00' is three, '01' is four, '10' is five, '11' is six, and so on. Then just create the required number of zero-length files.

      --
      -- Ed Avis ed@membled.com
    5. Re:An ingenious solution... by ethereal · · Score: 1

      It's not cheating if you follow the rules of the contest as explained to you, and create a winning entry as defined by those rules. The creator of the challenge is at fault for not creating a challenge that would rigorously prove the point he was trying to make. Shame on Mike, if anything - he should know better how to construct a correct challenge, since as he said he has a FAQ about exactly this sort of thing.

      I agree that in the real world, no actual compression occurred, and the compression gurus seem to be correct that a general solution to the compression challenge is impossible. Under the logic of a challenge which allowed tailoring of the compressor to the input and multiple files for the output, however, compression did occur. Unfortunately for the challenge provider, he allowed the problem space to be sufficiently dissimilar to reality for the problem to be solvable.

      --

      Your right to not believe: Americans United for Separation of Church and

    6. Re:An ingenious solution... by Surak · · Score: 2

      No way...the guy didn't compress shit. All he did was some data shifting. If the guy had read the FAQ, he would know that such tricks are not compression...you mention an MSDOS FAT filesystem. Even on a FAT filesystem, the 218 (or whatever) files and the decompressor would have taken up more space than the original file due to slack space...although the amount would vary from hard drive to hard drive, if you figure an average of 512 bytes of slack space per file thats 111,616 bytes of slack space, far more space than Patrick claims to have "saved."

      And anyways, the guy basically admits that he cheated by exploiting the vague wording of the challenge.

      I've got to hand it to Patrick, though, because if you do take the challenge out of context, he basically won...it depends on what the meaning of the words "compressor" and "decompressor" mean.

    7. Re:An ingenious solution... by Surak · · Score: 2

      The 218 files would have taken no additional space on the FAT, as it's fixed size.

      Further, you can format tracks down to 128 bytes per sector, leaving an average of 64 bytes of dead space per file. That's 13,592 bytes of dead space, if we store the files "conventionally".


      I never said that it would take additional FAT space, I'm talking slack space...if you knew anything about the FAT filesystem (which you obviously don't), you would know that its not sectors you have to worry about, it's clusters (GROUPS of sectors). there are only so many clusters because of FAT's fixed size... I was being QUITE generous in saying that 512 bytes is a practical average. On a 1.2 GB MS-DOS formatted drive (not FAT32, mind you, we're talking DOS here, not Windows), the cluster size ends up being something like 16K, leaving the average slack space to be 8K. so I was being VERY generous there.

    8. Re:An ingenious solution... by ChadN · · Score: 2
      In particular, the filename encodes information about the order in which the files need to be reassembled; so for each byte saved by the "compressor", several bytes are needed (on average) to encode the ordering as an ascii integer in the filename.

      The challenger should have made it clear that these costs would be counted against the reconstructed filesize (or simply not allowed any deviation from the "one file" rule) The decompressor should operate by having all the files piped into stdin, thus eliminating filesystem state info entirely.

      --
      "It's overkill, of course. But you can never have too much overkill." - Anonymous Slashdot Coward
    9. Re:An ingenious solution... by MikeBabcock · · Score: 2
      The quote everyone seems to avoid is:
      I am curious, though, and I am not trying to get any sort of angle on you here, what brilliant idea have you come up with that makes you think you can compress arbitrary data? Whether or whatever you disclose won't affect this challenge and I will live up t my end of the bargain regardless.
      That sums it up for me.
      --
      - Michael T. Babcock (Yes, I blog)
    10. Re:An ingenious solution... by supersnail · · Score: 1

      If he was serious he should have worded the competition this way:-

      "I will send you a 1.44 MB file on a standard DOS formatted HDD diskette".

      "To earn 5,000 you must return the compressed file AND the decompressor on a similar standard 1.44MB standard formatted HDD diskette".

      Tricky but still possable.

      --
      Old COBOL programmers never die. They just code in C.
    11. Re:An ingenious solution... by steveheath · · Score: 1

      The FAT file system is interesting. On a FAT file system this actually did perform compression. The FAT (table) is fixed size and takes up the same disk space however many entries are in it (up to it's limit). Thus the hiding mechanism takes advantage of an area of the file system that isn't normally used.
      Still, I don't think he ever expected the $5000, and expected to loose the $100 too. Fair enough I guess.. some ppl are willing to pay a price to prove a point (look at OSS :)

    12. Re:An ingenious solution... by susano_otter · · Score: 2

      Ingenious, but not ingeneous enough. The FAT table is a limited resource, and his solution uses up more of that resource than the original file. It's still not a valid compression - not to mention the fact that it's totally unportable anyway. Other OS's won't even concede this marginal point. . .

      --

      Any sufficiently well-organized community is indistinguishable from Government.

    13. Re:An ingenious solution... by arfy · · Score: 1

      The only problem with getting them to donate the money is that only Patrick has ponied up any cash. Looks like Mike Goldman is more concerned with money than honor; Patrick met the challenge as written and Mike weaseled out by saying that it wasn't what he meant.

    14. Re:An ingenious solution... by EllisDees · · Score: 1

      When one makes this sort of challenge, he should be sure that the wording is unambiguous. Mike really should have seen it coming when he was asked about splitting up the file. To threaten legal action was just asinine.

      --
      -- Give me ambiguity or give me something else!
  16. Re:Are you for lawyers or against? by iabervon · · Score: 1

    If he had not specifically accepted the possibility of more than two files, it would be a case of someone exploiting a loophole in his challenge. But the challenger actually asked about the loophole, and was told that it was okay. If you're making bets like this, you ought to be sufficiently attentive to not permit loopholes if the person asks about them before using them.

    In this case, the challengee specifically asked if the challenger would permit multiple files, and the challenger said yes. It's hard to argue that, simply because he *intended* to say either no, or "yes, but you have to include metadata size on all files after the first" or something else that would actually require compression, that he should be let off the hook. He was asked a straightforward question and gave a straight and wrong answer, and should lose because of that.

  17. Easy way to show he lost by dmahurin · · Score: 1

    Create a partition of 3,145,728 bytes on hardisk.
    Put the data file there.
    Now try to put the solution in that same partition some how ( as file system,tar, or a big exe).
    It will not fit.

    1. Re:Easy way to show he lost by Chandon+Seldon · · Score: 1

      That's a different problem with a different solution.

      --
      -- The act of censorship is always worse than whatever is being censored. Always.
  18. Omega by Zooko · · Score: 2

    He should have used Gregory Chaitin's Omega number to generate the challenge file.

    Actually I really don't understand Chaitin's work well enough to know if that would have saved him the $5000, but at least he (and the challenger) would have learned something about algorithmic complexity theory.

    Zooko

  19. no genius required by slew · · Score: 2

    As people have already mentioned, the numbers are probably not pseudo-random as they were
    retrieved from random.org...

    However, if someone decides to be cute and repeat this challenge with a pseudorandom sequence,
    you don't need to be a good mathmetician, you just need to know how to read a paper written
    by a good mathmetician... Look for the paper...

    Massey, J. 1969. Shift-Register Synthesis and BCH Decoding. IEEE Transactions on Information Theory.
    IT-15(1): 122-127.

    Abstract -- It is shown in this paper that the iterative algorithm introduced by Berlekamp
    for decoding BCH codes actually provides a general solution to the problem of synthesizing the
    shortest linear feedback shift register capable of generating the prescribed finite sequence of
    digits. The equivalence of the decoding problem for BCH codes to a shift-register synthesis
    problem is demonstrated...

    This is probably in most textbooks on linear codes for encryption and/or error detection and
    allows you to recreate the shortest LFSR (linear feedback shift register, a common component
    of pseudo random number generators), given a sequence of digits. Of course nobody would
    use a non-cryptographically secure PRNG like random() would they?

    So you take the number, pass it through this algorithm, find the shortest LFSR, and the
    decompressor just takes the LFSR initial state and reproduces the sequence according to the LFSR.

    Then again, the shortest LFSR can be thought of as the linear complexity of the number and who
    knows it might just accidentally compress the sequence the challenger gives you (not likely,
    but who knows?)...

    Always remember... "stand on the backs of giants" whenever possible ;^)

  20. Re:Compression by Genom · · Score: 2

    ahh...but would THAT 1000 bit number be random, or compressible on it's own?

    Once you can drop 1000 bits from the file, you can play with the offset itself, which has a good chance of being compressible or at the very least expressible in a smaller format.

  21. A variation which might work by stevelinton · · Score: 2

    As several people have observed, the attacker in this case used his "split into multi-files" trick to hide the necessary information in the directory structure of the disk. This is just barely ruled out by the "one compressed file" rule. However, suppose you hid the data in the filename of one file. For example write a program like

    echo $1 | cat - $1

    (about 20 bytes) and use the first 21 bytes of the raw file as the name of the compressed file.

    This is in the same spirit as this attempt (hide the data in the directory) but avoids breaking the one compressed file rule.

    Steve

  22. Re:Nice try but wrong by jamiemccarthy · · Score: 1
    "any number can be defined as n=a^2 + b"

    That doesn't get you any closer to winning the contest or overturning the laws of information theory.

    Because squaring it gets you close to n, your variable a will have approximately half the bits of n. But there's no guarantee that b will be smaller than (the other) half of n's size. In fact, with large n, the probability is overwhelming that b will be larger than half n's size, so together a and b will occupy more bits than n.

    I just tried this for fun with a 15-digit number I pecked in at random (482837578298375), and got an 8-digit a (21973565) and a 9-digit b (19489150).

    Because the gap between primes increases according to a simple formula, there's probably a simple proof of the average size of b, given n. But I don't feel like calculating it. :)

    Jamie McCarthy

    --

    Jamie McCarthy
    jamie.mccarthy.vg

  23. Re:Nice try but wrong by jamiemccarthy · · Score: 1

    I wrote:

    "Because the gap between primes increases according to a simple formula"

    Typo, for "primes" read "squares." Sigh.

    Jamie McCarthy

    --

    Jamie McCarthy
    jamie.mccarthy.vg

  24. Re:Mike's right by markb · · Score: 1

    "Sure" is not straight forward enough???

  25. Re:There is no solution... by ocie · · Score: 1

    How about this for a compression algorithm:

    1) if the input is original.dat, output 1 byte 0xff

    2) otherwise, output 1 byte 0x00, followed by the input

    This algorithm doesn't work that well in general, but for compressing the given data file, it should be great.

    --
    JET Program: see Japan, meet intere
  26. Re:There is no solution... by ocie · · Score: 1

    As soon as I was able to find the mirror of the original site, I figured this out. Seems like there are all sorts of tricks like having a script say:

    gunzip $1

    gives you all the power of the gunzip executable (72kb on my system) for only 10 bytes. Also, is opening a network connection back to a web server that has the original data forbidden:)

    --
    JET Program: see Japan, meet intere
  27. Mike defined total file size himself by A+nonymous+Coward · · Score: 2

    Patrick specified a file size, and Mike generated a file of that specified size. Mike did not include any OS filesystem overhead in his file size. It seems rather hypocritial of Mike to claim OS file system overhead counts for Patrick's files but not Mike's. Mike defined FILE SIZE, not DISK SPACE USED.

    --

    1. Re:Mike defined total file size himself by MO! · · Score: 1
      While not explicitly stated, those bytes ARE an integral part of the "decompressor" that was supplied (it can't work without them), and so including their size is consistent with the agreed upon rules, IMO.

      Because the decompressor relied on the file numbering to work does not make it an "integral part" any more than the OS needed to run the decompression program on is "integral".

      As well, things such as sector size, blocking factor, filesystem paging size, etc can cause, for example, a 1K file to take anywhere from 1K - (many)MB to store. For this reason, the filesystem utilization should not be considered in the validity of the challenge. The source file generated was a set byte size which did not factor in filesystem and storage overhead. Therefor, the "solution" presented did meet the stated requirements.

      --
      I AM, therefore I THINK!
    2. Re:Mike defined total file size himself by ChadN · · Score: 3

      However, the decompressor DOES depend on the filename numbering in order to work; it is part of the algorithm (and hence the "decompressor"), and those bytes (the bytes used to number the files) should be charged against the decompressor size. Then Patrick loses.

      While not explicitly stated, those bytes ARE an integral part of the "decompressor" that was supplied (it can't work without them), and so including their size is consistent with the agreed upon rules, IMO.

      --
      "It's overkill, of course. But you can never have too much overkill." - Anonymous Slashdot Coward
    3. Re:Mike defined total file size himself by spoon42 · · Score: 1

      While not explicitly stated, those bytes ARE an integral part of the "decompressor" that was supplied (it can't work without them), and so including their size is consistent with the agreed upon rules, IMO.

      IMO, too. Patrick said he wouldn't use "cheats" like saving information in command-line options, user input, or filenames:

      It's probably also possible to meet the challenge with smaller file sizes by storing information in the filename of the compressed file or the decompressor, but I think most people would consider this to be cheating.

      However, the number identifying each file and its place in relation to the others is *information*, stored in the filename, and is cheating by his own admission. He only gained one byte for each file by splitting it up as he did, and it would require at least one byte each to encode each file's position in the sequence. So he loses, as I see it. He fought the law (information theory) and the law won. ;-)
      --
      --- this comment is presented in WIDE SCREEN STEREO!!!
    4. Re:Mike defined total file size himself by RedAlert99 · · Score: 1

      Absolutely Right! I can't believe none of the other posts I read (many) mentioned this. I'm sure the file that the Challenge-maker created was exactly 3Mb (or whatever it was supposed to be), not 3Mb less the overhead for the file-storage system. How am I sure of this? Because if he were going to send a file that had less than 3 Mb of data + the filename and whatnot, it *would* depend what OS he was using, and the challengee specifically asked if it had to run on multiple platforms or not. The defense Mike made was that that information (like filenames) is part of the data, and so in total, it's not smaller after the "compression" routine. However, that means, by his own definition, he didn't send the right "size" file to Patrick. Patrick may have violated the spirit of the challenge, but Mike VALIDATED his violation by using the same *wrong* definition of "file size."

      --
      Cats know what you're thinking. They don't care, but they know.
    5. Re:Mike defined total file size himself by bigwig10001 · · Score: 1

      I agree with ChadN.

      The filename and EOF flags are vital to the algorithm. While these pieces of information won't show up as part of the "file size", they will take up disk space.

      Reducing the memory footprint is the whole point of the comp.compression group.

    6. Re:Mike defined total file size himself by dinivin · · Score: 1

      and so including their size is consistent with the agreed upon rules, IMO.

      Why? Mike didn't say that the total files had to take up less room on disk, just that their total file size had to be less. Total the file sizes and what do you find? It's less than the file size of the original file. Patric fulfilled the requirement. Period.

      Dinivin

    7. Re:Mike defined total file size himself by dinivin · · Score: 1


      But reducing memory footprint obviously wasn't the whole point of the challenge. The point of the challenge was to make a fool of whoever accepted it. In the end, it's Mike who looks like the fool.

      Dinivin

  28. Re:Intriquing by Firehawk · · Score: 1
    If you flip a coin 100 times yielding all heads (assuming a non-biased coin), the odds of heads on the next flip is 1/2.

    I don't know about you, but if I came across a coin that flipped heads 100 times in a row, I'd strongly suspect a biased coin.


  29. Re:Intriquing by llywrch · · Score: 2

    > What I'm thinking of, is wouldn't it be possible to create a minimalist compression algorithm that tended to not affect the file size,
    > but might deviate it in either direction by about 1% ?(enough to cover the overhead of the decompressor).

    All purely random strings of numbers (e.g. pi, e or the square root of 2), tend to have short strings of digits that repeat randomly. I believe this behavior is called ``clustering."

    Assuming a large enough file, say 3MB in size, we could find up to 26 strings of two or more digits that repeat enough times so that the compressor would be a series of substitution commands. For example:

    s/a/311/g
    s/b/47/g
    s/c/9724/g
    ...
    etc.

    That comes out to an average of 10 characters per line (including end-of-line character), add another 20 characters for overhead, & the decompressing script could be kept to less than 300 characters. As long as each substitution applied to more than 3 places in the original, then there would be net decrease in total file. And since we are looking at the 26 most common strings of digits, this inductively should be possible.

    Hmm. I seem to have reinvented pkzip.

    Geoff

    --
    I think I see a trend here. Maybe for them it really would be easier to muzzle the entire internet than to produce p
  30. Re:Another tack... by scrytch · · Score: 2

    > My question is, has anyone put any efforts towards seeing if other, larger pieces of data could be represented in this way?

    Sure: any. It was just the decss source padded out to a prime. You could do the same for mozilla.
    --

    --
    I've finally had it: until slashdot gets article moderation, I am not coming back.
  31. *sigh* Here's why. by Jeremy+Lee · · Score: 1

    Since a couple of people have asked the same thing, I'll have a go. My mathematics is reasonable, but I'm not a pro. IANAM.

    PI isn't actually random, and will skew this analysis in ways that doubtlessly earned someone a PhD, but let's ignore that for the moment. Let's pretend PI is an infinite-length random bitstream. It doesn't matter much.

    Being infinite in length, it will therefore (eventually) contain any finite length random bitstring that you care to look for.

    The chance of finding a match at any location is one in 2 to the power of the length of the bitstring. So, the chance of finding a 100 bit match at any index is 2^100.

    You will therefore find a match, ON AVERAGE, at a distance of about 2^99 bits into PI.

    Hurrah! you say. We only need a 99 bit index to represent a 100 bit number. We've made a saving!

    First, let's just diverge slightly and apply a sanity check for why this can't possibly be right. If that logic holds, then we can compress any N bit number to N-1 by looking for it in PI. Therefore, we can compress the 99 bit index to a 98 bit index, then to 97... down to 1 bit. Something's wrong.

    The issue is that you get a 99 bit index ON AVERAGE. Sometimes it will be 101 bits. Sometimes it will be 98. There's even a small chance (exactly 2^100) it will be right at the beginning. So, you can't just have a fixed 99 bit block to store the index, you need another number to store the length of the index number.

    Since the number "99" requires 7 bits of storage, your average case will be to represent a 100 bit string as a 106 bit index+length code.

    So, actually, you end up expanding the data, if the law of averages holds.

    There's no way around this, no matter how clever you get.

    If you say "we'll use a 99 bit block only if it fits, otherwise we'll just store the original 100 bit number" then you need at least a 1 bit switch to indicate this.. so on average, half will be 99+1 bit blocks, and the other half will be 100+1 bits. 100.5 bits per original 100 bit block.

    No, Huffman coding the index length doesn't help either. But if you asked that question, you probably already knew the answer.

    Of course, you might get lucky on individual cases. That's always a possibility when dealing with truly random data. But then, that's not compression, that's gambling.

    If you really want to learn about this stuff, read Claude E. Shannon's work on Entropy and coding theory. (also Turing, Huffman, and Godel) Doing so will also give you the grounding necessary for thermodynamics and quantum computing.

    --
    Jeremy Lee | Orinoco
    1. Re:*sigh* Here's why. by Jeremy+Lee · · Score: 1

      > And hence *on average* you win the competition.

      Only if you can devise a 1 bit decompression algorithm. Good luck.

      --
      Jeremy Lee | Orinoco
    2. Re:*sigh* Here's why. by Jeremy+Lee · · Score: 1

      No.

      --
      Jeremy Lee | Orinoco
    3. Re:*sigh* Here's why. by Jeremy+Lee · · Score: 1

      No.

      I'll restate. The chance of matching a 100 bit string is so remote - you need to search through so much of PI - that to simply record WHERE in PI you found the match would require about 99 bits just to store a number that big.

      Incidentally, 2^99 is about 10^97, which is up in the number-of-subatomic-particles-in-the-universe range. If you actually try to compute 2^99 bits of PI, you'll be here long past when the sun burns out. You have to do that both during the compression and the decompression steps BTW. But I digress...

      Any number you store requires at least a little metadata about it - it's length, usually - which will require more than 1 bit. Thus, there is no saving after all.

      Bottom line. You cannot compress truly random data. That's the definition. If you can compress it, it wasn't random.

      --
      Jeremy Lee | Orinoco
    4. Re:*sigh* Here's why. by clare-ents · · Score: 2

      "
      You will therefore find a match, ON AVERAGE, at a distance of about 2^99 bits into PI
      "

      And hence *on average* you win the competition.

      In n entries you will win n/2 times and lose n/2 times - admittedly you will tend to expand the data - the amount the data expands on average in the losing cases is greater than the average saving in the winning cases. However, that doesn't matter here. Providing we can shrink the data in better than 1 in every 50 attempts we will end up making a profit.

      If the guy wanted to have a winning money source he should have made the prize $199.

      --
      Only two things are infinite, the universe and human stupidity, and I'm not sure about the former. (Einstein)
    5. Re:*sigh* Here's why. by Drakantus · · Score: 1

      Interesting stuff. But what if instead of pi, it ALSO used primes. You could encode various strings in sets of three numbers. First number indicates the prime, 0 would be used for pi. Second is the starting bit, third the ending bit. Such as 99,21,99 would mean bits 21-99 of the 99th prime number. There would be cases of course where you couldn't save any space, but wouldn't it increase the "average" by having more places to look?

      --
      I love going down to the elementary school, watching all the kids jump and shout, but they dont know I'm using blanks.
    6. Re:*sigh* Here's why. by Drakantus · · Score: 1

      If I understand correctly, there is some reasonable chance that a 99bit number can be found early enough in pi to really "save" space. Couldn't the algorithm simply look for different strings if the first one doesn't actually save space? Okay, the 99 bit sequence may happen to be at a bit requireing 99 bits to record, so it doesn't save space- instead the program searches for a 98bit sequences, and so on untill it finds one that results in an actuall savings. Or could even search for OTHER 99 bit sequences within the file, instead of the first which didn't save space. By the law of averages isn't it pretty close to 100% likely that one possible sequence within the random data is somewhere in pi which can be described in less bits?

      --
      I love going down to the elementary school, watching all the kids jump and shout, but they dont know I'm using blanks.
    7. Re:*sigh* Here's why. by Drakantus · · Score: 1

      Your bottom line sucks. You can most certainly compress some random data, because the set of all random possibilities include every number ever written, a few of which have already been compressed. The definition of random data is not "data that can't be compressed" Besides, you are the one who decided to make an example with a 99bit string. Just for the record, I could easily be wrong about how often random strings can be found in other numbers, but not for the reason you gave.

      --
      I love going down to the elementary school, watching all the kids jump and shout, but they dont know I'm using blanks.
    8. Re:*sigh* Here's why. by matrix29 · · Score: 1

      The problem is in the average.

      Higher levels of randomness above 50% (as ordered files are much below the 10% random mark) along the area of 90% random should be easily compressable. The trick is in find something SIMILAR to the original random generation function. Any pseudo-random intersection SQRT(prime number) and Pi should be close enough to get a space-saving gain on location bit-strings(otherwise we couldn't have irrational numbers - think about it). The big problem is TIME. These functions waste huge amounts of time & computation for something with no practical worth (except the $5000).

      The implications of the counting theory being absolute extend to physics, quantum theory, pointlessly delving into Pi, and the structure of the universe. Try not to overlook the big picture for small-scale pigeon holes.

      --
      "Face it, a nation that maintains a 72% approval rating on George W. Bush is a nation with a very loose grip on reality.
    9. Re:*sigh* Here's why. by Guy+Smith · · Score: 1

      Well, no. 5,000 is actually a bargain, given that we're measuring file lengths in BYTES instead of bits. In order to technically compress the data, you have to reduce the file length by EIGHT bits, not one. With truly random data, you'll only do that 1/256 of the time. Hence, the break-even prize offer is 25,600 dollars. So 5,000 is really lucrative. And then add in the fact that the nature of the contest requires ADDING data to the original file; the decompressor size is also counted. So in order to profit, you have to be able to reduce the file size by 8 bits + the bitlength of the decompressor. The odds of winning are extremely remote.

  32. Re:PAY UP MIKE by edhall · · Score: 2

    If you want to be picky, Patrick utterly failed the challenge by not supplying compressed files. They were simply unaltered pieces of the original file. Sure, there were missing bits of the files that had been moved to the file names, but the fact that this slight of hand failed to compress the data can be proved simply by renaming the files. Even if the order of files supplid to the "decompressor" is preserved, renaming them renders the original data unrecoverable. Sure, moving data into the filename is a cute trick, but it isn't compression.

    The goal of the challenge is to produce compression, not to win via some semantic shell game. It's Patrick who shed his honor by resorting to a semantic smoke screen and attempting to win by wordplay with no intention of actually producing any real compression.

    -Ed
  33. Re:The page has been removed by GeoCities by Geek+In+Training · · Score: 1

    My understanding is that GeoCities IMMEDIATELY shuts down pages that induce X hits per hour, because they are likely porn/warez/MP3 sites.

    They THEN do investigation, and re-enable TOS-compliant sites.

    Kind of sad that they have to do this...

    --
    SlashSigTheorem: Humorous, Political, Critical, Constructive- If you have a .sig, someone WILL complai
  34. Re:Compression by ethereal · · Score: 1

    Although you submitted it as plain text, the browser might still interpret it as a tag. /. doesn't seem to insert [pre] tags around plain old text postings, I'm not sure why.

    --

    Your right to not believe: Americans United for Separation of Church and

  35. Re:According to the rules... by ethereal · · Score: 1

    Which article were you reading? He had 218 compressed files! (Admittedly without real-world compression, but within the rules of the contest there was compression.)

    --

    Your right to not believe: Americans United for Separation of Church and

  36. Re:Petty by ethereal · · Score: 1

    I think the moral is that if you think you're so smart, and you want to set up a public challenge to prove how smart you are and/or how inviolable a certain concept is, you'd better be smart enough to word the challenge so as to prevent fairly obvious technicalities/loopholes in your challenge. This wasn't a brand-new loophole, Mike even knew about it from the FAQ to begin with and he still couldn't phrase a challenge that avoided this problem.

    If you're going to put your money where your mouth is, make sure you know what you're talking about.

    --

    Your right to not believe: Americans United for Separation of Church and

  37. Re:Some things can be taken for granted. by ethereal · · Score: 1

    The stated metric of compression was "file size", as reported by 'wc -b', not "disk space". This metric was used by both challenger and challengee. The challenge wasn't intended to cause people to write better code, because the provider of the challenge knew that the problem was unsolvable. The only purpose of the challenge was to provide a bit of sport for the elite of the compression usenet world. If the whole thing was planned around "let's laugh at people who think they've found the Philosopher's Stone", then I don't see how turning the tables on the so-called "experts" is really contrary to the spirit of the contest. Either way, somebody thinks they've got a sure thing and that there's no way the other side can win.

    I understand the argument that you're making that certain things should be able to be taken for granted in such challenges, but I don't think it's a worthwhile challenge if you don't specify everything up front and then as people solve the problem you say to them "no, you can't do it that way, everybody knows that's just cheating". The provider of the challenge supposedly had much more of a background in the topic of compression and should have been able to easily word a challenge without loopholes (just a reference to avoiding techniques which had been proved to fail from the FAQ would have been sufficient, for example).

    Since Mike was willing to put up money based on a challenge which didn't really prove what he thought it did, he should hand over the money to Patrick for providing a worthwhile education in specifying your problem clearly.

    I suspect Patrick knew about this hole all along, but chose to go through the challenge to find out if Mike would really admit the error in the specification of the challenge, or if he would try to back out. I only wish I'd thought of it first!

    --

    Your right to not believe: Americans United for Separation of Church and

  38. Re:I have an idea... by Sloppy · · Score: 1

    Yes, but the offset number probably has more digits than the length of the file you're trying to compress.


    ---
    --
    As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
  39. Good deal for contestant by Sloppy · · Score: 2

    The payoff matrix is highly in favor of the contestant. You can't just assume that a random stream won't compress. Sure, it might not, but your odds are about even, not nearly as bad as a 50-to-1 shot.


    ---
    --
    As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
    1. Re:Good deal for contestant by dasmegabyte · · Score: 2

      Correct. He no doubt weeds his files for the normal patterns of random similarity, because even ONE of these would make the contest a farce (you could replace the bytes very easily). But there must be other alternatives to simple dictionary replacement, alternatives like dictionary on a bit shift (you're guaranteed to find some stuff that way, though the compression will be slow as hell), fractal decompression (match a sequence of the random to some buildable equation and save that equation) or possibly compression of a matrix of the file itself. Of course, for each of these methods I mention, Mike has probably generated a file that's taken these into account, and that's the point of this: he wants to see the magical next step in compression, which doesn't rely on these simple notions that things are repeated or can be rerepresented. There's some elusive technology he wants us to uncover, and hence the $5000 (and why he wouldn't pay up on that sham of a compression scheme Pat generated...converting "5" to an EOF does not constitute compressing.)

      --
      Hey freaks: now you're ju
  40. *Read* the article before you post. by arcade · · Score: 2

    Read the damn article before you post - he didn't use 'gunzip' in his proposition.


    --

    --
    "Rune Kristian Viken" - http://www.nwo.no - arca
    1. Re:*Read* the article before you post. by weis3w3 · · Score: 1

      Can't read it - yahoo geocities says it's unavailable.

      --
      -- We all get heavier as we get older because there's a lot more information in our heads.
  41. Re:The page has been removed by GeoCities by Skapare · · Score: 2

    Their 486 got overloaded

    --
    now we need to go OSS in diesel cars
  42. Re:Almost. by th0m · · Score: 1

    in this case it wasn't entirely random, since, although the file was sourced from random.org, goldman got the chance to review it before delivery to craig, which (presumably) gave him the chance to check that he didn't fortuitously generate three meg of pi or the decss source - or any of the number of less obvious cases which may have inadvertently yielded the very slight compressability required to win the challenge.

    --

    -- in china, chinese food is just called food.

  43. Re:One born every minute by th0m · · Score: 1

    it's more of a thought experiment than a scam - the idea being that you can't win the $5k, however bad you want it. and it's a win-win situation for goldman, since he either gets a bunch of $100 checks from a bunch of loonies who think they can compress his random data, or he ends up paying out the not-impossibly-large sum of $5k in exchange for playing a part in what would be one of the greatest mathematical revelations in recent memory, which is probably more than good enough value as far as he's concerned.

    --

    -- in china, chinese food is just called food.

  44. Re:Compression by th0m · · Score: 1
    probably statistically true in the general case of "compress a truly random file", but not so accurate in this challenge where the rules are "compress this specific file which has been carefully generated to resist compression".

    of course, there's always the chance that your algorithm would fortuitously perform wonders on any given file, but i think the odds are stacked against you.

    --

    -- in china, chinese food is just called food.

  45. Re:Another tack... by th0m · · Score: 1

    is this a troll? you can represent a gzip of any data in this way.

    --

    -- in china, chinese food is just called food.

  46. Re:Another tack... by th0m · · Score: 1
    1. gzip your data
    2. represent it as a large integer
    3. find the next prime number that can be represented by appending garbage bytes to your integer
    so, just regular prime discovery afaik.
    --

    -- in china, chinese food is just called food.

  47. Re:Compression by th0m · · Score: 1

    clare-ents was talking about one compressor that would win 2% of the time

    --

    -- in china, chinese food is just called food.

  48. Re:Compression by th0m · · Score: 1

    good point; at the level that craig was discussing, though, something as desperately simple as ensuring that you had no repeated or related sequences (given a suitably large random file) would probably be a first step towards eliminating the most obvious candidates.

    --

    -- in china, chinese food is just called food.

  49. Re:There is no solution... by th0m · · Score: 1

    this is a troll, right?

    --

    -- in china, chinese food is just called food.

  50. Re:Compression by th0m · · Score: 1
    yeah, nice one mate, looks like you know much more about information theory than any of the people on the compression newsgroups doesn't it

    you've got them over a barrel! go claim that 5k!

    --

    -- in china, chinese food is just called food.

  51. Re:This is why we need lawyes by th0m · · Score: 1

    it's not as ambiguous as you seem to think it is. can you represent any integer quantity between 0 and 10 using only an integer quantity between 0 and 9? no, you can't; there are more things to represent than you have things to represent them with. i'd say that's pretty inviolate.

    --

    -- in china, chinese food is just called food.

  52. Re:There is no solution... by th0m · · Score: 1

    it does not work period. storing the factors of a number takes more space than storing the original number.

    --

    -- in china, chinese food is just called food.

  53. Re:Unless I'm mistaken...(which I probably am) by th0m · · Score: 1

    it'd work for numbers that were close to a prime. unfortunately, there are far more numbers that are a long way away from their nearest prime - so storing the prime index and the difference would eat more space than storing the original number. your recursive tactics don't solve that problem.

    --

    -- in china, chinese food is just called food.

  54. Re:Petty by th0m · · Score: 1

    ultimately it's just a bit of fun. nobody got hurt, nobody got scammed, and it entertained slashdot readers for a minute. where's the harm in that?

    --

    -- in china, chinese food is just called food.

  55. Re:Maybe he should have used the "lossy" lzip algo by th0m · · Score: 1
    do you ever get the strange feeling that you're missing a joke?

    look at the date of the article he linked to.

    --

    -- in china, chinese food is just called food.

  56. Re:This is why we need lawyes by th0m · · Score: 1

    i guess you just don't get it. some things are highly unlikely, and some things are quite literally impossible. read section 9 of the faq; maybe that'll clarify it for you.

    --

    -- in china, chinese food is just called food.

  57. Re:Someone correct me please... by th0m · · Score: 1
    i think you've answered your own question there. the "only problem [you] can see" is what makes your scheme fail.

    it'd work for *some* numbers (14159265 compresses wonderfully!) but not for others, which is true of every other compression scheme ever.

    --

    -- in china, chinese food is just called food.

  58. Re:A Hypothetical Question by th0m · · Score: 1

    he wouldn't void the entry if it genuinely wasn't hiding data in the filesystem, ie. the decompressor would accept the input file on stdin. that wasn't the case in patrick's attempt.

    --

    -- in china, chinese food is just called food.

  59. Re:sorry! by th0m · · Score: 2

    there's no scamming involved. goldman required the (suitably small) entry fee to deter time-wasters; craig wasn't after the money in the first place. they were both just out to prove a point (goldman, that truly random data can't be compressed in the general case; craig, that goldman's challenge was amusingly open to abuse), not to make any money.

    --

    -- in china, chinese food is just called food.

  60. Re:Compression by platypus · · Score: 2

    You could scan the file for the smallest possible number whose binary representation isn't contained in the file,
    i.e. let's say 1001010111001010101 (I know this number is too short) is not there replace the 1000 zero-bits with that number.

    But I fear now it only get's more difficult to calculate the probabilties of a "win", but the outcome will stay the same.

  61. Re:PAY UP MIKE by G-Force · · Score: 1

    Agreed. Mike clearly stated, quite boastfully in the newgroup before he even sent the file that Patrick obviously didn't read the FAQ because it was an impossible task. so who's frauding whom? He believes a task to be impossible, and alludes quite obviously to it, then sets up a challenge to meet it?

    Come on, Mike, in my opinion you were just trying to make fools of people and maybe make a quick buck on the side. As you obviously were positive this was an impossible task, then you had no intentions of paying $5000, though I can't really speculate, you probably would propose the challenge whether you even had the $5000 or not. Then when he actually used your own rules and resolved the challenge in a way you didn't see coming, you quickly turned lawyer in an attempt to scare him from even trying to collect.

    He met the challenge that you yourself called impossible. Pay up (assuming you even can)

    --
    Once I thought I was wrong...I was mistaken.
  62. Re:That wasn't in the challenge by Surak · · Score: 2

    The hard drive platters were free. The coffee can lids might have worked better though because they are lighter in weight, although they wouldn't have been as rigid, so I'm not sure how well they would have supported the weight of the vehicle...

    I guess I could try it, though. :)

  63. Re:That wasn't in the challenge by Surak · · Score: 4

    Again, that depends on what you mean by "paper airplane." Admittedly, I won a similar contest involving rubber-band driven cars. The contest rules said that CDs or DVDs could not be used for tires due to the fact that they make for very narrow tires, thus having reduced friction. So I used hard drive platters. :-)

  64. IF I'm not mistaken. by mindstrm · · Score: 2

    He never said it was okay to deliver a whole bunch of 'compressed' files. He said ONE decompressor, and ONE compressed file. Though Patrick asked about this, all the guy did was restate his terms.

    What Patrick did was take a small bit of information out of the file, and, rather than having a table of 'offsets' telling his decompressor where to insert 5's, he broke it up into a bunch of files, effectively using the directory/inote tables to store this information.

    Had he not insisited on a single file, I'd say Patrick here would have technically won, though not at all in the spirit of the competition.

    1. Re:IF I'm not mistaken. by mindstrm · · Score: 2

      Hmm. okay.. point taken.
      Guy should cough up 5 big ones then.

    2. Re:IF I'm not mistaken. by ChadN · · Score: 2

      I would argue that the decompressor needs the filename numbers in order to work (for ordering) It CANNOT work on a bitstream of the "compressed" files alone (even if the ordering is preserved), so it is dependent on the numbering in the filename, and thus, their size needs to be counted as part of the decompressor. If you add in these bytes, then the decompressor (file plus filename numbering bytes) and datafiles are larger than the original file.

      So, after some consideration, I would say that Patrick didn't succeed, even considering the specific wording of the challenge (and the discussion in the emails).

      --
      "It's overkill, of course. But you can never have too much overkill." - Anonymous Slashdot Coward
    3. Re:IF I'm not mistaken. by selectspec · · Score: 2

      At first, when I was reading the page, I felt that Patrick clearly cheated, given that he exploited the filesystem to store data. However, after reading your comment, I must agree. Mike should have clarified the rules. Clearly, Patrick deserves the cash.

      --

      Someone you trust is one of us.

    4. Re:IF I'm not mistaken. by LAI · · Score: 1
      As Mike is the guy inventing the rules, I don't see why this wouldn't be binding.

      "binding?" This is not a legal issue. This is a case of a hacker newsgroup discussing issues with compression. Then someone writes code he thinks has found a loophole, and finds out it has in fact not. Mike's point in saying that Patrick did not meet the challenge is valid.

      Sure, it would have been more sportsmanlike for Mike to have said at some point, "sure, you can give me multiple files, but remember that filenames take up space." But either way, Patrick did not meet the challenge. His files took up more room on disk than the original.

      LAI

      "How do I type "for i in *.dvi do xdvi $i done" in a GUI?"

      --
      :eof
    5. Re:IF I'm not mistaken. by mathemagicianX · · Score: 1

      With a masters degree in the mathematics of Kolmogorov complexity theory (on which the theory behind this challenged is based) I have a few comments. Yes, it is impossible to compress a random string, in fact if you accept the Kolmogorov notion of randomness then it's true just by definition. However you have to be really really careful in defining the problem not to let any additional information slip in from anywhere -- even the mathematicians made a mistake here when they first did it in that they didn't properly account of the information in the length of the compressed string(s) (Solomonoff's original definition didn't account for this - not to detract from his genius!). Patrick used exactly the same trick here and Mike was just too careless and let Patrick's trick slip by. Mike had a basic idea right, but in agreeing to allow multiple files without explaining in great detail how he was going to measure total file size he slipped up. Well done Patrick! :)

    6. Re:IF I'm not mistaken. by dachshund · · Score: 1
      "binding?" This is not a legal issue.

      Of course it's not, and nobody said it was. But even in a gentleman's agreement, you can agree to be bound by the rules.

      I'm not saying that Patrick should collect the money, I am saying that Mike is an idiot for agreeing to Patrick's request. When he saw that request, what could he possibly have been thinking? It seems that he was either asleep at the switch, or he deliberately neglected to inform Patrick of the limitations of the challenge. Patrick for his part did not try to sneak this by, he was very up front about it from the start.

      Mike does profess to have some knowledge of these issues, enough to issue the challenge, so it was really up to him to define the rules in such a way that obvious data-hiding techniques were excluded (they weren't, if you read the article.) It seemed odd that Mike would agree to the request, and then seem stunned by the fact that Patrick actually went ahead and generated multiple files (to the point that Mike refused to even download them.)

    7. Re:IF I'm not mistaken. by dachshund · · Score: 3
      > I meant can I send you a compressor and several compressed files whose
      > total file size is less than the original uncompressed file and from
      > which I can regenerate the original uncompressed file
      >
      > Patrick

      Sure -- but you send me a *decompressor*, I don't need the compressor.

      Sure means Sure. As far as I can tell this is acceptance of Patrick's request. As Mike is the guy inventing the rules, I don't see why this wouldn't be binding.

    8. Re:IF I'm not mistaken. by dinivin · · Score: 1


      He didn't say the files need to take up less room on disk, just that they needed to total less in size. There's a huge difference. Mike left the requirements very open ended and Pat met them. End of story.

      Dinivin

  65. Re:Already compressed data? by mindstrm · · Score: 2

    The data has to come from the guy accepting the challenge; you get to tell him how much data to compress, he gives you the data.

  66. Re:I have an idea... (an old idea) by ChadN · · Score: 1

    The indices into Pi will take more room to represent (in terms of bits) than the data you are trying to represent (in general).

    But it is a commonly suggested technique. Good try!

    --
    "It's overkill, of course. But you can never have too much overkill." - Anonymous Slashdot Coward
  67. Re:Another tack... by ChadN · · Score: 2

    I've heard this idea conjectured a lot by various people; but although the mandelbrot set is in some sense infinite, the set of images that can be produced has bounds. Basically, choosing any fractal, and sampling at any resolution, with any slice through the set, still won't produce the entire possible set of data that could be represented in the finite space that you've chosen. Due to the sampling, you would repeat some data and never produce others. In fact, the self-similarity properties of fractals would seem to indicate that this would be a rather poor (or inefficient) way to represent general data.

    --
    "It's overkill, of course. But you can never have too much overkill." - Anonymous Slashdot Coward
  68. Re:it's "file size", and that's all it is by ChadN · · Score: 2
    NEVER does Mike say "file size". He uses the phrase "total in size". In fact, here is a quote:

    In order to meet the challenge you must provide to me a compressed file and a decompressor which together total less than the size of this original.dat, and which can generate a file identical to original.dat.

    He doesn't say "file size" in this or any other passage. It is Patrick who says (and assumes) this.

    Also, Patrick specifically addresses the issue of filename based compression in this passage.

    It's probably also possible to meet the challenge with smaller file sizes by storing information in the filename of the compressed file or the decompressor, but I think most people would consider this to be cheating.

    Thus, I think it is Patrick who hadn't considered the fact that his "decompressor" is not only just the 'dfi' script, but also the filename information used to indicate file boundaries and ordering. Like I said, his algorithm (much less his actual implementation) could not properly reassemble the file if it were fed all the properly ordered "compressed" files as an input stream (through a pipe, say), since it wouldn't know where to place the missing fives. So given that Mike said "total size", in all cases, when discussing and offering the challenge, I think his position is safe.

    Now, I do think that Patrick was reasonably clever, and that Mike was overly smug in both the challenge and the initial news posting. And in the exchanges Patrick did explictly ask about file sizes, but I do not fault Mike for not clarifying further, since he had no a priori knowledge of the scheme, and therefore didn't know he would have to be so specific. But his use of terminology was consistent, and I think, appropriate, to protect against this type of scheme.

    Anyway, that's just my take on it. Mike can hopefully learn some humility from this, and Patrick can learn that his "decompressor" is more than just the sum of its file bytes.

    --
    "It's overkill, of course. But you can never have too much overkill." - Anonymous Slashdot Coward
  69. Exactly. by slashkitty · · Score: 1
    Mod this up. It actually takes up more space with all the files. I do not think that this was the intent of the challenge. Hell, if you were allowed to not include the file names in the size, I could compress gigs and gigs of data down to a few hundred bytes!!! How? put the data in the file names, and have EMPTY files! My files would have looked like this:
    decompress -- 543 bytes
    1.AFDAGWRWERF234234r -- 0 bytes
    2.redsofiu325jfiu52f -- 0 bytes
    3.wlekfjsdf345edslfj -- 0 bytes
    .....
    100000.sdfsdf---end--- -- 0 bytes
    Give me $5000
    --
    -- these are only opinions and they might not be mine.
  70. Re:Are you for lawyers or against? by Moofie · · Score: 2

    As soon as Mike accepted Patrick's money, it was no longer a rhetorical challenge. It was a bet. Patrick, at that point, had caught Mike proffering a reverse sucker bet. Mike says "I bet you $100 at 50:1 odds that you can't do X." Patrick, confident that he could do X, accepted the bet and then did X. Mike says "Oh, I really meant for you to do X.1. I only tricked you into trying to do X because it's impossible." Patrick, in a really amazing display of good sportsmanship, declined to have his money refunded. (Which Mike wisely proffered...otherwise, he'd be in serious danger of fraud...)

    My momma told me never to take a sucker bet. Mike got caught by his OWN sucker bet...which takes a) greed and b) foolishness. He got lucky that the person who took him was such a good sport about it.

    It'd be really nice to see Mike donate the money to ACM or EFF or somebody like that...that'd be a pretty classy way of taking care of this.

    --
    Why yes, I AM a rocket scientist!
  71. Re:Are you for lawyers or against? by Moofie · · Score: 2

    I'm not necessarily arguing a point of law, it's more of a point of honor. Wagers are well-understood in many cultures, and in this case it seems clear that Mike proffered an arguably dishonest wager (a bet where he challenged people to do something that was not possible to do to his satisfaction). Somebody hoist Mike by his own petard. Good for Patrick.

    --
    Why yes, I AM a rocket scientist!
  72. According to the rules... by BWing · · Score: 1

    "Last, the contestant will send me a decompressor and a compressed file, which will together total in size less than the original data file, and which will be able to restore the compressed file to the original state."

    At the bottom of the page:

    "There's almost universal agreement that I didn't compress anything..."

    Since he didn't have any compressed file (or files), it seems he didn't meet the rules of the challenge.

    --

    Dad always thought laughter was the best medicine, which I guess is why several of us died of tuberculosis.
    1. Re:According to the rules... by BWing · · Score: 1

      Perhaps this is semantics, but how would compression within the rules be any different than real-world compression?

      --

      Dad always thought laughter was the best medicine, which I guess is why several of us died of tuberculosis.
  73. Re:Compression & Time by angst_ridden_hipster · · Score: 2

    Yeah, my exceptionally slow compressor counts the number of "one" bits and the number of "zero" bits. It stores those two numbers, along with several position-dependent checksums. Compression is very fast.

    Decompression can take roughly the lifetime of the universe (for large files).

    Interestingly, my brilliant strategy is also a encryption algorithm ;)
    bukra fil mish mish
    -
    Monitor the Web, or Track your site!

    --
    Eloi, Eloi, lema sabachtani?
    www.fogbound.net
  74. Re:Compression & Time by angst_ridden_hipster · · Score: 2

    (should have said "slow de-compressor" but you get the picture)
    bukra fil mish mish
    -
    Monitor the Web, or Track your site!

    --
    Eloi, Eloi, lema sabachtani?
    www.fogbound.net
  75. Play the odds by elstumpo · · Score: 1

    It should be possible to come up with a scheme that would work more often than 1 in 50, which is all you need to make money at this game, pretending the backer would actually let you play over and over again.

  76. Re:Compression by horace · · Score: 1

    How about this:
    take the file,
    1 look for the longest sequence of 1s. (length x)
    2 record its starting point (position y).
    3 remove the ones and write a program to insert x 1s at position y.
    4 choose the file size to ensure a high enough (>50% say given the terms of the bet) probability of finding enough 1s in a row to be longer than the code to reinstate the ones.
    Of course the decompressor is not general but there will be one such that will work on any sufficiently large randon number and hte length of the program will only increase logarithmically with the size of the file.

  77. Re:You just described RLE. by horace · · Score: 1

    I was trying to avoid using EOF as a cop out.
    Is it also the case that I could not win through choice of run (zeroes, digits of pi, exponentials of small numbers)? I imagine that this wins just a few bits and the length of the program bounds the maximum number of different checks. and each method only wins about a bit.
    Also the files would get very large.
    Still since I only have to do better than one fiftieth it ought to be possible to get quite close.

  78. Do EOF characters count? by MacBoy · · Score: 1

    In theory, and in practice, the actual length of the "compressed" data files is equal to the length of the original files. Why? because all that happened is that a character ("5") was replaced by another character ("\D" or EOF). The tool wc obviously does not include EOF characters in its byte counts, so it does appear that the "compressed" files are smaller, in sum, than the original. In actuallity, the compressed files each have an extra byte tacked onto the end, where that "5" used to be -- an EOF character.

    Does that matter? I think so.

    Has Patrick, albiet perhaps unwittingly, used a "filesystem exploit" to (try to) win? Yes. The fact that the computer removes and ignores the EOF character when processing the file data is of no concern -- the EOF is there, stored in the file, regardless.

  79. Lzip! by PovRayMan · · Score: 2

    Lzip should have gotten $5000 for being the best compression program.

    ----------

  80. Re:Compression by jfunk · · Score: 2

    Too bad the Python doesn't get executed.

    Maybe in Squishdot, as long as rexec is used, of course. You could import StructuredText from Zope and your post could act like a Wiki.

    - *this* would look like this

    - **this** would look like this

    The above would become an actual HTML bulleted list.

    http://this.would.be/a/link

    || Instant Table ||
    || 1 || 2 ||

    Ah, well. :-)*

  81. Re:Are you for lawyers or against? by ajs · · Score: 2

    I think that a person's stance on this issue, of who was right and who was wrong, will indicate that same person's approval of lawyers and much of the practice of the law today.

    Data point: The corolation for me is: modern US law as practiced is harmful, and anti-productive; this contest was clearly stated, and clearly won. Mike should pay up.

  82. Re:Are you for lawyers or against? by ajs · · Score: 2

    But it was clear. The contest was in terms of total file size of the compressed output and decompressor program.

    The proviso given (which should not have been) was that the compressed file could be provided in multiple parts. What he forgot is that an ordered sequence of files has more information than just the sum of their contents. Ooops.

    This is not a wording problem, but one of a poorly thought out bet. This guy gave 1:50 odds that no one could compress truely random data. He chose to allow conditions that made the bet doable, and then did not want to pay up when the bet was won.

    Actually, even without using multiple files, I think the bet's doable for any single given data file (though you cannot write a generalized compressor for all truely random data). I should come up with the submission to solve for this guy's challenge.

  83. It doesn't count because... by tlight · · Score: 1

    there's implicit information present in the split files that is not present in the whole file such as the sizes of the files and the order of the files (which is implicitly included in the filenames). That's why the split files tarred and gzipped together produce a larger archive than the orginal file (because this information will have to be stored also in the archive). That's why mr. Goldman asks for "a decompressor and a compressed file" and not multiple files. Now if mr. Craig delivered ONE file that is smaller than the original which dearchives into the 218 files which in turn decompress to the original file, he might have won (but it would never work).

    1. Re:It doesn't count because... by EllisDees · · Score: 1
      as someone posted above:
      > I meant can I send you a compressor and several compressed files whose > total file size is less than the original uncompressed file and from > which I can regenerate the original uncompressed file > > Patrick Sure -- but you send me a *decompressor*, I don't need the compressor.

      --
      -- Give me ambiguity or give me something else!
  84. Re:Intriquing by tlight · · Score: 1
    For example, a file full of 0's, is perfectly possible from a true random number generator, and just as likely as any other set of numbers.

    which isn't true because if this were true every number would be a random number (because every number could potentially be generated by this number-generator). This obviously isn't true. A true random number generator doesn't randomly generate numbers but generates numbers that are truly random, that is, you can't predict what digit (x+1) will be based on the previous x digits. That is why the only true random numbers can be generated if the source is completely random such as the background noise in space, or timing nuclear disintegrations.

  85. Re:Intriquing by tlight · · Score: 1
    Hmmm okay... after reconsideration I take that one back... have to remind myself to think before I write not after ;-)

    (or to do it anonymously ofcourse). Now please don't flame me ;-)

  86. Sucker Bets and Real Proposition Gambling by werdna · · Score: 2
    This was a great story. It reminds me of Sky Masterson's soliloquy from Guys and Dolls, in rejecting a proposed sucker bet about the number of sales for various pastries at a deli:


    "Let me tell you a story. When I was a young man about to go out into the world, my father says to me a very valuable thing. 'Son,' the old guy says, 'I'm sorry that I am not able to bankroll you a very large start. But not having any potatoes to give you, I am going to stake you to some very valuable advice. One of these days in your travels, you are going to come across a guy with a nice brand new deck of cards, with the wrap and seal unbroken, and this guy is going to bet you that he can make the Jack of Spades jump out of the deck and squirt cider in your ear. But, son, you do not take this bet. For if you do, as sure as you are standing there, you are going to end up with an ear full of cider.' "


    Sky, an extraordinary proposition gambler, responded by countering. He covered the offeror's eyes and bet he couldn't tell what color was Sky's tie. "No bet."

    I say this to point out the difference between sucker betting and proposition gambling. It is a sucker bet to offer a bet challenging a mark to do something that is, in fact, impossible. In proposition gambling, the offeror gives the mark a proposition they are unlikely to win, or that is worded in such a clever way that the winning of it leaves the mark too embarassed, entertained or impressed by the cleverness to attempt to challenge having lost.

    In this case, the first offer (compressing high-entropy data) was a sucker bet. Shame on him. The response, doing the sucker bet using multiple files, was a sweet proposition gamble. Shame on him. Once taken, the offeror-now-mark did weasel, but as I see it, ultimately welched.

    I admit that I, too, didn't consider the import of the multiple files on first reading (but, hey, I didn't have $5K against $100 riding on the result).

    At the end of the day, the sucker-bettor was the mark, and he had an earful of cider.

    No sympathy.
  87. A Hypothetical Question by gnfnrf · · Score: 2

    It is my understanding of the situation that Patrick "solved" the challenge by exploiting the fact that multiple files store more implicit data in the filesystem than a single file does. I believe the only such data he used was the EOF.

    This is considered "cheating" because he used 220 EOFs that are not counted in the filesize.

    So, my question is this. If a contestant submits a single compressed file and a single decompressor file which are just 1 byte under the original file, does Mike void the entry because of the extra filesystem space of two files? If not, why not? If so, shouldn't the rules say that specifically, as two is the minimum total number of files one can submit?

    --
    gnfnrf

    1. Re:A Hypothetical Question by dinivin · · Score: 1


      And you know this how?

      Dinivin

  88. Intriquing by PhiRatE · · Score: 3

    Its an intriquing problem. I certainly wouldn't stake $5000 on nobody being able to compress the data I supplied, good random number generator or not. Why? because while the concept that random data in general cannot be compressed is correct, there are no patterns, it is not necessarily true that a given set of random data cannot be compressed.

    For example, a file full of 0's, is perfectly possible from a true random number generator, and just as likely as any other set of numbers. As any good geek knows, a file full of 0's compresses very very well.

    In this case, I think the challenge has one major problem, and that is that there is no real time limit on the entry, and no requirement that the decompressor work effectively in general, it can be entirely specialised.

    This means that the problem *may* be solvable, simply by throwing ungodly amounts of CPU power at it. With the appropriate code, it would be possible to simply hunt for unintentional patterns in the data and hope. If you're lucky, there will be enough indexable patterns present that it will outweigh the cost of the indexing, and the code necessary to replace the index points.

    If you're unlucky, the data will have no patterns in the space you searched, and therefore you're out of luck.

    This is why the guy noted that if he got a big enough file, he could win the competition. As the filesize increases, it becomes increasingly likely that there are, accidentally, enough patterns in the data to cover the space required for the decompressor and the dictionary. Of course, as the filesize increases, you also have to burn considerably more CPU in order to locate the patterns in the first place.

    Personally I think it would make an excellent project for a research team, not particularly for the money, but in order to create some really fast, specialised code for locating patterns in assumedly random data.

    I am not aware of much effort in that area, most compression research goes into identifying patterns inherent in the data type, waveforms in audio, visual patterns in images, duplicate strings in text and binary files. It would be very interesting to see just how effective fast analysis code and heavy CPU is against the problem.

    I suspect there may well be some crossover here with Cryptoanalysis, since one of the primary jobs in breaking crypto is finding the patterns that differentiate the ciphered data from pure randomness.

    Any links appreciated.

    --
    You can't win a fight.
    1. Re:Intriquing by gotan · · Score: 2

      Since you have to provide the algorithm for uncompressing too you can regard the machine (OS/shell+tools) as one huge decompressor. It does f(compressed data+algorithm)= decompressed data. Simple counting yields that since there are at least as many (compressed data+algorithm) pairs, as there are sets of decompresssed data. Even by choosing only the shortest versions at least some of those pairs have to have a larger total length than the decompressed data (not accounting for tricks with hiding data in EOFs and filelengths).

      --
      "By the way if anyone here is in advertising or marketing... kill yourself." -- Bill Hicks
    2. Re:Intriquing by PurpleBob · · Score: 1

      If the data consists of nothing but digits and leaves you the rest of the ASCII set (or even a few letters like your "a", "b", and "c") to work with, of course you can compress it.

      That is not the case in this problem. If you made 'c' mean '9724', you'd have to make something else mean 'c', and something else mean that something else... and in random data you will save nothing.

      --

      --
      Win dain a lotica, en vai tu ri silota
    3. Re:Intriquing by jidar · · Score: 1

      This has nothing to do with anything because that isn't how he completed the challenege in the first place.

      --
      Sigs are awesome huh?
    4. Re:Intriquing by Tungz10 · · Score: 1

      What I'm thinking of, is wouldn't it be possible to create a minimalist compression algorithm that tended to not affect the file size, but might deviate it in either direction by about 1% ?(enough to cover the overhead of the decompressor).

      Of course, on average it would increase the size of the file, but as someone posted much earlier, all you have to do is decrease the total length by a single bit 1 out of 50 times.

    5. Re:Intriquing by Erasmus+Darwin · · Score: 2
      because while the concept that random data in general cannot be compressed is correct, there are no patterns, it is not necessarily true that a given set of random data cannot be compressed.

      If I'm not mistaken, during the email exchange, someone mentioned that the data was probably skew-corrected. So not only would you have to hope for a nice pattern in the data, but it would have to be a pattern that the person issuing the contest didn't notice.

    6. Re:Intriquing by vidarh · · Score: 2
      Your argument is fatally flawed.

      For it to be impossible to predict the next digit,to be generated, that digit needs to be totally random and chosen without any relation to the preceeding digits. If that is true, then any digit can follow any digit with the same probability, otherwise you would be able to make predictions better than pure chance by looking at the preceeding digit.

      In any sequence of n digits, where every digit is choosen totally on random, every sequence of n digits are just as likely to occur. That means that 0,0,0,0,0 is just as likely as 5,2,7,3,1, or any other sequence of the same number of digit chosen.

      If you're given a sequence of 10.000 digits, that are totally random, and you get to look at the first 9.999 of them, and they are all zero, the chance is still just 1 in 10 that the final digit is zero.

      A number isn't random in itself, it is random if it is chosen at random. You may confuse random numbers with numbers such as Pi, which aren't chosen at random, and thus aren't random numbers, but where you can't predict digit n+1 by looking at any of the digits 1 to n, and where there are no patterns. These numbers are transcendental numbers, not random.

      Obviously, in this case, a randomly generated number isn't what one should be looking for, but a number where there are no patterns to be exploited.

  89. Almost. by rjh · · Score: 2

    Depends entirely on how random the random data is. There's a surprising lot of random numbers which can be compressed--this almost always indicates a critical flaw in the random selection process. Even radioisotope-based RNGs aren't immune; by knowing the isotope, its age, and the sort of radiation detector being used, it's often possible to get a faint hint of determinism out of the RNG. That faint hint of determinism can be exploited to result in compressability... sometimes.

    By and large you're right that random data can't be compressed. But getting really random data is a far-from-trivial undertaking.

    1. Re:Almost. by Betcour · · Score: 1

      How about reverse-engineering randomness ? Make a file of random datas, then compress it using various algorithm, then try to make it less compressable, and repeat. Then call the results "pure randomness" (whatever that can be ;)

    2. Re:Almost. by pallex · · Score: 1

      >But getting really random data is a far-from-trivial undertaking.

      Lavarand? http://lavarand.sgi.com/cgi-bin/how.cgi

    3. Re:Almost. by mendepie · · Score: 1

      Lavarand has taken a life of it's own outside of SGI. The new website is not done yet, but you can check out http://www.lavarand.org to check out the new lavarand site. Isn't it great that SGI supplies engineers with beer :-)

      --

      Are you paranoid if you know that they just want to know everything you say and do?

    4. Re:Almost. by onepoint · · Score: 1

      the web site mentioned above http://lavarand.sgi.com/ is the best start if you need a true random number to start. I have not been there for a while but I bet that they still have the output from the lavba lamps for free.

      onepoint


      spambait e-mail
      my web site artistcorner.tv hip-hop news
      please help me make it better

      --
      if you see me, smile and say hello.
  90. Re:Another tack... by Alanzilla · · Score: 1


    1. gzip your data
    2. represent it as a large integer
    3. find the next prime number that can be represented by appending garbage bytes to your integer


    Why bother with primes at all? Find the closest n ^ k. Specify n, k, and the difference (positive or negative).

  91. Re:file size by Chandon+Seldon · · Score: 1

    ASCII charactor EOF? What the hell kind of ASCII EOF would be in a binary file?

    --
    -- The act of censorship is always worse than whatever is being censored. Always.
  92. Re:Another tack... by wct · · Score: 1

    Understood. I wanted to know if any research has been done to determine the discovery time and whether this would make the scheme feasible...although I'm pretty sure no, for the same reason cryptography is secure these days.

    Just spouting random thoughts, jeez :)

  93. Another tack... by wct · · Score: 2

    A while ago on slashdot.org an article was posted about a prime number that had the "magical" property of being able to be g'unzip'd to the DeCSS source. The number was represented as a (relatively short) arithmetic expression.

    My question is, has anyone put any efforts towards seeing if other, larger pieces of data could be represented in this way? Of course the chances of finding these type of numbers are very slim, but they *are* infinitely large spaces, and the compression would be ultra-compact.

    Maybe something for a quantum computer to chew on anyway...

    1. Re:Another tack... by PurpleBob · · Score: 1

      The answer is no. There is no way to compress random data without putting that data somewhere else. It is mathematically impossible. Even a quantum computer could not do something impossible.
      --

      --
      Win dain a lotica, en vai tu ri silota
    2. Re:Another tack... by waynem77 · · Score: 1

      Find a copy of Fractal Image Compression by Michael Barnsley et al. Barnsley describes a method of fitting a fractal to an existing bitmap and reducing the fractal to its generating equation. It's a lossy method of compression, but it produces tiny files (for some images). Barnsley started (or joined, I'm not sure) a company, Iterated Systems, that tried to make money producing these files but it didn't work out for whatever reason.

    3. Re:Another tack... by Paradise_Pete · · Score: 1
      There are some prime numbers which can be represented by (2^n)-1 (e.g. 3,7,31 but not 15 or 63)

      2^4-1 = 15
      2^6-1 = 63

      But then they're not prime, either.

    4. Re:Another tack... by Tungz10 · · Score: 1

      The trick to that one is that there are infinitely many equivalent versions of the DeCSS source. You can add/remove whitespace, change the order of a few commands, etc.

      So you take a whole bunch of mathematical numbers, and you take a whole bunch of versions of the source code, there's very likely to be a match.

      BTW, Mathematics is an extremely high level language. I'll bet that if you were to actaully write a program to efficiently generate that prime number, it would be larger than (or very close to) the size of the DeCSS source itself.

    5. Re:Another tack... by Relic+of+the+Future · · Score: 2
      You mean this story? Ummm, it was never represented by any "arithmetic expression", only in its full (hex) glory and and like this:

      48565...29443 (1401-digits)

      I'm afraid using '...' doesn't count as compression. ;)

      There are some prime numbers which can be represented by (2^n)-1 (e.g. 3,7,31 but not 15 or 63), but I think you're right that your chances of finding such numbers would be slim.

      God does not play dice with the universe. Albert Einstein

      --
      Those who fail to understand communication protocols, are doomed to repeat them over port 80.
    6. Re:Another tack... by BMazurek · · Score: 1
      You would also need to either encode or otherwise store the fractal function, coordinates and resolution.

      But given a bit, and a series of equations, coordinates and expansion resolutions, it should be possible.

      I'm in no way thinking this would be easy to calculate the data necessary to do it. Is it possible? Yes. Are we likely to see such a mechanism? No. Unless this is something a quantum computer could do. But I have no idea.

    7. Re:Another tack... by BMazurek · · Score: 2
      One thought that has always intrigued me is the infinitely large nature of fractals.

      Given a set of data to represent, construct a fractal equation, and a set of coordinates and a resolution that when expanded reproduce the original data.

      If you could find them, that would be some mighty cool compression....

    8. Re:Another tack... by J+x · · Score: 1

      Wouldn't that allow for almost-instant binary transmissions? Instead of sending Metallica.mp3, for example.. one could simply send a mathematical equation f(x)=something. It would then be up to the client machine to evalute the equation and execute/decompress/etc the file. The transmission size would only be the size of the equation, something under 100k. Excessive amounts of data (1GB+?) could be represented given the correct formula. ..And the RIAA/MPAA can't trademark mathematic equations.. can they? :)

    9. Re:Another tack... by vidarh · · Score: 2
      The problem with this method is that it doesn't only require you to find a set of coordinates and a resolution, but to find a set of coordinates and resolution that can be expressed in less space than the original data.

      Which may work for some input sets, but can't possibly work for all input sets. If you believe it can, imagine this scenario: I take an arbitrarily sized block, and compress it. Then I compress the result. I keep going until the data set is one bit. No matter how large your decoder, how would you decode more than two states from one bit?

      You can't.

      The tradeoff for all lossless compression for "generic" data, is to find an algorithm that compresses well for a reasonable set of inputs.

  94. Here is a Real Challenge by superid · · Score: 2
    There is another numerical analysis challenge thats been around for several years and has only had one winner of the $100 prize. Check out the Weak Signal Challenge. You need to recover a morse code signal embedded in noise. And you don't need to send any $$$ to enter, just download the WAV and start number crunching :)


    SuperID

    Free Database Hosting

    1. Re:Here is a Real Challenge by cetan · · Score: 1

      Just out of curiousity, did you scroll to the bottom of the page? :)

      --
      In Soviet Russia...michael would be rotting in Siberia!
    2. Re:Here is a Real Challenge by Tungz10 · · Score: 1

      Out of curiosity is the offer still valid now that someone has won it? Not that I think I'll solve it anyway...but this looks like fun.

  95. Re:engodelization discussed by an amateur by cyanoacrylate · · Score: 1

    This works well for highly composite numbers, eg.

    4096 => 2^12

    However for less composite numbers,

    p*q = p^1, q^1, p & q prime

    Its not so good. For prime numbers, you're worse off (trivially):

    p = p^1

    Note that I've ignored the delta - eg. for the (can't remember the name) special primes that appear as 2^n -1. Here, you run into trouble finding the delta (you have to factor p, then p-1, then p+1, then p-2... factoring being a hard problem (in NP), you're not going to have much fun implementing this ;-). You also have the problem of deciding when you have achieved 'enough' compression.

    By the counting arguments expressed in the other replies to your post, there will be points where, by the time you have factored the number, counted the factors & their exponents, and the delta, you're worse off than when you started.

    I would expect this to be true for very small
    numbers. Note that no matter what we do, we're effectively in non-computable space because we have to factor prime numbers. So as reasonable as this algorithm is for compressing numbers, its going to take far too long to do it to be practical, at least, without quantum computers.

    Cyano

    --
    Don't like my sig? I don't either.
  96. The problem by Hard_Code · · Score: 2

    The problem is that Mike said it was OK to use multiple files, but did not specify whether that meant that the actual additional filesystem space that they took up would be counted against Patrick anyway (with the idea that Patrick could use many files, just so long as the eventual information still added up smaller than the original), or whether he meant that it shouldn't make a difference, and didn't care.

    The information that could be considered "removed" by the files, would basically be the start and end offsets of the chunks, if they were all in one file...which would naturally add up to more than the original (219 * 2 additional info). The "missing" information escaped into the unix shell and filesystem, through the facility of "cat" being able to print out a block of bytes given a single piece of info.

    --

    It's 10 PM. Do you know if you're un-American?
    1. Re:The problem by Tungz10 · · Score: 1

      He said "file size". He did not say "disk usage"

      If I took a piece of paper and cut it into pieces, the sum of the parts is equal to the whole, and that's what the challengee did IMO. It doesn't matter if the representation just so happens to use up more space, the challenge specified that OS was not important.

  97. Re:Compression by Hard_Code · · Score: 2

    POT appears not to mean that Slashdot will *interpret* your post as POT, but instead that Slashdot *assumes* your post to be text that is not meaningful HTML. They should changed it so any HTML is appropriately escaped (entitied) from POT submissions. Angle brackets, etc, are very annoying to have to fix.

    --

    It's 10 PM. Do you know if you're un-American?
  98. Re:One born every minute by QuantumG · · Score: 2

    ie, it would confirm what I say every day.

    --
    How we know is more important than what we know.
  99. Re:There is no solution... by pollo2 · · Score: 1

    Wrong.
    If you could compress everything, you could just take any set of data and keep compressing until you have nothing left. This is quite clearly nonsense.

    --
    This is my sig. Hooray !
  100. But this solution WOULD work by Cy+Guy · · Score: 2

    Since the challenge allows you to see the file before building the compressor (and the neither the size or speed of the compressor is relevant), all you need to do is find the single array of bytes (I'd say string, but presumably the 'arbitrary file' would be binary) greater than three bytes, that occupies the greatest percentage of file space in the compressed file. Then replace that array wherever it occurs with an EOF character and name each new section of the file sequentially using a 1 byte name.

    (As noted in another post) As long as you pick an initial file length long enough that 2% of the time there statistically should be enough repeats of some 3 byte (or longer) array to equal the number of bytes in your decompressor (which would be roughly the same size as Patrick's), then you will win the challenge enough times to make it profitable.

  101. Re:Compression by Cy+Guy · · Score: 2

    Nice try, but won't work. You have to keep the position of that serie of '1'. That position will use about log2(position) bits. Unsuprisingly, 2^n is the size the file needs to be to have a good chance to have a serie of n '1's.

    Not if you expand on Patrick's original idea replace the series of 1's (or zeros, your compressor should determine the longest continues series of each and then use the longer) and replace it with EOF. Since EOF can't appear in the original file except at the end, your de-compressor will know that this is the position to insert the original series. You could also repeat this as many times as like, provided the replaced series is at least as long as length of the EOF character plus the length of each new file name created plus memory space needed to store the value of the length of the series.

  102. Re:Random data... by jguthrie · · Score: 1

    Do I get to pick the number base I represent the result in?

  103. Re:Are you for lawyers or against? by mjh · · Score: 1
    You make some good points, and I stand corrected.

    First of all, only naive idiots are universally pro-lawyer or anti-lawyer.

    This can't be anything other than flamebait. I'll try and overlook it by saying that I never meant to suggest that I was universally pro- or anti-lawyer. I meant to discuss how a person's stance on this issue might lead to unintended consequences. Specifically that you might be enabling the powerful to be more powerful, by giving less value to intent and only sticking to the specificity that was set forth in his challenge.

    But my assumptions about how the law does and does not work are well corrected by you. (Obviously) IANAL, so anything I said about how the law works is left to better people than me.

    Cheers!

    --
    Key to financial independence: Spend less than you earn. Save and invest the difference. Do it for a long time.
  104. Re:Are you for lawyers or against? by mjh · · Score: 1

    What's funny about that is that I was thinking that exact thing right after I pushed submit on my last post! Nice catch.

    --
    Key to financial independence: Spend less than you earn. Save and invest the difference. Do it for a long time.
  105. Re:Are you for lawyers or against? by mjh · · Score: 1
    You are probably correct in saying that my post is ignorant. However I think you may be missing my point. My point is that we as members of a society have an influence in determining what is more important, specificity or intent. If you like specificity, then that's something that will probably have to be paid for, further empowering those with money. If you prefer intent, then how do you discern the intent a priori without good specificity?

    My only intent (ironic) was to start a discussion of which side you think you'd rather have, or how you combine the two into something that is acceptable. Of course, no such discussion started, and discussion about how little I understand the law is what ensued. Oh well!

    --
    Key to financial independence: Spend less than you earn. Save and invest the difference. Do it for a long time.
  106. Re:Are you for lawyers or against? by mjh · · Score: 2
    I think that specifity is important. Any software engineer will tell you that

    [ ... ]

    The english language (in common use) is suitably ambiguous to make that impossible

    Specificity is critically important when you're communicating with a computer. But over specificity is actually a deterrant to communicating with another human. Ambiguity is a common mechanism by which we humans learn. So, for example, when I say "cat" you have one image in your mind and I have another image in my mind. But when we both discover that the thing that I was talking about is different than the thing you were thinking, we'll have expanded our understanding of what the word "cat" comprises. Just now, I was thinking of a cheetah. Do you remember when you learned that cheetahs and lions and tigers were also cats?

    I'm overstating some things here becuase it's been about 8 years since I took my philosophy of language class. But to sum up, generally human communication works much more efficiently when there's some ambiguity in it.

    Which leads us to where we are - I don't really like it when a silly case wins in court due to a loophole, but I see no alternative

    But it's not just in the event of silly cases. You are cornered by the silliness everyday. Clearly you use a computer connected to the Internet. Do you think that the terms of service that govern your connection protect you? The point is that if we can't figure out a better way to deal with specificity and intent, then we consign ourselves to the rules that are written by those who can afford to write them.

    I don't have an answer for this either, but I'd like to think that the discussion isn't so easily dismissed.

    --
    Key to financial independence: Spend less than you earn. Save and invest the difference. Do it for a long time.
  107. Are you for lawyers or against? by mjh · · Score: 5
    I think that a person's stance on this issue, of who was right and who was wrong, will indicate that same person's approval of lawyers and much of the practice of the law today.

    If you are for the challengee, then you believe that exact specificity is more important than intent. And you inadvertantly empower the legal profession to profit from being better able to specify rules, etc. Which, in turn, gives those with money even more power. They can write the rules by which your interaction with them will be governed. You can be certain that the rules will be written to the advantage of the guy with the money.

    If, on the other hand, you are for the challenger, then you believe that his intent was more important. (Considering that the challenge came from the compression FAQ, the exact definition of "compression" should have not been vague. It seems clear to me that his intent was obvious, even though his challenge wasn't very specific.) Still there is a problem with valuing intent over specificity. That is it's easy to misrepresent intent after the fact. Thus the need to clearly specify our intentions prior to anything we do. That being the case, with accurate documentation, it seems possible to me that we can discern the original intent w/out having to specify everything legalistically. For example, I think that I was able to clearly determine the challenger's intent well after the fact, and I did this having read the challengee's representation of the events.

    Now you may say, in rebuttal, "Who cares who I choose? I don't get to make the law." But think of yourself as the jury trying to determine to whom you would award the money. You may one day find yourself on a jury and trying to determine who should be awarded money. I see this as a microcosm of exactly how our court system has gotten into the state that it's in. Specificity is clearly more important than intent. And it's that way because juries award that way. Since juries are comprised of members of society, it's not a far stretch to say that society as a whole regards specificity as more important than intent.

    Is there any chance that we can change this? And if we did, would it be better?

    --
    Key to financial independence: Spend less than you earn. Save and invest the difference. Do it for a long time.
    1. Re:Are you for lawyers or against? by flounder99 · · Score: 1
      Are you for stupidity or against?

      This has less to do with lawyers and everything to do with stupidity. If you are stupid enough to make a stupid bet you deserve to loose. This was a gamble. The challenger though it was a sure thing, but it was still a gamble. He gambled, he lost. That simple. It was a stupid bet. I am less afraid of lawyers than I am of stupidity. BTW, Who was the first to bring up legal action???

      (Actually the most frightening thing is when lawyers or anyone else defends stupidity)

      my $0.02

      flounder

      --
      I don't like .spam. in my email address, neither should you
    2. Re:Are you for lawyers or against? by radish · · Score: 2


      I was very interested by your comments regarding the difference between specification and intent. However, I think there are some other factors in play here.

      You mention intent - let's look at the challenger's intent. It was NOT to find an amazing new compression scheme, in other words this contest was not meant to sponsor research in the field. He set the challenge PURELY to get one over on everyone else, to shout "told you so - ha ha ha" like some kiddie cracker. As he's essentially trying to catch people out, I think the onus is on HIM to make sure he specifies things EXACTLY. What has happened is the smartass challenger has had his whole scheme flipped up and fired back at him, which I personally relish and applaud.

      I think that specifity is important. Any software engineer will tell you that - the number one quote from a user when you deliver your system?? "Oh, I didn't MEAN I wanted it to do that...". You may think you know what the other party intended, but how can you be sure? The english language (in common use) is suitably ambiguous to make that impossible. (An aside - I'd love to know if developers in non-english speaking countries have the same problems, is english particularly ambiguous??). A lawyer would tell you that's exactly what they are trying to do, deduce intent from the actual written law (or contract or whatever). The two sides have differing interpretations, which are simply two different models of the original intent, deduced from the same written document. Where do you draw the line between the "quest for truth" and 2 bit scumbags just trying to pick holes in perfectly good laws? The simple fact is I don't think you can.

      Which leads us to where we are - I don't really like it when a silly case wins in court due to a loophole, but I see no alternative. People will disagree, they will always disagree, so how do you settle it? If you don't go with exact interpretation of the letter of the law, you end up having to go with an interpretation of the spirit of the law, which puts total (and to me unacceptable) power in the hands of whoever happens to be making that interpretation. At least an argument over english linguistics is transparent and explainable. Hopefully the inclusion of a jury in many cases helps to add the "human touch" to the process - at least I know it did when I served recently.

      --

      ---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"

    3. Re:Are you for lawyers or against? by aozilla · · Score: 1

      The challenger though [sic] it was a sure thing, but it was still a gamble.

      RTFA. "Obviously, I would like you to send me the $5000 as promised, but I never really expected you to. I'm surprised we went as far as we did. I thought you were going to pull out the morning after the original emails after realising your mistake. You can keep the $100."

      --
      ok then your [sic] infringing on my copyright! Could you as [sic] me next time before STEALING my comments for your own?
    4. Re:Are you for lawyers or against? by aozilla · · Score: 1

      I'm a stupid idiot... you meant challenger meaning Mike... Unsubmit, Unsubmit...

      --
      ok then your [sic] infringing on my copyright! Could you as [sic] me next time before STEALING my comments for your own?
    5. Re:Are you for lawyers or against? by Theodore+Logan · · Score: 2
      In this case, it is clearly obvious that the challengers intent is either to make a fool out of someone, or to make money. Not only does this person (Mike) believe the challenge will not be successfully met, he also believes it cannot be successfully met. This is not good sportmanship to me, and the only right thing to do is to look for the little things in the way the challenge was stated, that may allow bending the rules just enough to "win". This is exactly what Partick did.

      Had Mike had a different intent with this challenge, or stated it so that he thought it would be possible, but very unlikely, that he would have to cough up, the kind of approach Patrick took may be considered wrong. But this was not the case, and it seems to me as if you're trying to make this into something that it isn't.

      --

      "If you think education is expensive, try ignorance" - Derek Bok

    6. Re:Are you for lawyers or against? by startled · · Score: 1

      I wasn't actually aiming the "naive idiots" bit at you-- your original statement was quite different: "approval of lawyers and much of the practice of the law today". My statement was misleading, but I intended to refer to the lawyer-bashing contingent, and not to people who are unhappy with the current state of the courts and/or law. Sorry for the confusion.

    7. Re:Are you for lawyers or against? by startled · · Score: 2

      I believe you're under several misconceptions, which might be remedied by sitting in on a few trials, or reading up on a few cases.

      First of all, only naive idiots are universally pro-lawyer or anti-lawyer. Lawyers, like any other large group of people, are diverse. Often, people who say they're anti-lawyer mean the asshole working for the insurance company who cheated them out of a claim; they don't mean the guy suing to protect national forests, or defending the latest Leonard Peltier.

      Next, intent plays a much larger role than you'd believe. Law is typically not a game of "gotcha!" on contract wording. Intent determines whether you're charged with manslaughter or murder. Intent quite often is used to clarify, or even override, portions of contracts. If the challengee went to court, and attempted to base his entire case on the "sure" response to his "compressor" e-mail, he'd likely get tossed out-- the challenger could simply say, "oh, I thought he meant decompressor", or, "I was just attempting to clarify-- that wasn't intended to be a modification of the contract", or some other such thing. Again, the law is not a simple game of "gotcha".

      Finally, juries are another reason specificity is often trumped. If they see a person getting cheated out of their money because they missed one line in a quite long, and very misleading, contract, they'll tend to side with that person.

      You should sit in on a case or two some time; they can be quite enlightening. Court rooms are generally open to the public, and while some of the cases may be quite dry, you'll get a lot of insight in just a few hours.

    8. Re:Are you for lawyers or against? by TekPolitik · · Score: 1
      If you are for the [challenger], then you believe that exact specificity is more important than intent. And you inadvertantly empower the legal profession to profit from being better able to specify rules, etc. Which, in turn, gives those with money even more power. They can write the rules by which your interaction with them will be governed. You can be certain that the rules will be written to the advantage of the guy with the money.

      Actually, this result wouldn't necessarily benefit the challenger. A lawyer with any technical ability would identify the directory itself as one of the files necessary to complete the decompression in this case - you certainly couldn't complete the decompression without it, and in every file system I've ever seen (except perhaps MSDOS 1.0 or Apple DOS 3.3), a directory was in fact just a special type of file.

      Same thing if it was wrapped in a tar file - the tar file would be a necessary file to complete the decompression.

      The challenge was not won - neither by "exact specificity" nor by "intent."

      On the other hand, the challenge as specified appears to be a private lottery, which is illegal in most jurisdictions that matter here anyway.

    9. Re:Are you for lawyers or against? by YKnot · · Score: 1

      I think he doesn't deserve to get the $5000, because he didn't compress the data (which he realized himself when he pointed out that using the filename would be cheating). He does however deserve to get the $100 entrance fee back, because he played by the rules. The important thing is that if you choose to create a set of rules for something, especially if they sound like you have thought about all implications, you have to expect that they will be seen as an exact description of what is allowed and what is not. Loopholes found will be loopholes exploited. On the other hand, if you state your intention and point out that you want people to act with that intention in mind, you open yourself up to ambiguousity which has its advantages and disadvantages.

      In this case (The challengee already explained this): the filename and other fileheader information was not counted as part of the orginal.dat file, therefore logic dictates that filesystem information was not to be counted. The rules don't explain this, so logic deduction and asking for clarification are the only ways to resolve the ambiguousity. The challengee did both.

      Why do rules exist? Because people usually have different opinions on one hand and need to agree on certain things on the other hand. To really understand someone else's intentions, you need to know his context. For relatively homogeneous groups, that is not a problem. Everyone in the field of information theory would probably understand that several files were against the intention of the challenger. The more the contexts differ, the harder becomes gaining certainty about someone else's intentions. That's why rulesets in heterogeneous groups become bloated and look like they deal with every special case. But context is never really obliterated, just reduced. As with many things, balance is more desirable than the extremes.

    10. Re:Are you for lawyers or against? by YKnot · · Score: 1

      the challengers intent is either to make a fool out of someone, or to make money. Not only does this person (Mike) believe the challenge will not be successfully met, he also believes it cannot be successfully met.

      He also didn't expect anyone to expect anyone to take that challenge. It was a rhetorical challenge. He put it in the FAQ to underline the futility of writing a compressor for random data. See the problem with taking guesses at other people's intentions?

    11. Re:Are you for lawyers or against? by YKnot · · Score: 1

      Accuracy in the presence of context is unnecessary overhead. Licenses are not blindly ok-clicked because they are too complicated to understand but because they are too long.

    12. Re:Are you for lawyers or against? by sulli · · Score: 2
      Which is precisely why they are made that long - so people won't read them and object.

      Back on topic, I think Mike should return the $100 (or Y12200) in a nice picture frame. Patrick nailed him good, and even though he didn't disprove the mathematical principle, he did poke some appropriate holes in his hubris. Exact intent of the contest is less important than the specifics of their exchange; Mike (who really should have known better) took the bait, hook, line, and sinker!

      --

      sulli
      RTFJ.
    13. Re:Are you for lawyers or against? by odin53 · · Score: 1

      Wow, this is surprisingly ignorant, no offense. I would say, in rebuttal, that for every lawyer representing a person who would win based on "specificity," the other side's lawyer -- assuming they have no "specificity"-based arguments -- will argue for "intent." This is what you see ALL THE TIME. In any event, intent is ALWAYS relevant in a "specificity"-heavy case. Why do you think the current Supreme Court -- remember that judges/justices are all lawyers -- split all the time? At least with respect to challenging the constitutionality of statutes, it's quite often a conflict between specific words and congressional intent.

  108. Million monkeys at a million typewriters Solution by sirket · · Score: 1

    There is a solution to this problem, but it takes a _long_ time to work :)

    Write a script that:

    1. outputs random data to a file

    2. gen's an md5 sum on the file and checks it against the original md5

    3. gen's another checksum that does not overlap with md5 and check this against the original files checksum

    4. When the two checksum's match you have recreated the original file.

    This script should be easy to make small. You then simply ask for a data set one byte larger. Eventually, you will generate the original file, but it will take a long F***ing time.

    The only other problem is finding two checksum programs with large collision domains and that do not overlap. md5 is one, simply find another.if you can do this, then you can probably win the challenge, although I would hardly call this compression.

    -sirket

  109. Re:A million monkeys will get a million solutions by sirket · · Score: 1

    The point of using multiple non-overlapping check-sums on a file is that the collision domains will minimally overlap.

    The chances of finding two files with the same md5, the same sum (Alg 2), and the same crc32 are miniscule, and the more checks you use the smaller the odds. If neither program collides with the other you are garanteed a unique file.

    md5 is a 128 bit checksum. crc32 is 32 bit. sum (alg 2 is 32 bit). Are there times in which these algortihms will produce the same answer for two different files? Yes. Are there times when they will _ALL_ produce the same answers for two different files? Yes but the chances have become very remote. The question is how many will produce useful input for a program. Probaly only one.

    You could also check some of the byte sequences that you know from the original file to make sure that the randomly produced file is the correct one. If you know some small part of the original data as well as the checksums you can reduce the collsion domain to zero and garantee you have the original file.

    Eventually you should be able to pick a set of check-sum programs that produce somewhat unique results, and then choose a subset of the original data, such that the two taken together produce the original data uniquely, and do so in a smaller space than the original file.

    -sirket

  110. Re:But then what about "compressible" files. by PurpleBob · · Score: 1

    And you would have destroyed anything anyone knew about information theory in the process! Yay!

    Say you had an algorithm that would take a "compressed" file of all but the last 4 bits of information, and guess at the last 4 bits. It'll get it right 1/16 of the time. That's great! You can spend $1600 and win $5000! So then all you need is a program to do this that fits in the remaining half a byte! That's not too hard, right? :P

    Say you're a really l33t assembly coder and you can make an algorithm that would do this sort of thing in 20 bytes. You'd then have to guess at least 20 bytes of information on each attempt. The prize for the contest would have to be approximately 1.5 quindecillion dollars (1.5 x 10^48) for you to have a chance to make a profit.
    --

    --
    Win dain a lotica, en vai tu ri silota
  111. Re:Compression by PurpleBob · · Score: 1

    The flaw is that, for ANY file size, the probability of finding enough 1s in a row to be longer than the code to reinstate them is damn near close to 0. It will never get close to 50%. It won't even get close to 2%, the break even point. It won't even get close to a tiny fraction of a percent. You'd be better off going out and looking for a $5000 bill on the street.
    --

    --
    Win dain a lotica, en vai tu ri silota
  112. Re:One born every minute by PurpleBob · · Score: 1

    If random data could be compressed in a useful way, it would certainly be a mathematical revelation. You would be discovering that (say) 1000000 bits of pure information was equal to 999999 bits of pure information, and it follows directly from there that 1000000 = 999999, 1 = 0, TRUE = FALSE, "the universe" = "gone", etc.
    --

    --
    Win dain a lotica, en vai tu ri silota
  113. Re:The page has been removed by GeoCities by MikeBabcock · · Score: 2

    And here if needed.

    --
    - Michael T. Babcock (Yes, I blog)
  114. Dangerous game it is to issue vague challenges ! by fisman · · Score: 1

    According to me the challenge was aimed at proving that it was impossible to build a perfect decompressor and that all compressors make use of redundency in data. This is a dangerous assumption to make for this challenge ! If the challenge required 2 files to be compressed it would have been much closer to being safe.

    Consider this:
    0x23 ^ 0x34 = 0x77B352C55E214991

    Thus we could "compress" the 64bit binary data to 24bit binary data by storing only 0x239434. This indicates 2x8 bit numbers separated with a mathematical symbol. This leaves 56 bits for code, etc/64 bits of data -> 87.5% of data size for code in the decompressor.

    The point being that if the decompressor need only work on a single dataset there are usually cunning methods of representing the data. Such a decompressor will not be usable to compress day-to-day data (as is well known from compression theory) but the number of possibilities to find a pattern that is known to exist in a file increases the probablity of meeting such a challenge greatly.

    Am I just crazy or is this a possibilty?

  115. Re:There is no solution... by ZahrGnosis · · Score: 2

    I have to agree with this. The trick is Given a sufficiently large file. I can't imagine anyone coming up with a 2GB file off of which a modified RLE program couldn't snip a few kB. By "modified", I mean spend all the time you want analyzing the file (this is allowed), and find strings of similar bytes in arithmetic progression, or in strange but calculable patterns. Remember, the compressor can be as complicated and large as possible; it's the decompressor that needs to be small.

  116. Re:There is no solution... by MadHobbit · · Score: 1
    You should be able to compress any individual piece of data that you choose. Of course the decompression script that you write will not necessarily work for all such data.

    True enough...if you don't consider the decompressor part of the output. Similar to what Patrick did, I could check the first byte of the file, strip it, and write a "decompressor" that sticks the byte back. However, if you include the decompressor, then the decompressor is part of the output, and the "decompression algorithm" is actually "run this program through an x86 (or whatever)". Like it says (all over the place, here), you can't do it generically. That's not to say it can't be done for a given file...maybe I can find an LCG pseudo-random number generator that creates the first 50k of the file.

    The technique which was used in this case would not work for instane if there was a massive under representation of 5's in the data set.

    Of course, in that case, you could use the lack of fives to your advantage. Maybe a Huffman encoding?

  117. One born every minute by wowbagger · · Score: 1

    My BS and scam detectors burned out about three sentences into the guy's offer. "Give me $100 and I *might* give you $5000"? That is a classic scam, a variant of the "Give me $1000 ernest money and I'll split this $10000" scam.

    Sorry, but technical details aside, anyone who responded to this should avoid timeshare offers, Ginsu knife commercials, and Three Card Monty players.

    1. Re:One born every minute by waynem77 · · Score: 1
      My BS and scam detectors burned out about three sentences into the guy's offer. "Give me $100 and I *might* give you $5000"? That is a classic scam, a variant of the "Give me $1000 ernest money and I'll split this $10000" scam.

      Would you please produce a quote from the offer confirming this? I don't see anything of the sort on the page, and I'm curious as to what you're seeing.

    2. Re:One born every minute by TobyWong · · Score: 1

      And all he has to do to pull it off is act the part of the slimy cheat!

      --
      - Toby
  118. Re:There is no solution... by Stonehand · · Score: 1

    That only compresses if a, b and c combined use fewer bits. That's not necessarily the case. For instance, if n=5:

    n = 101 : 3 bits
    n = (10 * 10) + 1 : 5 bits
    n = (100 * 1) + 1 : 5 bits

    And that's ignoring overhead bytes to specify how many bits each number has.

    --
    Only the dead have seen the end of war.
  119. Re:Nice try but wrong by Stonehand · · Score: 1

    You've clearly never heard of the pigeon-hole principle, or considered applying it to compression theory.

    You can't squeeze the 2^n unique possible strings of 1s and 0s of length n into the (2^n)-1 unique possible strings of 1s and 0s of length 0..(n-1) without loss.

    --
    Only the dead have seen the end of war.
  120. Re:engodelization discussed by an amateur by Stonehand · · Score: 1

    Not in the general case. No method will in the general problem of being ONE algorithm that will losslessly, strictly compress/uncompress every possible bit string.

    (strictly compress => must reduce in size)

    The usual proof --

    * For arbitrary $n$ within $\mathcal{Z}^{+}$, there are $2^{n}$ unique binary strings of that length.

    * There are only $\sum_{i=0}^{n-1} 2^{i}$ unique binary strings that are *fewer* than $n$ bits long.

    * That summation adds up to... $2^{n}-1$.

    * Therefore, the pigeonhole principle applies; if all $2^{n}$ strings map to strings of length $n-1$ or smaller, there must be at least one output (compressed) string which is now ambiguous since it could mean either of at least two strings of length $n$. Ergo you've lost information. Conversely, if there is no ambiguity, then at least on string of length $n$ was mapped to a string of at least length $n$ -- IOW, not compressed. And this holds for every positive $n$...

    And when you combine different methods, you have to remember to account for the bits that tell you *which* decompression method to use.

    --
    Only the dead have seen the end of war.
  121. Re:Someone correct me please... by Stonehand · · Score: 1

    There's a fair chance it wouldn't work. Every algorithm will have bit strings it cannot compress... so, basically, it's a search through the infinite space of possible algorithms to find one that does apply.

    --
    Only the dead have seen the end of war.
  122. Silly little man! by Doubting+Thomas · · Score: 1

    Mr Goldman is really pushing his luck here. He's bet, at 50:1 odds, that someone can't compress truly random data. That's great. Except Random.org -isn't- truly random data.

    If you look at the stats, you'll see that the ent program suggests that the data being generated is at least superficially compressible, in the neighborhood of one quarter of one percent. All Mike said was that you have to compress it 1 byte. Given a large enough file, and a good statistical compressor, it's entirely possible that he'd have to cough up the dough.

    It may still be possible to honestly 'trick the trickster', here.

    -

    --
    Just because it works, doesn't mean it isn't broken.
    1. Re:Silly little man! by Mads+Haahr · · Score: 1

      The real-time statistics are computed on fairly small blocks of data (8 Kb). The more data you pull from random.org, the higher the accumulated entropy gets. (A new statistics module will be online in a couple of months and will feature accumulated entropy.) I therefore doubt that taking larger amounts of data will make it any easier to compress it. Compare the statistics for the 8 Kb blocks with those for the 1 Mb file mentioned in my essay.

      When you look at such small amounts of data, entropy isn't a very good test for randomness. Even for large amounts of data, it is by no means exhaustive. For example, a typical gzipped file has a high level of entropy but is not very random. Other tests (such as chi square) can be used to reveal this.

      (Note that I'm not involved in this challenge. I just operate random.org.)

  123. Re:But then what about "compressible" files. by BlueUnderwear · · Score: 2
    > however you have no way to represent non-compression resistant files.

    The challenge specifies that you can choose your compressor after you had a peek at the datafile. If it turns out to be compressible enough by a known algorithm, just ship that compressor. If it isn't, use your meta-compressor.

    However, I have a gut feeling that the uncompressible files are too numerous for this meta-compressor to work: if 99% of all possible files are uncompressible, then the index number of the file would not be that much smaller than the file itself, and you couldn't fit the meta-compressor's code (which would be rather complex...) in the small space saving.

    As others have pointed out, a better alternative would be to simply "play the odds": make an algorithm which only works in 10% of the cases. If the file can't be compressed, resubmit an entry until you get one that works. On average you would have spent $1000, and won $5000!

    --
    Say no to software patents.
  124. Silly Contest by selectspec · · Score: 1

    First of all, the decompressor was aloud to be written in shell scripts. There are all sorts of hacks one could do if you are able to link to other code for your decompressor. It should have have to have been written in C, and should have have to have relied on stdin.

    --

    Someone you trust is one of us.

  125. Re:Compression by selectspec · · Score: 1

    reverse crc compressions:

    For every block of real data:
    data[blocksize]

    compressed data:
    data[blocksize - 2] + crc

    The crc is only 1 byte and can be defined as the crc of this block and the previous crc. The decompressor then chops through the data, when a crc comes up incorrect, it goes back and chooses a different possible value for a proceeding crc until the entire series of datablocks validates.

    --

    Someone you trust is one of us.

  126. Re:PAY UP MIKE by selectspec · · Score: 2

    Clearly, this is the crux of the arguement, and intellegent people can waiver here. In a sense the fragments were compressed in that the 5's were chopped off. I would actually argue that this is a form of compression, however impractical in this case. For example, if you create a dictionary of "phrases" and install the dictionary within the decompressing client, and send the "compressed" files you are really doing the same thing that Patrick did. The difference, is that Patrick exploited the filesystem to store his dictionary data, as opposed to actually putting the dictionary inside the decompressor. In fact what he did IS compression, and the files were compressed in size. Patrick could have done even more devious techniques by moving all of the data into the filenames themselves. Because Mike did not spesify that external linkage to datasources was illegal, what Patrick did in my view was within the confines of the contest.

    --

    Someone you trust is one of us.

  127. A Solution by selectspec · · Score: 2

    One solution however impractical an imperfect is to reverse-checksums. Essentially, you take the data block and perform a series of running checksums (like CRCs). You store only the checksums in the "compressed" file plus tidbits of "hint" real data. What you have is a lossy form of compression. However, with enough processing power (a few dozen E10,000s) you can reverse calcuate the checksums which will give you accurate results with 99.999+% of data blocks (some data blocks will return false solutions). It is simply a matter of home much processing time you are willing to throw at the problem.

    --

    Someone you trust is one of us.

    1. Re:A Solution by Animats · · Score: 2

      MD5 is supposed to be resistant to that kind of attack. Generating data with a given CRC, though, is not all that hard.

  128. Re:Compression by selectspec · · Score: 2

    Aside for not clarifying the use of external dictionaries as being illegal (Mike let Patrick use the filesystem filenames to store data), Mike also didn't clarify processing time. If processing time is not a factor, you can create reverse CRC/Checksum compression that is acurate almost everytime. Reverse-crc compression is accurate inversely porportionate to the rate of compression. So, you simply need to choose an large data block (1 gig or so) to compress with a very low rate of compression, say 1000/999 or something like that. Then you let the reverse crc decompressor work on it for a few days, and eventually you'll come up with the right data 999/1000 times.

    --

    Someone you trust is one of us.

  129. PAY UP MIKE by selectspec · · Score: 3

    .

    Patrick I meant can I send you a compressor and several compressed files whose total file size is less than the original uncompressed file and from which I can regenerate the original uncompressed file

    Mike Sure.

    Pay up Mike. Pay up. You were too careless to plan out the rules and think about your intent before you threw 5k into the sink. Cough up the cash or loose all honor.

    --

    Someone you trust is one of us.

    1. Re:PAY UP MIKE by CloudWarrior · · Score: 1

      OK - run the fragments through gzip then, if it makes you feel better.

      CloudWarrior .o. "I may be in the gutter but I look to the stars"

    2. Re:PAY UP MIKE by Paradise_Pete · · Score: 1
      They weren't "compressed files". That's the only problem. They were just literal fragments of the original file.

      Fragments smaller than the original. In other words, compressed. As soon as this guy started asking for that rule change it should have caused the challenger to become wary of a trick. He didn't, and his arrogant challenge was met.

      He thought he was dealing with a kook, and having come to that conclusion, stopped thinking. He was drooling over the prospect of having found a fool for his personal entertainment. No doubt he was looking forward to sharing the tales of his fool's efforts with the rest of those "in the know," and all having a good laugh at this ignorant kook trying to square the circle. Instead, he was the fool.

    3. Re:PAY UP MIKE by Tungz10 · · Score: 1

      Grant? It was not "philanthropic" so to speak. Mike accepted the $100 he should be prepared to pay up.

    4. Re:PAY UP MIKE by jgerman · · Score: 2
      EVERYBODY doesn't know anything. Science is NOT a world where everything is fair.

      Let's start with the term EVERYBODY. Your use of the word has implicit assumptions that you consider self evident truths. Does EVERYBODY include the five year old down the street? I guess it's not EVERYBODY then is it. Your assumption on the meaning of everybody is vague and ambiguous. Thus the need for explicitly written rules and any loophole in the agreed upon rules is valid. Regardless of whether EVERYBODY knew what the intent was. This is a case of smug idiot screwing up the rules to his challenge.

      As far as science being fair. Fair to whom? That's a completely meaningless statement. why don't you elaborate and show how science is FAIR?

      --
      I'm the big fish in the big pond bitch.
    5. Re:PAY UP MIKE by FTL · · Score: 2
      > I meant can I send you a compressor and several compressed files...

      They weren't "compressed files". That's the only problem. They were just literal fragments of the original file.
      --

      --
      Slashdot monitor for your Mozilla sidebar or Active Desktop.
    6. Re:PAY UP MIKE by landaker · · Score: 1

      Go read the page again:

      He didn't move any data into the filenames. He split the file, yes, and removed '5' from them, yes, but none of that was stored in the filenames.

      He talked about how he *could* do this, but didn't because he (Patrick) considered it cheating.

      >Even if the order of files supplid to the
      >"decompressor" is preserved, renaming them
      >renders the original data unrecoverable.

      No, actually if the file order was perserved, even if the files were named anything, it would have worked. =)

    7. Re:PAY UP MIKE by david+duncan+scott · · Score: 1
      No, it's nothing like a research grant. He wanted to see somebody make a fool of himself.

      The weird thing about this case is that it was the victim who asked the crowd, "I bet you can't pick up this dime without touching the glass!"

      Here's an interesting question: does he have the $5K?

      --

      This next song is very sad. Please clap along. -- Robin Zander

    8. Re:PAY UP MIKE by spongebob · · Score: 1

      Science is Fair huh? Ask Telsa about that one!

    9. Re:PAY UP MIKE by dasmegabyte · · Score: 2

      Well, this isn't just a bet. It's like a research grant. Mike wants to see compression -- meaning, the file takes less disc space -- and that isn't what he got. He got a bunch of files that no doubt took up more disc space due to the file information headers...basically, he did the same thing as those cheater programs back in the '80s that would "compress" files by moving the EOF up to the front of the file, and storing the real bits in the free space area. The trick here was in the file system, and on a file system that included the size of the header in the file size (like MacOS) it would be obvious he failed. Don't try and milk Mike's grant money for a shuyster who did nothing other than lie in an attempt to win(which is what this was...even if it wasn't especially shiftless). EVERYBODY knows the purpose of this challenge is compression, not rhetorical trickery, and the advancement of computer science is a bit more important than a scientist's offhanded approach to definition. I agree that he was careless, but science *IS* a world where things are fair...and in this case, fairness precludes and should preclude honor.

      --
      Hey freaks: now you're ju
    10. Re:PAY UP MIKE by 3am · · Score: 1

      this wasn't science, it was the CS equivalent of a bad joke. neither of them deserves anything, and the whole situation is pathetic.

      --

      A: None. The Universe spins the bulb, and the Zen master merely stays out of the way.
    11. Re:PAY UP MIKE by YAZZO · · Score: 1
      "Patrick I meant can I send you a compressor and several compressed files whose total file size is less than the original uncompressed file and from which I can regenerate the original uncompressed file

      Mike Sure."

      Actually, this would be entirely viable in court. Had Patrick had any desire to do so whatsoever, he could have collected. Further, depending upon several specifics HE could have pinned fraud charges on Mike.

      Mike was an arrogant fool. It was quite laughable how he immediately threatened to sue upon seeing that Patrick had indeed, technically (all that matters) won.

      There's nothing that I like more than seeing an arrogant bastard beat at his own game.

    12. Re:PAY UP MIKE by dinivin · · Score: 1


      Nothing in life is fair including science... Mike fucked up, plain and simple. He wasn't clear in what he wanted, and Pat fulfilled the rules of the contest, though maybe not the spirit of the contest.

      BTW, if you think this was seriously an attempt by Mike to spur research into compression, you are either incredibly naive or incredibly stupid.

      Dinivin

  130. This is easy by __aahlyu4518 · · Score: 1

    Just gzip the file and send it together with the gunzip program. The total of the 2 files bigger than the original ? No way !! It's just, for example, minus 20kb smaller :-)

    OTOH... same goes the other way around... If you would solve this challenge... the guy could always say : 'Nope... it's minus 20kb bigger' and will keep his 5k in his pocket...

    1. Re:This is easy by BradleyUffner · · Score: 1

      think about this... what happens when you try to zip a zip file? Compressed files make pretty good random numbers.
      =\=\=\=\=\=\=\=\=\=\=\=\=\=\=\=\=\=\=\=\ =\=\=\=\=\

  131. original.dat mirror? by Burgr · · Score: 1

    I don't know about anyone else, but I'd be interested in taking a gander at the original.dat file, but the link is in Japan and very slow. Anyone else mirroring it?

  132. it's a bad challenge regardless of wording. by daniell · · Score: 2

    Let us disregard the issue of money and the issue of a specific answer's acceptability and analyze the question

    In my opinion any person, given enough time to come up to speed on the subject, and given that that time, plus the time it takes to come up with and implement their solution all multiplied by the factor of time it takes to maintain themselves (eating, resting etc.) during that time, and given that they live that long, can come up with a winning solution to Mike's challenge due to a flaw, not in wording of the challenge, but in Mike's reasoning behind the challenge.

    To put a lot of discussion to rest (in my mind) just read the relevent portion of the FAQ in which the challenge is found. i.e. the section entitled "9.2 The counting argument"

    Here you'll find, clearly stated, the flaw in trying to find any one algorithm which will always compress data of a fixed length to less than that length. Also you'll see the decent challege by Steve Tate which paraphrased says:

    "give me your best algorithm and the size of data it works on, and I'll give you one counter example of data in that size which will not compress with your algorithm."
    Mike Goldman's addendum challenge effectivly states:
    "I'll give you data in any size you request, you give me data and a program which in total are smaller than the original size and for which the program applied to this data produce my original data."

    Reading it in its full context, which aparently a lot of slashdotters didn't, it immediately gives itself up as a challenge which does not hinge on on the mathematical theorem it is presented with. i.e.

    Given, there is no one algorithm which can compress all data, nor a subset of data selected by being a fixed length.
    Therefore, if you give me an algorithm which claims to compress all data or all data of such a subset, I can give you at least one counter example.
    It does not follow that I can give you one counter example to all compression algorithms for a given length (of your choice).

    The implication of the first challenge is that the poser is confident in both the mathematical theory and either his analytical abilities when studying the given algorithm, or in his ability to test and check against the given algorithm in a sufficient period of time.

    Conversely, the implication in the second challenge is that Mike in fact is the possessor of a set of super-data, of which any sub-set cannot be compressed by any algorithm.

    Since we live in a time when any data sufficiently limited in size yet sufficiently large enough to offer decent gains when compressed (the challenger may pick that size) can be analyzed fully and with great speed, I think Mike has made a bit of a fool of himself.

    I am not saying that Mike can't supply <<the uncompressable data from hell>&gt, but he will have to go to some lengths to get it. It will likely be selected to be large enough so Mike cannot generate it himself without an algorithm. It cannot be generated with a small input set or that set could be discovered and when coupled with the generator would defeat the challenge. It may be generated from a large set, but the result must show no patterns that can be bitstuffed (see networking) against or more simply generated and indexed for, nor may the source set contain any of the same properties. It may be possible that Mike is collecting this data from a source, i.e. digitizing random radio signals; we may all be interested in any source data which cannot by any reversable algorithm be represented/stored more succinctly particularly without any regard for retaining its meaning while compressed.

    In summary, if Mike doesn't want to be giving away money, then he doesn't understand his own challenge's failure to be based on any mathematical certainty (unless I'm wrong and he has some super-data). However Patrick's solution was clearly not compression, though it was the start of a decent form of compression. Had concatenated his file-set and headed it with information as to each file's length in order, he just might have saved space enough for the decompression, particularly if he picked a suffiently large recuring patern that could be generated rather than having to be stored verbatim.

    However, I might just be able to claim, that if the universe is deterministic, if the input conditions for the creation of the universe could be represented in less data than a generated block of data which can be represented in that universe, and if I were given enough resources in which to recreate the universe, I should be able to offer the input conditions of the universe and an index into it's ongoing generation which could point-to a generated block of data which exactly matches one provided. I just might be able to give you the process, its inputs, and your data before you've finished generating the data for my participation in the challenge, if in my generated universe I could significantly warp time without affecting the results.

    -Daniel

    That last part's an odd joke

  133. Re:Compression by daniell · · Score: 2
    I tend to agree that it should be always possible to get the $5000 from this "challenge". You can see my extended thoughts at: article 403

    I'm surprised at the number of people using the FAQ's "pigeon hole" theorem backwards. The theorem states that a reversable algorithm cannot be used to represent every piece of data as a piece of data smaller than itself. It does not address being able to pick an algorithm well suited to just one specific piece of data. If there is any data to which no compression alogrithm can be applied to, tailored to, or made for, that's a pretty amazing piece of data, and we should all be made aware of it Mike.

    -Daniel

  134. "to be disbursed at my sole discretion" by Baldrson · · Score: 2
    The reason I said "to be disbursed at my sole discretion" in the first sentence of my rocketry prize was to make it clear that I was acting as the arbitrator of my own "contract" with the public and that therefore if anyone wanted to play "games" with me, they would quickly discover they were playing with themselves.

    Goldman's mistake was two fold:

    1) He didn't put a phrase like "to be disbursed at my sole discretion" at the start of the description of the prize award.

    2) This he failed to do so within an arena where the distinction between "tricks" and "solutions" is frequently nonexistant: mathematics.

    1. Re:"to be disbursed at my sole discretion" by Baldrson · · Score: 2
      And of course, I had at least a one-fold mistake in my comment on Goldman's second of his two fold mistake -- which comment should have read:

      2) This he failed to do within an arena where the distinction between "tricks" and "solutions" is frequently nonexistant: mathematics.

      Such mental typos being made at my decaffinated 6:54 AM brain's sole discretion.

    2. Re:"to be disbursed at my sole discretion" by Baldrson · · Score: 2

      If someone had a genuine solution -- something that actually fulfilled the intent of the challenge -- they would be likely to submit the $100 because the odds that Goldman was not scamming the compression group would be vastly higher than 1/50. :)

  135. The Lesson to be Learned by MegaFur · · Score: 1
    Patrick has this to say at the bottom of the page:
    My main motivation was to "out-trick the tricker". I thought the chances of me making any money were very remote. ... [As soon as ... it was obvious he wasn't going to pay up,] I was just trying to avoid the situation where he said that I had made up either the whole thing or the emails where he agreed to multiple files.

    I think the only reason Mike offered the $5000 prize was because he was convinced that the challenge was impossible. He probably thought that no one would ever take him up on it anyway. When I showed an interest, he couldn't believe his luck and allowed me to bend the rules a bit so he could get my $100. ... This should be a lesson to anyone else who wants to set (or take) such challenges with high stakes.

    When you set a (seemingly) impossible challenge, you've got to be really careful. Someone might just find a way to complete it. Especially if you don't give them very well-defined rules. I mean, obviously the whole reason why the person will be likely to find some alternative approach that borders on cheating is presicely because the task your setting for them is next to impossible. Remember Kirk and no-win scenerio...

    One of the solution paths people often forget about is the one where you re-define the problem to something that's (moreeasily)solvable. :-)

    --
    Furry cows moo and decompress.
  136. clearly he completed the challenge by userunknown · · Score: 1

    Looks like he did it to me, and the proof of his victory is all over the net. I don't see any way the sponsor of the challenge could weasel out of this one. I guess he's lucky Patrick was such a gracious winner. -Mark

  137. I don't think he won the challenge. by jidar · · Score: 1

    I think it is clear to anyone with any intellect who reads this that he took the contest out of context. People, a great deal of misery is the result of miscommunication. I'm talking about people who hear what they want to hear as opposed to what someone else is trying to convey, and lets be frank, spoken language is bad enough of a medium to convey ideas without people intentionally looking for ways to misinterpret it to their own ends.

    Although I do still think it was a clever prank to pull, but not one that I would seriously make a webpage complaining about not getting paid.

    Of course the argument is "But he said...!"

    Well, my only response would be "Hear what I mean, not what I say."

    --
    Sigs are awesome huh?
  138. Re:That wasn't in the challenge by jidar · · Score: 1

    I find that odd. I observed a similar contest and the planes that won went much further than a crumpled up ball would have. Using a single sheet of paper the size of a typical drawing tablet I could make an airplane that goes the distance of the gymnasium where the contest took place. I don't think you could have thrown a crumped up sheet of the same paper that far, although I didn't try.

    --
    Sigs are awesome huh?
  139. Re:Reality? or Data Hiding? by Apotsy · · Score: 2
    The guy was clever and found a loophole in the rules of the challenge. He even said so.

    The point of the story is to be careful when issuing so-called "impossible" challenges. Don't leave loopholes in the rules!

  140. Re:Bar room bets? by Apotsy · · Score: 3
    Exactly. "Pompus ass" were the first words to come to my mind as well.

    When Goldman gleefully posted to the newsgroup that someone had accepted his challenge, he said "some people just don't understand information theory too well". Yeah, and some people just don't understand issuing challenges very well, either!

    When you make a challenge like that you absolutely must be sure to word the rules carefully and try to avoid leaving any loopholes. In this case, all Goldman really needed to do was tack a sentence at the end of his original challenge saying "the data must actually be compressed according to the definition of 'compression' contained in this FAQ. Challengers who merely point out loopholes or exploits in the rules of the challenge will not be rewarded".

    The moment Mr. Craig started iquiring about about the specifics of the rules and saying "wait and see" regarding his methods, that should have been a warning sign to Goldman that Craig had some trick up his sleeve. But no, Goldman was just so in love with the idea that he found some "impossible" task which suckers might try to undertake that he couldn't be bothered to think about such things.

    What's even more embarassing to Goldman is his initial reaction to Craig's submission:

    I have accessed your server and downloaded exactly two files, the "decompressor" (dfi) and the "compressed file" (comp). The rest of the files in the directory (comp.0 through comp.217 inclusive) I have not touched.
    Even though Craig had already asked if more than one file was okay and Goldman had said yes!

    And Goldman's following point:

    In further point of fact, the 218 parts together must occupy more space than the original file (original.dat) that you were given. Each file requires space for a directory entry, and each such directory entry requires in excess of one byte of space. Thus you have in actuality expanded the data which you were given to compress.
    is even more ridiculous given that Goldman had already said that "sure" in response to Craig's question about multiple files whose "total file size" (which does not include directory entries) are less than the size of the original data file. Goldman tried to change the rules rather than admit he was sloppy in setting them up.

    Goldman also says "I think that I did not make a mistake", even though he admits in the same message that he needs to restate the rules:

    Perhaps it will be necessary to restate the terms in a legalistic form which is not open to this sort of miscommunication...
    "Perhaps"? How about "definitely"! And any "miscommunication" was entirely one-way. Craig was very sly and clever, and Goldman was too full of himself to realize what the fuck was going on until after the fact. A "pompus ass" indeed.
  141. Re:Psuedo-random generators by kevinank · · Score: 2

    I think in his post to comp.compression Mike wrote that his random data had come from random.org. Random doesn't use a PRN generation algorithm; they get random data from atmospheric radio noise (tune your AM radio to a non-station and you'll hear what that means.) To remove any bias in the data they postprocess the noise by removing high order bits, and the low order bits are shifted to normal using a bit discarding algorithm.

    --
    LibBT: BitTorrent for C - small - fast - clean (Now Versio
  142. Let's raise some funds to pay for Mike's lawyers! by Choron · · Score: 1

    What about raising some funds to sue the moron and give Mike some good lawyers ?

    --
    "Naughty, naughty, naughty, you filthy old soomka !"
  143. No. by AndyL · · Score: 1

    No, because in almost all instances the formula would involve operating with multi-million-digit numbers totaling more then the original MP3.

    You'd waste a tremendous amount of processing trying to find applicable equations that would rival current compression.

    Of course if you're willing to take some loss then it starts looking up again. But that doesn't help for this contest. Who would want lossy white-noise? It'd be rendered completly worthless!

    -Andy

  144. Re:No money owed... by Nonesuch · · Score: 1

    Read the entire file. The challenger agreed that the entry could use multiple files:

    === Begin quoted text ===
    >
    > I meant can I send you a compressor and several compressed files whose
    > total file size is less than the original uncompressed file and from
    > which I can regenerate the original uncompressed file
    >
    > Patrick

    Sure -- but you send me a *decompressor*, I don't need the compressor.

    === End quoted text ===

  145. The challenge was *not* impossible. by Nonesuch · · Score: 2
    The challenger was mistaken, he thought that he was offering an impossible challenge, but was mistaken.

    The comp.compression fact discusses that for any given compression algorithm, it is possible to craft an input fill that cannot be compressed.

    This is different than the challenge proposed, where Mike, the challenger, provided a file of size X, and Patrick, the challengee, was required to provide a de-compression program of size Y and a data file of size Z, where X > Y+Z.

    The difference here is that Patrick was allowed to hand-craft the compression method to the data file, but Mike was not allowed to hand-craft the original data to the compression routine.

    Basically, Mike was wrong, and should pay the $5K.

  146. Re:Reality? or Data Hiding? by Nonesuch · · Score: 2
    The quest was for 'Lossless data compression', but the challenge was presented backwards-

    Given a sufficently large source file of truly random data, I can create a binary and a data file which can recreate that one specific source file, losslessly, with my binary+data being smaller than the source. That is, I can create a routine that will work for just that one specific source file, no others.

    The problem of 'Lossless Data Compression' refers to the opposite case- given any general purpose routine to compress files, I can create a file that it cannot compress. That is the oppposite of the challenge given, and actually is impossible.

    As stated, even with the restriction of a single compressed file, the challenge *IS* winnable.

  147. Re:The bogus ideas piggie bank by Nonesuch · · Score: 2
    There is a huge difference between 'have the ability to compress randomly occurring data' and 'have the ability to compress the specific file Mike supplies'.

    It is impossible to write a routine that can losslessly compress every possible 3-megabyte file that could ever be generated.

    It is not impossible to write a routine that can compress the one specific file that Mike supplied to you. That is, it is quite likely that you could write a program that can losslessly compress that one single input set, such that:
    length of decompressor) + (length of compressed data)

    Why, he'd just reverse engineer the decompressor, patent it, get rich, and buy an island inhabited with young bikini clad virgins before the your check had even cleared.
    That would only be true for the general casse compressor-decompressor. I could write a routine that would work for that one specific input file, but fail miserably on any other input. This would win me the $5K, but be valueless for any other purpose.

    econdly, i've been thinking about this problem for about 3 years now. :) I've tried so many methods and ways of interpreting and looking at random data, i'd have enough to write a book if i'd bothered to keep accurate logs. If i had $1 for every time i said "I'VE GOT IT!" i'd probably have $5000 already. Well, maybe that's a tad off the mark, but anyway, you get my point. Right, onto my conclusion...

    The mistake both you and Mike make is that while writing a program for the general case of compressing random data is impossible, it is very different than writing a program to compress one specific file of 'random' data.

  148. Re:Compression by gimbo · · Score: 2

    Slightly over-zealous use of "Submit" without previous use of "Preview", methinks. ;-)

    BTW, you can use &lt; to get "less than", like this:

    <

    This is very very standard stuff in HTML.


    --

  149. Re:Compression by gimbo · · Score: 2

    Heh - interesting: I've never used anything other than "Plain Old Text" (including on this message - but spot the italics!).

    Strange... Hey ho, live and learn I guess.
    --

  150. Re:Compression by gimbo · · Score: 2

    def test():
    print 'so what do the other modes do?'
    this_is_the_example_for('code')
    return("cool, works the way I'd hope")

    if __name__ == '__main__':
    test()

    --

  151. Re:Compression by gimbo · · Score: 2

    But tomorrow morning, I shall be sober.

    Oh, hang on, that's not right.
    --

  152. Re:Random data... by Paradise_Pete · · Score: 1
    Base Pi.

  153. Re:This is why we need lawyes by Paradise_Pete · · Score: 1
    Because we are unable to do this today does not mean it is impossible

    True, the simple fact that we can't do it doesn't mean it can't be done. However, other facts do mean it can't be done.

    If all files of say, 1MB, could be compressed to a smaller file, then some of those compressed files would necessarily be identical. If they are identical they can't be decompressed to different files. Therefore, not all files can be compressed. Not today, not tomorrow, not ever. It has nothing to do with not knowing how to do it. It simply can not be done.

  154. 'Did I compress anything?' by Borogove · · Score: 1
    There's almost universal agreement that I didn't compress anything, but I don't think this is a requirement of the challenge anyway. I still think I compressed the data in the original file. It's not my fault that a file system uses up more space storing the same amount of data in two files rather than a single file.
    And in fact, in some filesystems he might have used less space. From what I remember of Acorn's ADFS, small files could be stored in the same allocation block as the directory catalog. So you could potentially 'show space savings' (Steve Tate's quote at the bottom of the page) simply by splitting the file up.
    -- Andrem
    --
    There has been a major scientific break-in
    1. Re:'Did I compress anything?' by Borogove · · Score: 1

      Depending on the size of the allocation blocks and the file in question, you could potentially make savings using an ADFS-like file-system, and you wouldn't even have to use the trick of removing every copy of a certain byte from the file. The design of the file-system allows small files to be squeezed into spaces that large files wouldn't fit in: large files have to use space allocated in blocks (which have a minimum size - for example, 30 sectors).

      Several small files can share one block - so you could fit one directory catalog and several small files into a single block, but a large file would have to have a whole block to itself.

      So, if the minimum block size were 60k, and a directory catalog took 2k, then a directory+70k file would require 180k, whereas directory+20k file+50k file would require 120k.

      Of course, no-one sensible would claim that this consitutes compression of the original file: I've come up with a way of storing a 70k file in only 120k! However, it is an improvement over the 180k you'd normally get, and on occasion it could be useful to know about this technique (like when ADFS reported 0 bytes free on a floppy disk, you could normally guarantee that there would be room for a few more 512 byte files...)


      -- Andrem

      --
      There has been a major scientific break-in
    2. Re:'Did I compress anything?' by markmoss · · Score: 2

      Re ADFS, Goldman apparently believed that the FAQ on his website excluded storing parts of the file in the catalog, for instance by creating a filename out of the first N data characters. If it had been explicitly stated in the two paragraphs of the challenge, I think that would have prevented the use of the ADFS to store a small file in the same allocation block as the catalog, but Goldman was sloppy...

      You cannot get real space saving's by Patrick's method -- the extra directory entries created will always be larger than the data removed from the file. Think of the smallest possible directory structure -- an array of pointers. (Files aren't named, you get them by number. Files are stored sequentially, so you don't need a length -- EOF is the last byte before the next file starts.) To accomodate files up to 2^n, the pointers have to be n bits long. Patrick's trick is to split the file at a particular sequence of m bits, m being less than n so you can be sure that that sequence will be found within a 2^n length random file. So you take away m bits and add n bits for every split.

      But Goldman's challenge was not clearly worded in this regard, even though he was apparently aware of possible cheats involving hiding data in the directory. As Patrick said, if a 1,000 byte file could have been duplicated by a 499 byte decompression program and a 500 byte data file, there is no doubt that it would have met the challenge even though it would have taken more HD space.

  155. Re:There is no solution... by TheCarp · · Score: 1

    Yes exactly. Whats wrong with that?

    He was given a challenge where it was obvious that the giver knew it was impossible (he never said he would use random data, but it would be the obvious way to give someone an uncompressable file - and he did it)

    So he took this "impossible challange" and attacked the challenge itself. The challenge said "file size" so he put the data somewhere else.

    So he showed up the trickster. Of course, the trickster then got defensive and said "no wait you didn't win" and started making up new rules... typical really :)

    -Steve

    --
    "I opened my eyes, and everything went dark again"
  156. Re:Compression by mikera · · Score: 1

    Yeah, I know about &lt.

    Just that I was submitting in "Plain Old Text" so I would have thought they'd let me get away it. :-)

  157. Re:Compression by mikera · · Score: 2

    You know, I reckon it's actually *always* possible to win $5000, no matter what file is generated, providing it is of sufficient length. This is because knowing the file to be compressed *before* you create the decopressor gives you an advantage over the arbitrary random file case.

    Furthermore, I reckon I can write a program G that will, when given any input file A of length>=X, will create a decompressor D and compressed file C such that:

    D(C) = A
    size (D) + size (C) size (A)

    Don't know what X is yet, I supspect around 1 meg is fine though.

    Anyone willing to bet either way?

  158. Re:Compression by mikera · · Score: 2

    Slashdot apparently doesn't like open angle brackets.

    the condition should read:

    size(D) + size(C) "is less than" size (A)

    Slightly over-zealous HTML filter, methinks....

  159. Re:Compression by mikera · · Score: 2

    Uh, just spotted that isn't possible. Cos then you've created G(A)->(D,C) which is a universal compressor, and therefore not possible by your usual arguments. Course, I'm still interested to see if you could create G which works for a percentage X of possible values of A that gets arbitrarily close to 1. My guess is that this is possible, in which case you can still win money off the original bet by ensuring that X>0.02 Of course, if your opponent knew G, they could always choose a value of A that would confound it. But this isn't the case here. Also, it may be that X increases only very slowly with filesize, so that in order to get X>0.02 you would need a few solar systems worth of storage and CPUs to complete the challenge. Can anyone prove/disprove any of this?

  160. engodelization discussed by an amateur by peteshaw · · Score: 1
    I remember reading sometime back about 'engodelization' as a compression algorithm. This theoretically could be used on a stream of random numbers.

    Take a block of data. ("Four score and twenty...")

    Convert it into bits. (1010001001001...)

    Arrange the bits so that they form a decimal number. (564273274687632....)

    Factor the decimal number as compactly as possible, by using brute force to encode the number as a series of prime powers. So the number becomes

    (2^A)+(3^B)+(5^C)+(7^D)+... - Delta

    So the original message becomes (an example, I have no ideas what the actual values are, this is a thought experiment) (2^71)+(3^151)+(5^0)+(7^12)-18763

    Which could be compactly encoded as (71,151,0,12,-18763).

    Presto! Godel originally proposed this I believe, and I read about this in aa novel 'Starburst' by Robert Silverberg. I can't recall anymore. Any thoughts? Would this work?

    --
    www.avacal.com -- the home page of pete shaw
    1. Re:engodelization discussed by an amateur by n7ytd · · Score: 1
      This technique was also used in a Pohl story, Gold at the Starbow's End, I think. This technique would work for non-random data, as any (lossless) compression will work for non-random data. The example you give (English text) is not random, though. Compression=removing redundancy in the source, and therein lies the problem; the source of this contest is random data, skew-adjusted to remove any possible redundancy that could be exploited.

      Godelization is a cool idea, but the problem with this technique is twofold: huge amounts of processing power required, and a massive numbering system (can anyone say "4096-bit" integer?) with absolute precision, growing larger as the source becomes larger. This could be helped somewhat by encoding the source in blocks, rather than one large stream of bits.

      BTW, I don't believe there would be any "delta" involved, as any positive integer can be expressed as a product of primes.

    2. Re:engodelization discussed by an amateur by Mr.+Obvious · · Score: 1
      Quite right, but it's possible to state the proof in a much more readable form. This is from the compression FAQ, which is one of the first links on the page referenced in the original story:
      Theorem:
      No program can compress without loss *all* files of size >= N bits, for any given integer N >= 0.
      Proof:
      Assume that the program can compress without loss all files of size >= N bits. Compress with this program all the 2^N files which have exactly N bits. All compressed files have at most N-1 bits, so there are at most (2^N)-1 different compressed files [2^(N-1) files of size N-1, 2^(N-2) of size N-2, and so on, down to 1 file of size 0]. So at least two different input files must compress to the same output file. Hence the compression program cannot be lossless.

      The proof is called the "counting argument". It uses the so-called pigeon-hole principle: you can't put 16 pigeons into 15 holes without using one of the holes twice.

      The interesting thing about this whole discussion on /. is how much of it effectively just repeats stuff found in the FAQ --- in the same question, for that matter, where the original challenge appears. I found this stuff just be clicking on the link to the FAQ and then searching for "goldman" and then paging up to the start of the question. Had more /. readers done that, I suspect the raw size of this story could be compressed (with a very lossy algorithm -- by tossing out the nonsense) by, oh, maybe 90%?

      Ron Obvious

  161. Nope. by Tungz10 · · Score: 1

    There are so many random files, that the length of the number of the compression resistant file would be very close (although slightly less) than the length of the file itself.

    However, there should be no way for the decompressor to generate compression resistant files in order to enumerate them. If the files are compression resistant, they would have to be stored bit-for-bit inside the source of the decompressor, enlarging it.

    1. Re:Nope. by edp · · Score: 2

      "However, there should be no way for the decompressor to generate compression resistant files in order to enumerate them."

      Generating files is easy. You just iterate through them: empty file, all the one-byte files (0x00, 0x01, 0x02, ... 0xff), all the two-byte files (0x0000, 0x0001, 0x0002, ... 0x00ff, 0x0100, 0x0101, ... 0xffff), all the three-byte files (0x000000, 0x000001, ...), et cetera. It would take a while, but time isn't the issue.

      Of course, the decompressor wants to count compression-resistant files, but it does this by generating each file, as described above, and then running it through the compressor. If it does not come out smaller, it is compression-resistant.

  162. Mike needs to pay out by Tungz10 · · Score: 1

    If he just offered $5000 to whoever could meet the challenge, then I'd say that intent was very important, he could argue that the "compression" scheme was not practical...he shouldn't have to pay.

    But he had no objection to accepting the $100. Money has already changed hands. Now that he lost the bet, he wants to just call the whole thing off and return the $100? I don't think so.

  163. But then what about "compressible" files. by Tungz10 · · Score: 1

    So you're running them through an external compressor (perhaps it removes strings with runs or something of the sort).

    The problem is that you are reducing the size of compression resistant files very slightly, however you have no way to represent non-compression resistant files.

    So if the random number generator happened to spit out three zeroes in a row, and your compressor would try to eliminate them, you could NOT compress the file (even increasing the size) because your decompressor ONLY decompresses "uncompressable" files.

    1. Re:But then what about "compressible" files. by Tungz10 · · Score: 1

      The challenger offered a random file of *ANY* size. I would have paid 100 bucks just to force him to make (and upload) a 100 terabyte file for me to "compress" (yeah right)

    2. Re:But then what about "compressible" files. by edp · · Score: 2

      "So you're running them through an external compressor ..."

      No, it is not external. The compressor and decompressor are written with an internal copy of the compressor known to be used by the person who generated the file.

      "... your decompressor ONLY decompresses 'uncompressable' files"

      That is the point here. This thread is about how one might go about compressing files that had been generated by somebody who filtered out compression-resistant files.

  164. Re:Compression by jareds · · Score: 2

    Anyone willing to bet either way?

    Yes, assuming appropriate restrictions against storing information in the compressed file's name or attributes, I'll give you N:1 odds that you cannot write the program G, where N is any positive number. X can also be any number.

    Or, save yourself time, don't try to write the program at all, just send me money.

  165. Re:Compression by jareds · · Score: 2

    It's not so easy to generate a file to resist compression. If you had and used a method for determining whether a file could resist compression, then I could compress your generated file simply by counting how many compression-resistant files preceded it in some sort order and writing that number as a binary file. Decompression would proceed by counting that number of compression-resistant files in the same order and writing the file found at the count.

    However, your decompression program would have to be smaller than M - log_2 (N) = log_2 (2^M/N) bits, where M is the number of bits in the compression-resistant file, and N is the number of compresion-resistant files preceding it in your sort order. So, the number of compression-resistant files would have to be small relative to the number of possible files if you want more than a bit to write your program. Since this is not the case, you would fail.

  166. Re:Compression by jareds · · Score: 2

    Nope, you don't have to record it, simply attach the part of the original file following the block of zeros to your decompressor program. The "compressed file" is the part of the original file preceding the block of zeros. Your decompressor prints out the contents of the compressed file, then 1024 zeros, then it reads from itself the remainder of the file and prints that out. The bet can be won; not in every instance of course, but in the long run.

    That reflects a mistake on the part of Mike Goldman. If you want your decompressor and compressed file to be a total of length N, you can store log_2(N) bits of information in how you set the lengths of the two files. If you have more than two files, you can store even more information. Mike should have required that (number of bits in original data)+log_2(length of original data) > sum of number of bits in each compressed file and decompressor + log_2(product of lengths of each file and of decompressor), and he should have permitted himself to rename or change the file attributes of any file he wished in case information was being stored therein.

  167. Re:Compression by jareds · · Score: 2

    Yes, or simply require that the contestant submit a single program (with prescribed file name) which can regenerate original.dat and is shorter than original.dat.

    In that case, the outcome of the bet clearly depends on the machine model. If the machine model is "complete Debian Linux running on Pentium", I would conjecture that the bet can't be won, but it is far from clear (obviously, a succinct powerful language is needed, maybe A+). On many machine models, the bet can be trivially won.

    No, the machine architecture doesn't matter. The bet can be won only if you are very astronomically lucky.

    I'd gladly stake large amounts of money on this, though it would feel like I'm running a scam.

  168. Re:Compression by jareds · · Score: 2

    If you allow me to specify the machine model ahead of time, you'd lose. This was observed by Anonymous Coward in article 466.

    As I understood the challenge, there was no requirement that the data I give you be random, though I might ordinarily give random data for such a challenge. If that was the machine model, however, I'd just give you a 0 followed by random data, and you would lose.

  169. mirror by jon_c · · Score: 1

    geocity's seems to be blocking it.

    here's my mirror

    http://streamripper.sourceforge.net/compression.ht m

    -Jon

    --
    this is my sig.
  170. Re:New Contest by Tetsujin · · Score: 1

    And the winner is... CmdrTaco!

    --
    Bow-ties are cool.
  171. Shortcoming of the solution by Tetsujin · · Score: 1

    I think the biggest flaw in the given solution (and one that wasn't pointed out too well) is that, regardless of what filesystem is used, how efficiently the filenames are stored, etc., the compressed files cannot be stored in less space than the original data.

    Think about this. For every one byte he takes out, he adds another file to his collection. Even if you store all the files in a linear block, like a TAR file or something, and store their filenames by a mechanism that requires no storage, you still need to encode the *length* of each of those blocks - requiring at least one or two bytes, depending on the block size. The challenge didn't say anything about the cost of storing the files in a filesystem (inodes or the like) but it did say that the compressed files, with the decompressor, needed to take less space than the original file - and the solution can't be stored in a way that meets those requirements, even under very generous assumptions about the conditions of the storage.

    The challenge should have been stated better, but the challenger still lost IMO.

    --
    Bow-ties are cool.
  172. What about... by BradleyUffner · · Score: 1

    What about taking that huge set of random data, representing it as a number. If you break that number down into say, a series of smaller numbers that when one number is raised to the power of the next number gives you the orriginal number? I've been thinking about this problem all morning, and this solution hit my while driveing to get lunch.
    It may be a dumb idea, it may not even work at all. If it wouldn't work could someone explain to me why it wouldn't work?
    =\=\=\=\=\=\=\=\=\=\=\=\=\=\=\=\=\=\=\=\=\= \=\=\=\

  173. The Best Resolution ... by Bob(TM) · · Score: 1

    Mike: Give back the $100 with hearty congrats for finding a way to meet the challenge with true bar room challenge flare (con the con).

    Patrick: Accept the $100, publically concede the impossibility of meeting the spirit of the challenge and relinquish honorable claims to the $5K (as you've done in the link). Buy a round for Mike and all the patrons at your favorite pub. Save the story as a "good one" for the rest of your life.

    --

    The little guy just ain't getting it, is he?
  174. The use of context information in compression by cribeiro · · Score: 1
    I agree partially with Patrick, specially because the contest was so loosely stated. However, if you do a thoroughly analysis of the result, Patric did not compress the files. All he did was to change some specific character with a end-of-file marker. The EOF will *always* take some space, be it in the form of a marker (such as Ctrl-Z) or in the form of a file length attribute. In this sense, I don't think that Patrick got a real solution to the compression problem. So, even discounting the file-system overhead (the empty space to fill out disk blocks), you have to take in account that the EOF marker is a implicit part of the compressed data, stored outside the files. He simply gambled some bytes inside the file for some bytes out.

    Anyway, I think that he could go a bit further. Depending upon the random number generator, it is possible to compress the file a lot - mainly because there are no true random number generators, except if you use some natural random source such as radioactivity decay. If the random numbers are in fact "pseudo random numbers" generated by some mechanical/computational device, then it is possible to compress the data, using some particular algorithm.

    Going a litttle further, it is interesting to note that the combination of the compressed data and the decompressor is not completely self contained. The decompressor includes to a lot of information that is not stored inside the executable file. For example, libraries can be linked. Even if the decompressor is written in assembly, the problem is still there - a lot of information may be contained in a single assembly instruction.

    As an exercise, imagine the following scenario:

    • The original data has some particular sequence repeated in some places (even a random file can have such repetitions; in fact, a file containing only ones can be possibly generated by the best random number generator; it's improbable but it is possible)
    • This particular sequence can be regenerated by a simple CPU instruction
    • The decoder (written in assembly) uses this assembler instruction the reconstruct the sequence, using only one byte of code.
    Is it a *real* compressed file? In this case, the CPU does contain the information needed to decompress the file; even if the information is not inside the compressed file itself, the information is part of the context. This is just to complicate things a little bit further...

    In the end, we need to keep mind that using information that is implicit in the context is all that compression is about. We are removing redundant information that can be inferred later based on the context. In a contest like this, the context should be clearly stated to avoid this kind of discussion. Looking at things this way, maybe Patrick was right after all...

  175. Re:Did I read something wrong here... by Ashran · · Score: 1

    RTFS (read the fucking site ;)

    From then on, I was just trying to avoid the situation where he said that I had made up either the whole thing or the emails where he agreed to multiple files. That's why I said I didn't want the $5000 or for him to reveal my identity. I thought that if he thought I wasn't going to say anything publicly it would make him feel more secure and more likely to talk to me and other people, and to tell the truth to comp.compression, thereby making it more difficult to deny everything later on. I said he could keep the $100 because I think that taking it back would be admitting that I failed or cheated.

    --

    Before you email me, remember: "There is no god!"
  176. Slightly different "decompression" technique by CptnKirk · · Score: 1
    While I'm no student of information theory, it would seem possible to complete this challenge in the following way, and for the following reasons:

    1. A Compressor does not need to be provided.
    2. There is no time limit which limits how long the decompressor can take.

    Solution:
    Create a program that can generate and check. You already know what the md5sum of the result should be. In fact you know what the result should be. Systematically generate data in memory, or with a seed built into the "decompressor", and then check your results. If they match the md5 you can then write this file out to disk and you're done.

    Given enough time I think I can prove mathematically that this will work.

    In fact I can speed up the computation significantly if I break up the original file into chunks, and md5 those chunks. Store those md5 sums in the "decompressor". Now I know the chunk size and md5 sum, I can permutate the bits until I get it right, and then write out the bits, chunk by chunk.

    Now you have a method which you can use to "decompress" the orginal file out of nothing. Of course you can argue that this is not a decompressor, so fine. Given the same orignal program you can have it "decompress" a file which contains the first x bits of the original. You can use this file as your "compressed" file, and as a means of speeding up your generation (err, I mean decompression).

    With this system I believe you can beat this challenge given enough time. This system does not hide any data in the file system. This system will also take a lot of time to realize a solution, but it will happen eventually.

    You can use this for arbitrary data too, as long as the original file is larger than the decoder program and the original file's SHA or md5 hashsum represented as a file.

  177. Looks like a valid contract by criticalrealist · · Score: 1
    Mike Goldman made an offer to $5000 if certain conditions were met. By the literal reading of his words, it appears that the challenger met those conditions. It is irrelevant that the correspondence took place over e-mail. These two people have a valid contract. Now it's time for Goldman to pay up.

    People can't go around making promises and then refusing to stand behind them.

    If Goldman doesn't pay, I'd urge the challenger to retain a tech-savvy lawyer (one in the challenger's home state). The lawyer should simply send a letter to Goldman threatening to sue. That should take care of that. If Goldman still won't pay, the challenger has the option of suing him.

    Disclaimer: I am a law student. The above is not to be construed as legal advice, and no attorney-client relationship between the author and any recipient of this message should be inferred.

    --
    I am not a lawyer.
  178. Re:That wasn't in the challenge by Richy_T · · Score: 2
    We recently had a "Active Learning day" at work. The staff was divided into two teams and various activities occurred throughout the day, the aim being to accumulate clues. When all clues were accumulated, you had to use them to open a briefcase which contained the final activity of the day.

    The theory was that each teams set of data was incomplete so they would have to cooperate to get the suitcase open, thus teaching us the importance of acting as a whole team or something.

    However, I have always had an interest in locks (and "unlocking" them) and the briefcase had those cheapy combo locks on it so I wandered over and had it open in 20 seconds.

    Lesson: Teams are good but sometimes an individual makes the difference.

    Unfortunately, because things had gone wrong, they decided to usurp my victory and give the final activity (locate an item in the woods) to both teams. My team lost. I'm not sure what the message there is but I chose to learn my own message which is "Messing around for the day is much more fun than sitting in an office working"

    Rich

  179. Re:That wasn't in the challenge by tooth · · Score: 1
    Actually, the HDD platters where probably very effective flywheels. Tin lids might not store the energy as well, and could be prone to slipping on the surface as the energy would be transferred to quickly.

    It probably all balances out in the end though (weight vs inertia etc.)

  180. Re:There is no solution... by The+Flymaster · · Score: 1

    Sure, but then your decompressor has to include the entire contents of the original file, or a compressed version thereof. In order to get this compressed version, you would, of course, need another compressor, and YOUR compressor would be more or less worthless.

  181. Compressing Random Data - absolutely insoluble. by adamtegen · · Score: 1

    First of all, the page with the story was down, so I didn't get the jist of it. But, for the argument, it I might guess that the challenger's intent was to post an insoluble problem: submit a program (or algoritm, again I didn't read the story because it was down) that is guarenteed to compress ANY random data of an arbitrary size. The problem is this: If you can compress a single bit out of truly random data, your result is truly random, thus you could compress an arbitrary length sequence to 0 or 1 bits. Obviously this is impossible. You can't represent something with nothing. At test of this is simple. The challengee could not compress a file with a single bit of 0 and a single bit of 1, if each were in a vacuum. Thats my 2 cents.

  182. Re:There is no solution... by malfunct · · Score: 1
    Along these same lines of thought, we can assume that the data was computer generated in some way (doubtful the creater of the challenge flipped pennies to figure each bit of the output or something such as that) in which case there is a repeatability factor in the data. Chances are if you chose a file of sufficient size you could get a complete (or most) of a repetition of the file. (Classic problem with pseudo-random generators of course). This would allow you to get near 50% compression of the data with no "tricks".

    Not that I feel like trying a challenge like this but it seems if you ask for about 60gig of data and analyze the hell out of it you will be able to get sufficient patterns in the data to get the compression needed. A complete waste of time but would beat this challenge because its not very good. Of course this is what any math or science type person would be able to tell you so I'm not saying anything revolutionary.

    --

    "You can now flame me, I am full of love,"

  183. Re:Compression by malfunct · · Score: 1

    Well since splitting the files was allowed, let the filesystem record the position with a file split. The key then would be to get a sufficiently large run that you could "throw out" to allow you to fit in the decompressor and the filehead penalty. Possible in theory but I doubt the example file had any sufficiently long runs.

    --

    "You can now flame me, I am full of love,"

  184. Re:That wasn't in the challenge by malfunct · · Score: 1

    This is definitely off topic but I'd think the bigger advantage of DVD's or CD's as tires is that they are far nearer to perfectly round than just about anything common that I can think of and most certainly when compared to something cut out of wood or paper.

    --

    "You can now flame me, I am full of love,"

  185. Re:but we're talking about skew corrected random d by boldra · · Score: 1
    blonde said:
    most of the time you can't find such an algorithm with random data (and this can be proved.)
    Yes, it can be proved that not all data can be generated from a formula. This was the subject of Gregory Chaitin's presentation which was covered on slashdot in March.

    One of the interesting caveats in this argument is the reliance on the Turing Halting Problem, which means that this proof is restricted to finite time. However, I don't think "Patrick" would have got his $5000 if the "decompressor" had taken infinite time to run, despite the slack wording of his challenge :)
    --
    I've been posting on the net since 1994 and I still haven't come up with a good sig!
  186. Random data... by abiogenesis · · Score: 2

    If you try to compress random data using one generic algorithm, you will sometimes have size savings, and sometimes will end up with a larger file. If you are smart enough to put a "not compressed" flag in your data file when it becomes larger, you can reduce the size on average.

    But, if you are free to select your algorithm after seeing the data file, chances are you can always compress the data (although I may not mathematically prove this). Remember the article here on slashdot a few weeks ago where a prime number, when gunzipped, produces the source code for a DVD decoder. Well, you may say that they are not the same because the source code was compressed with gzip beforehand, but please think a different algorithm than gzip (as in mandelbrot fractals) and you *might* end up with an algorithm which compresses some data which is random to you but not-so-random statistically or mathematically.

    PI is infinitely random, but can't you compress it in a few lines of code?

    Although I should admit that finding the algorithm itself would be mostly by chance.

    --

    Donate free food to the hungry at The Hunger site.
    1. Re:Random data... by streetlawyer · · Score: 1
      Although I should admit that finding the algorithm itself would be mostly by chance.

      Indeed it would, and on this basis might I invite you to send me $100 in return for a payment of $5000 the next time you are struck by a meteorite? Compared to randomly searching the space of algorithms for one that will compress a large file of white noise, my odds are quite absurdly generous.

    2. Re:Random data... by streetlawyer · · Score: 1

      Yes. Specifically, it's called "ludicrously overpriced insurance that only a sucker would ever accept".

  187. Reality? or Data Hiding? by lordmage · · Score: 1

    Since when did Data Compression become Data hiding? It is simple that he "Moved" data but never compressed it. The quest was for a Lossless Data Compression and any person involved in basic data compression would testify that this did NOT meet the requirements of Compression but the requirements of data hiding.

    He lost, he complained, but simply put he paid 100 dollars to play word games with the intent.

    --
    I can program myself out of a Hello World Contest!!
  188. By that logic... by JonahC · · Score: 1
    Well if you're going to include the size of storing the file on the system as part of the "file size", then technically you can compress with:
    mv original.dat a
    And type this to decompress:
    mv a original.dat
    Effectively saving 11 bytes!
  189. Re:Compress the Inode! by jgarry · · Score: 1

    If he was a REAL PROGRAMMER he would have written his own file system! Tailored to the actual data!

    --
    Oracle and unix guy.
  190. file size by little_blaine · · Score: 1

    Patrick's assertion that the size of the "compressed" files plus size of the decompressor is less than the size of the original is false. The compressed files have replaced the ASCII character "5" with the ASCII character EOF (end of file), a one-to-one substitution that doesn't reduce size.

    His trick implicitly relies in the way "wc -b" operates: it doesn't count EOF in the number of bytes it returns. One could argue that the true size of the compressed files is + and construct a tool to measure file size according to this definition, which would show that Patrick failed the challenge.

    However, Mike's arrogant attitude and posturing to the newsgroup gets him no brownie points either. His challenge is impossible in spirit, but his poor wording allows trickery.

  191. Re:Compression by pallex · · Score: 1

    >This is very very standard stuff in HTML.

    Its not standard in `plain old text` mode though - at least, not around these parts! Why is this altered from the input at all?

  192. Re:Compression by pallex · · Score: 1

    good point. actually, i toyed with pre (i`m scared to put slashes and things in my posts now!) once, and it used a differenct font, so i stopped! I`m not really a html coder though!

  193. Re:Maybe he should have used the "lossy" lzip algo by DrXym · · Score: 2
    No he wouldn't. The goal is to compress the data, such that decompressing it returns it to its original form.

    Lossy implies decompression will *not* return it to its original form.

  194. Unless I'm mistaken...(which I probably am) by simon_cockle · · Score: 1

    Data can be expressed as a number.
    Random data can be expressed as a random number.

    Take A to be the random number.
    Take B to the the nearest prime number.
    Take r to the difference between the prime number and the random number.

    B+r=A

    Likewise take C to be the nearest prime number to r and so on...

    B+C+D+E+F+r=A

    The data in your compressed file could consist of;

    32334/2000/231/22//23222

    (obviously you would need to delimit them differently)

    Your decompresser would add the 32334th prime to the 2000th prime to the 231st prime to the 22nd prime and add 23222 to the result.

    So long as you know the data to be compressed why could you not *rig* something like this to work?

    Then again...

    I assume it wouldn't...

    --
    ________ semper ubi sub ubi
    1. Re:Unless I'm mistaken...(which I probably am) by Mr.+Obvious · · Score: 1
      This is covered in the compression FAQ, a few lines above the place where the challenge appears. I quote:
      Another idea also related to primes is to encode each number as an index into a table of primes and an offset relative to the indexed prime; this idea doesn't work either because the number of bits required to encode the index, the offset and the separation between index and offset is on average not smaller than the number of bits of the original bit stream.
      There are several other good arguments in the FAQ concering other numerical tricks and why they won't work --- for the better defined problem of compressing arbitrary files. The interesting question is whether one of these clever ideas could be made to work for Mr. Goldman's challenge, which is something quite different, since it involves compressing one and only one file. As someone has already observed, if Mr. Goldman would play along, you would only need to bet lucky one time in 50 to (probably) make money.

      However, in this case (using prime numbers and offsets), he would need to choose a file which (when seen as a nunber) is quite close to the nearest prime number, closer than the size of a decompressor which can do things like calculate the nth prime number and such. Seems unlikely to me, but I'm not inclined to do the math today...

      Ron Obvious

  195. here's a decompressor by aozilla · · Score: 2

    works on freebsd, anyway
    -------------
    #!/bin/sh

    fetch http://211.0.18.242/~patrick/other/original.dat

    --
    ok then your [sic] infringing on my copyright! Could you as [sic] me next time before STEALING my comments for your own?
  196. Re:There is no solution... by ssimpson · · Score: 2

    Kind of true: you can't universally compress random data using a single program, but this doesn't mean you can't compress a single instance of random data.

    This competition is kind of "balanced" because:

    • The contestant can attempt and mount any number of different types of compression - all he has to do is find one instance that meets the requirements of the challenge.
    • The challenge owner can choose a file of the specified length that is "strong" against compression: Test the data with tools such as Diehard and Ent - if the file doesn't seem "strong enough" then create a new one.

    Given a sufficiently large file (a couple of Gb would probably do) then I think virtually any file could be compressed with a specially crafted compressor and decompressor. Given a 2Gb file, you only need to achieve .0001% compression to have 2147 bytes to write the decompressor.

    --
    "Mary had a crypto key, she kept it in escrow, and everything that Mary said, the Feds were sure to know."
  197. The page has been removed by GeoCities by locutus074 · · Score: 5
    There is now a mirror available.

    --

    --

    --
    We have fought the AC's, and they have won.

    1. Re:The page has been removed by GeoCities by sulli · · Score: 1

      It's back, I followed the initial link. (Maybe grafx were deleted?)

      --

      sulli
      RTFJ.
    2. Re:The page has been removed by GeoCities by lmd · · Score: 1

      It is gone again. In fact, the entire directory is unavailable for viewing. Geocities sux if they can't handle getting slashdotted. The page (mirror here) is just text.

      --


      Just my $0.04 (adjusted for inflation)
  198. Re:What arrogancy! by Theodore+Logan · · Score: 1

    unfortunately no, and I can't remember where I first came across it either. If you have any further information I'd be very interested, as I would find it quite embarrasing if I credit him [Derek Bok] on false grounds.

    --

    "If you think education is expensive, try ignorance" - Derek Bok

  199. What arrogancy! by Theodore+Logan · · Score: 2
    As seen in his letter posted to the newsgroup after the challenge had been accepted, Mike was very arrogant, and quite stupid. As many slashdotters have noted already, being able to compress a single "random" file, is not the same as being able to compress any "random" file. $5000 is a lot of money for a not very clearly thought-through competition, and though it is obvious that this guy can't cough up although he should, he ought to apologize and give the $100 back. Bending the rules are what these kind of contests are about. Besides, threatening with legal action makes him lose his face completely.

    Hopefully, at least, he learned his lesson, which most likely was all Patrick ever hoped for anyway.

    --

    "If you think education is expensive, try ignorance" - Derek Bok

    1. Re:What arrogancy! by GoldenBear · · Score: 1

      He did apologize for the miscommunication and offered to give the challenger his $100 back. he never "threatened legal action" as many people on here have mentioned. he only said perhaps the rules of the contest should be written in "legalese".
      Mike would do this to avoid situations like what happened, where someone could argue that they DID win the challenge, even though the challenger did not really do what the spirit of the challenge intended.
      I'm not trying to say the the challenger lost, but i'm getting frustrated now that i've read several posts implying the mike was ready to unleash a team of lawyers or something.

      Eric

  200. Re:Compression by e_lehman · · Score: 2

    Pick a large enough length, and you are virtually guaranteed that there is a 1024-byte block of zeroes in it somewhere.

    Your reasoning contains an error. What were you thinking, arguing with an MIT guy, for heaven's sake! :-)

    As the length of the file increases, the number of bits needed to *specify the position* of this run of zeroes also increases. And you're going to have to record that position somewhere in your decompressor or compressed file.

    For example, to a first order, you need a file of length around 2^1000 bits to expect a run of 1000 consecutive zero-bits. Great, But specifying the position of this run requires-- you got it!-- a 1000-bit number. We're hosed!

  201. Question for the information theorists by ahaile · · Score: 1

    Many people have been citing the usual argument against a universal compression algorithm: it's impossible to uniquely map all 2^n combinations of n bits into the 2^(n-1) available combinations of n-1 bits. But that wasn't Mike's challenge. As Patrick pointed out, Mike only required a challenger to compress a single specific file, which Mike would provide. Was Mike smart in making this bet? We know that some of the 2^n files Mike could provide are sufficiently compressible: he might just randomly generate an ASCII file or a file with a long string of zeroes, for instance. So the question is what percentage of the possible files are sufficiently compressible? "Sufficiently compressible" would mean that they compress enough that whatever compression algorithm is used can fit in the freed space. Mike's bet is only a good one if less than 2% ($100/$5000) are sufficiently compressible.

  202. Re:50/50 compression by John_Booty · · Score: 2

    That's almost a good idea, and it made me laugh when I read it, but your decompressor would surely take more than one bit, wouldn't it? So the total size of the decompressor plus the compressed file would certainly be greater than the original file (this was a key part of the original challange).

    http://www.bootyproject.org

    --

    OtakuBooty.com: Smart, funny, sexy nerds.
  203. Nope, no can do... by drycht · · Score: 1

    Yeah, I thought the same thing at first, however...

    If you read carefully in the email corrispondance, the random data was gotten from random.org, which offers truely random data, not pseudo-random data as generated by some algorithm. Unless you have a nifty algorithm for generating the values of radio noise (random.org's external source of white noise), this won't work.

  204. Re:Compression by clare-ents · · Score: 2

    "
    I reckon it's actually *always* possible to win $5000, no matter what file is generated, providing it is of sufficient length.
    "

    Nope,

    If I have N bits of file. There are 2^N possible files that can be created. There is no way you can always have a compressed file of length 2^(N-1) in length simply because you only have 2^(N-1) distinct possibilities leaving 2^(N-1) files untouched.

    However, I intend to compress 2^(N-1) files into a length of 2^(N-1) and the other 2^(N-1) files into a length > 2^N. Since there's a 2^-1 chance of getting each file I expect an average return of $2400. (2^-1 * $prize - $entry).

    --
    Only two things are infinite, the universe and human stupidity, and I'm not sure about the former. (Einstein)
  205. Re:Compression by clare-ents · · Score: 2

    Sorry, I meant to say length N-1, and length N.

    I've muddied my explanation however hopefully the point is clear [it's possible to compress 50% of the files to smaller than the starting length at the expense of the other 50% being larger].

    --
    Only two things are infinite, the universe and human stupidity, and I'm not sure about the former. (Einstein)
  206. 50/50 compression by clare-ents · · Score: 2

    Ignore the last bit

    Supply it at random.

    50% of the time you're wrong and lose your $100

    50% of the time you're right and win $5000

    Of course this loses the spirit of the competition [design a universal compressor which is clearly impossible] but wins the other competition of 'how do I make the arrogant prick look like a fool'

    --
    Only two things are infinite, the universe and human stupidity, and I'm not sure about the former. (Einstein)
  207. Compression by clare-ents · · Score: 4

    Ok, I've done some information theory, I know it's impossible to create a perfect compressor. Boxes arguments and all that.

    However, to make money, you don't have to.

    All that is required is a compresser that saves 1 bit on better than 1 in 50 attempts.

    For example, if we can save 1 bit on 50% of the attempts - even if the other 50% of the attempts results it catasrohpically large files. On average we've lost the compression war.

    However, on 100 attempts, we've paid $100 x 100 or $10000 but we've won on half of these - netting us $250000 or a cool $240000 profit.

    Anyone want to demonstrate this?

    --
    Only two things are infinite, the universe and human stupidity, and I'm not sure about the former. (Einstein)
    1. Re:Compression by edp · · Score: 2

      "... this challenge where the rules are 'compress this specific file which has been carefully generated to resist compression'."

      It's not so easy to generate a file to resist compression. If you had and used a method for determining whether a file could resist compression, then I could compress your generated file simply by counting how many compression-resistant files preceded it in some sort order and writing that number as a binary file. Decompression would proceed by counting that number of compression-resistant files in the same order and writing the file found at the count.

    2. Re:Compression by edp · · Score: 2

      I'll retract my previous comments. Trying to find a 1024-byte block of zeroes requires bits to record where the block is found, which consumes the potential savings. Whether counting compression-resistant files saves space depends on the choice of "compression-resistant."

    3. Re:Compression by edp · · Score: 3

      "Since this is not the case, ..."

      Sorry, not so. The proportion of compression-resistant files necessarily decreases as file length increases. To see this, consider that if a file had a nice block of 1024 bytes of zeroes somewhere in it, we could easily write a simple program that wrote the bytes before that block, then 1024 bytes of zeroes, then the bytes after that block. This program would easily fit in less than 1024 bytes, plus the bytes before and after the block of zeroes. So such a file is compressible. I will show that the proportion of files that are not compressible like this decreases as file length increases.

      To make this easy, I'll count for compressible files just the ones where a block of zeroes begins at a multiple of 1024 bytes from the beginning of a file. The chance that a single 1024-byte block (selected randomly from a uniform distribution) is not all zeroes is (2^8192-1)/2^8192, a very tiny bit less than one. The chance that all n blocks of n 1024-byte blocks are not zeroes is (2^8192-1)/2^8192)^n. Obviously this approaches zero as n increases. Thus the proportion of files that are not compressible decreases as file length increases. Pick a large enough length, and you are virtually guaranteed there is a 1024-byte block of zeroes in it somewhere.

      When we open compression up to a wider range of techniques, the same effect occurs; the proportion of files that are not compressible under any given, reasonable technique decreases as file length increases. This makes Goldman's challenge a bad offer for him -- the fact that the average file size change must be positive under any compression technique in the limit does not mean the probability of a positive change in a randomly selected file is large enough to give him a positive return on the challenge.

    4. Re:Compression by bmongar · · Score: 2

      Since you get to look at this file before hand there is little chance involved you can tweek make an algorithm to specifically compress that file, or at least drop it by a byte or two as in the above case.

      --
      As x approaches total apathy I couldn't care less.
    5. Re:Compression by uigrad_2000 · · Score: 1
      To see this, consider that if a file had a nice block of 1024 bytes of zeroes somewhere in it, we could easily write a simple program that wrote the bytes before that block, then 1024 bytes of zeroes, then the bytes after that block.

      Pick a large enough length, and you are virtually guaranteed there is a 1024-byte block of zeroes in it somewhere.

      You'll never get this to succeed.

      You've admitted such a file would be HUGE. That, in fact, is your problem. How many characters should be read beore printing all the zeros? You'll have to have that number in the decompression program, and it will be more than 1024 bytes.

      What this guy did was cheating. Instead of putting the position of the "MAGIC_CHAR" in the decompression file, he just broke up the original file into 217 multiple files, and read them in, one after another. It looks like he saved 216 bytes, but he didn't. There was more than 216 bytes of overhead produced in the filesystem structure.

      If he would have tarred the 217 files together, the size of the tar would be larger than the sum of the 217 lengths. This is because tar records the length of the 217 files. (Otherwise, they could never be un-tarred).

      Basically, he was lucky that the challenger allowed multiple files. This allowed him to hide the "decompression instructions" in the file system. The challenger was dumb, and deserved to lose his money.

      --
      Free unix account: freeshell.org
    6. Re:Compression by zby · · Score: 1

      If that was true you could take the Universal Turing Machine and run your progran on it. For every data it would produce a program and data set - the program and data can be considered together the compressed data. This way the your program would be the universal compressor. This is impossible. I can be impossible to algorithmically find the uncompressable data anyway, and random data is not enough for that as well. The Chaitin Omega number theory seems applicable at this problem.

    7. Re:Compression by zby · · Score: 1

      Sorry - I'd replied to your previous post before I read this one. The right mathematical theory seems to be the Chaitin Omega Number theory. There are uncompressable strings, but it can be impossible to say if given string is incompressable, I mean you can say if it is compressable, but some time you can't prove if it is not. The random data send by the challenger does not seem to be a good move - there is much chance you can find patterns in random data (just like long substrings of 0 - what is the probabillity to find them - I don't know but it's a growing function of the size, ok asume the guy deleted all such substrings - so how about consecutive 01 - you cant prevent all patterns in a random string). Conclusion: the challange seems to be quite fair for both sides.

    8. Re:Compression by pat_1729 · · Score: 1

      Sorry, but no; your mind itself is just an algorithm.

      More verbosely: The original file is the input to your mind. The resulting decompressor+file is the output of your mind. This output is just a compressed version of the input produced by the algorithm in your brain (and your books, and any experts you consult, etc).

      So the whole process, from beginning to end, is just a compression algorithm. I cannot describe the algorithm, exactly, but simple counting shows that it is unlikely that it will successfully compress a randomly chosen string, no matter how long the string is.

    9. Re:Compression by haruharaharu · · Score: 1

      It is trivially possible; it need only work for one A and you get that A before creating G

      --
      Reboot macht Frei.
  208. /.ed by grue23 · · Score: 1
    anyone have a link to a mirror? the site is /.ed

    Shooting from the hip without having been able to read the content, it is only theoretically impossible to compress random data assuming uniform distribution of all possible characters. usually there will be enough variance in a finite sized randomly generated file to allow for some compression. the trick is making sure that the addition of the decompression map to the file does not make the file bigger than it was in the first place. (that's why using lossless compression on the same thing twice will frequently give you a bigger file)

  209. Re:Two Words For These Guys by PinkFloyd · · Score: 1

    Here ya go. GZipped. "1F8B0808EA9CE53A0003666F6F00732C2ED653F0C8CF492DD 6E3E5020019F95E3F0D000000" blah blah blah lameness shmameness..

    --

    The face of a child can say it all, especially the mouth part of the face.
  210. Old trick by sepulcrum · · Score: 1

    The second i read the mail about multiple files i understood what Patrick was going to do. Putting information in filenames is a very old trick that was used to get around file system quotas for a pretty long time. But it is pretty stupid of Mike not to include a clause that filenames themselve are also data.

  211. Re:There is no solution... (Or is there?) by monkeyfamily · · Score: 1

    OK, how about this: The "random" data had to be generated from some finite set of data. Whether the original "randomness" comes from a bunch of a pictures of lava lamps or static from the radio, it's still a finite amount of information at the beginning. Then, you apply a bunch of randomizing algorithms to it to generate a large file. If I request a file to be generated that is larger than the initial input and the source code to the randomizing algorithms. Then my problem is to find some initial input and combination of algorithms to generate the file - assuming a long time to brute-force it, you might be able to win in this way - the compressed file is then the initial randomized input, and the decompression algorithm is the exact same series of randomization algorithms that was used to create the random data.

    How does this sound?

    NB: I've not studied information theory.

  212. a troll by ReidMaynard · · Score: 1

    A different slant on a troll, nothing to see here, move along.

    --
    -- www.globaltics.net

    Political discussion for a new world

  213. Some things can be taken for granted. by LAI · · Score: 1
    In any dispute like this, it is reasonable to expect certain things to be taken for granted. At its most basic level, this includes stuff like the definition of a file and of a file system. Nobody would argue that Patrick had successfully completed the challenge if he printed out the original file in fontsize 30, and again in fontsize 9, and included the "compressed" file in an envelope along with a small magnifying glass.

    Other assumptions, obviously, are a little more obscure. Sure, Patrick would have done well to have read the FAQ, but even without having done so there are some things he certainly knows. Without having done any Information Theory, I know that the objective of any compression program is to take up less disk space, and I think it reasonable to assume that Patrick knew this as well. Therefore, as much as I applaud his creativity, what he wrote was not compression.

    The only issue remaining, then, is whether the wording of the challenge should override the information Patrick could reasonably be assumed to have. I think not. Every hacker instinct I have says that the challenge was intended to spur people to write better code, or at least to get them thinking about how code can be improved. Focussing on the wording of the challenge and saying the equivalent of "well, how was I supposed to know an envelope didn't count as a filesystem? It's not my fault a magnifying glass isn't machine-readable" goes counter to the spirit of the challenge and of constructive programming.

    Therefore, Patrick, I'm afraid you did not successfully meet the challenge. Reading what you wrote, I did not think about the space filenames took up until I read the email response to your final submission. I suspect you either made the same oversight as me, or chose to ignore it in favour of rubbing Mike's nose in his choice of words. Either way, you made a valiant effort, but did not earn the prize.

    LAI

    "How do I type "for i in *.dvi do xdvi $i done" in a GUI?"

    --
    :eof
    1. Re:Some things can be taken for granted. by Mr.+Obvious · · Score: 1
      Ah, but you're assuming we should understand this in a context of real hard disks, operating systems, etc. etc. Franky, it seems quite clear to me that the real context here is silly bets.

      An example. Years ago, I'm walking down the street in New Orleans, this dude walks up and wants me to bet him money on some outrageous card game or something of the sort (I forget). I tell him "forget it, you wouldn't be betting if you didn't know you'd win." He looks me up and down, and changes his plan entirely, and says "I'll bet you $5 I can tell you where you got those shoes".

      Do I clarify the rules? No, I figure it's worth $5 to learn how the punch line to this joke, so I say (to quote Mr. Goldman) "sure", and the guy says "You got those shoes on yo' feet, in Jackson Square, in the City of New Orleans, in the State of Lousiana."

      I gave him the $5. I played by the rules of silly bets. Mr. Goldman should pop up the $5K. He placed a silly bet, and he lost.

      Ron Obvious

  214. Bar room bets? by TobyWong · · Score: 1

    Goldman comes across like a pompous ass in this series of emails and he deserved to be showed up.

    After Patrick accepts the challenge Goldman quickly jumps on the newsgroup to proclaim how some sucker fell for his trick.

    At the first sign that his little charade is in trouble he hints at legal action against the challenger.

    How classy.

    Patrick, you should settle this bar room bet in the same manner most bar room bets are settled: a haymaker to the jaw of the jackass challenger.

    --
    - Toby
  215. depends on how you pay for your space by wytcld · · Score: 1

    What if you named the files randomly but recombined them in the order they'd been placed in the directory? Checking the date on the file would probably look like cheating to you, but it's quite possible to build a directory mechanism in which files are simply listed in order of creation with no explicit time stored. By your logic he only failed if he's not instantiated on such a system - but there's nothing intrinsic to his method that rules out doing just that.

    <p>Of course even such a system would have to have some marker of the beginning (or end) of each file, you'll say. But consider when I compress fruit into jam. It's the same degree of compression if I end up with five one-gallon jars or on five-gallon. Sure, it uses more jars and labels the first way, but no one's going to claim I've made more jam. And if there's a cubbord I need to put the jam in that the five-gallon jar won't fit into, the one-gallon jars may even be the more effective compression scheme - less rather than more costly storage because I don't need a new cubbord.

    --
    "with their freedom lost all virtue lose" - Milton
  216. Do something stupid, lose $100... by ShunScene · · Score: 1
    Clearly Goldman's challenge was designed to stop helpless newbies from posting "I've found a great new compression technique" in comp.compression, when what they really (re)discovered is an old information hiding technique.

    So he upped the stakes, put your money where your post is, or run back to comp.information.hiding ....

    Where's the problem?

    ShunScene

  217. No money owed... by Wavicle · · Score: 1

    After reading that page, I have to wonder if there is some language barrier that is being argued. Case in point:

    Last, the contestant will send me a decompressor and a compressed file, which will together total in size less than the original data file, and which will be able to restore the compressed file to the original state.

    According to the wording of the challenge, in order to "complete" the challenge you had to produce "a compressed file". Note the singular specific article. The person performing the challenge did not meet this requirement. He produced 218 uncompressed files.

    --
    Education is a better safeguard of liberty than a standing army.
    Edward Everett (1794 - 1865)
  218. hm by Ratcrow · · Score: 1
    Apart from the representation in inodes versus FAT tables versus everything else, don't many filesystems also have a minimum disk block size? If he split the file into 218 pieces, then each one would take up 14430 bytes (on avereage, assuming uniformly random data), which is not evenly divisible by any significant power of two...meaning that unused space in each disk block at the end of each file would occupy far more than the savings gained by splitting the file in the first place.

    As an aside, I find it amusing that one of the people he quoted regarding the outcome of the contest was one David Scott, who sounds a lot like someone who also posts to sci.crypt, and gives ./ trolls a bad name. The SCOTT16U.ZIP controversy, in which he personally blasted Bruce Schneier, was particularly amusing.

  219. Re:There is no solution...but there is. by haplo21112 · · Score: 1

    Ok, I have done number theory, I have read all the posts, I still think this can be done especially since you have prior knowledge of the file your compressing. Given all the time in the world to analyze it, this could certianly be done...sounds like a fun way to use distributed.net actually. :>

    --
    Power Corrupts,Absolute Power Corrupts Absolutely, leaving one person(group)in charge is absolutely corrupt.
  220. Petty by Bobman1235 · · Score: 1
    No offense to the challenger (Patrick) of course, but this whole situation just seems petty and a waste of time. I understand that putting forth such a challenge may be irresponsible on the part of the challenger, as his wording did leave himself open to an attack such as this. But in my opinion, the entire challenge was merely an attempt to illustrate a point--that this type of compression is impossible.

    Now, I don't know a thing about compression or anything else, and never would have known such a thing was impossible, but that's besides the point. Had I in the least bit cared, I would have seen this challenge, tried it out, realized the futuility, and said "oh, would you look at that. He was right."

    Instead our society has raised everyone to be a bunch of babies about wording and specifics, just because everyone wants something for nothing. Despite what you say about Michael, it was obvious what he wanted of this challenge--to prove a point. Patrick completely ignored this point, took advantage of the fact that Michael didn't prepare for every nitpicking possibility, and found a loophole to an honest challenge. And to what end? Just so everyone would try to find similar innocent challenges and take advantage of them too? Simple product of a litigious society--argue everything until you win.

    Oh, and to say he was trying to make money by thinking people would flood him with 100 dollar checks is just ridiculous--he stated that the challenge was impossible, and how many people are dying to beat an impossible compression challenge? Not many.

    Maybe technically Michael should pay 5000 dollars... but technicalities aren't the best way to win everything.

    1. Re:Petty by matrix29 · · Score: 1

      The biggest problem is not that the counting theory denies compression of ALL files, it is that the counting theory means that many files can be compressed. So it is only a matter of time before a pure random file is easily compressed (but a 50% to 60% random file will never be compressed).

      The challange has a huge obvious weakness to probability. The fellow just used a weakness in the counting theory when it comes to file systems and already allocated space (something I mentioned quite a while back). A fine jest I think.

      --
      "Face it, a nation that maintains a 72% approval rating on George W. Bush is a nation with a very loose grip on reality.
  221. Ironic - Goldman's challenge is no proof. by ckedge · · Score: 2

    Section 9.2 of the FAQ, the Counting Theorem, says that "no program can compress without loss *all* files of size N >= 0".

    Mike Goldman's challenge explicitly involves "a specific known file" of size N, where both N and the data within are arbitrary and predetermined.

    Mike's intent is obvious, but his logic is horrible. It's quite clear that Mike's challenge has HUGE gaping holes in it's mathematics, statistics, and logic as compared to the Counting Theorem. There's nothing "for sure" about Mike's challenge. Just reading it by itself made me sure he's going to lose, if not this time, then another time. He set himself up to lose.

  222. gunzip $1 by Idolatre · · Score: 1

    This guy's a genius.. he should patent using gunzip $1 to compress a file!

  223. There is no solution... by grammar+nazi · · Score: 1

    because you can't compress random data. He can just generate a file of random data. This is just a scam so he can make $100 off of every entry.

    --

    Keeping /. free of grammatical errors for ~5 years.
    1. Re:There is no solution... by grammar+nazi · · Score: 2
      you are correct that eventually the sequence of characters will come up. However, in 1/2 of the cases, your initial Z will be larger than the size of the string of characters, whether it is in magnitude or precision (or both).

      A similar approach may be used with the logistic map (a=4 ... X(n+1)=4 X(n) * (X(n)-1), 0X(n)1 )

      --

      Keeping /. free of grammatical errors for ~5 years.
    2. Re:There is no solution... by Phillip2 · · Score: 1
      "because you can't compress random data. He can just generate a file of random data. This is just a scam so he can make $100 off of every entry."

      No you are completely wrong here I am afraid. The challenge was not to compress random data but to compress a specific piece of data.

      It is quite possible for a truely random number generator to produce a string of zero's or the complete works of Shakespeare. Both of these are clearly compressible whilst also being a valid part of a random number stream.

      You should be able to compress any individual piece of data that you choose. Of course the decompression script that you write will not necessarily work for all such data. The technique which was used in this case would not work for instane if there was a massive under representation of 5's in the data set.

      Now I admit that the story that we have here is one sided, but it seems clear to me that the guy who set the challenge is the one who didn't understand information theory well enough, and that given that he was foolish enough to set the challenge he should pay up.

      Phil

    3. Re:There is no solution... by Phillip2 · · Score: 1
      "You didn't read the whole web site. Even the contestant admits that he didn't compress anything. "

      You didn't read the whole site, or you would have realised that he suggested some perfectly good ways of compressing the data, although he didn't happen to use them in this case.

      Now if you read the compression newsgroup FAQ you will see that there are two challenges. The first does indeed offer a challenge to compress random data, but the second one, the one mentioned here does not offer that challenge. If you know what the data is before hand then it is NOT random data, and is it almost certain that you will be able to compress it.

      Look at is this way. If I get some random number generator and send you 10 numbers, then these numbers are indeed random. If I send you the same 10 numbers again, do you have twice as many random numbers. Well of course you don't. This situation here is analagous.

      Phil

    4. Re:There is no solution... by Microsift · · Score: 1

      I thought Kvasir had all of the answers!

      --
      My other sig is extremely clever...
    5. Re:There is no solution... by Raffaello · · Score: 1

      You didn't read the whole web site. Even the contestant admits that he didn't compress anything. In other words, his "solution" was not based on finding somthing "recognizable" in the single random data file, but rather by putting part of the random data file in the file names of each of the many smaller files he broke it into.

      In other words, he used a loophole in the way the challenge was stated. The challenge doesn't explicitly forbid "hiding" data in file names, even though that is specifically mentioned in the comp.compression FAQ. By his own admission, the contestant didn't even attempt to compress the data itself, simply hide some of it in the names of the smaller files.

    6. Re:There is no solution... by Kvasir · · Score: 1
      I'm not an expert on compression algorithms or anything related to computer science for that matter but you can easily 'compress' any arbitrary set of data.

      Any data file can be expressed in binary form as a number. This number can be defined as n=a*b+c. Where c is 1 in the case of prime numbers and 0 otherwise.

      The challenge could be won by choosing a relatively small data file (say 1MB), and finding a, b, and c. A lengthy process, but not a difficult one.

      All the decompresser has to do is multiply files a and b, and add c.

      --
      this signature is a virus, please make me your .sig so I can continue to spread :/
    7. Re:There is no solution... by Kvasir · · Score: 1
      not at all

      the contest was not to be able to compress random data, but to compress a specific file of random data: original.dat

      take the binary expression of original.dat, preferably choosing a small value, convert it into decimal, find the values a, b, and c. Save them as seperate files, or as different lines in a single file. Decompress. Collect $5000

      It is not elegant, does not work for large files (I'd hate to try to find a, b, and c for a 5MB+ file), but given the time to look at and analyse the data file, it would work.

      --
      this signature is a virus, please make me your .sig so I can continue to spread :/
    8. Re:There is no solution... by Kvasir · · Score: 1
      you're right.

      To a historian [me] it just looked like the common sense solution...

      --
      this signature is a virus, please make me your .sig so I can continue to spread :/
    9. Re:There is no solution... by fireboy1919 · · Score: 1
      Ever studied quantum physics? All we can characterize is probability. Its pretty simple to get absolutely random data. Just sample the spin direction of electrons at fixed intervals for a specific point where the probability is known to be 50%. It is likely that he just used the modulus approach to generate random numbers, in which case four numbers (at most) characterize the entire set.

      But, your assumption is still wrong.

      --
      Mod me down and I will become more powerful than you can possibly imagine!
    10. Re:There is no solution... by bnoji · · Score: 1

      Here's an idea from fractal geometry...using the Mandelbrot set, you should be able to compress a string of numbers of arbitrary size into just 3 numbers. The M set has infinite detail which means that within it, we should be able to find any string of numbers. Lets say random data equates to the string: 76213, you could just go through the M set Zn = (Zn-1)^2 + C until the Z's are the sequence of numbers you're compressing. By storing the value of Z where the sequence begins, the value of C (doesnt really matter) and the nuumber of iterations, you could express the string as Z,C,i. This could take a long time for longer sequences, but in theory they should occur sometime. This could take a really really...long time.

      Ok, I have no idea if the above idea would even work but it seemed to make sense.

  224. sorry! by grammar+nazi · · Score: 2
    I will avoid splitting the sentences in the future.

    What were the /. readers' opinions on this subject? It sounds to me like the person offering the $5000 was scamming everyone because it's theoretically impossible to win. This fellow from Japan knows this, but due to a wording technicality in the contract, he figures he can scam the $5k from the scammer. They both seemed to be good natured chaps about it, but I feel that they still have some deep resentment towards each other.

    Any other opinions?

    --

    Keeping /. free of grammatical errors for ~5 years.
    1. Re:sorry! by Jack+Wagner · · Score: 1

      If I recall correctly Captian James T. Kirk became the captian of the Enterprise because he was able to "cheat" on a test (the Kobayashi maru) that was supposed to be un-winnable. Thus because he was resourceful he was rewarded. I submit that real life is pretty much like that and when you are looking for solutions to real world problems it's quite alright to cheat if you must. I realize that the Kirk reference is pure fanatasy but the truth is we can learn quite a bit from that if we teach ourselves to think beyond what the guidlines of how we are taught to think.

      I think the $5000 should have been paid, pure and simple.

      --


      Wagner LLC Consulting Co. - Getting it right the first time
  225. Page No Longer Exists :-( by DanBari · · Score: 1

    Sadly, so many people have overloaded the page, that I suppose GeoCities has taken the page down :-(

    --
    Fruit flies like bananas... Time flies like the wind...
    1. Re:Page No Longer Exists :-( by lmd · · Score: 1

      The page is back up at Geocities. Apparently they took it down because they thought it was pr0n. That is what usually happens when thousands of people go to a file in a short period of time.

      --


      Just my $0.04 (adjusted for inflation)
  226. This is why we need lawyes by MeNeXT · · Score: 1
    I'll never againg question the need for a good lawyer. These two people READ there own contract. In other word said one thing but meant another.

    On a seperate subject compression of random numbers is not imposible just very unlikely. Maybe one day we will find a way to represent any randum number in a simple formula. Because we are unable to do this today does not mean it is impossible, just unlikely ( I will not mention how many inventions we take for granted today which were thought impossible in the past)

    --
    DRM? No thanks, I'll just get it somewhere else...
    1. Re:This is why we need lawyes by MeNeXT · · Score: 1
      an you represent any integer quantity between 0 and 10 using only an integer quantity between 0 and 9?

      Not my point, but if I use binary to represent the integer or ASCII I will have two diff. file sizes.

      But it is beside the point. Because we do not know how to it does not mean we are unable. If some one had the solution to this problem it would not be a problem. All I'm saying is that nothing is impossible just highly unlikely.

      --
      DRM? No thanks, I'll just get it somewhere else...
  227. Okay, another method of cheating by dfenstrate · · Score: 1
    IANACP;IAME (i am not a computer professional, I am a Mech-E)
    but....
    You've got a large amount of truly random data..... except few things on a computer are truly random- they have to come from somewhere, right?
    So, lets say that Mike used a certain program to generate the file to be compressed. (Didn't he use some third party company?) Now if one knew the algorythm used to generate the file, would it be possible to reverse the program and obtain the original random seed, and then submit the random seed and the algorythm?

    If Mike arbitrarily rearranged the bits, or used multiple transformations of the original data, this would be obviously be difficult or impossible to do- but what if he was lazy and only ran it through once, is recovering the random seed possible?

    --
    Alcohol, Tobacco and Firearms should be the name of a store, not a government agency.
  228. Its a pity the contestant didnt solve it simpler by PinkyAndThaBrain · · Score: 1

    Even with two files and without hiding data in the names you can split data between the decompressor and the file and the the ratio of sizes between the two is an extra bit of information not expressed in the sum of sizes, with a sufficiently large data file this will always allow you to solve the challenge exactly as originally stated.

  229. Its an excercise in information hiding... by PinkyAndThaBrain · · Score: 1

    The wording of challenge allows it to be easily solved... rewording it after the fact is indeed slimey.

  230. it's "file size", and that's all it is by mr.ska · · Score: 2
    However, the decompressor DOES depend on the filename numbering in order to work; it is part of the algorithm and those bytes should be charged against the decompressor size. Then Patrick loses.

    Well, if Mike had specified anything other than "file size", I'd agree. But he said "file size". And the file size of what Patrick sent is less than the file size of the original file. Balls to directories taking up space, or more files meaning more space is taken up. File size is file size.

    I don't believe that the intent was followed. However, as this challenge was "impossible" to begin with, I think that the $5K should go to Patrick for creativity. He met the letter of the challenge, which in this case should be good enough as the challenge as it was intended was "impossible".

    Mr. Ska

    I slit a sheet
    A sheet I slit

    --

    Mr. Ska

  231. Re:Nice try but wrong by jibster · · Score: 1

    Pity. I thought you were holding out something on us there ;-)

  232. Someone correct me please... by Drakantus · · Score: 1

    But couldn't you write a program that will calculate pi, and then compare pi to the random data, and in any place where they overlap, substitue the starting digit and size from pi. Maybe this doesn't work because the "starting digit" itself would usually be bigger than the data saved, but thats the only problem I can see. With the option of asking for a file of any size, I'm sure an 8gig file will have several places where the bits match up directly with some part of pi. This program would be slow- calculating pi takes time I understand- but there was no time limit or speed requirement.

    --
    I love going down to the elementary school, watching all the kids jump and shout, but they dont know I'm using blanks.
  233. Rules Lawyerism by Courageous · · Score: 1


    This is an obvious case of rules lawyerism,
    where the contestent looks for loopholes in
    the rules in order to exploit the game itself
    instead of the spirit under which it was set
    up. Why people should get upset when the other
    person responds "but that is not what I meant"
    when it should be perfectly obvious that they're
    not going to mean that is beyond me.

    C//

  234. Has the fat lady sung? by shibboleth · · Score: 1
    Patrick, IANAL, but I think you're entitled to that $5K. See if a lawyer will take it on contingency.

    If you weren't in Japan I'd recommend a http://www.prepaidlegal.com/, which is a legal hmo that's usually been v. good for me.

    --
    "Be thankful you are not my student. You would not get a high grade for such a design :-)" - Minix pro
  235. Smug b**tard by Zakk · · Score: 1

    Just how smug do you have to be to bother to announce a challenge like this just because you are so proud of yourself for knowing a fundamental result in information theory. The answer: Really, really smug.

  236. Re:I wrote a compression program by n7ytd · · Score: 1
    • I haven't seen any good compressors so far though that do well when you continue reapplying them.

    That's because good compressors remove as much redundancy as possible on the first pass.

  237. The Bigger Picture by muck1969 · · Score: 1
    A compression algorithm that can accurately compress/decompress a random data stream is worth more than $5,000. Think about it ... the application of such an algorithm in today's data-hungry world would prove the value to be beyond a mere dollar amount.

    By disguising it as a contest ... the owner of the contest might be able claim the algorithm as property of the contest and reap the rewards, or the algorithm could be made public for everyone to benefit (slim chance?).

    BTW; compression via division/square root of values will tend to have remainders that must be stored, often creating more data than the original. ref: post 483

    --
    m.mmm..myyy ... sssissxxxtthh bbboottle offf mmmmmoouunnnttain ddeeewww.. in thhe pppassst ffffif
  238. Re:Maybe he should have used the "lossy" lzip algo by jwilhelm · · Score: 1

    i think he was joking... the lossy compression was one of those wonderful April Fools posts we all loved so much.

  239. How about the phrase by iomud · · Score: 1

    Never con a con man?

  240. Re:That wasn't in the challenge by bdktty · · Score: 1

    Coffee can lids. Round, thin, cheaper than harddrive platters. I have seen many a geek go bad because they get so stuck on getting around the rules that they are trapped in to a certain train of thought. Thinking outside the box is just like realizing that the spoon doesn't exist.

  241. Come on folks! by Dallas+Truax · · Score: 1

    This fellow paid $100 for the opportunity to stick it to some some other fellow. He made a $100 bet, on the off change that he'd get $5000. If he had a brain in his head, he'd send his address and get his money back, and laugh about joke. Sticking to the "I met the stated rules" just look dumb, especially when you're out $100.

    Information THEORY aside, the challenge can be met.

    Imagine the bits of a file, lined up, bit-0 through bit-n, as the output of a boolean logic function. The number of variables required to produce those bits is Z, where 2^Z = Total bits.
    So, 1024 bits requires a maximum of 10 variables, but usually less, depending on the data.

    Write a program that will find the boolean equation that, will produce bit 0 of the origional file if all variables are set to 0, and bit 1 of the file, if all but the last variable is set to zero (a binary count, you get the idea?)

    The resulting equation will have log2(Total bits)+4 unique symbols. Now reduce the function to eliminate all unneeded terms. Now write the equation to a file, prepending the binary number equal to the number of bits in the origional file.

    This procedure, if run on a large data set, like megs, will result in the highest level of compression possible, because boolean reduction WILL find the minimum possible symbols for even random data.

    One reason this procedure is no good is that the processing power, and storage capacity needed to achieve it is incredibly large, and exponential with origional data size.

    I've written the code and yes, it works great, but I have not been able to do files of any size due to the fact that expanding the boolean terms takes 2^(bits in origional file)/2 terms, which is quite a chunk of change.

    Information THEORY is just that. If you can code up the above, and run it on data sizes in the megs, you won't need my $5000 bucks, you'll have plenty of money rolling in on your own.

    --
    Above comment is personal opinion. Poster is not a spokesperson.
  242. Re:Nice try but wrong by Kvasir · · Score: 1

    how about:
    100000=100^2
    my compressed data reads: 100
    the decompressor original=data^2

    with larger numbers you probably make larger savings, especially if you can use squares.

    Change my original post to read any number can be defined as n=a^2 + b. It is still true and for small (yet still huge) files ie: 100KB-1MB, you make massive savings.

    Again though it works as a one off compression for a given data set. And again, I'm no mathematician and not exceptionally talented computer-wise: I could be completely wrong.

    --
    this signature is a virus, please make me your .sig so I can continue to spread :/
  243. Re:Nice try but wrong by Kvasir · · Score: 1
    I concede....

    Should have thought about it more, but it was a nice distraction from my history thesis anyway...

    69 hours and counting

    --
    this signature is a virus, please make me your .sig so I can continue to spread :/
  244. Re:A million monkeys will get a million solutions by jmony · · Score: 1

    no I've been working on this. The problem is you will need a key at least larger than the bytes you don't represent. If you have 500 bytes, the needed informations to represent those bytes will be 500 bytes or more.

  245. Re:Dangerous game it is to issue vague challenges by jmony · · Score: 1

    Well, that is interesting. If you can find a mathematical equation which is smaller than the "string" to compress, that would work. However, there will always be strings you can't make equations for... Sometimes, you would need a equation of 128 bits so you would enlarge the file. You could use a mark to say "yes the following is compressed and the equation is x bytes long" or "no, the following x bytes are uncompressed". Maybe you could also use less than 256 mathematical symbols so you can make better equations... In fact, if this method could work for like 64 bytes on a 1kb file and that the method is applied to a lot of files, the size of the compressor (which could be huge, because it tests mathematical equations) could be won by the reduction of all those files. But, again, this method will not ALWAYS work... so there will be a time when a given file can't be compressed anymore.

  246. Still a subtlety by NanoProf · · Score: 1

    Doesn't quite work that way, but the difficulty is very subtle. You have to tell the decompressor where in the file the string of zeros starts. A string of 1000 zeroes happens once in 2^1000 bits. So you need a 1000-bit pointer to say where to find it. The key to the impossibility of compressing the truly random file is the true randomness. Any pattern and yes it can be compressed. No pattern at all, and no compression. Cutting the file in two is sneaky, but it's information hiding, not compression. There's information in where you choose to cut the 2^1000 bit file.

    --
    Curtains for windows?
  247. True randomness only at infinite length? by NanoProf · · Score: 1

    Interesting. Maybe any 'random' file of finite length is not entirely random- there is information in its length itself. Perhaps this is what allows for your loophole of the (extraordinarily unlikely, but not impossible) random file which is all zeros. This is only possible with a finite length file. An infinite length random file will demonstrably have no patterns.

    --
    Curtains for windows?
  248. Make it a poll !! by rixster · · Score: 1

    You know, various stuff like:
    Who won (legally?)
    Who won (morally?)
    etc etc.
    And something todo with cowboyneal, probably....

    --
    Two wrongs may not make a right, but three ....
  249. Mike's right by daniel_isaacs · · Score: 1


    At no time did Mike say that multiple 'compressed' files were acceptable. He only used the singular "file". However, he was asked, point blank, "Are multiple compressed files acceptable?" and gave a less than straightforward answer.

    Mike ought to send the C note back. Even though Patrick did not complete the challenge.

    --
    - Dan I.
  250. but we're talking about skew corrected random data by blonde+rser · · Score: 1

    But, if you are free to select your algorithm after seeing the data file, chances are you can always compress the data The point is that this file is skew corrected so you can't find such an algorithm. Besides most of the time you can't find such an algorithm with random data (and this can be proved.) Think about it like this, compression depends on some repition or pattern. For every set of data that has such a pattern there are multiple adjustments to this data that will destroy this pattern. So in a stricktly random sence more sets of data are uncompessable than compressable. Obviously this isn't how the proof would be outlined but you can see how it would fall from there.

  251. Re:How about $5000 worth of shares in a Ponzi sche by DivineOb · · Score: 1

    You don't seem to know what a ponzi scheme is... A ponzi scheme is something where the early investors are paid off with money paid in by the later investors. It grows and grows as new people join, and all the early people are happy... but soon, the rate of people joining slows and there isn't enough money to pay off people anymore and the whole thing collapses... this has nothing to do with that whatsoever...

    --

    I must burn in hell, suffer and pay for my sins
    But Gods the one who's losing, Satan always wins!

  252. Re:How about $5000 worth of shares in a Ponzi sche by DivineOb · · Score: 1

    Well, I don't see this other scheme as a ripoff at all... Even taking your example about fermat's last theorem... what do you care if someone is willing to pay to find out what the theorem is? I mean, there are entrance fees to enter into tournaments all the time... what if I had a quake tournament where the entrance fee was $5 and the winner, if he had been undefeated, would get prize money, otherwise nothing... whats really the difference between the two?

    --

    I must burn in hell, suffer and pay for my sins
    But Gods the one who's losing, Satan always wins!

  253. Re:How about $5000 worth of shares in a Ponzi sche by DivineOb · · Score: 1

    Well, he did mention that the 100 was to scare off people who weren't serious about it... I mean, I probably would have requested the data if it was free and I knew about the challenge... if he had charged even $1 though, I would have considered it too much trouble...

    --

    I must burn in hell, suffer and pay for my sins
    But Gods the one who's losing, Satan always wins!

  254. Re:How about $5000 worth of shares in a Ponzi sche by DivineOb · · Score: 1

    Going back over the exachange covered on the webpage, I'm more inclined to agree with you now.

    --

    I must burn in hell, suffer and pay for my sins
    But Gods the one who's losing, Satan always wins!

  255. How about $5000 worth of shares in a Ponzi scheme? by Shoten · · Score: 1

    Unbelieveable. This guy comes across a theoretically impossible task, and turns it into a scam basically, whereby anyone who wants to try to meet the challenge must pay up front. This is truly disgusting.
    BR>If this were something where a person sponsored it with a little web space, publicity, and the prize money, then ok, I can understand that, that's fantastic. But this was nothing more than a very thinly veiled scheme to make money off of other people's attempts at a valid challenge.

    Hey, I wish Fermat's Last Theorem hadn't been solved. Then I could put up a site, saying "I'll pay $5,000 to anyone who can solve it, but first send me $100. Once I have the money, I'll tell you what Fermat's Last Theorem is."

    --

    For your security, this post has been encrypted with ROT-13, twice.
  256. Re:How about $5000 worth of shares in a Ponzi sche by Shoten · · Score: 1

    Uh, yeah, I know that. It's called humor, dude...it's not made to fit like car parts in a Ferrari :)

    In other words, I was associating one scumbag scheme with another.

    --

    For your security, this post has been encrypted with ROT-13, twice.
  257. Re:How about $5000 worth of shares in a Ponzi sche by Shoten · · Score: 1

    The difference is mostly in amount, and what purpose it really serves. If you're running a quake server (hardware investment, bandwidth costs) and charge $5 a participant...well, I can see that, particularly since you're offering a service as well. But this guy didn't really offer much, did he? Do you really think that it incurred that much cost/trouble/work to get a bunch of random data (which someone else generated, btw) and FTP it to someone?

    It'd be like trying to charge $800 for a Red Hat distro, to cover shipping and handling or some strange thing like that. And what's more, it makes me question the motive behind the whole thing. My opinion is that this wasn't motivated by scientific curiosity, but just greed. He picked a challenge he didn't think anyone else could meet, and charged $100 a head just to try. Where do you think that $100 ended up?

    --

    For your security, this post has been encrypted with ROT-13, twice.
  258. Nothing special... by hyrdra · · Score: 1

    The challenge, I think, was well stated. It was clearly stated the data was to be compressed. In this case, the data was hidden by way of the filesystem. There wasn't anything clever or even novel about what that guy did. He didn't complete the challenge and the time spent on both parts was wasteful. Of course there is going to be loopholes, and literal translations, but this was simply a play on words. The point is both parties understood the terms, both were trying to trick each other AND make money at it, and both just ended up wasting time.

    --


    "I'll just chip in a bit for RedHat: I actually have that installed on my university machine." - Linus, '95
  259. A million monkeys will get a million solutions by Edgewize · · Score: 1

    An MD5 or any other checksum is only "unique" for a given file it it has as many bits as the original file. If you have 36 bits of data and do a 32-bit checksum, there are 16 (2^(36-32)) possible files that will all have that same checksum. Even if you use two non-overlapping algorithms, you cannot recreate the original file. You can create a file that has the same MD5, the same CRC32, and the same file length, but the data will be completely different.

  260. A real solution - Re:I have an idea... by Zeinfeld · · Score: 2
    Just find the spot in pi that contains a string of numbers that can represent the file. It's gotta be in there somewhere.

    First off it would be better to choose a pseudo-random sequence that is easier to generate, something like RC4 which is designed to run fast.

    Secondly the index into the random stream would increase with the number of bits, so there might be a section of Pi that met the criteria, chances are that it would be a very large number of bits out.

    There is however a possible approach that is arguably not cheating. Chances are that the person setting the challenge did not generate a perfect random stream. If they are generating megabytes of the stuff they are most likely using some pseudorandom number generator to do so.

    Cryptanalysing the input stream to identify the source could well reveal the random number generator. This would not be a cheat in my opinion.

    Of course the person setting the challenge could have done the job properly as I did for a one time pad system once. What I did there was to hook up a microphone, record a reasonably random source (a loudish fan) and sampled at 16 bits. Then after determining that the ergodicity of the source was at least 4 bits per byte I ran blocks of 1K of data through a modified version of MD5 to remove any bias in the input stream.

    --
    Looking for an Information Security student project suggestion?
    Try http://dotcrimeManifesto.com/
  261. Psuedo-random generators by markmoss · · Score: 2

    I suspect that a good enough mathematician with the right expertise probably could meet that challenge without "cheating" by shifting data into the directory. In the simplest case, if Goldman derived the file from a psuedo-random number generator, you guess the formula used, then your "decompressor" is the generator and your "compressed file" is the seed value. More generally (like if the data was truly random from an inherently random quantum process), you try matching pseudo-random (did I finally get the spelling right?) formulas against parts of the file until you hit something that fortuitously matches for long enough. Snip that segment out of the data file. Write the "decompressor" to copy all the bytes before, then generate the pseudo-random sequence, then copy the rest of the bytes.

    However, I think if you are genius enough to make that work, you probably have easier ways of making $5,000...

    1. Re:Psuedo-random generators by markmoss · · Score: 2

      I'm not much into this sort of mathematics -- but one thing I do know is that people can see "patterns" in purely random data. (Constellations, Schiaparelli's Martian "canals", all mystics, ...) And an infite random file should include every possible finite sequence. That is, if it's really random and really long, somewhere in there should be 1,2,3,4,..., and 2,4,6,8,..., and so on. What you need for this challenge is to find just one sequence in a finite random file that continues for long enough to exceed the length of the program that generates it, by enough to also cover the code to copy the rest of the data from file to file.

      That is, you need a program less than m bytes long that copies bytes 1 to n, generates 1,2,3,...,m, and copies bytes n+1 through the end of file. Since the program size doesn't scale with the length of the sequence, there is some m for which this will be true. And then you ask for a random file long enough to ensure that 1,2,3,...,m will be somewhere in it. (I don't know how to calculate the required length, and I've got a feeling the file might have to be sent as truckloads of 40G tapes...)

      Of course, the challenge doesn't promise that the file will be entirely random. It might be the output of a quantum random number generator, filtered to eliminate obvious sequences. But if it's filtered, it's not random, and some form of actual compression should give a slight reduction.

      This is not a usable compression algorithm -- it's just a one-time trick that squeezes a few bytes out of one particular extremely large file, with far more work than compressing any file is worth. But it would meet the terms of the challenge.

  262. Did I read something wrong here... by t'mbert · · Score: 1

    ...or did the author ask that his personal information NOT be posted to the newsgroup, but then he turned around and one-upped that, and posted it to /.!

    Did I miss something...

    --Those who can, do; those who can't, sue

  263. Please tell me this is a troll by drew_kime · · Score: 1

    There are some prime numbers which can be represented by (2^n)-1 (e.g. 3,7,31 but not 15 or 63)

    That sounds like you're saying 15 and 63 are primes. Is that bad grammar or bad math?

    --
    Nope, no sig
  264. Bad troll, no cookie by drew_kime · · Score: 1
    --
    Nope, no sig
  265. That wasn't in the challenge by drew_kime · · Score: 4

    If the guy had read the FAQ, he would know that such tricks are not compression

    Did the rules as stated say that the solution had to comply with the principle laid out in the FAQ? Not that I can see. If you're going to pup up a $5,000 challenge, you damn well better post all the rules.

    For instance: I had a friend who entered a paper airplane contest. He signed up for the "maximum distance" part. The only rules were that the contest organizers supplied the single piece of paper, you had five minutes to fold or tear it however you want to, and then everyone threw them at the same time. The one that stops moving farthest from the launch line wins.

    He waited until 4:45 into the time, then crumpled the paper up into a ball and threw it. While everyone else's planes were circling lazily around, his went straight, landed, and rolled several more feet.

    While you can debate all you want about whether his wad of paper counted as a "paper airplane" -- and we did debate (hey, it was college) -- he did saitsfy the rules of the contest. Had there been enough money on the line for someone to threaten legal action, I suspect he would have won.

    --
    Nope, no sig
  266. I have an idea... by jasonk3 · · Score: 1

    Just find the spot in pi that contains a string of numbers that can represent the file. It's gotta be in there somewhere.

  267. what a bunch of losers. by 3am · · Score: 1

    what a loser.

    you know, i wouldn't give patrick the 5k either

    in fact, i would have just loved it to death if this was a scam, and he lost $100 in his attempt to be clever.

    however, in the same spirit, i think that the the chalengee (Mike) should send him the 5K US in whatever the Italian equivalent of the penny is, and send it COD... how many tons of coins would that be?

    --

    A: None. The Universe spins the bulb, and the Zen master merely stays out of the way.
  268. New Contest by Public+Enemy · · Score: 1

    Send me your $100 entry fee to take part in my new contest, there is a $5000 prize! To win all you have to do is get FP with a goat post in every single Slashdot article that appears over the course of the next 24 hours.

    1. Re:New Contest by Public+Enemy · · Score: 1

      $100 entry fee received, thanks for your participation. Good luck!

  269. Someone should pay me $5000 for even reading that by CrazyJim0 · · Score: 1

    Wow alot of worthless text... damn

  270. Why doesn't this work? by Stridar · · Score: 1

    Sorry if someone already asked this question, but this thread has had a lot of activity.

    Wouldn't this be an acceptible solution to this challenge.

    Recall file length can be stored in a DWORD.
    Begin by searching through the data file for the most occurances of subsequences of length M &gt 4. For each occurance of this sequence, you can store the position in a table, costing 4 bytes, and remove the sequence from the data file, reducing it by M bytes. This gives you a savings of M-4 bytes of each occurance. Depending on your luck, you may be able to find enough occurances to allow the decompressor to work, which would need to house the following information :

    1. the original subsequence (size M)
    2. array of locations N (size 4*number of occurances)
    3. filename to decompress (2 bytes)
    4. output filename (1 byte if needn't be the same). Or alternativey 3 & 4 can be stored in the command line.
    5. code to implement the reintroduction of the subsequences into the original code, something along the lines of

    FILE * fin = open(INPUT_FILE, READ);
    FILE * fout = creat(OUTPUT_FILE);
    for (i = 0; i &lt sizeof(N)/4; i++)
    {
    while(current_position &lt *N) putc(getc(fin), fout);
    for(j = 0; j &lt sizeof(M);j++) putc(M[j], fout);
    }

    Anyone know of a general result giving the probability of an occurance of an M digit sequence appearing more than N times in a file of arbitrary length L? With appropriate values for M,N, and L, one minus this number would be Mike's chances of not having to pay...

    No sig

  271. For reference - he lost. by Haeleth · · Score: 1

    The challenge he accepted was to produce "a compressor and several COMPRESSED files whose
    total file size [was] less than the original uncompressed file and from which [he could] regenerate the original uncompressed file".
    (Emphasis mine.)

    The files he produced were not compressed. They contained blocks of the original data unchanged. Ergo, he failed the challenge, even under the modified terms.

  272. Re:Nice try but wrong by marcs · · Score: 1

    What if you tried to compress your 8 digit and 9 digit numbers (in your example) using more conventional compression methods? Could those (either both or at least one) be compressible so that your decompress routine decompresses the 8 & 9 digit numbers back to their original values before you apply n=a^2 + b

    Or will a & b always be very random also when this is applied to a highly randomized n, thus making them also equally hard to compress?

    Seems like if a & b are long enough, you might get lucky.

  273. The bogus ideas piggie bank by ren_cohen · · Score: 1


    First off, if you have the ability to compress randomly occurring data (that is, each bit has an approximately 50/50 chance of occurring), i seriously doubt you'd send $100 to some guy you don't know in the hope that he'll send you $5000 back. Why, he'd just reverse engineer the decompressor, patent it, get rich, and buy an island inhabited with young bikini clad virgins before the your check had even cleared.

    Secondly, i've been thinking about this problem for about 3 years now. :) I've tried so many methods and ways of interpreting and looking at random data, i'd have enough to write a book if i'd bothered to keep accurate logs. If i had $1 for every time i said "I'VE GOT IT!" i'd probably have $5000 already. Well, maybe that's a tad off the mark, but anyway, you get my point. Right, onto my conclusion...

    Imagine that you have three sets of numbers, X, Y and Z. X constitutes all currently compressible data (say, a bit of text). Y all seemingly random data (say, a text file compressed with zip, or Pi/2 rounded off to 4000 digits). And finally Z, all purely random-occurring evenly distributed data. Let A(n), B(n) and C(n) be functions that spit out the n'th number contained within the respective sets, in ascending order.

    Now, for all possible combinations of bits within a 128KB file (2^1048576) how many numbers fall into each of X, Y and Z?

    If an infinitely fast computer could calculate all numbers contained within the set Z, and X+Y was a few times greater than Z, then storing the value of 'n' might just compress...

    awww shit, no it won't... cos the decompressor would be huge. Ah well, that's another $1 into the bogus ideas piggie bank.

    1. Re:The bogus ideas piggie bank by ren_cohen · · Score: 1


      You're right, it would be impossible to compress ALL randomly occouring data, and it would probably be possible to compress a specific random block of data if you knew what to look for. I've never admitted that compressing random data with a general algorithm was possible, even though i still dabble with the idea of compressing certain types of random data with a general algorithm. The reason? It's just something to burn time. In the months i've spent researching the impossible i've learnt things i woulden't have otherwise. 4d math is a good example.

      The mistake both you and Mike make is that while writing a program for the general case of compressing random data is impossible, it is very different than writing a program to compress one specific file of 'random' data.

      'If' is a very usefull concept. :)

  274. Just curious but... by SlaveInRubber · · Score: 1

    Why not take the compressed file and move it to a base that is extraordinarily huge...

    A 3 megabyte file in base10 is how much in base10000? And -- just include a space after every XXXXX XXXXX to seperate them... I figure a small enough algorithm could be coded in ASM to accomplish this and -- since runtime isnt an issue (just that it accomplish it): why wouldnt that work?

    Disclaimer: I am sure that I have made some huge mistake here but I can't figure it out.. would love someone to point it out to me...

    --
    ----------- You look at life differently while suspended upside down and gagged.
  275. Slashdot 'mimics' BBSpot!! by javadood · · Score: 1

    I saw this article a few days ago on BBSpot.com. I'm an avid reader of both yet I have to side with Brian for posting his originally.