This false claims seems to keep resurfacing
every few years. Here is a simple way to see
that it cannot be true:
Suppose we make the less amazing claim that
we can compress any random file just 2:1.
Let's consider files of just two bits in length.
There are 4: "00", "01", "10", and "11".
Let's suppose our magic compressor function is
called C. Obviously, a 2 bit file must compress
to only one bit. Since there are only two choices for one-bit files, C("00") must be "0" or "1". C("01") must be the other choice; this is because for compression to be lossless, no two different files can compress to the same result.
(or else, how would the decompressor know which
one was originally compressed???)
So we have constrained the function so far to be
(C("00") => "0" and C("01") => "1") or
(C("00") => "1" and C("01") => "0")...
Now, what will happen when we try to compress
C("10")? This is where the contradiction occurs.
There are no other unused 1-bit files left and
so the compressor cannot possibly succeed in its claim of achieving 2:1 lossless compression
even for the trivial case of 2-bit files. This same counting argument can be used to formally show that it is impossible to make a general lossless compressor than can compress any more than half of all random files of a given length by even a single bit. "Real world" compressors
like Zip expand the vast majority of random files -- they only happen to do well on "typical, useful" files that we use which contain less entropy than most random files. To see this for yourself, write a small program in your favorite language to make a pseudorandom file of bytes, then run it through your compressor. You will see that when you run it through PKZip, gzip, or whatever, it almost
always gets bigger. (If you see compression this probably indicates a problem in your pseudo random number generator)
The frauds at ZeoSync are just trying to confuse the issue by invoking technical-sounding jargon from information theory. Their crazy claims do not even stand up to the simplest analysis by counting, much less real-world testing.
Rudi Cilibrasi
This false claims seems to keep resurfacing every few years. Here is a simple way to see that it cannot be true: ...
Suppose we make the less amazing claim that we can compress any random file just 2:1.
Let's consider files of just two bits in length. There are 4: "00", "01", "10", and "11".
Let's suppose our magic compressor function is called C. Obviously, a 2 bit file must compress to only one bit. Since there are only two choices for one-bit files, C("00") must be "0" or "1". C("01") must be the other choice; this is because for compression to be lossless, no two different files can compress to the same result. (or else, how would the decompressor know which one was originally compressed???) So we have constrained the function so far to be
(C("00") => "0" and C("01") => "1") or
(C("00") => "1" and C("01") => "0")
Now, what will happen when we try to compress C("10")? This is where the contradiction occurs. There are no other unused 1-bit files left and so the compressor cannot possibly succeed in its claim of achieving 2:1 lossless compression even for the trivial case of 2-bit files. This same counting argument can be used to formally show that it is impossible to make a general lossless compressor than can compress any more than half of all random files of a given length by even a single bit. "Real world" compressors like Zip expand the vast majority of random files -- they only happen to do well on "typical, useful" files that we use which contain less entropy than most random files. To see this for yourself, write a small program in your favorite language to make a pseudorandom file of bytes, then run it through your compressor. You will see that when you run it through PKZip, gzip, or whatever, it almost always gets bigger. (If you see compression this probably indicates a problem in your pseudo random number generator) The frauds at ZeoSync are just trying to confuse the issue by invoking technical-sounding jargon from information theory. Their crazy claims do not even stand up to the simplest analysis by counting, much less real-world testing.
Rudi Cilibrasi