ZeoSync Makes Claim of Compression Breakthrough

← Back to Stories (view on slashdot.org)

ZeoSync Makes Claim of Compression Breakthrough

Posted by ryuzaki0 on Tuesday January 8, 2002 @01:09AM from the claims-easy-proof-hard dept.

dsb42 writes: "Reuters is reporting that ZeoSync has announced a breakthrough in data compression that allows for 100:1 lossless compression of random data. If this is true, our bandwidth problems just got a lot smaller (or our streaming video just became a lot clearer)..." This story has been submitted many times due to the astounding claims - Zeosync explicitly claims that they've superseded Claude Shannon's work. The "technical description" from their website is less than impressive. I think the odds of this being true are slim to none, but here you go, math majors and EE's - something to liven up your drab dull existence today. Update: 01/08 13:18 GMT by M : I should include a link to their press release.

20 of 989 comments (clear)

Min score:

Reason:

Sort:

100:1 ? I don't think so... by Mr+Thinly+Sliced · 2002-01-08 01:14 · Score: 5, Insightful

They claim 100:1 compression for random data. The thing is, if thats true, then lets say we have data A size (1000)

compress(A) = B

Now, B is 1/100th the size of A, right, but it too, is random, right (size 100).

On we go:
compress(B) = C (size is now 10)
compress(C) = D (size 1).

So everything compresses into 1 byte.

Or am I missing something.

Mr Thinly Sliced
1. Re:100:1 ? I don't think so... by arkanes · 2002-01-08 01:21 · Score: 5, Insightful
  
  I suspect that when they say "random" data, they are using marketing-speak random, not math-speak random. Therefore, by 'random', they mean "data with lots of repetition like music or video files, which we'll CALL random because none of you copyright-infringing IP thieving pirates will know the difference"
2. Re:100:1 ? I don't think so... by MikeTheYak · 2002-01-08 01:25 · Score: 5, Insightful
  
  It goes beyond bullshit into the realm of humor:
  
  ZeoSync has developed the TunerAccelerator(TM) in conjunction with some traditional state-of-the-art compression methodologies. This work includes the advancement of Fractals, Wavelets, DCT, FFT, Subband Coding, and Acoustic Compression that utilizes synthetic instruments. These are methods that are derived from classical physics and statistical mechanics and quantum theory, and at the highest level, this mathematical breakthrough has enabled two classical scientific methods to be improved, Huffman Compression and Arithmetic Compression, both industry standards for the past fifty years.
  
  They just threw in a bunch of compression buzzwords without even bothering to check whether they have anything to do with lossless compression...
3. Re:100:1 ? I don't think so... by biobogonics · 2002-01-08 05:45 · Score: 3, Insightful
  
  I suspect that when they say "random" data, they are using marketing-speak random, not math-speak random. Therefore, by 'random', they mean "data with lots of repetition like music or video files, which we'll CALL random because none of you copyright-infringing IP thieving pirates will know the difference"
  
  Actually, if you change the domain you can get what appears to be impressive compression. Consider a bitmapped picture of a child's line drawing of a house. Replace that by a description of the drawing commands. Of course you have not violated Shannon's theorem because the amount of information in the original drawing is actually low.
  
  At one time commercial codes were common. They were not used for secrecy, but to transmit large amounts of information when telegrams were charged by the word. The recipient looked up the code number in his codebook and reconstructed a lengthy message: "Don't buy widgets from this bozo. He does not know what he is doing."
  
  If you have a restricted set of outputs that appear to be random but are not, ie white noise sample #1, white noise sample #2 ... all you need to do is send 1, 2... and voila!
Re:how can this be? by jrockway · 2002-01-08 01:21 · Score: 4, Insightful

I'm going to agree with you here. If there's no pattern in the data, how can you find one and compress it. The reason things like gzip work well on c files (for instance) is because C code is far from random. How many times do you use void or int in a C file? a lot :)

Try compressing a wav or mpeg file with gzip. Doesn't work too well, becuase the data is "random", at least in the sense of the raw numbers. When you look at patterns that the data forms, (i.e. pictures, and relative motion) then you can "compress" that.
Here's my test for random compression :)

$ dd if=/dev/urandom of=random bs=1M count=10
$ du random
11M random
11M total
$ gzip -9 random
$ du random.gz
11M random.gz
11M total
$

no pattern == no compression
prove me wrong, please :)

--
My other car is first.
Re:Current ratio? by Sobrique · 2002-01-08 01:24 · Score: 2, Insightful

Of course, given that cpu speed increases faster than bandwidth, even if it is an issue now, it won't be in a year.
In this house we obey the 2nd law of thermodynamic by tshoppa · 2002-01-08 01:26 · Score: 3, Insightful

From the Press Release:
This press release may contain forward-looking statements. Investors are cautioned that such forward-looking statements involve risks and uncertainties, including, without limitation, financing, completion of technology development, product demand, competition, and other risks and uncertainties.
They left out Disobeying the 2nd law of Thermodynamics!
On the contrary! by Simon+Tatham · 2002-01-08 01:35 · Score: 3, Insightful

Quite the contrary: if they had claimed to be achieving 100:1 compression on truly random data, they would be provably talking total rubbish. Consider the number of possible bit strings of length N. Now consider the number of possible bit strings of length N/100. There are fewer of the latter, right? Therefore, if you can compress every length-N string into a length-N/100 string, at least two inputs must map to the same output. Hence, you can't uniquely recover the input from the output - and the compression cannot be lossless.
The fact that they hedge and talk about "practically" random sequences is the only thing that makes it possible they're telling the truth!
Re:how can this be? by Dr_Cheeks · 2002-01-08 01:46 · Score: 3, Insightful

If the data was represented a different way (say, using bits instead of bytesize data) then patterns might emerge...
With truly random data there's no pattern to find, assuming you're looking at a large enough sample, which is why everyone else on this thread is talking about the maximum compression for such data being 1:1. However, since "ZeoSync said its scientific team had succeeded on a small scale" it's likely that whatever algorithm they're using works only in limited cases.
Shannon's work on information theory is over 1/2 a century old and has been re-examined by thousands of extremely well-qualified people, so I'm finding it rather hard to accept that ZeoSync aren't talking BS.

--
ZeoTech Scientific Team fake? by dannyspanner · 2002-01-08 01:48 · Score: 4, Insightful

For example, at the top of the list Dr. Piotr Blass is listed as Chief Technical Adviser from Florida Atlantic University. But he seems to be missing from the faculty. Google doesn't turn up much on the guy either. Hmmm.

I've not even had time to check the rest yet.
Wow, now all data can be compressed in one bit!! by PEdelman · 2002-01-08 01:58 · Score: 2, Insightful

So, if practically random data can be compressed, I can compress the result again, and the result again, until I end up with one bit of data in the end? That's great! Imagine the implications: for example, every ordinary lamp is now a computer, because it holds exactly one bit of data, on or off. No wait, that can't be right.

--
Like science? Comics? Wicked...
Funny By Nature
Re:No Way... by CaseyB · 2002-01-08 02:07 · Score: 3, Insightful

It will (probably) get smaller, a reduction is more likely the bigger the file is.
It "probably" will not.
The reason is that in a random stream you may get repeating patterns (although you may not), and it's these repeating patterns which deflate uses.
Any encoding that saves space by compressing repeating data, also adds overhead for data that doesn't repeat -- at least as much overhead as you saved on the repetition, over the long run.
There ain't no such thing as a free lunch.
Re:Not random data by 3am · 2002-01-08 02:09 · Score: 2, Insightful

By your 'trivial' argument, compression of random data is impossible on any scale (you can't have a bijection between sets of different sizes).

--

A: None. The Universe spins the bulb, and the Zen master merely stays out of the way.
Re:No Way... by Eivind · 2002-01-08 02:24 · Score: 3, Insightful

Get yourself some random data (real random is of course somewhat hard to find! but the output from a crypto-strength RNG is OK) and zip it. It will (probably) get smaller, a reduction is more likely the bigger the file is.

Bullshit. There will be patterns, but the point is, all patterns are equally likely, so this does not help you. Don't believe me ? Test it yourself. Pull say a megabyte of your /dev/random (this will take a while!) And then try to compress it with all the compressors on your machine. Zip, Compress, Bzip, you name it.

The odds are very high (as in 99.999% ++) that none of the compressors will manage to shrink the file a single byte. Infact they will probably all cause it to grow very sligthly.
It sounds like crap but ... by slashdot2.2sucks · 2002-01-08 02:36 · Score: 2, Insightful
If I read it correctly (If it can be read correctly) Then they are
1. Transforming the data to a complex vector space, C^n if you will.
2. Using some very complicated seed and algorithm to generate randomish data in this complex domain that approximates the transformed data.
3. Investigaiting the differences, and storing the differences with a "complex combinatorial series".
Yes it sounds like crap but it's not as empty as social texts.
So what really is the claim? by jopet · 2002-01-08 04:37 · Score: 2, Insightful

If it means they compress arbitrary random data it is just bullshit. It is easy to prove that there exists some file that will not be compressible, and not much harder to prove that actually there are many more uncompressable random files than compressable ones (read any text about kolmogorov complexity). But of course most computer files are not at all random. Compressing a *randomly picked* computer file is something different altogether therefore, but it still hard to guarantee a certain compression if the type of information stored in the file is not known. Thats the reason why different compression algorithms for different file types exist. All in all their claim is too fuzzy to say anything ... better compression is a certain thing of the future, guaranteeing compression for random files is just another cold fusion hoax.
Re:how can this be? Answer: BitPerfectTM by Alsee · 2002-01-08 05:55 · Score: 4, Insightful

Note the results are "BitPerfectTM", rather than simply saying "perfect". They try to hide it, but they are using lossy compression. That is why repeated compression makes it smaller, more loss.

"Singular-bit-variance" and "single-point-variance" mean errors.

The trick is that they aren't randomly throwing away data. They are introducing a carefully selected error to change the data to a version that happens to compress really well. If you have 3 bits, and introduce a 1 bit error in just the right spot, it will easily compress to 1 bit.

000 and 111 both happen to compress really well, so...

000: leave as is. Store it as a single zero bit
001: add error in bit 3 turns it into 000
010: add error in bit 2 turns it into 000
011: add error in bit 1 turns it into 111
100: add error in bit 1 turns it into 000
101: add error in bit 2 turns it into 111
110: add error in bit 3 turns it into 111
111: leave as it. Store it as a single one bit.

They are using some pretty hairy math for their list of strings that compress the best. The problem is that there is no easy way to find the string almost the same as your data that just happens to be really compressable. That is why they are having "temporal" problems for anything except short test cases.

Basicly it means they *might* have a breakthrough for audio/video, but it's useless for executables etc.

-

--
- - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
Simple, it can't be by nusuth · 2002-01-08 06:19 · Score: 5, Insightful

I have been pretty late to this thread, and I'm sorry if this is redundant. I just can't read all 700 posts.
1:100 average compression on all data is just impossible. And I don't mean "improbable" or "I don't belive that", it is impossible. The reason is pigeon hole principle, for simplicity assume that we are talking about 1000bit files, although you can compress some of these 1000bit files to just 10bits, you cannot possibly compress all of them to 10bits, as with 10 bits is just 1024 different configurations while 1000bits call for representations of 2 different configurations. If you can compress the first 1024, there is simply no room to represent remaining 2-1024 files.
...And that is assuming the compression header takes no space at all...
So every loseless compression algorithm that can represent some files with other files less than original in length must expand some other files. Higher compression on some files means number of files that do not compress at all is also greater. Average compression rate other than 1 is only achiveable if there is some redundancy in original encoding. I guess you can call that redundancy "a pattern." Rar, zip, gzip etc. all achieve less than 1 compressed/original length on average because there is redundancy in originals : programs that have some instructions, prefixes with common occurance, pictures that are represented with full dword although they use a few thousand colors, sound files almost devoid of very low and very high numbers because of recording conditions etc. No compression algorithm can achive less than 1 ratio averaged over all possible strings. It is a simple consequence of pigeon hole principle and cannot be tricked.

--
Gentlemen, you can't fight in here, this is the War Room!
A BRILLIANT business move by ZeoSoft! by Rayonic · 2002-01-08 06:21 · Score: 2, Insightful

Bear with me for a moment. This kind of 'compression technology' is EXACTLY the kind of thing the MPAA has been dreading. Imagine millions of people on Morpheus trading 5MB copies of The Matrix, Star Wars and everything else. Of course it's a hoax, but if they can keep it up long enough, then maybe they'll get bought out by the MPAA, RIAA, or whoever!

ZeoSoft is ushering in the business model of the new millenium - fooling the tech-illiterate elite of today's content cartels into buying them out, then laughing all the way to the bank! I applaud ZeoSoft for their initiative, and hope to see other such business ventures in the future.

Now, if you'll excuse me, I'm off to develop a program that uses fractal-temporal equations to randomly generate sequels to popular movies! (hint, hint)

--
[PowerPoint] is a tool for capitalist presentation
Re:Infinity:1 by Anonymous Coward · 2002-01-08 14:40 · Score: 1, Insightful

The problem is you are still transmitting the same amount of data, it's just that apart from your start and stop bits the rest of the data is in analogue, not digital, form. Why use seconds anyway, why not nanoseconds? Also you could just post them a piece of metal whose length in nanometres was precisely the same value as the bytes in the file. The problem is if you wanted to store either of these things DIGITALLY - i.e. sample the phone conversion as a WAV or take a picture of the piece of metal with a digital camera, you'd use at least as much data as the original file. Unless you scaled the size of image or length of the WAV and stored the scaling factor too - but then you'd either lose resolution and thus some of your data, or else your scaling factor would need so much memory to store that you'd need as much data as the original file again. By the way, in a sense I think you've re-invented the modem!