Slashdot Mirror


ZeoSync Makes Claim of Compression Breakthrough

dsb42 writes: "Reuters is reporting that ZeoSync has announced a breakthrough in data compression that allows for 100:1 lossless compression of random data. If this is true, our bandwidth problems just got a lot smaller (or our streaming video just became a lot clearer)..." This story has been submitted many times due to the astounding claims - Zeosync explicitly claims that they've superseded Claude Shannon's work. The "technical description" from their website is less than impressive. I think the odds of this being true are slim to none, but here you go, math majors and EE's - something to liven up your drab dull existence today. Update: 01/08 13:18 GMT by M : I should include a link to their press release.

266 of 989 comments (clear)

  1. Current ratio? by L-Wave · · Score: 2, Interesting

    Exscuse my lack of compression knowledge, but whats the current ratio? Im assuming 100:1 is pretty damn good. =) btw...even though this *might* be a good compression algorithm and all that, how long would it take to decompress a file using your joe average computer??

    --
    I SURVIVED THE GREAT SLASHDOT BLACKOUT OF 2002!
    1. Re:Current ratio? by skroz · · Score: 2

      Umm... I care. If your compression/decompression time exceeds the amount of time it would take to transfer the file uncompressed, you're really not gaining anything.

      The mathematical implications alone of such a breakthrough would be impressive. 100:1 compression of truly random data? Wow.

      --
      -- Minds are like parachutes... they work best when open.
    2. Re:Current ratio? by Sobrique · · Score: 2, Insightful

      Of course, given that cpu speed increases faster than bandwidth, even if it is an issue now, it won't be in a year.

    3. Re:Current ratio? by CaseyB · · Score: 3, Informative
      but whats the current ratio?

      For truly random data? 1:1 at the absolute best.

    4. Re:Current ratio? by radish · · Score: 5, Informative


      For lossless (e.g. zip, not jpg, mpg, divx, mp3 etc etc) you are looking at about 2:1 for 8-bit random, much better (50:1?) for ascii text (e.g. 7-bit non-random).

      If you're willing to accept loss, then the sky's the limit, mp3 @ 128kbps is about 12:1 compared to a 44k 16bit wave.

      --

      ---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"

    5. Re:Current ratio? by CaseyB · · Score: 2, Redundant

      That's not right. A 1:1 average for a large sample of random data is the best you can ever do. On a case by case basis, you can get lucky and do better, but no algorithm can compress arbitrary random data at better than 1:1 in the long run.

    6. Re:Current ratio? by mirko · · Score: 2

      There will still be a need for cheap quick disk and memory as the cpu will have to deal a lot with these. I think 5 years is more optimistic.

      --
      Trolling using another account since 2005.
    7. Re:Current ratio? by markmoss · · Score: 5, Informative

      whats the current ratio? I would take the *zip algorithms as a standard. (I've seen commercial backup software that takes twice as long to compress the data as Winzip but leaves it 1/3 larger.) Zip will compress text files (ASCII such as source code, not MS Word) at least 50% (2:1) if the files are long enough for the most efficient algorithms to work. Some highly repetitive text formats will compress by over 90% (10:1). Executable code compresses by 30 to 50%. AutoCAD .DWG (vector graphics, binary format) compresses around 30%. Back when it was practical to use PKzip to compress my whole hard drive for backup, I expected about 50% average compression. This was before I had much bit-mapped graphics on it.

      Bit-mapped graphic files (BMP) vary widely in compressibility depending on the complexity of the graphics, and whether you are willing to lose more-or-less invisible details. A BMP of black text on white paper is likely to zip (losslessly) by close to 100:1 -- and fax machines perform a very simple compression algorithm (sending white*number of pixels, black*number of pixels, etc.) that also approaches 100:1 ratios for typical memos. Photographs (where every pixel is colored a little differently) don't compress nearly as well; the JPEG format exceeds 10:1 compression, but I think it loses a little fine detail. And JPEG's compress by less than 10% when zipped.

      IMHO, 100:1 as an average (compressing your whole harddrive, for example), is far beyond "pretty damn good" and well into "unbelievable". I know of only two situations where I'd expect 100:1. One is the case of a bit-map of black and white text (e.g., faxes), the other is with lossy compression of video when you apply enough CPU power to use every trick known.

    8. Re:Current ratio? by -douggy · · Score: 2

      What about a spectacular fractal image? Sure as .jpg it could be 1/2 a meg in size but the equation to draw it.... The same with any image

    9. Re:Current ratio? by dannyspanner · · Score: 2

      Yes, but the coder (i.e. the equation) and decoder (i.e. equation to image converter) of your spectacular fractal image have to know that the equation represents a fractal image. You cannot apply this technique to arbitrary data, so my original point about the general case still stands.

      JPEG is a lossy image compression technique and we're talking about general lossless compression here.

    10. Re:Current ratio? by Graspee_Leemoor · · Score: 2, Funny

      Heheh, I always wanted to write a "gainy compression" routine. It would probably have a special marker in there like the ascii string:

      "The next three bytes are compressed!"

      graspee

    11. Re:Current ratio? by anacron · · Score: 2

      The point being you can compress random data if the decoder knows what the random data is beforehand.

      Won't this always be true for typical-use applications like compressing files and such? The bits that are to be encoded are known, because the encoder can just parse them. Things like streaming video and audio might get a bit tricky because unless you put some sophisticated buffering mechanism in place the bits to be encoded are probably not known ahead of time.

      .anacron

    12. Re:Current ratio? by radish · · Score: 2


      hence "If you're willing to accept loss..."

      :-)

      --

      ---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"

    13. Re:Current ratio? by radish · · Score: 2


      I should add some clarification (as various people have pointed out flaws in my wording). When I said "random" I didn't mean random in the mathematical sense, but rather "average binary data on your disk". My bad, I apologise to everyone who pointed out correctly that real random is 1:1.

      And what I meant by 7-bit non random is that ASCII is only 7-bit, the high bit is always 0, and certain bytes are much more common that others (think E, space etc). This makes it much easier for things like zip to compress it.

      And that post really wasn't worth +5 moderators!

      --

      ---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"

    14. Re:Current ratio? by arkanes · · Score: 2

      I used to get nearly 100:1 (about 94:1, as I recall) on database files we used to use at work. Would have gone down if they'd ever been redesigned so they were relational, but hey :P

  2. how can this be? by posmon · · Score: 3, Informative

    even lossless compression still relies on redundancy within the data, normally repeating patterns of data. surely 100-1 on TRUE random data is impossible?

    --

    update comments set karma=-1, reason='offtopic' where sid=26315

    1. Re:how can this be? by jrockway · · Score: 4, Insightful

      I'm going to agree with you here. If there's no pattern in the data, how can you find one and compress it. The reason things like gzip work well on c files (for instance) is because C code is far from random. How many times do you use void or int in a C file? a lot :)

      Try compressing a wav or mpeg file with gzip. Doesn't work too well, becuase the data is "random", at least in the sense of the raw numbers. When you look at patterns that the data forms, (i.e. pictures, and relative motion) then you can "compress" that.
      Here's my test for random compression :)

      $ dd if=/dev/urandom of=random bs=1M count=10
      $ du random
      11M random
      11M total
      $ gzip -9 random
      $ du random.gz
      11M random.gz
      11M total
      $

      no pattern == no compression
      prove me wrong, please :)

      --
      My other car is first.
    2. Re:how can this be? by skroz · · Score: 2

      They just threw out information theory entirely... too restrictive. They came up with their own theory... disinformation theory! Everyone seems to be jumping on the bandwagon, too... these guys even compiled a list of the pioneers!

      --
      -- Minds are like parachutes... they work best when open.
    3. Re:how can this be? by Rentar · · Score: 5, Funny
      I'm going to agree with you here. If there's no pattern in the data, how can you find one and compress it. The reason things like gzip work well on c files (for instance) is because C code is far from random. How many times do you use void or int in a C file? a lot :)

      So a perl programm can't be compressed?

    4. Re:how can this be? by Shimbo · · Score: 3, Interesting

      They don't claim they can compress TRUE random data only 'practically random' data. Now the digits of Pi are a good source of 'practically random' data for some definition of the phrase 'practically random'.

    5. Re:how can this be? by mccalli · · Score: 2, Informative
      even lossless compression still relies on...normally repeating patterns of data. surely 100-1 on TRUE random data is impossible?

      However, in truly random data such patterns will exist from time to time. For example, I'm going to randomly type on my keyboard now (promise this isn't fixed...):

      oqierg qjn.amdn vpaoef oqleafv z

      Look at the data. No patterns. Again....

      oejgkjnfv,cm v;aslek [p'wk/v,c

      Now look - two occurences of 'v,c'. Patterns have occured in truly random data.

      Personally, I'd tend to agree with you and consider this not possible. But I can see how patterns might crop in random data, given a sufficiently large amount of source data to work with.

      Cheers,
      Ian

    6. Re:how can this be? by sprag · · Score: 2

      Well, I can think of two ways that "random" data might be compressed without an obvious pattern:

      * If the data was represented a different way (say, using bits instead of bytesize data) then patterns might emerge, which would then be compressable. Of course, the $64k question is: will it be smaller than the original data?

      * If the set of data doesn't cover all possibilities of the encoding (i.e. only 50 characters out of 256 are actually present), then a recoding might be able to compress the data using a smaller "byte" size. In this case, 6 bits per character instead of 8. The problem with this on is that you have to scan through all of the data before you can determine the optimal bytesize...and then it still may end up being 8.

    7. Re:how can this be? by harlows_monkeys · · Score: 3, Interesting
      I realize that what I'm about to propose does not work. The challenge is to figure out why

      Here's a proposal for a compression scheme that has the following properties:

      1. It works on all bit strings of more than one bit.

      2. It is lossless and reversible.

      3. It never makes the string larger. There are some strings that don't get smaller, but see item #4.

      4. You can iterate it, to reduce any string down to 1 bit! You can use this to deal with pesky strings that don't get smaller. After enough iterations, they will be compressed.

      OK, here's my algorithm:

      Input: a string of N bits, numbered 0 to N-1.

      If all N bits are 0, the output is a string of N-1 1's. Otherwise, find the lowest numbered 1 bit. Let its position be i. The output string consists of N bits, as follows:

      Bits 0, 1, ... i-1 are 1's. Bit i is 0. Bits i+1, ..., N-1 are the same as the corresponding input bits.

      Again, let me emphasize that this is not a usable compression method!. The fun is finding the flaw.

    8. Re:how can this be? by levendis · · Score: 2

      yes, but, /dev/urandom isn't really random... if gzip was 'smart' enough, it could figure out the seed & algorithm for /dev/urandom and just save the output data that way. We don't really have any good way of generating really random data, so theoretically all data is not random and therefore arbitrarily compressible. In practice, of course, this is bullshit, and I think this press release will prove to be as well.

      --
      ---- I made the Kessel Run in under 11 parsecs.
    9. Re:how can this be? by spoonboy42 · · Score: 2

      True random data, however, is extremely rare. Even random number generator algorithms used on PCs don't generate truly random numbers, but rather "semirandom numbers" resulting from a number of operations being applied to the current timestamp. If you pull bytes out of /dev/random at specified intervals for a long enough time, you will eventually be able to discern what pattern connect these semirandom numbers to the time.

      As far as we can tell, the digits of Pi are random. They are also, however, based on mathematical relationships which can be modeled to find patterns in the digits. There are formulae to calculate any independent digit of Pi in both hexadecimal and decimal number systems, as well as known relations like e^(i*Pi) = -1.

      Anyway, the press release says that the algorithm is effective for practically random data. I'm not sure exactly what this means, but I would guess that it applies to data that is in some way human-generated. Text files might contain, say, many instances of the text strings "and" and "the", no matter what their overall content. Even media files have loads of patterns, both in their structure (16 bit chunks of audio, or VGA-sized frames) and in their content (the same background from image to image in a video, for example). Even in something as complex as a high resolution video (which we'll take to be "practically random"), there are many patterns which can be exploited for compression.

      --
      Anonymous Luddite: "What do you think of the dehumanizing effects of the Internet?"
      Andy Grove: "Not Much."
    10. Re:how can this be? by Dr_Cheeks · · Score: 3, Insightful
      If the data was represented a different way (say, using bits instead of bytesize data) then patterns might emerge...
      With truly random data there's no pattern to find, assuming you're looking at a large enough sample, which is why everyone else on this thread is talking about the maximum compression for such data being 1:1. However, since "ZeoSync said its scientific team had succeeded on a small scale" it's likely that whatever algorithm they're using works only in limited cases.

      Shannon's work on information theory is over 1/2 a century old and has been re-examined by thousands of extremely well-qualified people, so I'm finding it rather hard to accept that ZeoSync aren't talking BS.

      --

    11. Re:how can this be? by s20451 · · Score: 3, Informative

      Of course patterns occur in random data. For example, if you toss a fair coin for a long time, you will get runs of three, four, or five heads which recur from time to time. The point is that in random, noncompressible data, the probability of occurrence for any given pattern is the same as the probability of any other pattern.

      --
      Toronto-area transit rider? Rate your ride.
    12. Re:how can this be? by Catiline · · Score: 2, Informative

      Simple. You're doing binary counting. To decompress using this algorythm you need to know the number of cycles performed, for which the smallest (uncompressed) form is the original imput data.

    13. Re:how can this be? by EllisDees · · Score: 2
      Even random number generator algorithms used on PCs don't generate truly random numbers


      Actually, that depends on the hardware for that particular PC. For instance, the Pentium 2 (and possibly above), have a builtin source of real random numbers based on the thermal noise of the processor itself. Another possible source of randomness is a microphone input that isn't connected to anything.
      --
      -- Give me ambiguity or give me something else!
    14. Re:how can this be? by tjansen · · Score: 5, Informative
    15. Re:how can this be? by swillden · · Score: 2

      If you pull bytes out of /dev/random at specified intervals for a long enough time, you will eventually be able to discern what pattern connect these semirandom numbers to the time.

      Who will be able to? Not me, that's for sure.

      /dev/random uses a pool of very random bits that are distilled from truly random (though not uniformly distributed) data, and it applies a well-respected one-way hash to generate output bits from this pool of randomness. Further, it applies some fairly conservative estimation of the quality of the pool of randomness and stops providing output when the entropy drops too far (use /dev/urandom if you don't want to have to wait for output now and then).

      It is theoretically possible to determine the (large) seed and predict future outputs based only on the observed outputs (though completely impractical based on current public knowledge), but even if you determined the seed at one moment, unless you can observe/predict all of the events /dev/random uses to stir the pool, you'll quickly be wrong again.

      In practice, predicting the output of /dev/random requires complete control over the machine and its environment.

      /dev/random is not truly random, but it's as close as unclassified research knows how to make it, and it's damned good.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    16. Re:how can this be? by ergo98 · · Score: 5, Informative

      Well firstly I'd say the press release gives a pretty clear picture of the reality of their technology: It has such an overuse of supposedly TM'd (anyone want to double check the filings? I'm going to guess that there are none) "technoterms" like "TunerAccelerator" and "BinaryAccelerator" that it just is screaming hoax (or creative deception), not to mention a use of Flash that makes you want to punch something. Note that they give themselves huge openings such as always saying "practically random" data: What the hell does that mean?

      I think one way to understand it (Because all of us at some point or another have thought up some half-assed, ridiculous way of compressing any data down to 1/10th -> "Maybe I'll find a denominator and store that with a floating point representation of..."), and I'm saying this as not a mathematician or compression expert : Let's say for instance that this compression ratio is 10 to 1 on random data, and I have every possible random document 100 bytes long -> That means I have 6.6680144328798542740798517907213e+240 different random documents (256^100). So I compress them all into 10 byte documents, but the maximum variations of a 10 byte documents is 1208925819614629174706176 : There isn't the entropy in a 10-byte document to store 6.6680144328798542740798517907213e+240 different possibilities (it is simply impossible, no matter how many QuantumStreamTM HyperTechTM TechoBabbleTM TermsTM) : You end up needed, tada, 100 bytes to have the entropy to possibly store all variants of a 100 byte document, but of course most compression routines put in various logic codes and actually increase the size of the document. In the case of the ZeoSync claim though they're apparently claiming that somehow you'll represent 6.6680144328798542740798517907213e+240 different variations in a single byte : So somehow 64 tells you "Oh yeah, that's variation 5.5958572359823958293589253e+236!". Maybe they're using SubSpatialQuantumBitsTM.

    17. Re:how can this be? by Erik+Hensema · · Score: 5, Funny

      Perl source is as close to truly random data as possible.

      --

      This is your sig. There are thousands more, but this one is yours.

    18. Re:how can this be? by CaseyB · · Score: 2
      We don't really have any good way of generating really random data

      Well, not since LavaRand went down...

      "harnessing the power of Lava Lite® lamps to generate truly random numbers since 1996."

    19. Re:how can this be? by daniel_howell · · Score: 2, Funny

      Maybe they just write all the 1s and 0s *really small*?

    20. Re:how can this be? by NightWhistler · · Score: 2

      Congratulations, you have managed to prove that you simply can't compress truly random data, as everybody has been saying all along...

      The article however, states that the data is practically random... which does make a difference, 'cause else you wouldn't be able to compress anything...

      That being said, I still think it's a load of hot air... I guess we'll be seeing it in next year's vaporware top 10... ;-)

      --
      PageTurner Reader: open-source e-reader for Android with cloudsync. http://pageturner-reader.org
    21. Re:how can this be? by why-is-it · · Score: 2

      But I can see how patterns might crop in random data, given a sufficiently large amount of source data to work with.

      I think it is a given that patterns will occur in truly random data. Strictly speaking, if the probability of such a pattern existing is greater than 0, it is a certainty that it will eventually occur, given enough trials.

      The question is, will a sufficient number of patterns occur often enough that the data can be significantly compressed to warrant the CPU cycles involved?

      I agree, it does not seem possible.

      --
      *** Where are we going? And what's with this handbasket?
    22. Re:how can this be? by tshak · · Score: 2

      The reason things like gzip work well on c files (for instance) is because C code is far from random. How many times do you use void or int in a C file? a lot :)

      So a perl programm can't be compressed?


      Good question. Although my post made on my own time in my house while eating my breakfast and ignoring my clock should be compressed quite easily.

      --

      There is no longer anything that can be done with computers that is nontrivial and clearly legal. -- Paul Phillips
    23. Re:how can this be? by Nelson · · Score: 2
      It can't be, that's how it can be.


      Now I'll give them the benefit of the doubt and assume that their PR firm screwed up and they're really talking about very specific types of data but this is most likely a scam. The counting argument is proven mathematically and you can't unprove that or circumvent it.


      Let me explain. With a string of bits you can describe different types of data. Different combinations of bits can be used to describe different things, different strings of bits can be used in a multitude of ways but there is a limit to how much you can describe with a string of bits becuase it is fixed in length.


      Say I take 1 bit. It's either 0 or 1. There are only two things I can describe with that one bit without extra information. (In compression, "extra information" implies bullshit, your idea doesn't work if it needs "extra information") It's impossible to make that one bit represent 3, 4 or more things without extra information.


      With 2 bits I can represent 4 things. It's impossible to represent more. They are 00, 01, 10, and 11, there isn't a 5th possibility. From a compression standpoint that means that if I compress something down to 2 bits then with that particular compressor there are only 4 possible input that can compress to that size. One of those inputs might be the encyclopdedia britannica but that's a pretty specializae compressor then. Do the induction, 2**bits is the equation that defines the maximum number of possible inputs to produce that output of that length. This is encoding and what Shannon is most famous for, and his laws are still laws. There are a couple popular ways to do encoding with binary data in compression, arithmetic and Huffman are the most popular.


      So where is this going? Well first, recursive compression. It's not compression becuase of that "extra information" problem. How many times do I run it to get my data back? Well that's simple, you compressed the linux kernel and you ran it a number of times x. x just happens to be about the same size as the linux kernel when represented in binary, gzip that number up and keep it in a safe place and you'll be able to restore your kernel that was compressed into 1 bit.


      Second. Compressing random data. Say I compress a random string to 1/100th of it's size. To make the math easy, let's use a string that is 1024bits long, it is reduced to 10 bits (or 11 now and then.) 1024 bits can represent a lot of different things. 2**1024 of them. 10 bits can only represent 1024 different things, 2**10 = 1024. 2**1024 has 309 digits, 1024 is an teeny tiny fraction of it. Well under 1%, way way way under 1%. With that compressor that does 100:1 compression you can only compress, at the most, 1024 things out of 2**1024, without "extra information." That might be randomesque data but the fact is that if you pick a random string of 1024bit, you will very very rarely pick one of those 1024 strings that your compressor works on, it pretty much will never happen in practice because it's such a small percentage. This is a truth with all compressors. That's not to say that they can't compress a truley random string of bits 100:1, they just might want to take a few centuries to come up with the particular random string that they are going to use.


      LZ77, LZ78, Burrows-Wheeler transforms, PPM modelers, Markov modelers, all modern generic lossless compressors exploit the fact that most "interesting data" that we want to compress has redundancy to it. If you allow for some degree of redundancy or perhaps a lot of it then you reduce the number of possible inputs to the "interesting inputs" and that's usually far smaller than 2**1024 for a 1024 bit string. It's still probably orders and orders of magnitude larger than 1024 though, it's quite easy to find more than 1024 interesting things you can represent with 1024 bits, the short of it is that a 100:1 compressor will only be able to compress 1024 things of that length. No matter how you model the data, you're limited by the possible number of inputs represented by the encoding.

    24. Re:how can this be? by FlatEarther · · Score: 4, Funny

      It is possible despite the many (uninformed) negative comments that have appeared concerning this truly amazing breakthrough in compression technology. I, myself, using my own patented compression technology - The Shannon-Transmogrificator (TM) have managed to compress the entire Reuters article to a mere 4 ASCII characters (!), with essentially no loss in meaning: 'C', 'R', 'A', 'P'. I wonder if anyone can improve on this ?

    25. Re:how can this be? by istartedi · · Score: 2

      Not surprisingly, small code is one of the virtues of Perl. No need to compress it--it's already compressed!

      That said, the claim made by this company is obvious bollox (sp?). Anything much better than 1:1 on truly random data is not possible. Does anybody have a link to a mathematical proof, or better yet a layman's argument from a respected mathematician?

      --
      For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
    26. Re:how can this be? by Rupert · · Score: 2

      Hot Bits uses a Geiger-Muller tube pointed at a radiation source. About as random as you can get.

      --

      --
      E_NOSIG
    27. Re:how can this be? by SIGFPE · · Score: 2

      Real programmers write perl scripts by editing compressed source files directly so of course they can't be compressed any more...

      --
      -- SIGFPE
    28. Re:how can this be? by sprag · · Score: 2, Troll

      >With truly random data [random.org] there's no pattern to find, assuming you're looking at a large enough sample

      How big is a 'large enough sample'? Seems the larger the sample, the more likelyhood of getting longer matches.

      However, given 10 bytes from random.org: 39, 233, 196, 127, 220, 228, 10, 146, 60, 68.
      Strung together as binary they come out as:

      00100111 11101001 11000100 01111111 11011100 11100100 00001010 10010010 00111100 01000100

      Lots of little patterns in there, providing you cross byte boundaries. 4 1's in a row happens 4 times. 3 zeroes in a row come up 7 times.'10' comes up 17 times. '100' comes up 12 times. '0100' comes up 8 times.

      Can this be encoded in a way that takes less than 10 bytes? Don't know. Don't care really, but there are patterns in there.

    29. Re:how can this be? by krogoth · · Score: 2

      I have developped a program based on MD5 that can compress any file down to one byte nearly instantaneously*. It adds a .sc extension. Copy this code into a file and make it executable, then run it with the filename as the first parameter:

      #!/bin/bash
      md5sum $1 > .supercompressed
      dd if=.supercompressed of=$1.sc bs=1 count=1 > /dev/null 2>&1
      rm -f .supercompressed


      * Warning: decompression is not supported. You can tell if another file is itentical to the original by compressing it and comparing the results, but it has a very high uncertainty.

      --

      They that quote Benjamin Franklin on liberty and safety deserve neither.
    30. Re:how can this be? by the_quark · · Score: 2

      Not to be too much of a geek, but you want to pull that from /dev/random. /dev/urandom is actually psuedorandom if you pull that many bits out of it. /dev/random will block waiting for more actually random bits (which is why you should use it for things like keys), which /dev/urandom will us cryptographic hashes to "stretch" the entropy it has. Theoretically, /dev/urandom may operate as designed and yet produce data with patterns in it. /dev/random should always provide "cryptographically" random data.

    31. Re:how can this be? by GypC · · Score: 2

      the probability of occurrence for any given pattern is the same as the probability of any other pattern

      ... of the same size. the pattern '333' is far more likely to reappear than the pattern '1234321.'

    32. Re:how can this be? by Decimal · · Score: 2

      4. You can iterate it, to reduce any string down to 1 bit!

      Okay, so you've got this 4 byte file zipped down to 1 bit using this miracle compression algorithm. Let's try to decompress this file. Assume the result is "Kate". Now also compress "SuSe", "1234" and "Nick" into a 1 bit files using that same algorithm. Go ahead and decompress these. See the problem?

      Try looking at it from the ground-up. If the compressed data is 1 bit long, then you can decompress to 2 possible files. 2 bits, 4 possibilities. 3 bits, 8 possibilities. And so on. Keep in mind that there is information in what kind of compression system you are using. If I give you compressed data and ask you to decompress it, you're in a bit of a bind if you don't know what it was compressed with. GZip? RAR? LZW? Remember "One if by land, two if by sea?" Only 1 bit of information was sent, but the larger amount of information (land/_sea) was already with the people recieving the data.

      Now I'm assuming that you *could* compress every possible file to 1 bit + [file extention] using a different compression algorithm. But the number of compression algorithms you'd need to create doubles for every bit you add to the compressed file. And those compression algorithims would very likely all be larger than the original uncompressed file itself!

      Oh, and one more thing to think about: All the data on your hard drive can be considered one really large number. All your mp3s, documents, pr0n and operating system files etc. = n. And n changes every time the disk is written to.

      (Nope. I haven't read any compression FAQs. Flame me if I'm wrong. You know you want to.)

      --

      Remember "Bring 'em on"? *sigh
    33. Re:how can this be? by Debillitatus · · Score: 2

      Unfortunately, this is a pretty lossy algorithm, because if you wanted to recreate the article from your compression, you couldn't. This is simply because just about anything in the New Scientist will be compressed, using your algorithm, to the same thing... heh.

      --

      Come on, give it up, that's

  3. 100:1 ? I don't think so... by Mr+Thinly+Sliced · · Score: 5, Insightful

    They claim 100:1 compression for random data. The thing is, if thats true, then lets say we have data A size (1000)

    compress(A) = B

    Now, B is 1/100th the size of A, right, but it too, is random, right (size 100).

    On we go:
    compress(B) = C (size is now 10)
    compress(C) = D (size 1).

    So everything compresses into 1 byte.

    Or am I missing something.

    Mr Thinly Sliced

    1. Re:100:1 ? I don't think so... by oyenstikker · · Score: 5, Funny

      Maybe they'll be able to compress their debt to $1 when they go under.

      --
      The masses are the crack whores of religion.
    2. Re:100:1 ? I don't think so... by Xentax · · Score: 3, Informative

      No...the compressed data is almost certainly NOT random, so it couldn't be compressed the same way. It's also highly unlikely any other compression scheme could reduce it either.

      I'm very, very skeptical of 100:1 claims on "random" data -- it must either be large enough that even being random, there are lots of repeated sequences, or the test data is rigged.

      Or, of course, it could all be a big pile of BS designed to encourage some funding/publicity.

      Xentax

      --
      You shouldn't verb words.
    3. Re:100:1 ? I don't think so... by arkanes · · Score: 5, Insightful

      I suspect that when they say "random" data, they are using marketing-speak random, not math-speak random. Therefore, by 'random', they mean "data with lots of repetition like music or video files, which we'll CALL random because none of you copyright-infringing IP thieving pirates will know the difference"

    4. Re:100:1 ? I don't think so... by Rentar · · Score: 3, Interesting

      This is a proof ('though I doubt it is a scientificly correct one), that you can't get lossless compression with a constant compression factor! What they claim would be theroretically possible if 100:1 where an average, but I still don't think this is possible.

    5. Re:100:1 ? I don't think so... by MikeTheYak · · Score: 5, Insightful

      It goes beyond bullshit into the realm of humor:

      ZeoSync has developed the TunerAccelerator(TM) in conjunction with some traditional state-of-the-art compression methodologies. This work includes the advancement of Fractals, Wavelets, DCT, FFT, Subband Coding, and Acoustic Compression that utilizes synthetic instruments. These are methods that are derived from classical physics and statistical mechanics and quantum theory, and at the highest level, this mathematical breakthrough has enabled two classical scientific methods to be improved, Huffman Compression and Arithmetic Compression, both industry standards for the past fifty years.

      They just threw in a bunch of compression buzzwords without even bothering to check whether they have anything to do with lossless compression...

    6. Re:100:1 ? I don't think so... by Mr+Thinly+Sliced · · Score: 4, Funny

      Not only that, but I just hacked their site, and downloaded the entire source tree here it is:

      01101011

      Pop that baby in an executable shell script. Its a self extracting
      ./configure
      ./make
      ./make install

      Shh. Don't tell anyone.

      Mr Thinly Sliced

    7. Re:100:1 ? I don't think so... by larien · · Score: 2
      From their press release:
      Current technologies that enable the compression of data for transmission and storage are generally limited to compression ratios of ten-to-one. ZeoSync's Zero Space Tuner(TM) and BinaryAccelerator(TM) solutions, once fully developed, will offer compression ratios that are anticipated to approach the hundreds-to-one range
      What I read this to mean is that for some data sets, they anticipate 100:1 (or more) compression. For 'random' data, they will get some compression. Also note the 'once fully developed' phrase and the word 'anticipated'; they haven't actually achieved these results as yet; until they do, this is vapourware.

      BTW, someone shoot them for using so many TMs...

    8. Re:100:1 ? I don't think so... by cfulmer · · Score: 2

      Well, yeah. It's basic discrete math, the pigeon-hole principle. You can't map a large set into a smaller set without having some overlap. And, since you have overlap, then you won't be able to tell how to decompress your compressed data.

    9. Re:100:1 ? I don't think so... by swillden · · Score: 4, Funny

      So everything compresses into 1 byte.

      Duh, are you like an idiot or something?

      When you send me a one-byte copy of, say, The Matrix, you also have to tell me how many times it was compressed so I know how many times to run the decompressor!

      So everything compresses to *two* bytes. Maybe even three bytes if something is compressed more than 256 times. That's only required for files whose initial size is more than 100^256, though, so two bytes should do it for most applications.

      Jeez, the quality of math and CS education has really gone down the tubes.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    10. Re:100:1 ? I don't think so... by CaseyB · · Score: 2
      "Random" is not "worst-case" performance, at least it's certainly not guaranteed to be.

      It does indeed represent the worst case. "Random data" in the context of data compression means "any data whatsoever", and an algorithm that "compresses random data" is implied to compress any data at better than 1:1, over the long run.

      Defining "random data" as "this particular set of random data" is just deceptive and misleading.

    11. Re:100:1 ? I don't think so... by pmc · · Score: 4, Funny

      Duh, are you like an idiot or something?

      You're the moron, moron. When you get the one byte compressed file, you run the decompressor once to get the number of additional times to run the decompressor.

      What are they teaching the kids today? Shannon-shmannon nonsense, no doubt. They should be doing useful things, like Marketing and Management Science. There's no point in being able to count if you don't have any money.

    12. Re:100:1 ? I don't think so... by Bandman · · Score: 5, Funny

      I get the idea that this part of the algorithm is perfected by them...its the decompresser that's giving them fits...

      Step 1: Steal Underpants
      Step 3: Profit!

      We're still working on step 2

    13. Re:100:1 ? I don't think so... by Anarchofascist · · Score: 2, Interesting
      "When you send me a one-byte copy of, say, The Matrix, you also have to tell me how many times it was compressed so I know how many times to run the decompressor!"

      Not true! You don't need an extra byte for the number of times the compression has been run, as long as you compress files that are no larger than a certain size.

      If each pass reduces the size by two orders of magnitude, then 256 compressions will compress down by a factor of (on average) 10^512 = one hundred million billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion times. That's enough to compress a 1024 x 768 movie (at 50 fps and 24 bit colour) into a single byte, as long as the movie runs for less than fifty five billion eight hundred million billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion times the current age of the UNIVERSE.

      Therefore, I should easily be able to compress The Matrix into a single byte with 256 passes.

      I don't need to encode the number of compressions, every decompression consists of decompressing 256 times.

      --
      Once more unto the breach, dear friends, once more, Or close the wall up with our American dead!
    14. Re:100:1 ? I don't think so... by pbryan · · Score: 2

      On we go:
      compress(B) = C (size is now 10)
      compress(C) = D (size 1).
      So everything compresses into 1 byte.


      The press release failed to indicate that their new compression algorithm "brings order from chaos", a feature that I first recognized in the motion picture "Big Trouble in Little China".

      Conservatively assuming a 10:1 compression result in both their algorithm and more common compression algorithms, in order to achieve your one-byte result, you need to achieve it in a slightly different manner...

      randomCompress(A) = B (size is now 100, but less random, therefore less compressable by randomCompress than A.

      normalCompress(B) = C (size is now 10, but more random, therefore more suitable for randomCompress function. In the next iteration.

      randomCompress(C) = D (size is now 1, but no longer exhibits randomness nor pattern, and therfore is no longer reducable), unless extremelyLossyNonDeterministicCompress function is used, which allows this one last bit to be reduced to zero bytes, but which results in a 50:50 chance of being undecompressable.

      Another feature of their algorithm that was not mentioned was its ability to remove uncertainty from other volitile complex systems such as stock markets, and badly/over-managed economies.

      I predict this new algorithm will revolutionize the gambling industry when it is discovered that practically random events can be de-entrophized, allowing more deterministic behavior, and thus unprecidented gambling profits to result.

      --

      My car gets 40 rods to the hogshead, and that's the way I likes it!

    15. Re:100:1 ? I don't think so... by micromoog · · Score: 2
      Sorry, not true. The average is better than 1:1.

      Imagine that we use gzip to attempt to compress all possible files. If it gets smaller, we keep it. If not, we keep the original.

      Overall, some set of files will get smaller. The rest will stay the same. Therefore, we end up with better than 1:1 over the set of all possible files.

    16. Re:100:1 ? I don't think so... by stilwebm · · Score: 2

      Two things I noticed: they use the term "practical random" which I presume means much less than perfect random, such as a photograph. Also, they mention they have only applied the compression to small strings.

      For all we know, this chip could have a few registers that just mark which file it is compressing so they can spit out a single byte representation of a 100 byte file.

      Does anyone else think this site is catering towards corprate and private venture capitalists more than anything?

    17. Re:100:1 ? I don't think so... by Happy+Monkey · · Score: 3, Informative

      You then need to add one bit of data to tell whether you've compressed it or not.

      --
      __
      Do ya feel happy-go-lucky, punk?
    18. Re:100:1 ? I don't think so... by Archanagor · · Score: 2, Funny

      You know,

      If you just remove the flashy buzzwords. Their press release compresses ~100:1

      Here's the result:

      Bullshit.

    19. Re:100:1 ? I don't think so... by SuperguyA1 · · Score: 2

      Interesting. Most compression algorithms rely on certain patterns(generally repetition) within the data. I suppose if they are claiming 100:1 on any data then yes, but if this turns out NOT to be a hoax(not likely) then I'd bet the algorithm will only run on a data set once.

      --
      "as plurdled gabbleblotchits on a lurgid bee" - Prostetnic Vogon Jeltz. (One man's humorous is another mans flamebait)
    20. Re:100:1 ? I don't think so... by Rentar · · Score: 2
      Nope, wouldn't work either. The best you can get on average over all possible inputs is 1:1.

      Of course. But noone is actually likely to work with a significant perfentage of all possible inputs. What I want to say, that each usefull data, that is not already compressed is less than random (otherwise it wouldn't be useless). The really interesting average is that over the average /home/foo

    21. Re:100:1 ? I don't think so... by biobogonics · · Score: 3, Insightful

      I suspect that when they say "random" data, they are using marketing-speak random, not math-speak random. Therefore, by 'random', they mean "data with lots of repetition like music or video files, which we'll CALL random because none of you copyright-infringing IP thieving pirates will know the difference"


      Actually, if you change the domain you can get what appears to be impressive compression. Consider a bitmapped picture of a child's line drawing of a house. Replace that by a description of the drawing commands. Of course you have not violated Shannon's theorem because the amount of information in the original drawing is actually low.

      At one time commercial codes were common. They were not used for secrecy, but to transmit large amounts of information when telegrams were charged by the word. The recipient looked up the code number in his codebook and reconstructed a lengthy message: "Don't buy widgets from this bozo. He does not know what he is doing."

      If you have a restricted set of outputs that appear to be random but are not, ie white noise sample #1, white noise sample #2 ... all you need to do is send 1, 2... and voila!

    22. Re:100:1 ? I don't think so... by grytpype · · Score: 3, Funny

      I just ran another compression pass on that, and i got:

      BS

      --

      - Have a picture

    23. Re:100:1 ? I don't think so... by swillden · · Score: 3, Funny

      I don't need to encode the number of compressions, every decompression consists of decompressing 256 times.

      I think you mean at most 256 times. Supposing I had to perform 10 compressions to compress to a singe byte. After you had decompressed 10 times, you'd have the data. the next decompression would make some other file 100 times larger than the Matrix. So if you could recognize the correct file when you saw it, I could avoid transmitting the decompression count.

      So, I just have to prepend a string saying "This is it!" before compressing!

      Also, it occurred to me after my previous posting (and to another poster, I saw) that if we can compress to a single byte, why not to a single bit? This is a great advance, which I believe I shall patent quickly before that other poster does, because now I can give you my copy of The Matrix over the phone! I can just tell you if it's a 1 or 0. For that matter, I don't even have to tell you -- you can just try both possibilities!

      So my question now is, does the decompressor only produce strings of bits that exist somewhere and were once compressed, or does it produce anything? Can I just think "I want a great term paper..." and then try decompressing both 1 and 0 until I get it (in no more than 8 or ten iterations of the decompressor, 'cause I want a paper, not a novel).

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    24. Re:100:1 ? I don't think so... by zhensel · · Score: 3, Funny

      Quantum theory has everything to do with compression. Inside sources have revealed that this compression scheme works on the uncertainty principles key to quantum physics. You see, any strinng of 100 bits has a distinct probability of being compressable to a single bit. Of course, this means that this compression scheme will produce bogus results 99.999999% of the time, but think of the wonder of compression realized the other .000001% of the time! Furthermore, the system requirements for their technology are as follows: x86 PC running WindowsXP (to take advantage of DirectX in wickedly rendering the fractals neccessary for the compression), a particle accelerator, and a heavy dose of optimism combined with a complete lack of skepticism.

    25. Re:100:1 ? I don't think so... by curunir · · Score: 2, Funny

      Therefore, I should easily be able to compress The Matrix into a single byte with 256 passes.

      I'm not so sure about that...It takes a lot of bytes to represent our entire society (in 1999, at least). The AI for Hugo Weaving's character must have been a couple of gigs of code at least.

      However, if you want to compress the movie "The Matrix" into a single byte...here goes:
      <breathy_keanu_voice>Whoah...</breathy_ke anu_voice> (soundByte® compression...far from lossless compression, but this is as close as anyone will ever come to one byte compression).

      --
      "Don't blame me, I voted for Kodos!"
    26. Re:100:1 ? I don't think so... by Syberghost · · Score: 2

      Nah, they're using a table based on the Library of Congress.

      So it compresses 100 to 1, but the decompressor program is a hundred terabytes...

    27. Re:100:1 ? I don't think so... by SuperguyA1 · · Score: 2

      LOL the encoding is based on the dewey decimal system.

      --
      "as plurdled gabbleblotchits on a lurgid bee" - Prostetnic Vogon Jeltz. (One man's humorous is another mans flamebait)
    28. Re:100:1 ? I don't think so... by gnovos · · Score: 2

      Where is what you really end up with by compressing a 1 meg file using "Super compression":

      o One bit of "compressed data".
      o One number tell you how many times to run the decompressing program.

      Ok, sounds good so far, right? Well you are missing a little something here, which is that this is not some abstract model, that "number" you sent is going to be made up of bits, and, in the end, it will be a number larger than the 1 meg file you started with more often than it is not!

      Now, what you COULD do is send a virtual-number in two bits. The way you do this is you send one bit of information, and then you wait. when the number of milliseconds equals the number that you wish to send, then you send the second bit. In fact, you don't even really need that "first" bit of compressed data, all you need is the vitural number. Now you have successfully sent one meg of information in two bits, right? Yeeeesish, but you have also spent a week just doing nothing while that number was being created, so you end up taking more time that it would have taken in the first place.

      --
      "Your superior intellect is no match for our puny weapons!"
    29. Re:100:1 ? I don't think so... by gnovos · · Score: 2

      only use femto or pico seconds to do it...milli is just to slow... how fast can you switch?
      also take advantage of current frequency shifting/multiplexing, running both data streams, and time-data concurrently?

      Maybe use a combo of the two, send 10% of the data and the decoder, timed to match the other 90% in femtoseconds?

      Well, that's the rub... if your clock could actually measure in such tiny increments accurately then the bit-rate of the line would go up, thus negating the need for compression.

      --
      "Your superior intellect is no match for our puny weapons!"
    30. Re:100:1 ? I don't think so... by justin.warren · · Score: 2
      Now, B is 1/100th the size of A, right, but it too, is random, right (size 100).

      Not so. In a compression scheme, the compressed data is more organised than the original since it contains the information required to re-assemble the original from the compressed version. Thus the 1/100th size file is less random than the original and would not compress as well.

      The same thing happens if you gzip a gzip-ed file.

      --
      Just because you're paranoid doesn't mean they're NOT after you.
    31. Re:100:1 ? I don't think so... by Fjord · · Score: 2

      I ran another pass. The result:

      $

      --
      -no broken link
  4. Conserve Bandwidth? by Atzanteol · · Score: 2, Funny

    Maybe they just needed more bandwidth for their terrible site?

    --
    "Ignorance more frequently begets confidence than does knowledge"

    - Charles Darwin
  5. Time for a new law of information theory? by Anonymous Coward · · Score: 5, Funny

    The odds on a compression claim turning out to be true are always identical to the compression ratio claimed?

    1. Re:Time for a new law of information theory? by jdavidb · · Score: 2

      I hereby announce my new rot13 compression method which achieves a 1:1 compression ratio! And as an added bonus, it is legally unbreakable encryption under the DCMA!

  6. Tech details from the crappy Flash-only website by bleeeeck · · Score: 5, Informative
    ZeoSynch's Technical Process: The Pigeonhole Principle and Data Encoding Dr. Claude Shannon's dissertation on Information Theory in 1948 and his following work on run-length encoding confidently established the understanding that compression technologies are "all" predisposed to limitation. With this foundation behind us we can conclude that the effort to accelerate the transmission of information past the permutation load capacity of the binary system, and past the naturally occurring singular-bit-variances of nature can not be accomplished through compression. Rather, this problem can only be successfully resolved through the solution of what is commonly understood within the mathematical community as the "Pigeonhole Principle."

    Given a number of pigeons within a sealed room that has a single hole, and which allows only one pigeon at a time to escape the room, how many unique markers are required to individually mark all of the pigeons as each escapes, one pigeon at a time?

    After some time a person will reasonably conclude that:
    "One unique marker is required for each pigeon that flies through the hole, if there are one hundred pigeons in the group then the answer is one hundred markers". In our three dimensional world we can visualize an example. If we were to take a three-dimensional cube and collapse it into a two-dimensional edge, and then again reduce it into a one-dimensional point, and believe that we are going to successfully recover either the square or cube from the single edge, we would be sorely mistaken.

    This three-dimensional world limitation can however be resolved in higher dimensional space. In higher, multi-dimensional projective theory, it is possible to create string nodes that describe significant components of simultaneously identically yet different mathematical entities. Within this space it is possible and is not a theoretical impossibility to create a point that is simultaneously a square and also a cube. In our example all three substantially exist as unique entities yet are linked together. This simultaneous yet differentiated occurrence is the foundation of ZeoSync's Relational Differentiation Encoding(TM) (RDE(TM)) technology. This proprietary methodology is capable of intentionally introducing a multi-dimensional patterning so that the nodes of a target binary string simultaneously and/or substantially occupy the space of a Low Kolmogorov Complexity construct. The difference between these occurrences is so small that we will have for all intents and purposes successfully encoded lossley universal compression. The limitation to this Pigeonhole Principle circumvention is that the multi-dimensional space can never be super saturated, and that all of the pigeons can not be simultaneously present at which point our multi-dimensional circumvention of the pigeonhole problem breaks down.

    1. Re:Tech details from the crappy Flash-only website by MadCow42 · · Score: 2

      >> The difference between these occurrences is so small that we will have for all intents and purposes successfully encoded lossley universal compression.

      Based on this quote, they don't claim lossless... anyone believe their claim now? (ok, they claim that "these (differences are) so small that we have for all intents...)"...)

      MadCow

      --
      I used to have a sig, but I set it free and it never came back.
  7. Is this April 1st? by tshoppa · · Score: 3, Informative
    This has *long* been an April 1st joke published in such hallowed rags as BYTE and Datamation for at least as long as I've been reading them (20 years).

    The punchline to the joke was always along the lines of

    Of course, since this compression works on random data, you can repeatedly apply it to previously compressed data. So if you get 100:1 on the first compression, you get 10000:1 on the second and 1000000:1 on the third.
    1. Re:Is this April 1st? by friscolr · · Score: 2, Funny
      But this is no joke.

      Please note they claim to be able to compress data 100:1, but do not say they can decompress the resultant data back to the original.

      By the way, so can i.
      Give me your data, of any sort, of any size, and i will make it take up zero space.

      Just don't ask for it back.

  8. Press Release here by thing12 · · Score: 2, Informative
    If you don't want to wade through the flash animations...

    http://www.zeosync.com/flash/pressrelease.htm

  9. randomness by Derwen · · Score: 2
    a breakthrough in data compression that allows for 100:1 lossless compression of random data.
    That's fine if you only have random data - but a lot of mine is non-random ;o)
    - Derwen

    --
    http://fsfeurope.org/
  10. No Way... by tonywestonuk · · Score: 2, Redundant

    Pure random data is imposible to compress - If You compress 1Mb of random data (propper Random Data, not pseudo random).. and you get, say 100K's worth of compressed output; what's stopping you feading this 100K's worth back through the algorhythm, again and reduceing it down even more.... again, and again, untill the whole 1MB is squashed into a byte! (Which, obviously is a load of rubbish).....

    1. Re:No Way... by radish · · Score: 2


      Not true.

      Get yourself some random data (real random is of course somewhat hard to find! but the output from a crypto-strength RNG is OK) and zip it. It will (probably) get smaller, a reduction is more likely the bigger the file is. The reason is that in a random stream you may get repeating patterns (although you may not), and it's these repeating patterns which deflate uses. The larger your dataset the more likely there are to be significant repeated sections. Other less random data sets (e.g. plain text) will comress much better because there are statistically more repeated sections (this is at the bit level, not char level of course).

      Now the output from deflate is NOT random (I've said this in other comments on this thread), it will not have any repeating sections (the first zip run has removed them), therefore running deflate over it again will have no effect.

      This is why zipping a zip file will always have no effect (OK so sometimes you can exploit weaknesses in the file format of zip, but rarely). It is not at all important what the original data looked like, a second run will not
      improve the ratio.

      Note that I'm not saying you can compress ANY data, (I just said you can't compress compressed streams for instance!), but random data is not impossible to compress, just quite hard.

      I've seen several people use the "repeated compression = 1 byte final result" argument against this announcement here - it's inappropriate. I agree this press release is pure horse manure, but not for that reason.

      --

      ---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"

    2. Re:No Way... by CaseyB · · Score: 3, Insightful
      It will (probably) get smaller, a reduction is more likely the bigger the file is.

      It "probably" will not.

      The reason is that in a random stream you may get repeating patterns (although you may not), and it's these repeating patterns which deflate uses.

      Any encoding that saves space by compressing repeating data, also adds overhead for data that doesn't repeat -- at least as much overhead as you saved on the repetition, over the long run.

      There ain't no such thing as a free lunch.

    3. Re:No Way... by liquidsin · · Score: 2

      I've seen several people use the "repeated compression = 1 byte final result" argument against this announcement here - it's inappropriate.

      Ok, so what if I start out with 100 bytes of data, purely random (as pure as can be had...) that just happens to have no patterns that can be factored out (could happen...it's random). You mean to tell me that can be compressed at 100:1? Even if it did have some patterns to it, there's no way in hell it could crush down to 1 byte. The fact that they claim on their website that they take data and randomize with a patented technology is a good tip-off that it's a hoax.

      --
      do not read this line twice.
    4. Re:No Way... by Eivind · · Score: 3, Insightful
      Get yourself some random data (real random is of course somewhat hard to find! but the output from a crypto-strength RNG is OK) and zip it. It will (probably) get smaller, a reduction is more likely the bigger the file is.



      Bullshit. There will be patterns, but the point is, all patterns are equally likely, so this does not help you. Don't believe me ? Test it yourself. Pull say a megabyte of your /dev/random (this will take a while!) And then try to compress it with all the compressors on your machine. Zip, Compress, Bzip, you name it.



      The odds are very high (as in 99.999% ++) that none of the compressors will manage to shrink the file a single byte. Infact they will probably all cause it to grow very sligthly.

    5. Re:No Way... by radish · · Score: 3, Funny


      *Reads FAQ* *Blushes*

      OK, so I went the "negligable housekeeping route". Maybe I should get a job in the patent office. ;-)

      --

      ---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"

    6. Re:No Way... by radish · · Score: 2


      The counting argument is just as appropriate to Zip as to this new algorithm. You can't apply Zip recursively, and likewise you can't apply this thing recursively. That doesn't mean they can't have got 100:1 compression. Zip can get 100:1 on some files, just not all. If they claim to get exactly 100:1 on EVERYTHING then they're talking crap, but I didn't see that claim (maybe I missed it). As I said, I think they're talking crap anyway, I'd just like to make sure we use valid arguments to beat them into the ground ;-)

      --

      ---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"

    7. Re:No Way... by Stephan+Schulz · · Score: 2

      The odds are very high (as in 99.999% ++) that none of the compressors will manage to shrink the file a single byte. Infact they will probably all cause it to grow very sligthly.


      However, many compression programs will hide this very small growth in the file name. Gzip, for example, will never increase a file in size by simply refusing to do the compression if the file does not shrink. However, it adds a 3 byte marker (".gz") to all files it compresses, nicely hidden away in a place you don't look at.

      --

      Stephan

    8. Re:No Way... by Eivind · · Score: 2
      No. you're wrong. Gzip *will* sometimes make a file grow. It has to. Think about it: there are certain sequences of bytes in a file that are "magical" to gzip, which are meant to expand to something more than what they are.

      If those sequences show up in the file, they must be "escaped" somehow so as to make gunzip understand that they are to be interpreted literally as opposed to expanded.

      This will cause a small growth. But the growth will generally stay very low, I haven't tested, but I would guess no more than 1% growth or so.

    9. Re:No Way... by Eivind · · Score: 2
      I tested this empirically just now. A 1000 byte random file gizipped will grow to about 1028 bytes.

      A 1000000 file grows to about 1000178

      You make your own experiments and draw your own conclusions if you like.

  11. The proofs in the pudding. by neo · · Score: 5, Funny

    ZeoSync said its scientific team had succeeded on a small scale in compressing random information sequences in such a way as to allow the same data to be compressed more than 100 times over -- with no data loss. That would be at least an order of magnitude beyond current known algorithms for compacting data.

    ZeoSync announced today that the "random data" they were referencing is string of all zero's. Technically this could be produced randomly and our algorythm reduces this to just a couple of characters, a 100 times compression!!

  12. The pressrelease by grazzy · · Score: 4, Informative

    ZEOSYNC'S MATHEMATICAL BREAKTHROUGH OVERCOMES LIMITATIONS OF DATA COMPRESSION THEORY

    International Team of Scientists Have Discovered
    How to Reduce the Expression of Practically Random Information Sequences

    WEST PALM BEACH, Fla. - January 7, 2001 - ZeoSync Corp., a Florida-based scientific research company, today announced that it has succeeded in reducing the expression of practically random information sequences. Although currently demonstrating its technology on very small bit strings, ZeoSync expects to overcome the existing temporal restraints of its technology and optimize its algorithms to lead to significant changes in how data is stored and transmitted.

    Existing compression technologies are currently dependent upon the mapping and encoding of redundantly occurring mathematical structures, which are limited in application to single or several pass reduction. ZeoSync's approach to the encoding of practically random sequences is expected to evolve into the reduction of already reduced information across many reduction iterations, producing a previously unattainable reduction capability. ZeoSync intentionally randomizes naturally occurring patterns to form entropy-like random sequences through its patent pending technology known as Zero Space Tuner(TM). Once randomized, ZeoSync's BinaryAccelerator(TM) encodes these singular-bit-variance strings within complex combinatorial series to result in massively reduced BitPerfect(TM) equivalents. The combined TunerAccelerator(TM) is expected to be commercially available during 2003.

    According to Peter St. George, founder and CEO of ZeoSync and lead developer of the technology: "What we've developed is a new plateau in communications theory. Through the manipulation of binary information and translation to complex multidimensional mathematical entities, we are expecting to produce the enormous capacity of analogue signaling, with the benefit of the noise free integrity of digital communications. We perceive this advancement as a significant breakthrough to the historical limitations of digital communications as it was originally detailed by Dr. Claude Shannon in his treatise on Information Theory." [C.E. Shannon. A Mathematical Theory of Communication. Bell System Technical Journal, 27:379-423, 623-656, 1948]

    "There are potentially fantastic ramifications of this new approach in both communications and storage," St. George continued. "By significantly reducing the size of data strings, we can envision products that will reduce the cost of communications and, more importantly, improve the quality of life for people around the world regardless of where they live."

    Current technologies that enable the compression of data for transmission and storage are generally limited to compression ratios of ten-to-one. ZeoSync's Zero Space Tuner(TM) and BinaryAccelerator(TM) solutions, once fully developed, will offer compression ratios that are anticipated to approach the hundreds-to-one range.

    Many types of digital communications channels and computing systems could benefit from this discovery. The technology could enable the telecommunications industry to massively reduce huge amounts of information for delivery over limited bandwidth channels while preserving perfect quality of information.

    ZeoSync has developed the TunerAccelerator(TM) in conjunction with some traditional state-of-the-art compression methodologies. This work includes the advancement of Fractals, Wavelets, DCT, FFT, Subband Coding, and Acoustic Compression that utilizes synthetic instruments. These are methods that are derived from classical physics and statistical mechanics and quantum theory, and at the highest level, this mathematical breakthrough has enabled two classical scientific methods to be improved, Huffman Compression and Arithmetic Compression, both industry standards for the past fifty years.

    All of these traditional methods are being enhanced by ZeoSync through collaboration with top experts from Harvard University, MIT, University of California at Berkley, Stanford University, University of Florida, University of Michigan, Florida Atlantic University, Warsaw Polytechnic, Moscow State University and Nankin and Peking Universities in China, Johannes Kepler University in Lintz Austria, and the University of Arkansas, among others.

    Dr. Piotr Blass, chief technology advisor at ZeoSync, said "Our recent accomplishment is so significant that highly randomized information sequences, which were once considered non-reducible by the scientific community, are now massively reducible using advanced single-bit- variance encoding and supporting technologies."

    "The technologies that are being developed at ZeoSync are anticipated to ultimately provide a means to perform multi-pass data encoding and compression on practically random data sets with applicability to nearly every industry," said Jim Slemp, president of Radical Systems, Inc. "The evaluation of the complex algorithms is currently being performed with small practically random data sets due to the analysis times on standard computers. Based on our internally validated test results of these components, we have demonstrated a single-point-variance when encoding random data into a smaller data set. The ability to encode single-point-variance data is expected to yield multi-pass capable systems after temporal issues are addressed."

    "We would like to invite additional members of the scientific community to join us in our efforts to revolutionize digital technology," said St. George. "There is a lot of exciting work to be done."

    About ZeoSync

    Headquartered in West Palm Beach, Florida, ZeoSync is a scientific research company dedicated to advancements in communications theory and application. Additional information can be found on the company's Web site at www.ZeoSync.com or can be obtained from the company at +1 (561) 640-8464.

    This press release may contain forward-looking statements. Investors are cautioned that such forward-looking statements involve risks and uncertainties, including, without limitation, financing, completion of technology development, product demand, competition, and other risks and uncertainties.

  13. Buzzwordtastic by Steve+Cox · · Score: 2, Interesting
    I got bored reading the press release after finding the fourth trademarked buzzword in the second paragraph.


    I simply can't believe that this method of compression/encoding is so new that it requires a completely new dictionary (of words we presumably are not allowed to use).

  14. I can do better than that! by Sobrique · · Score: 2, Funny

    100 to 1? Bah, that's only 99%.
    The _real_ trick is getting 100% compression. It's actually really easy, there's a module built in to do it on your average unix.
    Simply run all your backups to the New Universal Logical Loader and perfect compression is achieved. The device driver, is of course, loaded as /dev/null.

    1. Re:I can do better than that! by kzinti · · Score: 2

      Thats fine, its the uncompressing that gets you!

      Oh, you want reversible compression? Why didn't you say so? We have to have complete specifications you know. I'm sorry that you compressed your 120GB disk full of pr0n and mp3s down to nothing, but it's not really our fault now, is it?

      --Jim

  15. In this house we obey the 2nd law of thermodynamic by tshoppa · · Score: 3, Insightful
    From the Press Release:
    This press release may contain forward-looking statements. Investors are cautioned that such forward-looking statements involve risks and uncertainties, including, without limitation, financing, completion of technology development, product demand, competition, and other risks and uncertainties.
    They left out Disobeying the 2nd law of Thermodynamics!
  16. Yes you are... by radish · · Score: 2


    B is not random. It is a description (in some format) of A.

    But, what you say does have merit, and this is why compressing a ZIP doesn't do much - there is a limit on repeated compression because the particular algorithm will output data which it itself is very bad at comrpessing further (if it didn't why not iterate once more and produce a smaller file internally?).

    --

    ---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"

    1. Re:Yes you are... by john@iastate.edu · · Score: 2
      B is not random. It is a description (in some format) of A

      If it is not random, then it has some pattern and should compress even better.

      Clearly their claim is a steaming pile of technology (if you get my drift).

      --
      Shut up, be happy. The conveniences you demanded are now mandatory. -- Jello Biafra
    2. Re:Yes you are... by radish · · Score: 2


      I agree totally, their "technology" is junk, but I was just pointing out the difference between saying you can compress random data and saying you can compress any data.

      Still, I have said enough on this topic (and been proven an idiot in other threads) so I'll shut up :-)

      --

      ---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"

  17. I could see it working in a specific context by SerpentMage · · Score: 2

    Many people may say this is bull, but think of it in another way.

    Instead of assuming that data is static, think of it constantly moving. Even in random data, moving data can be compressed because it constantly moving along. It is sort of like when a herd of people file into hall. Sure everyone is unique, but you could organize and say, "Hey five red shirts now", "ten blue shirts now".

    And I think that is what they are trying to achieve. Move the dimensions into a different plane. However, and this is what I wonder about. How fast will it actually be? I am not referring to the mathematical requirements, but the data will stream and hence you will attempt to organize. Does that organization mean that some bytes have to wait?

    --

    "You can't make a race horse of a pig"
    "No," said Samuel, "but you can make very fast pig"
    1. Re:I could see it working in a specific context by ergo98 · · Score: 2

      For lossless compression simply saying "There were 5 red shirts and 7 blue shirts" isn't enough: You'd have to also store information on exactly where those 5 red shirts and 7 shirts were in the sample to be able to recreate the situation exactly as it was. Because of this it has been found to be impossible to "compress" truly random data without actually increasing the size of the file.

      Of course if you're talking lossy then everything changes: Who cares where the shirts are just tell em how many there was. Unfortunately lossy is only relevant for images and sounds.

    2. Re:I could see it working in a specific context by SerpentMage · · Score: 2

      What I was trying to get at is the following. From their explanation they were saying that even though there is randomness there is order. And that order was explained simply using pigeons going through a hole. So create a higher plane of dimensions and things become ordered. Consider for example fractals. Totally underorder, but pattern based. I think they are using chaos mathematics, but then I may be wrong.

      --

      "You can't make a race horse of a pig"
      "No," said Samuel, "but you can make very fast pig"
  18. What's random? by Moderation+abuser · · Score: 2

    What're they talking about? 20Gb of rand() output?

    If so, they're a bunch or twits.

    --
    Government of the people, by corporate executives, for corporate profits.
  19. Been there, done that... by color+of+static · · Score: 4, Informative

    There seems to be a company claiming to exceed, go around, obliterate Shannon every few years. In the early 90's there was a company called Web (before the WWW was really around by a year or so). They made claims of compressing any data, even data that had already been compressed. It is a sad story that you should be able to find in either the sci.compression FAQ or the renewed deja archives. It basically boils down to as they got closer to market, they found some problems... you can guess the rest.
    This isn't limited to the field of compression of course. There are people that come up with "unbreakable" encryption, infinite gain amplifier (is that gain in V and I?), and all sorts of perpetual motion machines. The sad fact is that compression and encryption are not well understood enough for these ideas to be killed before a company is started or stacked on the claims.

    1. Re:Been there, done that... by Fly · · Score: 2
      Their press release states that they still need to work out some issues:
      Although currently demonstrating its technology on very small bit strings, ZeoSync expects to overcome the existing temporal restraints of its technology and optimize its algorithms to lead to significant changes in how data is stored and transmitted.

      From the general lack of information, I'm guessing that "very small bit strings" have at least about one hundred bits, unless they compress to sub-bits. ;-) And we can only infer that the "temporal" problems indicate that compressing larger strings takes an inordinate amount of time. I suppose that sounds logical enough to get some investors to hand them some money, but I disagree with them that all they have to do us simply optimize their algorithms. It sounds like they have a long, long way to go to be practical, which could very well use up any money given to them with no return.
      --
      end of line
    2. Re:Been there, done that... by Zeinfeld · · Score: 2
      This isn't limited to the field of compression of course. There are people that come up with "unbreakable" encryption, infinite gain amplifier (is that gain in V and I?), and all sorts of perpetual motion machines. The sad fact is that compression and encryption are not well understood enough for these ideas to be killed before a company is started or stacked on the claims

      I don't think that the problem is a lack of understanding of the fields in question. The problem is a surfeit of gullability on the part of the con-artist's marks.

      I remember there being a 'compression' company not long ago that blew $15 million on its start up party in Las Vegas and closed not long after when it transpired that the CEO was an ex-convict on the run from a parole violation.

      The problem is that the press take the claim to have 'disproven' some fundamental proof as giving credibility to the claimants rather than making them suspect. The argument proceeds thus 'Black is white and therefore my perpetual motion machine works', 'Black is not white and your perpetual motion machine is almost certainly a fraud', 'You are only arguing that way because you are stuck in your ways and fail to understand that black is actually white, people like you have stopped every advance in science, they laughed at Gallileo when he said the earth was a cube...'.

      --
      Looking for an Information Security student project suggestion?
      Try http://dotcrimeManifesto.com/
    3. Re:Been there, done that... by color+of+static · · Score: 2

      I hate to admit to it, but I think you are probably far more accurate on that then I was. Especially the way the press eats up the actions outlined in your last paragraph. I guess you really can't go broke underestimating the American public.

  20. Blah! by jsse · · Score: 2, Funny

    We already have lzip to compress the files down to 0% of their original size. ZeoSync doesn't catch up with latest technologies on /. it seems.

  21. It's about /practically/ random data by Telcontar · · Score: 2

    If you read the press release carefully, they claim to be able to compress practically random data, such as pictures of green grass, 100 : 1. They never claim to be able to do the same with true random data, since this is impossible.

    There may be something about that. However, there are also many points that make me sceptical, but maybe the press release has not been reviewed carefully enough.
    This new algorithm does not break Shannon's limit, which is impossible, so the phrase about the "historical limitations" is a hoax...

    1. Re:It's about /practically/ random data by gorilla · · Score: 2

      Pictures aren't 'practically random'. They've very non-random.

  22. Re:scientific method, fact... goes out the window, by Anonymous Coward · · Score: 2, Funny

    Screw ZeoSync, I've built a compression algorithm that is 1000:1 and is completely lossless. I've yet to demonstrate it in public though but please give me venture capital. Thank you.

  23. Re: Information theory says 1 by Dada · · Score: 2, Interesting

    The maximum compression ratio for random data is 1. That's no compression at all.

  24. Buzz-word ALERT! by Hougaard · · Score: 2

    ZeoSync intentionally randomizes naturally occurring patterns to form entropy-like random sequences through its patent pending technology known as Zero Space Tuner(TM). Once randomized, ZeoSync's BinaryAccelerator(TM) encodes these singular-bit-variance strings within complex combinatorial series to result in massively reduced BitPerfect(TM) equivalents. The combined TunerAccelerator(TM) is expected to be commercially available during 2003.

    I think they have made a buzz-word compression routine, even our sales peoply have difficults putting this many buzz-words in a press release :)

  25. Some background reading: by Quixote · · Score: 5, Interesting

    Section 1.9 of the comp.compression FAQ is good background reading on this stuff. In particular, read the "WEB story".

  26. I wonder if... by mirko · · Score: 2

    Most random generation uses bytes as their unit.
    Now, what if they look for bit-sequences (not only 8-bit sequences but maybe odd numbers) in order to generate patterns ?
    I guess this could be a way to significantly compress data but this'd imply a huge number of data read in order to achieve the best result possible.
    Note they may also do this in more than one pass-through but then their compression thing should be really lengthy, then.

    --
    Trolling using another account since 2005.
  27. Reminder to Self... by kramer · · Score: 2

    Never, *EVER* accept any advice from the Aberdeen Group. Apparently their analysts don't know shit.

    "Either this research is the next 'Cold Fusion' scam that dies away or it's the foundation for a Nobel Prize. I don't have an answer to which one it is yet," said David Hill, a data storage analyst with Boston-based Aberdeen Group.

    Wonder which category he expects them to win in...

    Physics, Chemistry, Economics, Physiology / Medicine, Peace or Literature

    There is no Nobel category for pure mathematics, or computing theory.

    1. Re:Reminder to Self... by CaseyB · · Score: 2

      Either literature, for their epic fantasy press releases, or economics, for their Theory of Venture Capital Greed:Ignorance ratios.

    2. Re:Reminder to Self... by kramer · · Score: 2

      Okay, it's not generally good form to follow oneself up, but I wrote the analyist "David Hill" and asked him (slightly more politely than what I said here) what he was thinking.

      He actually responded,and his response to my which category did he think it qualified for was: "Economics! It would not be a traditional award, but its economic impact would be immense."

      I think this particular analyst is in la-la land. The economics award is awarded for work on the advancing the science economics, not making money.

  28. And they got funding ... by the+bluebrain · · Score: 2, Funny

    ... by compressing some VC's bank account, by a factor of greater than 100!

    "It was just data, you know," the sobbing wretch was reportedly told, "just ones and zeros. And hey - you can look at it as a proof of principle. We'll have the general application out ... real soon now, real soon".

    --
    yes, we have no bananas
  29. On the contrary! by Simon+Tatham · · Score: 3, Insightful

    Quite the contrary: if they had claimed to be achieving 100:1 compression on truly random data, they would be provably talking total rubbish. Consider the number of possible bit strings of length N. Now consider the number of possible bit strings of length N/100. There are fewer of the latter, right? Therefore, if you can compress every length-N string into a length-N/100 string, at least two inputs must map to the same output. Hence, you can't uniquely recover the input from the output - and the compression cannot be lossless.

    The fact that they hedge and talk about "practically" random sequences is the only thing that makes it possible they're telling the truth!

    1. Re:On the contrary! by Simon+Tatham · · Score: 2

      "But hey, you're writing a compression algorithm - just use it."

      You can't hide behind that, I'm afraid. At this stage you're trying to prove that a compression algorithm of this type is feasible to write. Unfortunately, in the course of your proof you've made the assumption that a compression algorithm of this type is feasible to write! So you've proved that if it can be done, then it can be done. Undeniably true, but not 100% helpful.

      You will find that the number of possible expansions of your X/125 data string is much larger than will fit in the difference between X/125 and X/100. In fact, on average it will work out to be roughly what you can fit in the difference between X/125 and X. So you still haven't gained anything - you're still bound by the simple counting argument that says you can't uniformly reduce every length-X string into a sub-length-X string.

    2. Re:On the contrary! by Simon+Tatham · · Score: 2

      Sorry, you've missed a "two to the power" out.

      4 megabytes of data == 32 megabits == 2^25 bits. That doesn't mean there are 2^25 combinations of bits - it means there are 2^25 actual bits. The number of combinations is 2^(2^25), which is really quite a staggeringly large amount bigger.

      Similarly, there are 2^(2^18) combinations in a 32K data block, not 2^18 as you suggest. So an even distribution would in fact mean that each 32K block expands to 2^(2^25-2^18) different 4 meg blocks - which means the amount of space it would take to store that number is (2^25-2^18) bits. Coincidentally, this is exactly the amount of space by which you reduced the piece of music in the first place!

  30. Not random data by edp · · Score: 4, Redundant

    ZeoSync is not claiming to reduce random data 100-to-1. They are claiming to reduce "practically random" data 100-to-1, and Reuters appears to have misreported it. What "practically random" data should mean is data randomly selected from that used in practice. What ZeoSync may mean by "practically random" is data randomly selected from that used in their intended applications. So their press release is not mathematically impossible; it just means they've found a good way to remove more information redundancy in some data.

    The proof that 100-to-1 compression of random data is impossible is so simple as to be trivial: There are 2^N files of length N bits. There are 2^(N/100) files of length N/100 bits. Clearly not all 2^N files can be compressed to length N/100.

    1. Re:Not random data by 3am · · Score: 2, Insightful

      By your 'trivial' argument, compression of random data is impossible on any scale (you can't have a bijection between sets of different sizes).

      --

      A: None. The Universe spins the bulb, and the Zen master merely stays out of the way.
  31. Egads... by RareHeintz · · Score: 5, Funny
    ZeoSync said its scientific team had succeeded on a small scale...

    The company's claims, which are yet to be demonstrated in any public forum...

    ...if ZeoSync's formulae succeed in scaling up...

    Call the editors at Wired... I think we have an early nominee for the 2k2 vaporware list.

    ZeoSync expects to overcome the existing temporal restraints of its technology

    Ah... So even if it's not outright bullshit, it's too slow to use?

    "Either this research is the next 'Cold Fusion' scam that dies away or it's the foundation for a Nobel Prize," said David Hill...

    Somehow I think this is going to turn out more Pons-and-Fleischmann than Watson-and-Crick. Almost anytime there's a press release with such startling claims but no peer review or public demonstration, someone has forgotten to stir the jar.

    When they become laughingstocks, and their careers are forever wrecked, I hope they realized they deserve it. And I hope their investors sue them.

    I should really post after I've had my coffee... I sound mean...

    OK,
    - B

    1. Re:Egads... by shic · · Score: 2, Funny

      > > ZeoSync expects to overcome the existing temporal restraints of its technology
      > Ah... So even if it's not outright bullshit, it's too slow to use?

      No, my friend - you are missing the whole point. ZeoSync HAVE succeeded (in a limited sense.) You see, in order to achieve implausible compression rates on random data - all you need to do is overcome a few temporal issues - follow this line of thinking...

      1) Each implementation of the compression algorithm will only be applied to (a relatively small finite number of) finite sequences of bits.
      2) Encode exactly these sequences in the compression tool.
      3) Astonishing compression is achieved - only a small ordinal need be stored to represent each compressed result.

      So your data will always be small, but your compression program will grow rather quickly!

      Puzzle solved.

    2. Re:Egads... by RareHeintz · · Score: 5, Funny
      Of course! What was I thinking? Why not just use a table lookup of every possible sequence of bytes of any length?

      See you all later - I have some coding to do!

      OK,
      - B

    3. Re:Egads... by MadAhab · · Score: 2
      That's funny, but if you realize (as the majority of idiots here do not) that they are talking about transmission compression, and not storage compression, you're probably closer to what they are pretending to do than you think. So their method would actually result in larger files if you compress them, but you could have reduced bandwidth if you have the right set of lookup tables at each endpoint.

      I'd still be surprised if this were anything other than the sheerest vapor, because the objections about compressibility of random data still apply. Call it the Pigeonhole theory or whatever, but the point is that as you accumulate different varieties of non-repeating segments, the set of codes you use to refer to them grows to the same size as the data it represents.

      --
      Expanding a vast wasteland since 1996.
  32. What is compression by Vapula · · Score: 3, Interesting

    Compression, after all, is removing all redundancy from the original data.

    So, if there is no redundancy, there is nothing to remove (if you want to remain lossless).

    When you use some text, you may compres by remving some letter evn if tht lead to bad ortogrph. That is because English (as other langages) is redundant. When compressing some periodical signal, you may give only one period and tell that the signal is then repeated. When compressing bytes, there are specific methods (RLE, Huffman's trees,...)

    But, in all these situations, there was some redundancy to remove...

    A compression algorithm may not be perfect (it usually has to add some info to tell how the original data was compressed). Then, recompressing with another compression algorithm (or sometimes, the same will do the trick) may improve the compression. But the information quantity inside the data is the lower limit.

    Now, take a true random data stream of n+1 bits. Even if you know the value of the n first bits, you can't predict the value of n+1. In other words, there is no way that could allow the express these n+1 bits with n (or less) bits. By definition, true random data can't be compressed.

    And, to finish, compression ratio of 1:100 can be easily archived with some data... take a sequence of 200 bytes at 0x00... It may be compressed to 0xC8 0x00. Compression ratio is really only meaningful when comparing different algorithms compressing the same data stream.

    1. Re:What is compression by mblase · · Score: 2

      Compression, after all, is removing all redundancy from the original data.

      Of course, that's only the definition of lossless compression. Lossy compression also exists, with better compression rates and the obligatory sacrifice of detail, and that's what multimedia often relies on.

    2. Re:What is compression by _Mustang · · Score: 2

      A compression algorithm may not be perfect (it usually has to add some info to tell how the original data was compressed).

      Well - why? Granted I'm no expert on this topic but; isn't that because current algorithms manipulate the data stream differently based on some preconception of "best" manner for each *chunk of X length*(ie: block of repetitive data)?
      Why couldn't it be possible to have the a single algorythmic solution that works on the entire dataset simultaneously?

      Pardon my math but, let's use "AAABBBCD"(8 characters) as the example data.
      Traditional methods would turn that into "3A3BCD", reducing it by 2 - correct?

      What is it that prevents this from being mapped to a preexiting multidimensional table/grid (I'll use the english alphabet here) If we use position of the output data as a predetermined element of the equation then..

      "A-B-C-D-E.." etc as out table, could have the data then overlayed as 3311 equaling 4 characters.

  33. Wow, it's not 100:1 by Daath · · Score: 2
    From the press release:
    [...] once fully developed, will offer compression ratios that are anticipated to approach the hundreds-to-one range
    Hundreds to one! Someone help me breathe!! :)
    --
    Any technology distinguishable from magic, is insufficiently advanced.
  34. Might be possible... but I doubt it... by Zocalo · · Score: 3, Interesting
    Reading through the press release it seems to imply that they take the "random" data, massage the data with the "Tuner" part, then compress it with the "Accelerator" part. This spits out "BitPerfect" which I assume is their data format. It's this "massaging" of the figures where it's going to sink or swim.

    Take very large prime numbers and the like, huge strings of almost random numbers that can often be written as a trivial (2^n)-1 type formula. Maybe the massaging of the figures is simply finding a very large number that can be expressed like the above with an offset other than "-1" to get the correct "BitPerfect" data. I was toying around with this idea when there was a fad for expressing DeCSS code in unusual ways, but ran out of math before I could get it to work.

    The above theory maybe bull when it comes to the crunch, but if it could be made to work, then the compression figures are bang in the ball park for this. They laughed at Goddard remember? But I have to admit, I think replacing Einstein with the Monty Python foot better fits my take on this at present...

    --
    UNIX? They're not even circumcised! Savages!
  35. Silly web site by pen · · Score: 2

    Is it possible, at all, to trust a company whose home page has silly javascript that resizes your browser window?

  36. What happens when you run it backwards? by sprag · · Score: 4, Funny

    A thought just occurred to me: If you can do 100:1 compression and compress something down to, say, 2 bytes, what would 'ab' expand to? My thought is "ZeoSync Rulz, Suckas"

  37. They are using time travel! by harlows_monkeys · · Score: 5, Funny
    From one of the things on their site: Although currently demonstrating its technology on very small bit strings, ZeoSync expects to overcome the existing temporal restraints of its technology and optimize its algorithms to lead to significant changes in how data is stored and transmitted (emphasis added).

    Using time travel, high compression of arbitrary data is trivial. Simply record the location (in both space and time) of the computer with the data, and the name of the file, and then replace the file with a note saying when and where it existed. To decompress, you just pop back in time and space to before the time of the deletion and copy the file.

  38. Re:Practically Random by RFC959 · · Score: 2
    True, but in the field of compression, "practically random" means "random". One of the definitions of a random sequence is that you can't describe the sequence in fewer terms than the sequence itself contains - which is to say, it's incompressible. (That definition is from Pi in the Sky, by John D. Barros.)

    I was thinking about submitting the ZeoSync release, and then I thought, nah, it's just fluff, no one will be interested... It's true that a press release is usually written by suits, not scientists, so you can't expect too much real meat - but "ZeoSync's approach to the encoding of practically random sequences is expected to evolve into the reduction of already reduced information" is a real winner; if you're "reducing information", it's not lossless compression! I smell a rat. The whole thing sounds like it could have been written by the Onion, for Crom's sake.

  39. Too soon to tell... by bpowell423 · · Score: 2

    First off, they don't say it can compress "random data", they say it can compress "practically random data", which I would take to be everyday sort of data like audio and video. And they don't say that data can be compressed infinitely. _If_ whatever they have does work, I suspect it'll be an enlightening moment for the rest of us if/when they release the details of their algorithm. Sort of like, if the only thing you're familiar with is the bubble sort, quick-sort is almost magical. Well, maybe the current schemes of run-length-encoding, and whatever other pattern matching we do, is akin to the bubble sort and these guys have put their heads together and created the quick-sort of data compression.

    I'm not calling it either way, but all the "It can't be done! The world is flat!" comments are so typically... well... slashdot.

  40. Directed evolution by HalfFlat · · Score: 5, Funny

    They're looking for investment money?

    Just think of it as an innumeracy tax on
    venture capitalists.

  41. New Compression Algorithm by pb · · Score: 2

    Proposed: a method for reducing any file down to 16 bytes and losslessly restoring it.

    1. Create an MD5 hash of the file.
    2. Share it on a Peer-to-Peer filesharing client.
    3. Delete the original file.
    4. Find it again!

    Note: in trials, this method seems to work best for Britney Spears songs and videos; further research is being done on how to restore Barry Manilow songs and videos, and what to do about hash collisions (bug those uppity MD5 people again).

    --
    pb Reply or e-mail; don't vaguely moderate.
  42. ZeoTech Scientific Team fake? by dannyspanner · · Score: 4, Insightful

    For example, at the top of the list Dr. Piotr Blass is listed as Chief Technical Adviser from Florida Atlantic University. But he seems to be missing from the faculty. Google doesn't turn up much on the guy either. Hmmm.

    I've not even had time to check the rest yet.

    1. Re:ZeoTech Scientific Team fake? by dannyspanner · · Score: 2, Interesting

      Okay, the mysterious Dr. Wlodzimierz Holtzinski doesn't get a single hit on Google. Dr. Steve Smale hasn't release a paper in five years and is in his seventies. Retired, perhaps?

      I'm still not impressed.

    2. Re:ZeoTech Scientific Team fake? by Quaternion · · Score: 2, Informative

      Do you mean the Steve Smale from Berkeley who won a Fields Medal?

      smale bio

      I heard him speak at MIT, and read a paper of his that was published in the Bulletin of the American Mathematical Society... On the Mathematical Foundations of Machine Learning, with Felipe Cucker I think. That was published in Oct. 2001, which qualifies as within the last 5 years, right?

      --

      "The horse leech's daughter is a closed system. Her quantum of wantum does not vary."

    3. Re:ZeoTech Scientific Team fake? by dannyspanner · · Score: 2

      Bah! Humbug! If it's not on the web, how can it be real? :)

    4. Re:ZeoTech Scientific Team fake? by King+Babar · · Score: 5, Informative
      Okay, the mysterious Dr. Wlodzimierz Holtzinski doesn't get a single hit on Google.

      Well, that's because they mis-spelled his name. Seriously, I bet they are really trying to refer to Wlodzimierz Holsztynski, who posts to Polish newsgroups from the address "sennajawa@yahoo.com". His last contribution to the one Usenet thread that mentions "zeosync" and his name uses the word "nonsens" a lot, also the phrase "nie autoryzowalem", and the sentence "Bylem ich konsultantem, moze znowu bede, a moze nie, z nimi nie wiadom." Somebody who really knows Polish could probably have a field day with this and other posts...

      I'm getting the idea that some people on the scientific team might be better termed "random people we sent email to who actually responded once or twice".

      --

      Babar

    5. Re:ZeoTech Scientific Team fake? by cheshire_cqx · · Score: 2

      FAU Directory Search for 'blass'
      Blass, Piotr (No eMail Address Listed)
      Title : Instructor
      Department : Computer Science & Engineering
      Bldg / Room : S&E 300
      Phone Ext : 72822

    6. Re:ZeoTech Scientific Team fake? by Evacuator · · Score: 5, Informative

      With my limited understanding of polish I can add that he talks about the nonsense of him beeing in the scientific team. He also states that his name was used without any authorisation and he points out that the whole affair is only for hustling the money from investors.

      --
      Human beeing is just an advanced, self-learning machine.
    7. Re:ZeoTech Scientific Team fake? by grytpype · · Score: 2

      Wow, I wish I hadn't posted a reply, so I could mod this up. This is a bombshell. Take a look at the post, I don't think you need to know much Polish to get the flavor of what the guy is saying!

      --

      - Have a picture

  43. A useful book on data compression... by danielrendall · · Score: 2, Interesting
    Anybody interested in data compression and a whole lot else besides might want to download the book available from here

    Please don't all do so at once though :-)

    It's essentially a collection of lecture notes for a course on information theory and neural networks given by the author (David MacKay), but has been much expanded since I took the course in 1997. It will certainly show how any claim for a compression technique which works consistently on random data is bogus.

  44. The real "Pigeon hole principle" by richieb · · Score: 3, Informative
    If I recall my set theory properly the "Pigeon Hole Principle" simply states that if you have 100 holes and 101 pigeons, when you distribute all the pigeons into all holes, there will be at least one hole with at least two pigeons.

    I don't recall any of this crap about pigeons flying out of boxes. Or am I getting old?

    --
    ...richie - It is a good day to code.
    1. Re:The real "Pigeon hole principle" by Eagle7 · · Score: 2

      No, I think you are right and wrong. You're right in the sense that your viewpoint is the commonly used example... but it is also identical to what they are saying (twisted a bit).

      They are saying that if you have 100 pigeons and 1 hole, you need 100 unique markers (labels) to differentiate them. If I added a pigeon and didn't add a label, there would be two ambiguous pigeons. This is the same as 101 pigeons in 100 holes - two of them must share a hole (or share a marker). I like the multiple holes version better, but its just a different way to discribe the same thing.

      They are suggesting that by using multidimensional mathematics (meaning, I assume, greater than your usual 3 dimensions) they can alleviate this "marker" problem. They completely lose me here though, so I'll shut up. ;)

      --
      _sig_ is away
    2. Re:The real "Pigeon hole principle" by Nakoruru · · Score: 2, Informative
      I believe in this example, you HAVE TO mark a pigeon with something. There is no such thing as a pigeon without a marker (or, a pigeon without a marker is one of the 100 ways to mark a pigeon). You only have 100 different types of markers, so two pigeons would share one if you had 101 pigeons. If you leave a marker off a pigeon, this would be the same as having a 101st type of marker. In other words, if you can tell two pigeons apart, then they have been marked. You could have just as easily said "well, some pigeons have different spots on them, some are big, some are small." But that its kind of beside the point.

      Its just a silly way of saying that if you have fewer categories than things to put into categories then some categories have to have more than one thing in them. For instance, you could say there are 6 different races of people on Earth, and there are 6 billion people. So, obviously at least one of the categories has more than one person in it. It is a very simple principle, but can be used to as part of a proof to show less obvious things (sorry, no examples spring to mind).

      Don't get blinded by all this pigeon crap ^_^

    3. Re:The real "Pigeon hole principle" by richieb · · Score: 2
      Am I the only Slashdot reader who realizes the entire article is either a joke, a scam, or written by a nut?

      No. You're not the only one. :-)

      Re: "Pigeon hole principle" (PHP) is a set theoretic idea, it has nothing do to with dimensionality of space. So talking about PHP in multi dimesional spaces doesn't make sense.

      I guess PHP is a variation on the Axiom of choice , or maybe it's a consequence...

      --
      ...richie - It is a good day to code.
  45. Wow, now all data can be compressed in one bit!! by PEdelman · · Score: 2, Insightful

    So, if practically random data can be compressed, I can compress the result again, and the result again, until I end up with one bit of data in the end? That's great! Imagine the implications: for example, every ordinary lamp is now a computer, because it holds exactly one bit of data, on or off. No wait, that can't be right.

    --
    Like science? Comics? Wicked...
    Funny By Nature
  46. Bollocks by MartinG · · Score: 2

    If they can compress "random" data 100:1 then they can compress _anything_ 100:1

    Which begs the question: have they tried compressing the compressed data again to get 10000:1? If not, why not? If fact why not make the compression function iterate to get 100^n:1 compression?

    Oh, I see. That's why. It's because this technology doesn't exist and never can. It's "ZeoSync vs Physics." I know where my money is.

    --
    -- MartinG To mail me: echo kewyjlcxyzvjfxbqwh | tr bcefhjklqvwxyz .@adgimnoprstu
  47. Their claims are 100% accurate by Mr+Z · · Score: 3, Interesting

    Their claims are 100% accurate (they can compress random data 100:1) only if (by their definition) random data comprises a very small percentage of all possible data sequences. The other 99.9999% of "non-random" sequences would need to expand. You can show this by a simple counting argument.

    This is covered in great detail in the comp.compression FAQ. Take a look at the information on the WEB Technologies DataFiles/16 compressor (notice the similarity of claims!) if you're unconvinced. You can find it in Section 8 of Part 1 of the FAQ.

    --Joe
    1. Re:Their claims are 100% accurate by Baldrson · · Score: 2
      Their claims are 100% accurate (they can compress random data 100:1) only if (by their definition) random data comprises a very small percentage of all possible data sequences. The other 99.9999% of "non-random" sequences would need to expand. You can show this by a simple counting argument.

      This is covered in great detail in the comp.compression [faqs.org] FAQ. Take a look at the information on the WEB Technologies DataFiles/16 compressor (notice the similarity of claims!) if you're unconvinced. You can find it in Section 8 of Part 1 [faqs.org] of the FAQ.

      Here's the passage from your referenced FAQ:

      The WEB compressor (see details in section 9.3 below) was claimed to compress without loss *all* files of greater than 64KB in size to about 1/16th their original length. A very simple counting argument shows that this is impossible...

      Contrast with your statement above:

      random data comprises a very small percentage of all possible data sequences

      ... but then you go on to say:

      notice the similarity of claims!

      What similarity of the claims??

      Certainly if one is saying "all" files and the other is saying, as you point out above, a very small percentage of all files, the claims are so different as to render your recommended search for similarities hopelessly misleading.

  48. 100:1 on random data? Easy! by BluBrick · · Score: 2

    If it's truly random data, this compression/decompression is actually VERY easy. Compression: Strip 99 bytes out of every hundred.
    Decompression: Insert 99 random bytes in between every byte.

    What's that? You want the SAME data back? Why does it matter? It's pure random data anyway!

    Oh yeah. Have they announced a DE-compression routine yet? (I know "lossless" sort of implies that they have one, but I didn't see anything about decompression, only compression)

    Marketing rubbish as usual.

    --
    Ahh - My eye!
    The doctor said I'm not supposed to get Slashdot in it!
  49. team members by loudici · · Score: 3, Interesting

    navigating through the flash rubbish you can reach a list of team members that includes steve smale from berkeley and richard stanley from MIT who both are existing senior academics.

    so either someone has lent their names to weirdoes without paying attention or there is something of substance hidden behind the PR ugliness. after all the PR is aimed toward investors, not toward sentient human beings, and is most probably not under the control of the scientific team.

    --
    Dev elpizw tipota, dev phoboumai tipota eimai lephteros http://euclidian.org
  50. How to compress ANY data to one bit by jd · · Score: 3, Funny
    Simply have the bit big enough. Let's say you're using one of those old-fashioned binary computers, and want to compress everything to 1/Nth the size. No problem, you simply need a bit with 2^N states. Everything then fits on that single bit.


    (Of course, this DOES create all sorts of other problems, but I'm going to ignore those, because they'd go and spoil things.)

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  51. Infinite monkey compression. by Sobrique · · Score: 4, Funny

    Don't bother compressing it, just delete it, and then get an infinite number on monkeys on an infinite number of typewriters to re-produce the original.

  52. It's rare to see such a baldfaced scam by Thagg · · Score: 4, Interesting

    I was wondering as I read the headline and summary on slashdot "how can these sleazeballs possibly promote this scam, because it would be easy to show counterexamples?" This shows, once again, that I lack the imagination and chutzpah of a real con artist.

    The beauty of this scam is that zeospace claims that they can't even do it themselves, yet. They've only managed to compress very short strings. So, they can't be called to compress large random files because, well gosh, they just haven't gotten the big file compressor work yet. So, you can't prove that they are full of shit.

    Beautiful flash animation, though. I particularly like the fact that clicking the 'skip intro' button does absolutely nothing -- you get the flash garbage anyway.

    thad

    --
    I love Mondays. On a Monday, anything is possible.
    1. Re:It's rare to see such a baldfaced scam by kitts · · Score: 2, Funny

      Beautiful flash animation, though. I particularly like the fact that clicking the 'skip intro' button does absolutely nothing -- you get the flash garbage anyway.

      Actually, no. What you're seeing is their new compression methodology in action, applied to their website. By clicking on Skip Intro, you're actually hurtled through a registration process at lightning speed and signed up to several of their services, but for security purposes in order to validate those services you're redirected to the main page. However, in order to expediate the service, the exact location of the time of your click on the Skip Intro is kept in a data file in your cookies folder (you might not see it there because, you guessed it, it's compressed to a single byte), and when redirected the cookie is read to get the exact location of your click in the Flash Intro so that the intro fast-forwards to that point in time when you clicked, giving the impression of seemless, uninterrupted animation.

      Go on, give it a try. Try clicking the Skip Intro button multiple times, and you'll notice that once you click it'll look like nothing's changing, with no trace in a cookie file of where that spot is. Now THAT'S impressive. And they've got all of your personal information from that registration which you didn't even know you did compressed to a single byte on the server, just waiting to be uncompressed so they can start sending you more information (they just need to work the decompression kinks out).

      Cool, huh? I'm giving them all my money.

      --
      -------------------------------------------------- ----
      charlton heston is more of a man than yo
    2. Re:It's rare to see such a baldfaced scam by jdavidb · · Score: 2

      Beautiful flash animation, though. I particularly like the fact that clicking the 'skip intro' button does absolutely nothing -- you get the flash garbage anyway.



      Makes me glad I didn't read the article before posting. :)

  53. Not possible by Eivind · · Score: 5, Informative
    Someone already pointed out that repeated compression would give infinite compression with this method. But there's another easy way to show that no compressor can ever manage to shrink all messages

    The proof goes like this:

    • Assume someone claims a compressor that will compress any X-byte message to Y bytes where Y<X
    • There are 2^(8*X) possible messages X bytes long.
    • There are 2^(8*Y) possible messages Y bytes long.
    • Since Y is smaller than X, this means that no 1 to 1 mapping between the two sets can exist, because they're not equally large.
    You see this simply if I claim a compressor that can compress any 2-byte message to 1 byte.

    There are then 65536 possible input-messages, but onle 256 possible outputs. So It is mathemathically certain that 99.7% of the messages can not be represented in 1 byte. (regardless of how I choose to encode them)

    These claims surface ever so often. They're bullshit every time. It's even a FAQ-entry on sci.compression

    1. Re:Not possible by aiken_d · · Score: 2

      Well, I agree that the zero-whatever claim is probably bogus. But this proof here seems equally bogus to me.

      If the claim is that a compressor can reduce *any* byte sequence from X to Y bytes, sure, it's a solid proof.

      However, if you discard Zero-whatever's claim of compression "random" data (which sounds like marketing speak), and look at reasonable probabilities, it's clear that lossless compression is possible -- otherwise RLE wouldn't work.

      So, if you subscribe to the proof above, you have to shoot down not only Zero-whatever, but also RLE, which is silly.

      Me, I think Zero-whatever is full of, well, you know. However I'd like to see them debunked on some more solid basis than a literal interpretation of marketing-speak, which is always pretty questionable.

      Cheers
      -b

      --
      If I wanted a sig I would have filled in that stupid box.
  54. It's just a scam by osgeek · · Score: 2

    This is along the lines of perpetual motion machines.

    Every once in a while, some bozo claims to achieve ridiculous compression rates on random data. It's always bullshit meant to sucker in the gullible investors, or just to get some attention for some psycho loser who usually doesn't understand more than enough math needed to copy and deform a few compression theory equations out of a text book.

    Skepticism is your friend.

  55. Oh, that's easy. by BubbaFett · · Score: 2, Funny
    ZeoSync said its scientific team had succeeded on a small scale in compressing random information sequences in such a way as to allow the same data to be compressed more than 100 times over -- with no data loss.

    Ok, say I want to compress "foo" 100 times over:

    bash$ for i in $(seq 1 100); do gzip foo; mv foo.gz foo; done

  56. ZeoSync's website is all Flash? by Omnifarious · · Score: 2

    The company website is all Flash. Well, that blows my opinion of them completely. All glitz and no substance. That changed my opinion from 95% sure it was a pile of BS to 99.99%.

    1. Re:ZeoSync's website is all Flash? by Fly · · Score: 2

      Not only is it needlessly in Flash, it changes my browser window to be the size of my desktop, which is extremely annoying. Nothing says "Go away," like a 1280x1024 window with a dimwitted message about needing Flash.

      --
      end of line
  57. From the press release: Huh? by mblase · · Score: 3, Interesting
    Existing compression technologies are currently dependent upon the mapping and encoding of redundantly occurring mathematical structures, which are limited in application to single or several pass reduction. ZeoSync's approach to the encoding of practically random sequences is expected to evolve into the reduction of already reduced information across many reduction iterations, producing a previously unattainable reduction capability. ZeoSync intentionally randomizes naturally occurring patterns to form entropy-like random sequences through its patent pending technology known as Zero Space Tuner?. Once randomized, ZeoSync's BinaryAccelerator? encodes these singular-bit-variance strings within complex combinatorial series to result in massively reduced BitPerfect? equivalents. The combined TunerAccelerator? is expected to be commercially available during 2003.
    Now, I'm not as geeky as some, but this looks suspiciously like technobabble designed to impress a bunch of investors and provide long-term promises which can easily be evaded by the end of the next fiscal year. I mean, if they really did have such a technology available today, why is it going to take them an entire twelve months to integrate it into a piece of commercial software?
  58. random means unpredictable and uncompressible by peter303 · · Score: 2

    If one finds a way to predict (i.e. compress) "random" numbers, then it is no long random. That means it has some deeper mathematical structure.
    What could happen is that so-called "random" information in human cultural datasets are far from random and highly compressible.

  59. West Palm Beach, FL? by rjamestaylor · · Score: 2
    ...based in West Palm Beach, FL...

    Mathematical breakthrough from the same county that gave us the Butterfly Ballot Balyhoo? Hard to believe. ;-)

    Anyway, they're still working on tiny "bit strings" due to not yet overcoming the "temporal contraint" barrier. So, don't get all excited just yet.

    --
    -- @rjamestaylor on Ello
  60. Ok so when can I... by ImaLamer · · Score: 2

    [yoshi@ilp.ath.cx]# apt-get zeosync
    [yoshi@ilp.ath.cx]# zeosync -compress /dev/hda* HD_backup.zeo
    [yoshi@ilp.ath.cx]# ls
    -rw------- 1 yoshi users 1 Jan 08 14:25 HD_backup.zeo


    Oh, that's right never.

    [windows users: the bold 1 would be the file size of all backed up partitions on the primary disk]

  61. Re:Compression to one bit by kzinti · · Score: 3, Informative

    Seriously though, the comp.compression FAQ [faqs.org] is really worth a read, especially question #9 [faqs.org]

    YES! Ditto. Seconded. Somebody mod this guy up.

    Here's a bit to whet your appetite:

    9.1 Introduction

    It is mathematically impossible to create a program compressing without loss
    *all* files by at least one bit (see below and also item 73 in part 2 of this
    FAQ). Yet from time to time some people claim to have invented a new algorithm
    for doing so. Such algorithms are claimed to compress random data and to be
    applicable recursively, that is, applying the compressor to the compressed
    output of the previous run, possibly multiple times. Fantastic compression
    ratios of over 100:1 on random data are claimed to be actually obtained.

    Such claims inevitably generate a lot of activity on comp.compression, which
    can last for several months. Large bursts of activity were generated by WEB
    Technologies and by Jules Gilbert. Premier Research Corporation (with a
    compressor called MINC) made only a brief appearance but came back later with a
    Web page at http://www.pacminc.com. The Hyper Space method invented by David
    C. James is another contender with a patent obtained in July 96. Another large
    burst occured in Dec 97 and Jan 98: Matthew Burch applied
    for a patent in Dec 97, but publicly admitted a few days later that his method
    was flawed; he then posted several dozen messages in a few days about another
    magic method based on primes, and again ended up admitting that his new method
    was flawed. (Usually people disappear from comp.compression and appear again 6
    months or a year later, rather than admitting their error.)

    Other people have also claimed incredible compression ratios, but the programs
    (OWS, WIC) were quickly shown to be fake (not compressing at all). This topic
    is covered in item 10 of this FAQ.

  62. 17 year kid scams $900,000 in market by peter303 · · Score: 2

    At least his scam was believable enough to fool a thousand people. ZeoSync got to choose a more believable scam to beat a 17 year old.

  63. Random data, or typical data? by jcr · · Score: 2

    If they're talking about compressing what you find in a typical user's documents, or perhaps executable programs, it's *possible* that there's enough redundancy to come up with that kind of savings.

    If they're talking about 100:1 compression of a pile of bytes out of /dev/random, I flat-out don't believe it.

    -jcr

    --
    The only title of honor that a tyrant can grant is "Enemy of the State."
  64. It sounds like crap but ... by slashdot2.2sucks · · Score: 2, Insightful
    If I read it correctly (If it can be read correctly) Then they are
    1. Transforming the data to a complex vector space, C^n if you will.
    2. Using some very complicated seed and algorithm to generate randomish data in this complex domain that approximates the transformed data.
    3. Investigaiting the differences, and storing the differences with a "complex combinatorial series".
    Yes it sounds like crap but it's not as empty as social texts.
  65. OK by TheMMaster · · Score: 2

    I am in no way a compression specialist but: ZeoSync intentionally randomizes naturally occurring patterns to form entropy-like random sequences through its patent pending technology known as Zero Space Tuner(TM).

    in this phase we are going to randomize the hard work you want to send over the internet, effectifly destroying it (unless you have the seed ofcourse)

    Once randomized, ZeoSync's BinaryAccelerator(TM) encodes these singular-bit-variance strings within complex combinatorial series to result in massively reduced BitPerfect(TM) equivalents

    Now it's going to find patterns in the so called "randomized" data, and probably writing those down, now irreversibly destroying your data...

    s. The combined TunerAccelerator(TM) is expected to be commercially available during 2003.

    and they are putting it off for a year too.... hmm...

    "By significantly reducing the size of data strings, we can envision products that will reduce the cost of communications and, more importantly, improve the quality of life for people around the world regardless of where they live."

    Jezus! these guys are geniuses!!! better compression REDUCES the cost of communications... damn... I wonder what else they envision?? that the files will be smaller too???

    my conclusion

    we can randomize any string, store 1 byte, then generate another random string... which, because it is random has a snowballs chance in hell of being the same ;-)

    correct me if I'm wrong, but this really seems to be a load of crap to me. Plus they use WAY to many buzzwords

    --
    Fighting for peace is like fucking for virginity
  66. "Random data" by magi · · Score: 2

    What does the article mean with "random data"?

    1) Data with maximal entropy?
    2) Random file picked from Internet?

    In case of 1), I'd say the article is crap. If bits in the data have absolutely no dependency between them, i.e., redundancy, (also between non-adjacent bits) it is absolutely impossible to compress them. It's not even good as a fairy tale.

    In case of 2), ok, 1:100 may be possible for most non-compressed data. The new JPEG-2 algorithms can do 1:100, but it's lossy. Text compression algorithms might do 10:1 on typical text, but they are also quite fast and don't therefore find all redundancies. For example, Huffman encoding is at simplest done with just single characters, and not much longer sequences, the searching of which takes a lot of time. The redundancies do not also have to be linear; for example "wDoRrOdW" ('word' written first in lower case, and then with upper case to opposite direction) would be difficult to compress completely, although it clearly has high redundancy.

    Removing all redundancies would require finding the shortest description, i.e., a program that prints the string. To find it, we have to go through all possible programs that are shorter than printf("wDoRrOdW"). Many of them don't even terminate (for example "while(1);"). Complete search is therefore impossible; all algorithms make guesses about the topology of the search landskape, and don't search everything.

    I have absolutely no doubt that this method works well within the theoretical limits, albeit it's of course always possible that it verges the limits closer than any earlier methods.

  67. 100:1 Compression? Sure! But on the fly? by Masem · · Score: 2
    I remember a sci-fi short story on a group of scientists that were first to traverse the trip to Alpha Centauri. In the latter stages of the flight, because of their distance to earth, they developed a method to compress their reports by using a simple number cipher (A=1, B=2, etc), writing their text as a very large number, then finding some composite number N with a minimal number of unique prime terms within a few integers of their number. They then sent back that composite number and the integer. The reciever was then expected to calculate that number and then back out the message.

    While this theorhetically could work to reduce messages down upwards of 100:1 compression, both the compression and the decompression would require huge resources of computer CPU time for a message of any reasonable length. Even if you had pre-built a table of 'short unique-prime-factors integers' to make finding the optimal composite to send back, you'd still have to generate some huge N-digit number, and then the decoder would have to be able to recalculate that N-digit number from the prime representations.

    So while I'm sure this is possible, computing speeds are no-where near close enough. And it would appear this company is trying to vie it for use in compressing internet traffic. Maybe on 512-byte messages they can get something, but I doubt if it's anything close to effective for internet use.

    --
    "Pinky, you've left the lens cap of your mind on again." - P&TB
    "I can see my house from here!" - ST:
  68. Swapping data with decomp. instructions by jcasey · · Score: 2, Informative

    The flaw here is simple,

    When you reorganize the string of data, and sort by value, you must retain information on how to restore the string to its original order. There is no effiient way to save this "undo" information without negating the benefit gained from compression.

    For example:

    Given a series of random numbers: 34, 8, 244, 127

    If you reorganize them by value: 8,34,127,244

    You can create redundancy if the string is large enough - for 8 bit values, a string of 25,600 values should produce a lot of repettition - in this example, there would be an average of 10 repetitions per value (10*256=25,600).

    This is nice until you try to decompress the file. Without a record of how to reorganize the values, you are left with junk.

    Even if you keep a record with info for reorganizing the data, the overhead needed to store the undo info outweighs the compression benefit.

    If you did find an efficient way to to store the undo information, it would be more effective to simply apply this algorithm directly to the random data!

    --
    X
  69. The best quote from their site: by micromoog · · Score: 2
    Right on the first page:

    ZeoSync's HTML site will be available January 13, 2002 with costumer service agents providing chat assistance.

    So they have a set of professionals in charge of "dressing up" their technology? Isn't that normally called the marketing department?

  70. Potentially patent-infringing... by pbryan · · Score: 2

    Our patent-pending technology ReductioAdAbsurdum (TM) will likely be infringed upon by this new technology, so rest assured that our lawyers will scrutinize their compression algorithm closely.

    --

    My car gets 40 rods to the hogshead, and that's the way I likes it!

  71. Entropy... by hyrdra · · Score: 2

    First of all, it's impossible to acheive any type of ratio on random data. Good quality random data such as that from random.org simply can't be compressed. Period.

    Data compression works by finding patterns in seemingly random data. A standard video stream really doesn't contain that much unique information. That's why we can compress it pretty well without loosing too much data. However, random data is 100% unique and you must have, say, 8 bits representing 8 bits because there is no other way to represent it without loosing information.

    The claims by this company are impossible. I read their technical description and I'm still trying that around in my head. It doesn't make sense. It's called the rule of limited entropy and no data compression breakthrough can break it. You can't just make data appear out of thin air.

    Is it just me, or is this another company looking to swindle over a few VC investors? The only type of program I see here is the lie, buy, and sell high kind -- I don't buy it.

    --


    "I'll just chip in a bit for RedHat: I actually have that installed on my university machine." - Linus, '95
  72. One method - Godel numbers by DarkMan · · Score: 2

    There is one method that might work - on sum data.

    Godel encoding is an old technique for compression, with a fast decompress (P time). Unfortunatly the compression statge is NP (Maybe NP-Hard, can't remember).

    The method relies on expressing the number as an algebraic product, that can be expressed in less space than the result.

    For example, in ASCII, the string (in RPN) "7 7 ^ 34 * 99 ^ - 7 p" has 18 characters. It's expansion has 740 characters. That's a compression ratio of, what, 35:1. [Ok, so you'd never actually do it in ASCII, but it shows the technique]

    The advantages of the technique are that it gives better compression on larger numbers, in principle. In general, however, other factos come into play, and it bottoms out. My analysis suggested it bottoms out somewhere around 120-150:1.

    However, the disadvantaged of the scheme are numerous. Firstly, there is no known algorithm to encode efficently. The system can't stream, like gzip and LZW can. Thus far, it's just an interesting idea.

    I mention this because the mult-dimensional mathematics that they are reffering to have a passing similarity somthing I was playing with a couple of years ago, to look for faster algorithms (or any, really, other than brute force). It was cute, but always slower than brute force, save a few best cases :(.

    If I put my best guess ot max compression together with the uncanny similarity of the maths. Namely, you to a split into some expression, and then re-apply the algorithm to a sub expression. Then , throw it through a symbolylic computation routine, to optimise it a bit, and gzip the whole lot. It would only work well on some numbers, but you can pad it slightly to get a very different number, and try again until you get a good fit.

    So stepwise:

    ZeoSync intentionally randomizes naturally occurring patterns to form entropy-like random sequences through its patent pending technology known as Zero Space Tuner

    Pad to a value that gives good compression

    Once randomized, ZeoSync's BinaryAccelerator encodes these singular-bit-variance strings within complex combinatorial series to result in massively reduced BitPerfect equivalents

    Godel encode.

    The refference to may iterations suggest that they reapply the process to any large enough numbers left in the expression.

    And that's a scary match, in my mind.

    Of course, pinch of salt. There was a comment above about the odds of any compression thecnology being vaild being equal to teh claimed compression rate. I can't see how this might work. But I'm not writing this off just yet, it rings just true enough.

  73. Why you Can't, but Overhead is Low by Tom7 · · Score: 2

    > Why couldn't it be possible to have the a single algorythmic solution that works on the entire dataset simultaneously?

    Because if you have a "perfect" compression algorithm, then it is not always reversible. That's because it maps a larger set of files (all files of N bit length) to a smaller set of files (all files of N-1 bit length, or whatever). Therefore, the mapping can't be an injection (some two or more large files are mapped to the same small file), and therefore not reversible. So you can't uncompress the data.

    But fortunately, you can always get by with a single bit of constant overhead. Simply set that bit to 0 if the rest of the stream is compressed, to 1 if it is uncompressed. Now, if your algorithm produces a larger file than the input, just leave it uncompressed and you have only lost 1 bit.

    The argument about no "perfect" compression algorithm existing is overrated, IMO., though people like to point it out whenever a compression algorithm pops up. Of course a press release wouldn't mention that they sometimes increase file sizes by 1 bit!

    (I still do think their technology is bullshit, though.)

  74. Patterns in Data by Tom7 · · Score: 2


    It's true that you need to find patterns to compress data. What constitutes a pattern though, can be more complicated than what gzip offers.

    For instance, I can come up with a number of statistically random sequences that can be compressed very small if you "know the pattern", but will fail completely to be compressed with gzip. For example, I could take 11 MB of the binary digits of pi -- a very short program can produce these, but gzip will totally fail in compressing it. Or I could encrypt 11 MB of zero bits with RC4; if I know the key then it is also extremely easy to compress -- otherwise, it will be nearly impossible.

    So the art, really, is in finding the patterns. I'm pretty sure that ZeoSync's stuff is bullshit, but it doesn't *necessarily* mean that this kind of thing is not possible (just... unlikely).

  75. Re:I think their investment model requires pigeons by softsign · · Score: 5, Interesting
    I'm not sure if I understand your point, but from what I do understand, it seems to me you are missing it.

    If you look at this sequence as a one-dimensional series: 00101101, it's pretty hard (at least for a processor) to distinguish a pattern there... it's a pseudo-random sequence. But if I paint it this way, in 2d: (0,0) (1,0) (1,1) (0,1), I can step back and see a square with sides of length one.

    AFAIK, what these people are claiming is that they've developed a way to step WAY back, to n-dimensions, and have patterns emerge from seemingly random data.

    It's not the random-number generation that's significant here... it's the purported ability to compress a seemingly random sequence. RLE typically doesn't fare very well with pure random data because it only looks for certain types of redundancy.

    If I haven't missed the boat here, it's really a very interesting achievment.

  76. That's fantastic; let's test it! by Viking+Coder · · Score: 2

    That's just amazing! Let's test it. Here's an idea of a pretty good test :

    I'll prepare 257 files containing random data, which are each 100 bytes in length. Then, they'll be able to compress each of those files into a corresponding lossless compressed file which is one byte long! (Remember, this is supposedly 100:1 lossless compression of random data.)

    Oh, wait a sec... How can they possibly represent 257 different files, with only one byte each? That one byte can only represent 256 different possible values!

    What about if the files that I asked them to compress were only 2 bytes in length, instead of 100 bytes in length? Still, 257 of them. Since they claim to be able to do 100:1 lossless compression of random data, they should be able to do 2:1 lossless compression of random data. I mean, that's 50 times less impressive! But, wait... They still have to express 257 different files with only 256 different possible values!

    Huh... How many different files are 2-bytes long? I guess there's 65,536 of them. I only wanted them to compress 257 different files each into a byte. The task of compressing 65,536 different files into one byte is almost 256 times harder than what they already can't do!

    This is starting to sound like a theorem, or something!

    --
    Education is the silver bullet.
  77. 10000 to 1 or better by return+42 · · Score: 2
    I have a family of compression algorithms that will take an apparently-random string of 5e7 bytes or more and reversibly compress it to 5000 bytes or less. Only works on certain strings, though. Here are a few:

    3.14159265...
    2.71828182...
    1.41421356...

  78. YHBT by Sloppy · · Score: 2

    I can understand people at Reuters being trolled by this crap, but Slashdot too? Wow.

    --
    As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
  79. Leveraging the Internet? by AtariDatacenter · · Score: 2

    Perhaps another kind of breakthough could be made by leveraging the internet for the keyspace used in your compression. (Okay, I might not have the terminology quite right... that was one of my friend's realms of interest.)

    The idea is that you have a token that is given to a remote server, which sends back a stream of data. As long as the tokens were significantly shorter than the data provided, then the observed local compression would be highly significant.

    Or, put another way, you're NOT storing data on a remote server. But a remote server has a very well developed library of token/data combinations. So, when a client sends a stream of tokens to this server, they get the original stream of data back (even though the stream of data, itself, isn't recorded in whole at the server).

    Again, not for random data. And perhaps better if the tokens at the main server were geared to particular types of data with a different tokenspace for each.

    Is this idea very silly, or very good?

  80. Reminds me of the "7 Minute Abs" by uradu · · Score: 2

    in There's Something About Mary. These guys will be in great shape until someone claims 200:1 compression. Then it's back to the claims drawing board.

    -

  81. Horseshit by The+Panther! · · Score: 2

    For every data set that is compressible 100:1 (which I will grant them.. even a fool can do that), there are 99 which grow larger or the compression fails entirely.

    So, they have figured out a way to compress difficult-to-compress data rather well, but cannot compress easy stuff that LZW works on? Rather dubious, but I'll eat my words with a smile if they can put all the Star Trek episodes on a floppy disk.

    --
    Any connection between your reality and mine is purely coincidental.
  82. So what really is the claim? by jopet · · Score: 2, Insightful

    If it means they compress arbitrary random data it is just bullshit. It is easy to prove that there exists some file that will not be compressible, and not much harder to prove that actually there are many more uncompressable random files than compressable ones (read any text about kolmogorov complexity). But of course most computer files are not at all random. Compressing a *randomly picked* computer file is something different altogether therefore, but it still hard to guarantee a certain compression if the type of information stored in the file is not known. Thats the reason why different compression algorithms for different file types exist. All in all their claim is too fuzzy to say anything ... better compression is a certain thing of the future, guaranteeing compression for random files is just another cold fusion hoax.

  83. More on Holsztynski... by King+Babar · · Score: 2
    Oops; I should have mentioned that the "real" Wlodzimierz Holsztynski gets a very respectable 1510 hits on google.

    Now here's the interesting part: they used to spell his name right in a previous version of their official bios section. This could just be sloppiness, of course.

    --

    Babar

  84. If you're looking... by Anonymous+DWord · · Score: 2

    It's over here (Question 9, search for 'WEB, Gilbert').

    --
    "If he thinks he can hide and run from the United States and our allies, he's sorely mistaken." Bush on bin Laden
  85. 100:1 not too unlikely... by Kjella · · Score: 2

    If you're talking about compressed video over uncompressed. A typical DVD movie would be 720 (horizontal) x 480 (vertical) x 16 (bit, YUV2) x 29.97 (fps) x 6300 (seconds in a 105 minute movie) / 8 (bits pr byte) = ca. 100 gigabytes. In reality you'll get it as 5-6 gigabytes, while as a divx 2-pass (or similar mpeg4-codec) you will reach 100:1, at very little quality loss. Of course this is only possible because movies are *very* non-random both in each frame, and in one frame to the next.

    Kjella

    --
    Live today, because you never know what tomorrow brings
  86. Too bad we don't know all the digits of pi... by Tony+Hammitt · · Score: 2

    The binary representation of pi contains all sequences, so it is claimed.

    If only we could predict what the Nth bit of pi was going to be, then we could just specify an offset into the bit sequence and a length and we could have any file compressed as two numbers.

    One of the numbers would be pretty large, though... It could easily be as big as the bit representation of the file, but hey, who cares??
    It's still a possible algorithm. These ZeoSunc people don't seem to care about practical algorithms either...

    Gimme some VC money!!!

  87. Anyone remember the OWS hoax? by wberry · · Score: 5, Interesting

    Back in 1991 or 1992, in the days of 2400 bps modems, MS-DOS 5.0, and BBS'es, a "radical new compression tool" called OWS made the rounds. It claimed to have been written by some guy in Japan and use breakthroughs in fractal compression, often achieving 99% compression! "Better than ARJ! Better than PKzip!" Of course all my friends and I downloaded it immediately. Now we can send gam^H^H^Hfiles to each other in 10 minutes instead of 10 hours!

    Now I was in the ninth grade, and compression technology was a complete mystery to me then, so I suspected nothing at first. I installed it and read the docs. The commands and such were pretty much like PKzip. I promptly took one of my favorite ga^H^Hdirectories, *copied it to a different place*, compressed it, deleted it, and uncompressed it without problems. The compressed file was exactly 1024 bytes. Hmm, what a coincidence!

    The output looked kind of funny though:
    Compressing file abc.wad by 99%.
    Compressing file cde.wad by 99%.
    Compressing file start.bat by 99%.
    etc. Wait, start.bat is only 10 characters, that's like one bit! And why is *every* file compressed by 99%? Oh well, must be a display bug.

    So I called my friend and arranged to send him this g^Hfile via Zmodem, and it took only a few seconds. But he couldn't uncompress it on the other side. "Sector Not Found", he said. Oh well, try it again. Same result. Another bug.

    So I decided that this wasn't working out and stopped using OWS. Their user interface needed some work anyway, plus I was a little suspicious of compression bugs. The evidence was right there for me to make the now-obvious conclusion, but it didn't hit me until a few *weeks* later when all the BBS sysops were posting bulletins warning that OWS was a hoax.

    As it turns out, OWS was storing the FAT information in the compressed files, so that when people do reality checks it will appear to re-create the deleted files, as it did for me. But when they try to uncompress a file that actually isn't there or has had its FAT entries moved around, you get the "Sector Not Found" error and you're screwed. If I hadn't tried to send a compressed file to a friend I might have been duped into "compressing" and deleting half my software or more.

    All in all, a pretty cruel but effective joke. If it happened today somebody would be in federal pound-me-in-the-ass prison. Maybe it happened then too...

    (Yes, this is slightly off-topic, but where else am I going to post this?)

    --
    LAMP hosting on Debian, SSH, no bandwidth cap, PayPal accepted - http://secondbrainhosting.com/
  88. The only way this could be any better... by dave-fu · · Score: 2

    ...was if they were powered by Blacklight Power. If you're not in the know, they're a "power company" run by a "scientist" who claimed that he had been able to reproduce something that sounded suspiciously like cold fusion in his Princeton, NJ-area laboratories. The Village Voice ran a story on them (where I read about these jokers) and a whole slew of investors were lined up (in the heady days a few months before the dot-com bubble popped) and last I checked, they still haven't actually, you know. Produced what they said they would two years ago (power).
    If you've got a slow afternoon, take a gander at what physicists have to say about Blacklight...

    --
    Easy does it!
    This comment has been submitted already, 276865 hours , 59 minutes ago. No need to try again.
  89. What about Kolmogorov? by Lictor · · Score: 2, Interesting

    I think the following statement in the press release pretty much says it all:

    >We perceive this advancement as a significant
    >breakthrough to the historical limitations of
    >digital communications as it was originally
    >detailed by
    >Dr. Claude Shannon in his treatise on Information
    >Theory."

    How about algorithmic information theory? Kolmogorov, Solomonov, Chaitin? The statement above indicates that the most recent word on compression is an old Bell Labs tech report by Claude Shannon... not to put Shannon down, that work *is* a landmark, but there has certainly been more work done since.

    Try compressing the number Pi using Shannons theory... you can't do it. On the other hand, using Kolmogorov complexity, you can compress it quite nicely.

    The fact that this statement appears in the press release seems to indicate a great deal of ignorance on the part of this corporations researchers. Part of any good research program is to familiarize yourself with previous work done in the field... and AIT is *not* some obscure backwater idea... there are several conferences on this topic every year and just about every CS graduate student has seen at least Kolmogorov complexity.

    This is a pretty serious credibility robber. (Not to mention that from a mathematical standpoint, compressing totally random data is impossible under our current axioms... so if we *can* compress completely random data... its time for a new theory of the foundations of mathematics. At the risk of sounding dogmatic: do you *really* think some dot-com startup is capable of this?

    Perhaps they are, but I'm going to need to see the proofs written up nice and formally before I run out and buy snake-oi... I mean *stock*.)

  90. At the ZeoSync investor demo by SuperKendall · · Score: 2

    ZeoSync: Ladies and gentleman - observe! The random data goes in THUS, and run through our process, comes out 100 times smaller!

    ZeoSync: Now, we carefully unpack and - volia! random data of the same size as before! This is due to our patented process and a little bit of magic we like to call "length of file stored in the header".

    Investor: Hey - those first few bytes from the original and uncompressed file look totally different!

    ZeoSync: Those bytes are in there somewhere - we only said LOSSLESS compression, not ordered!

    --
    "There is more worth loving than we have strength to love." - Brian Jay Stanley
  91. Shame on you /. by SIGFPE · · Score: 2

    Next you'll be publishing stories about squaring the circle and trisecting an angle with straight edge and compasses. Claiming to be able to compress random data is the oldest joke in the CS book and you fell for it.

    --
    -- SIGFPE
  92. "practically random data" by hackerhue · · Score: 3, Funny

    The output from a pseudo-random number generator is usually considered "random enough for practical purposes." So if you define "practically random data" as "data that is random enough for practical purposes," you can compress it by storing the random seed and the string length. ;-)

    I think I can beat their 100:1 compression ratio with this scheme.

    --

    To get something done, a committee should consist of no more than three persons, two of them absent.

  93. An obvious fake by arvindn · · Score: 2, Informative


    100:1 ratio? On random data?
    Considerations far more elementary than Shannon's limits rule out compression of statistically random data by even a single bit. Here's why:
    There are 2^n bit strings of length n. Any compression method purporting to compress random strings (by even a single bit) must produce output of length at most n-1 for these 2^n inputs. But in that case the mapping is not unique, since there are only (2^n)-1 bit strings of length n-1 or less. (So decoding is not possible.)
    Once every so often some "researchers" claim to have attained the holy grail of compression. Too bad we never hear of them again :(

    From the comp.compression faq
    this topic has generated and is still generating the greatest volume of news in the history of comp.compression
    ...
    The advertized revolutionary methods have all in common their supposed ability to compress random or already compressed data. I will keep this item in the FAQ to encourage people to take such claims with great precautions

  94. Re:impossible by recursiv · · Score: 2

    Yes it does.
    Compressing random data is impossible!

    --
    I used to bulls-eye womp-rats in my pants
  95. Re:Byte Magazine had a very similar article 14yrs by Eric+Smith · · Score: 2
    I was just thinking about that BYTE article. I'm not sure which issue it was in. I think it was in a news blurb sort of column. IIRC, they claimed their compression algorithm was "not affected by the laws of information theory".

    The reporter wrote glowing things about how when he decompressed his files, they had the right size and timestamp. There was a small matter of the contents being wrong, but the company had assured him that this was just a small glitch in the beta version that would be fixed in the final release.

    I can imagine that some junior reporter might fall for this, but where the heck was the editor?

    I imagine that the whole stunt was probably part of a scam to defraud some investors. Get it published in a magzine, and it must be legit, right? I wouldn't be the least bit surprised if this new "lossless compression algorithm" proved to be such a scheme.

    BYTE went seriously downhill around 1985 or so. A friend seems to think that it was a result of Steve Ciarcia moving on, but I don't think that fully explains it. Before that, there were plenty of technical articles by other authors, but BYTE turned into a rag full of mostly non-technical reviews.

  96. Re:how can this be? Answer: BitPerfectTM by Alsee · · Score: 4, Insightful

    Note the results are "BitPerfectTM", rather than simply saying "perfect". They try to hide it, but they are using lossy compression. That is why repeated compression makes it smaller, more loss.

    "Singular-bit-variance" and "single-point-variance" mean errors.

    The trick is that they aren't randomly throwing away data. They are introducing a carefully selected error to change the data to a version that happens to compress really well. If you have 3 bits, and introduce a 1 bit error in just the right spot, it will easily compress to 1 bit.

    000 and 111 both happen to compress really well, so...

    000: leave as is. Store it as a single zero bit
    001: add error in bit 3 turns it into 000
    010: add error in bit 2 turns it into 000
    011: add error in bit 1 turns it into 111
    100: add error in bit 1 turns it into 000
    101: add error in bit 2 turns it into 111
    110: add error in bit 3 turns it into 111
    111: leave as it. Store it as a single one bit.

    They are using some pretty hairy math for their list of strings that compress the best. The problem is that there is no easy way to find the string almost the same as your data that just happens to be really compressable. That is why they are having "temporal" problems for anything except short test cases.

    Basicly it means they *might* have a breakthrough for audio/video, but it's useless for executables etc.

    -

    --
    - - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
  97. I have an algorithm for compressing random data by SIGFPE · · Score: 2
    For example I can compress the first million numbers generated by rand() into a few bytes.
    A similar technique works with the output of drand48() and in fact for a long enough sequence this approach works with every random number generator algorithm available today.


    In fact here's the compressed file for the rand() case:


    int i;for (i = 0; i<1000000; ++i) printf("%d\n",rand());


    Use gcc as decompressor.

    --
    -- SIGFPE
  98. This thing just screams "scam" by Animats · · Score: 2
    • Big claims, no demo, no papers, and it doesn't work yet.
    • It's headquartered in West Palm Beach, Florida. Unclear why, but Southern Florida has been a major scam center for decades.
    • They're trying to get people to invest, publicly advertising for "accredited investors". It's not usually done that way. If they went to a VC for funding, the technology would get looked at, hard. (If it worked, getting VC funding for this would be easy.) If they went for an IPO, they'd have to file disclosures with the SEC under penalty of perjury.
    • They claim: "All of these traditional methods are being enhanced by ZeoSync through collaboration with top experts from Harvard University, MIT, University of California at Berkley, Stanford University, University of Florida, University of Michigan, Florida Atlantic University, Warsaw Polytechnic, Moscow State University and Nankin and Peking Universities in China, Johannes Kepler University in Lintz Austria, and the University of Arkansas, among others." Yeah, right. Let's see some names.
    • The Flash animation on the web site appears to be constructed entirely with stock photography. There's no useful information in the images. (Maybe that's their approach to compression.)

    Scroll down to Incredible Claims for descriptions of the last four scams like this. Remember Pixelon?

  99. Simple, it can't be by nusuth · · Score: 5, Insightful
    I have been pretty late to this thread, and I'm sorry if this is redundant. I just can't read all 700 posts.

    1:100 average compression on all data is just impossible. And I don't mean "improbable" or "I don't belive that", it is impossible. The reason is pigeon hole principle, for simplicity assume that we are talking about 1000bit files, although you can compress some of these 1000bit files to just 10bits, you cannot possibly compress all of them to 10bits, as with 10 bits is just 1024 different configurations while 1000bits call for representations of 2 different configurations. If you can compress the first 1024, there is simply no room to represent remaining 2-1024 files.

    ...And that is assuming the compression header takes no space at all...

    So every loseless compression algorithm that can represent some files with other files less than original in length must expand some other files. Higher compression on some files means number of files that do not compress at all is also greater. Average compression rate other than 1 is only achiveable if there is some redundancy in original encoding. I guess you can call that redundancy "a pattern." Rar, zip, gzip etc. all achieve less than 1 compressed/original length on average because there is redundancy in originals : programs that have some instructions, prefixes with common occurance, pictures that are represented with full dword although they use a few thousand colors, sound files almost devoid of very low and very high numbers because of recording conditions etc. No compression algorithm can achive less than 1 ratio averaged over all possible strings. It is a simple consequence of pigeon hole principle and cannot be tricked.

    --

    Gentlemen, you can't fight in here, this is the War Room!

  100. A BRILLIANT business move by ZeoSoft! by Rayonic · · Score: 2, Insightful

    Bear with me for a moment. This kind of 'compression technology' is EXACTLY the kind of thing the MPAA has been dreading. Imagine millions of people on Morpheus trading 5MB copies of The Matrix, Star Wars and everything else. Of course it's a hoax, but if they can keep it up long enough, then maybe they'll get bought out by the MPAA, RIAA, or whoever!

    ZeoSoft is ushering in the business model of the new millenium - fooling the tech-illiterate elite of today's content cartels into buying them out, then laughing all the way to the bank! I applaud ZeoSoft for their initiative, and hope to see other such business ventures in the future.

    Now, if you'll excuse me, I'm off to develop a program that uses fractal-temporal equations to randomly generate sequels to popular movies! (hint, hint)

  101. Re:I think their investment model requires pigeons by SIGFPE · · Score: 2
    If you can step back to N-dimensions and see patterns you can exploit then it wasn't random data in the first place.


    There are 2^N bit sequences of length N. There are 2^M sequences of length M. If M<N then 2^M<2^N. So you can't represent all sequences of length 2^N using sequences of length 2^M. You can't even represent most sequences of length N using sequences of length M. It doesn't matter if you can visualise infinite dimensional spaces with pretty purple knobs on. You can't have an algorithm that packs most sequences of length N into M bits.

    --
    -- SIGFPE
  102. Confirmed with my Polish speaking coworkers by Ewann · · Score: 3, Informative

    We have three native Polish speakers in my office. I asked one of them to translate the professor's reply. She said the gist of it is that he was upset they released his name, he didn't authorize any information release, etc. Apparently didn't deny or confirm the truth of the information but said something about having "more important things in my career" or something like that (not verbatim quote).

    1. Re:Confirmed with my Polish speaking coworkers by King+Babar · · Score: 2
      We have three native Polish speakers in my office. I asked one of them to translate the professor's reply. She said the gist of it is that he was upset they released his name, he didn't authorize any information release, etc.

      Wow; that's what it felt like to me. I feel my "random people who returned email queries" now has some support from native speakers.

      Now this has to be the beauty of Usenet; working from isolated keywords and the power of google (tm), you could follow what appears to be a scam from press announcement to debunking in a couple of hours, despite the fact that the smoking gun was in a Polish math specialty news group and had to be translated by a third party...

      Someday, this kind of thing will save people some real money. :-)

      --

      Babar

  103. Would this work? by Docrates · · Score: 2

    I know I'm posting late, but I hope someone reads this and comments.

    I've had this recurring thought in my head regarding compression that I haven't been able to prove/disprove.

    Disclaimer: I know absolutely nothing about compression other than what commons sense tells me.

    Now for my theory: Is it possible to make an analysis of a whole lot of data from a whole lot of sources for certain period of time. Let's say I log every single bit of data that comes and goes from, say, AOL's network. I then run an analysis of the data and come up with, say, the 5 million most used 8-byte strings. You probably want to play with the string sizes and number of strings to see what makes mathematical sense. You then keep a copy of the 40MB indexed string dbase on every internet node, or at each end of a slow link, or whatever, and then run all incoming and outgoing data through a program that trnaslates index references with actual data.

    Would that work? since a 5 million entry index requires a 3 byte key to acces an 8 byte string, would I get a 3/8 lossless compression on top of whatever's in place right now, whenever I hit an indexed string?

    --

    There are two kinds of people in the world: Those with good memory.
    1. Re:Would this work? by recursiv · · Score: 2

      The problem is that this scheme would have to differentiate between the indices and the rest of the raw data. So, let's say, for each "block", you either have a 3 byte index or a 8 bytes of raw data. But you also need at least one bit of header information do determine what it is. I think you would lose most if not all of your gains on this one bit.

      --
      I used to bulls-eye womp-rats in my pants
  104. Re:Unbreakable encryption by chipuni · · Score: 2

    Truly unbreakable encryption has existed for many years: the one-time pad . The problems of unbreakable encryption aren't the theory, but the practice. (If you want truly secure communications among n people who each transmit x bytes of data through the group each day, how will you securely generate n*(n-1)*x bytes of random data each day, and securely distribute it to each of them?)

    --
    Never play leapfrog with a unicorn. Or a juggernaut.
  105. infinite amplifier by Proud+Geek · · Score: 2

    I have an infinite amplifier; I can sell it to you now. It has infinite gain, and infinite input impedence. Unfortunately, it has to rely on real power supplies, since I do not have an ideal power supply. Funny thing is, it always outputs the rail voltage.

    --

    Even Slashdot wants to hide some things

  106. Pigeonhole Principle by sulli · · Score: 2

    Maybe it's another implementation of RFC1149?

    --

    sulli
    RTFJ.
  107. Re:Random data is fake data by LarsG · · Score: 2

    All I need is the random seed and the random number generation forumla....

    A string of 'random' data that can be generated by a seed and an algorithm is pseudo random. For certain applications pseudo random is good enough, and it is used all over the place - from picking the next block in a tetris game to generating cipher streams.

    Truly random data is an entirely different beast.

    --
    If J.K.R wrote Windows: Puteulanus fenestra mortalis!
  108. Re:Not possible - read article carefully! by MobiusKlein · · Score: 2, Interesting

    If you read the Reuters article carefully, it does not say a digital -> digital compression of 1:100, but implies a better way of encoding / compressing digital -> analog -> digital, with the analog bandwidth being much greater than today.

    Thats all the stuff where they talk about Dr. Claude Shannon and information theory. (They could have been clearer about it, but that's PR flacks for you.)

    examine the quote
    '"What we've developed is a new plateau in communications theory," St. George said. "We are expecting to produce the enormous capacity of analog signaling, with the benefit of the noise-free integrity of digital communications."'

    Sounds like they are trying to shove more data into an analog stream, using wacky math, than would normally be allowed.

    rbb

  109. The claim is FALSE! by rew · · Score: 2

    Without reading their website, the claim MUST BE FALSE.

    The proof is simple.

    Suppose we have a 100 bit message. There are 2^100 different messages. Suppose you can compress them on average to 98 bits. Then there can only be 2^98 compressed messages. We lost a couple along the way!

    This proves that if you compress SOME messages you will also have to make SOME longer. Not by much, but at least a little. (prepend 1 if "not compressable" prepend zero to the "compressed data stream" and you have a "worst case expansion" of "one bit")

    Now compressing normal data is easy. There are a lot of repeats, and other redundancy. So the normal case is that you can compress them. The bad news is that if you enumerate ALL 100-bit messages, ALL compression methods are going to need on average 100 bits or more. This is pure mathematics.

    The 2^100 number is a number that is quite large, but if you start talking about compressing a megabyte of data, then I'm already talking about enumerating all 2^8000000 possible messages. That is a thought experiment. But the argument still holds.

    -----

    I read their pressrelease. It's buzzword compliant bovine excrement. They will attract money and pay the existing people large salaries as long as they
    can keep up the charade.

    Oh, and they have placed a tactical "practically" in front of the word "random". I can compress "practically random data" by enormous amounts.

    If you take the MD5 hash of the string "hi there", and feed that back into the MD5 function, you can generate an endless stream of "practically random" data. Take the first 1Mb of this "practically random" data.

    I compressed 1Mbyte of data into the 212 bytes of the previous paragraph! However this is not possible if I let someone else generate the random data any way he pleases, and then have to compress it. They can claim to be "technically correct" up to a point due to this phenomenon....

    Roger.

  110. How fractal transform image coding works by yerricde · · Score: 2

    Sounds like fractal compression to me.

    The fractal transform that Barnsley's products use is merely vector quantization, mapping each 8x8 pixel block of an image onto a 4x4 pixel block of a reduced version of itself, plus an RGB offset for DC. It begins to converge to the desired image after a few iterations of the transform.

    --
    Will I retire or break 10K?
  111. Re:I think their investment model requires pigeons by softsign · · Score: 2
    If you can step back to N-dimensions and see patterns you can exploit then it wasn't random data in the first place.

    Exactly! But until you make that connection, it may as well be random!

    There are 2^N bit sequences of length N. There are 2^M sequences of length M. If M<N then 2^M<2^N. So you can't represent all sequences of length 2^N using sequences of length 2^M. You can't even represent most sequences of length N using sequences of length M. It doesn't matter if you can visualise infinite dimensional spaces with pretty purple knobs on. You can't have an algorithm that packs most sequences of length N into M bits.

    I'll assume by "sequences" you mean "random sequences" because otherwise you are saying that lossless compression is impossible. =) I agree with you otherwise, given one crippling constraint: that you can't observe your data except as one-dimensional binary numbers.

    After re-reading this press release a few times, I don't think these people have really accomplished much. Bear with me and I'll flesh out the point I'm trying to make - if someone could find a way to do this, I think it could work.

    With PURE random data, this won't work. Why anyone would want to transmit gigabytes worth of pure random data is beyond me. A signal worth compressing isn't going to be purely random. It may look like it, but there is some information there. This is why signal processing people use random processes to model signals. Not because their signals are completely random - but because - given enough samples - they look like specific random processes (Gaussian, Rayleigh, Rician...).

    Now, the technique I'm thinking of would do something along the lines of take a pseudo-random process and map it to an n-dimensional space. An algorithm then searches this space for (even just) simple patterns. Suppose it finds ten equally spaced points along a "line" in 12-dimensional space. That's 120 bytes that can be reduced to a significantly smaller vector (plus an offset to aid reconstruction in the right place), no?

    I don't know... would this work? I think so. Would it be feasible given existing computing power? I'm not so sure...

  112. perpetual motion of the information age by markj02 · · Score: 2

    These kinds of compression claims are the perpetual motion machine of the information age. Actually, they are less plausible than perpetual motion. For perpetual motion, there is at least the (very remote) possibility that there is some kind of undiscovered physics. Impossibility statements in compression only hinge on mathematics, with no physics or experiments needed.

  113. you CAN get a digit in pi w/o computing all priors by slew · · Score: 2

    It's called PSLQ lattice reduction...

    You can get the details here...

    http://www.mathsoft.com/asolve/plouffe/plouffe.h tm l

    http://www.lacim.uqam.ca/plouffe/Simon/articlepi .h tml

    Note: this goes quite a long ways to showing that conventional wisdom about pi being random digits isn't actually true... Pseudo random is more like it...

    However, it isn't really applicable to this multidimensional compression nonsense since the counting argument still applies.

    Suspiciously, this looks to be similar to what the fractal folks were pushing in the '80s if you replace gems with iterators... Every once in a while you have to change the color of your snake oil label to confuse the masses...

    -slew

  114. Gonna take a stab at this by realdpk · · Score: 2

    ...and bet that they meant "arbitrary data" rather than "random data". After all, who would want to compress random data? What possible benefit could there be to such a thing?

  115. Re:Here's an algorithm by recursiv · · Score: 2

    This is a common idea, and it might seem like it would work. However this idea still fails to take into account the counting argument. For example, if the seed is limited to 64 bits, this
    algorithm can generate at most 2^64 different files, and thus is unable to compress *all* files longer than 8 bytes

    --
    I used to bulls-eye womp-rats in my pants
  116. Cold Fusion Analogy is Wrong by billstewart · · Score: 2
    Pons and Fleischman, as near as I can tell, believed they had some interesting physics going on, though they were mistaken about quite what, and jumped to publication way prematurely. (As somebody said about their work "If it's not real, they've still invented the world's most interesting battery".)


    This is more like Usenet Crank Robert E. McElwaine who published lots of articles with his (capital-preserving) tagline "UN-altered REPRODUCTION and DISSEMINATION of this IMPORTANT Information is ENCOURAGED."


    And that may be giving them more credit than they deserve - it looks like a compression algorithm designed for use on digital wallets....

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  117. Re:Pseudo-scientific mumbo-jumbo detected by dillon_rinker · · Score: 2

    blah blah blah DISSERTATION blah blah blah

    This word in all caps is used in doctoral programs at all universities I know of. Give me my PhD, right?

  118. Re:I think their investment model requires pigeons by SIGFPE · · Score: 2
    No. I mean sequences. My argument makes no mention of the word random and that's no mistake (unless I've made a typo somewhere).


    A signal worth compressing isn't going to be purely random
    Absolutely, and that's why lossless compression works in practice.


    But the thing you're describing is bogus. Look, take random data and look at it in a complicated enough way then you're sure to find patterns that can be compressed out. But you'll find that you'll also have to describe the complexity of your way of looking at it and that'll take up the same amount of space as you've just compressed out. That press release is 100% bogus. It's not even slightly real. Have you seen how many universities they claim to collaborate with? It's merely a scam to make money out of venture capitalists.


    The way you speak, eg. putting scare quotes about the word "line", suggests that you're not comfortable dealing with multi-dimensional spaces. The SF connotations suggest something cool and esoteric to get venture capital cash. Those of us who actually work with these things every day know there's no reason to see compressible patterns if you start embedding things in high diemnsional spaces. People who do things like wavelet and DCT compression techniques quite happily represent data for compression in very high dimensional spaces. But there's no magic and certainly no way to to things that are provably false.


    Would it be feasible given existing computing power?

    It wouldn't be feasible with any computing power.

    --
    -- SIGFPE
  119. Other breakthroughs announced... by volpe · · Score: 2

    In other news, the company which managed to remove redundancy from pure entropy also managed breaking the absolute-zero barrier. It was previously thought that you couldn't make something colder if it already had zero heat in it. But apparently this is not the case, according to ZeoSync.

  120. "at best"? by Antaeus+Feldspar · · Score: 2


    I believe that Deborah Tannen pointed this up as a key problem in our society, as the fallacy of "false duality", the notion that because there are two differing points of view that they are both worthy of attention.



    You say that "at best, this is revolutionary" but this is like saying "I have a great plan! Everyone takes off their shoes, switches them around, and somehow everyone winds up with a bigger pair! *At best*, everyone gets bigger shoes!" Well, no, just because someone's floated the fantasy doesn't mean it's even a vague possibility. These people are selling snake oil; it can be proved at home. To entertain their fradulent notions simply because they bring them up is a mistake.

    --
    If people are to respect the law, perhaps the law should begin by respecting the people.
  121. And further still: by Myself · · Score: 2

    Then you could take the output files from this compression scheme, which would be pretty uncompressable by traditional methods, and run THEM through the very same compression scheme, and make them smaller still. Repeat ad infinitum, and reduce all the data in the universe to one small file.

    Better yet: To use your 10 bits example, feed every one of the 1024 combinations into the decompression program, and one of them is guaranteed to represent all the data in the universe. That's only a handful of combinations, we should be able to check them all before dinner. When someone decompresses the right 10-bit code, call me, since my phone number must be in the data somewhere.

  122. Other uses of the terms "mark" and "pigeon" by billstewart · · Score: 2
    Those pigeons aren't in the Eighth Dimension - they're somewhere over New Jersey


    There is a way to make compression like this work - for each string you want to compress, there's a compression program that losslessly compresses it to an arbitrarily short output string (one bit is fine...), but if the output string is N bits long, the program only works for 2**N input strings, and in general requires SIZE(INPUT) bits of program per input string (though for non-random strings, or for related strings, you can do better.) In other words, it's not useful for general-purpose compression, but you can use it for special-purpose compression - you can't design a small compression program to perfectly describe "Alice"'s or "Bob"'s appearance, but you can design a small program that outputs "Alice", "Bob", or "Somebody else".


    Similarly, with pigeons, you can play Hundred-Pigeon Monte, and attract investors to your company, or use this to attract customers for your other products, or have a big crowd on the street intently watching you play hundred-pigeon monte with your shill while a pickpocket walks around behind the crowd.

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  123. Re:Do not under-estimate complexity by markmoss · · Score: 2

    1.500.000.000 basepairs [are all the DNA coding for protein in the human genome]. So a 1.5 Gb file is enough to encode an entire human being.

    The protein-coding chromosomal DNA is very, very far from encoding an entire human being. You've also got the DNA that controls which proteins are expressed (some unknown portion of that other 95% of the chromosomes), mitochondrial DNA, environmental effects during your whole life, and most of all some billions of neurons, each with up to a hundred semi-randomly connections to other neurons. No one yet has come anywhere near to giving a computer the equivalent of the life experience stored in your neurons. (Or at least my neurons -- some people never learn...)

  124. Random vs. Pseudorandom by i_am_nitrogen · · Score: 2
    ...they've developed a way to step WAY back...
    If this is the case, and this isn't a hoax (neither of which I believe to be true), then the data (as stated by other posters) is not random. No algorithm could fare well with truly random data because there really is no pattern. That said, there is no such thing as truly random data, and even /dev/urandom uses a mathematical algorithm to generate data with very little (or at least a very hard to find) pattern, but a pattern still exists. So, if the compression system figured out that algorithm, all it would need to determine is what value to seed the algorithm with, and the entire sequence can be regenerated flawlessly. It's like doing srand(36); and printing out a sequence of numbers. You'll get the same sequence every time. Basically, if their compression system knows what my house looks like, and how my video camera works, then they could take a video of my house and compress it down to a very small size since they just have to recreate my house mathemagically.

    Even more importantly, however, is that their "Technical Information" reeks so strongly of buzzwords and technobabble it's hard to read it without the urge to hold my nose. This alone discredits their entire proposition. I feel like I've just been subjected to corporate brainwashing ..er.. I mean marketing.

  125. Infinity:1 by gnovos · · Score: 2

    It is possible to create "Infinite" compression, but it works like the laws of quantum mechanics, i.e. you never really get what you want. Here, I'll perform an expierement:

    o I have a 1 byte file I want to send you.
    o We start by synching our wrist-watches.
    o I call you on the telephone and say "Start" and hang up.
    o You and I start counting off the seconds.
    o When the number of seconds have passed that are equal to the value of the byte, I call you back and say "Stop".

    Now you have the value of the byte given to you in two bits of information (the "start" and "stop") bits.

    Now we have an 8:2 ratio, which isn't bad. But I can do this again with a two byte file and get 8:1. I can send you ANY length of file and only consume two bytes of bandwidth... but at a terrible cost: time. Lots and lots of time.

    But if you had something like a super far away satalite where bandwith is hard to come by and time is not in short supply, it would be the answer.

    --
    "Your superior intellect is no match for our puny weapons!"
  126. Re:I think their investment model requires pigeons by Kythorn · · Score: 2, Informative

    This may not appear immediately relevant, but bear with me.

    I'm not agreeing or disagreeing with ZeoSync's claims, but if you can impose a semblance of order on something that only appears chaotic, you can do some pretty cool stuff.

    Take for example this little demo at this website in germany. (I realize what the domain looks like, there's nothing for sale or license, trust me). The actual download link is about halfway down the page.

    This isn't "compression" in the conventional sense, but they still manage to contain a demo that contains hundreds of megs of textures and samples, in addition to the engine itself in *64kb*. Now thats a hell of a ratio.

    They do this not by storing the raw data, but instead storing the instructions needed to reconstruct the data as it is needed.

    Granted, I realize that they only accomplished this with their own data, but I don't think taking this a step further to an arbitrary set of textures and sounds is impossible. Granted, this idea won't work for all types of data, and also can not be considered "lossless", (hell, it's not even strictly compression) but I still think it's incredible that you can get this high quality results out of something this small.

    (Disclaimer: The above link is to a demo that requires directx 8.1 and I sincerely doubt will run under wine. It also doesn't work with every video card out there. I've scanned the binary, and it doesn't appear to have any viruses or trojans, but I won't guarantee it. If you can't accept the risk, don't download the binary.)

  127. Wondering by loraksus · · Score: 2

    I'm not a math person by any means (still doing college algebra, which pretty much means everybody has a better understanding of math than I do), and I would appreciate people picking this apart.

    So, my idea for a "Kick Ass Compression"

    Take a block of data - throw it against an algorithm that outputs a specific value ( I'm thinking of CRC, MD5 hash or what not), do that several times against several different algorithm which generate a similar kind of value. Record the two (or more) values, then encapsulate the small block of data into larger blocks - I'm thinking only 3 or 4 levels of encapsulation would be needed (because if you calculated the crc of the entire file, a program could decide which choice (in decoding a "block" if there are multiple ones, which I'm fairly sure there will be) is correct.

    Now people use md5 hashes/crc checks to verify whether the file they downloaded hasn't been modified, so I'm assuming that it is fairly difficult to get the exact value (especially with a known size). Using this "property" (I'm not sure if that is a correct word) you could decode the data into one of several (hundred??thousand??) byte streams (possibilities of uncompressed data) and by comparing byte streams between algorithm A and B, the byte streams would match at one (would it be possible to have more? I suppose it depends on the algorithms used) point, which would be the proper "uncompressed" (rather derived or something) data.

    I'm pretty sure it would take a shitload of computing power in decompressing, but computers are fairly fast nowadays, and I think that this could be a viable at some point. 100:1 probably not, and there would be a lower limit imposed on the file size based on the possible choices (I think the possible choices would reach a pretty large number pretty fast)

    Maybe I'm just plain wrong - but could something like this be useable? Any abuse would be appreciated :)
    Thanks!

    --
    1q2w3e4r5t6y7u8i9o0pqawsedrftgthyjukilo;p'azsxdcfv gbhnjmk,l.;/
  128. All ready has a patent? by Rubbersoul · · Score: 2, Interesting

    This may have already been posted, and if it has sorry, but I thought this may be of interest to some of you.

    Jean-loup Gailly (one of the creators of gzip) has written an article on a patent that was granted for compression of truly random data, and how it is not mathematically possible. You can read it here for those that are interested.

    --
    man .sig
    No manual entry for .sig.
  129. Isn't it a requirement of every mathematical site by JoeGee · · Score: 2

    ... to have catchy theme music, and pretty flash intros? That's how *I* can tell they doing something real in the academic community. :)

    If their technology is so earthshatteringly different and revolutionary but can use existing connections, why didn't their site download instantly? If it's only software and they already have a patent one would think the easiest route to gain investors would be a small download and a mindblowing demo away ...

    --

    Get off my virtual lawn, you damned virtual kids!
  130. HIJKLMNO by flufffy · · Score: 2

    can be compressed to water ;)

  131. TM's as indicator of crap. by augustz · · Score: 2

    The real breakthrough is the new discovery that the number of TMs and words capitalized in TheMiddle == the amount of money these folks will dupe from some silly investors.

  132. Re:Unbreakable encryption by color+of+static · · Score: 2

    I'm quite aware of that page. I work about 20' from the author :-).
    I'd argue that there is no effective commercial one-time pad, only products that approach it. There have been a number of companies releasing similiar press releases about OTPs for some time, but each time the generation method has resulted in it not being an OTP. Most of the time it has also been substantially worse then most existing algorithms.

  133. The current ratio, by mindstrm · · Score: 2

    The current BEST ratio for compressing truly random data is 1:1
    In other words, you can't do it.

    If you TRY, some compressions software will end up making it bigger.

    These guys are claiming 100:1 lossless on truly random data. This is difficult to believe on both fronts.
    First, 100:1 lossless on any real-life data is unlikely. Add in the 'truly random' part...

    So.. either they've violated the laws of the universe, or they are about to bring about one of the biggest mathematical discoveries in the world, or they are full of crap.

  134. See pigeonhole example. by mindstrm · · Score: 2

    You can't compress every set of 1000 bits of data into 10 bits of data.

    10 bits of data only allows for 1024 combinations.
    1000 bits allows for a lot more.. so it's simply not possible.

  135. "Practically random" (was Re:The current ratio,) by isdnip · · Score: 2

    They have funny wording in their release about data that is practically random. Well, that can be parsed to mean that in practice, the data is random and therefore it can be replaced by any other random string. After all, it's random! Not mathematically random in the entopy sense, but used by an application which wants any old string of random numbers. So sure, I can send a message saying, "generate me 1000 random digits". Great compression. Useless in practice, of course. In any case, these guys sound like a get-rich-quick scheme, trying to fool people, and not the only one of that type I can think of.

  136. Re:Random data is fake data by Stephen+Samuel · · Score: 2
    If they had an IPO tomorrow I'd buy, but I'd sell the next day.

    This is, of course, exactly what they WANT you to do. They only get money from the original sale of the stock. I'm presuming that this is a fly-by-night operation, so they're not going to care when (not if) their stock tanks. They've already got their money wired to a bank in the Bahamas. The person who will get hurt is the poor sod who doesn't understand that their claims are pure baloney.

    --
    Free Software: Like love, it grows best when given away.
  137. Re:Hash Maybe? by David+E.+Smith · · Score: 2

    No, I don't think hash is involved. Maybe LSD, but no hash.

  138. Re:Do you realize the implications of this? by SIGFPE · · Score: 2

    You could have every creation ever created on a 10 gig drive with ease.

    Er...you're not thinking hard enough. You could compress that 10 gig drive to 1 byte. In fact, here it is: X. That 'X' contains all the best warez ever written. Unfortunately I'm keeping the decompressor for myself.
    --
    -- SIGFPE
  139. Re:Do not under-estimate complexity by Negadecimal · · Score: 2

    So a 1.5 Gb file is enough to encode an entire human being.

    Nope. 3M basepairs, four possible bases per pair. Takes two bits to describe four possible states, and so the unannotated sequence requires 6 billion bits of storage -> 750 billion bytes -> 715.2MB.

    And genomic sequences generally aren't very random.... telomeric sequences, satellite DNA, common promoters, copied genes -- all of them can be easily abstracted and compressed out.

    I'd expect that even with mapping annotations, the whole shebang would easily fit on a CD-ROM.

  140. JAR compression. by Shanep · · Score: 2

    JAR, from the maker of ARJ, is substantially better than ZIP and RAR as far as compression goes and substantially slower also.

    Interesting thing I remember with JAR in DOS, is that the more memory you have to assign to the compression, the better the compression.

    http://www.arjsoft.com/jar.htm

    --
    War crimes, torture, lies, illegal spying... Would someone give Bush a blowjob, already, so he can be impeached?