Slashdot Mirror


ZeoSync Makes Claim of Compression Breakthrough

dsb42 writes: "Reuters is reporting that ZeoSync has announced a breakthrough in data compression that allows for 100:1 lossless compression of random data. If this is true, our bandwidth problems just got a lot smaller (or our streaming video just became a lot clearer)..." This story has been submitted many times due to the astounding claims - Zeosync explicitly claims that they've superseded Claude Shannon's work. The "technical description" from their website is less than impressive. I think the odds of this being true are slim to none, but here you go, math majors and EE's - something to liven up your drab dull existence today. Update: 01/08 13:18 GMT by M : I should include a link to their press release.

989 comments

  1. 100:1 I dont think so by ryanh50 · · Score: 0

    while it sounds nice it's inplausable. That amount of data cannot be compressed beyond each unique charachter + compression data. Unless long repeating strings appear. Of course then it would not be random. P.S. FIRST POST WOOHOOO!!!

    1. Re:100:1 I dont think so by posmon · · Score: 1
      even in random data, repeating strings do occur.

      and why the fuck does slashcode care that it only took me 6 seconds to type that and click next? was i too slow?

      --

      update comments set karma=-1, reason='offtopic' where sid=26315

    2. Re:100:1 I dont think so by telstar · · Score: 1

      You're right ... the press release states it could "approach the hundreds-to-one range." Many hundreds to one.

    3. Re:100:1 I dont think so by stilwebm · · Score: 1

      Repeating strings do occur, but he said "long repeating strings". In perfectly random 8 bit text, the chance of 3 of the same character appearing consecutively is 1/(256^3) or 1 in 16777216. That isn't going to help much towards 1:100 compression.

    4. Re:100:1 I dont think so by posmon · · Score: 1

      it doesn't have to be the same character consecutively. or look at it another way, if you convert ABC to ascii binary and look at the repeated bits. but still, i agree that it's not much help, seeing as 1:100 lossless compression is a pipedream, a fraud, a downright LIE and will never happen.

      --

      update comments set karma=-1, reason='offtopic' where sid=26315

    5. Re:100:1 I dont think so by posmon · · Score: 1

      click 'next'? ouch. can anybody else tell that i've been doing office upgrades all day?

      --

      update comments set karma=-1, reason='offtopic' where sid=26315

    6. Re:100:1 I dont think so by ergo98 · · Score: 1, Offtopic

      Totally unrelated, and I imply no correlation and similarity in facts, however I came across this old story which is always an interesting read: http://www.thestandard.com/article/0,1902,16368,00 .html?body_page=11. Again it's totally unrelated to the ventures of these data compression people, but just is an interesting read for Slashdot folks.

    7. Re:100:1 I dont think so by matrix29 · · Score: 1

      while it sounds nice it's implausible. That amount of data cannot be compressed beyond each unique character + compression data. Unless long repeating strings appear. Of course then it would not be random. P.S. FIRST POST WOOHOOO!!!

      There is a bell curve of potential data sets to be compressed. The past few decades have concentrated on the ordered sets. There is still a wealth of highly disordered sets to tackle which are still highly compressible. The difficulty comes with averages when an ordered data set becomes higher in entropy it is only reaching an average level of disorder. A highly chaotic set also has the limitations of becoming stuck at the average level of entropy when compressed.

      If you look at the highly chaotic, but precisely ordered, sets of Pi, e, the Sine/Cosine/Tangent functions, the logarithms, the square/cubic/etc.. roots of prime numbers and non integers there is a huge level of precise order which able to be tapped to overcome Shannon limitations. Those set modifiers require more calculations and search times though. Understanding this is understanding the true maximal limitations of pattern-based compression (though there are better methods always dependent of the set).

      There is even a N-Cell Life program that overcomes the base brute-force calculation time limitations. To those who think there is nothing new in math have simply cared not to learn much about math lately.

      --
      "Face it, a nation that maintains a 72% approval rating on George W. Bush is a nation with a very loose grip on reality.
    8. Re:100:1 I dont think so by cakoose · · Score: 1
      In perfectly random 8 bit text, the chance of 3 of the same character appearing consecutively is ... 1 in 16777216

      That's the chance of a specific character appearing three times in 8-bit text that is 3 characters in length.

      In 8-bit text of length N characters (where N is at least 3), expected number of runs of C repeating characters is something like:

      n-2 / 255^(c-1)

      For C = 3, a run of 3 consecutive digits is expected approximately every 65k characters (which would be around 1 in 65k for every character). However, even current compression methods don't rely on repetition of specific characters. When certain computable patterns are taken into account, this ration should go up.

    9. Re:100:1 I dont think so by theunjake · · Score: 1

      Did some checking and the offical website for this breakthrough company is hosted off some guys server www.hazydavy.com. Now if they can't even pay for their own server if they existed then how could they fund the research to make such a giant leap in the compression barrier.

    10. Re:100:1 I dont think so by medscaper · · Score: 1

      It's absolutely possible. Anything can be compressed and recompressed and recompressed.

      The problem is not in their claim of compression.

      Their problem will be when the claim they can uncompress. Then, I'll call bullshit.

      --
      Any sufficiently well-organized Government is indistinguishable from bullshit.
    11. Re:100:1 I dont think so by ebling555 · · Score: 1

      *yawn*

      Disorder and compression are irrevocably mutually exclusive.

      Disorder is unpredictability. Compression is prediction.

      It is impossible to express something more simply unless you have some idea of what to expect.

    12. Re:100:1 I dont think so by -brazil- · · Score: 1
      There is even a N-Cell Life program that overcomes the base brute-force calculation time limitations.


      Correct me if I'm wrong, but AFAIK programs like this don't actually "overcome" anything: in theory, the program replicates itself and thus "creates" steadily increasing computation power. In practice, the program and its copies all run on a limited computer, so it's completely worthless unless you create programs that physically replicate hardware, and even then you're limited to cubic growth by our physical space - rather useless when the problem is of an exponential nature.

      --

      The illegal we do immediately. The unconstitutional takes a little longer.
      --Henry Kissinger

  2. Current ratio? by L-Wave · · Score: 2, Interesting

    Exscuse my lack of compression knowledge, but whats the current ratio? Im assuming 100:1 is pretty damn good. =) btw...even though this *might* be a good compression algorithm and all that, how long would it take to decompress a file using your joe average computer??

    --
    I SURVIVED THE GREAT SLASHDOT BLACKOUT OF 2002!
    1. Re:Current ratio? by MrFredBloggs · · Score: 0, Offtopic

      "how long would it take"

      Who cares? If you can download a whole CD`s worth of music as a 7 meg file, who cares if it takes 1,2,5,10,20,100 mins to unpack it?

    2. Re:Current ratio? by skroz · · Score: 2

      Umm... I care. If your compression/decompression time exceeds the amount of time it would take to transfer the file uncompressed, you're really not gaining anything.

      The mathematical implications alone of such a breakthrough would be impressive. 100:1 compression of truly random data? Wow.

      --
      -- Minds are like parachutes... they work best when open.
    3. Re:Current ratio? by Sobrique · · Score: 2, Insightful

      Of course, given that cpu speed increases faster than bandwidth, even if it is an issue now, it won't be in a year.

    4. Re:Current ratio? by cide1 · · Score: 1

      About 1:2, sometimes more, often times less. It depends on the nature of the material being encoded, and the algorithm being used.

      --
      -- the computer doesn't want any beer, no matter how much you think it does. NEVER, EVER feed your computer beer.
    5. Re:Current ratio? by CaseyB · · Score: 3, Informative
      but whats the current ratio?

      For truly random data? 1:1 at the absolute best.

    6. Re:Current ratio? by radish · · Score: 5, Informative


      For lossless (e.g. zip, not jpg, mpg, divx, mp3 etc etc) you are looking at about 2:1 for 8-bit random, much better (50:1?) for ascii text (e.g. 7-bit non-random).

      If you're willing to accept loss, then the sky's the limit, mp3 @ 128kbps is about 12:1 compared to a 44k 16bit wave.

      --

      ---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"

    7. Re:Current ratio? by cide1 · · Score: 1

      2:1 is what I meant, 1:2 would be increasing the size of the data.

      --
      -- the computer doesn't want any beer, no matter how much you think it does. NEVER, EVER feed your computer beer.
    8. Re:Current ratio? by dannyspanner · · Score: 1

      Data compression is based on finding redundant data, i.e. repetition, that can be factored out. With random data this tends toward 1:1 as your increase the size of the data. Sorry, but I haven't got time to point you in the right direction, just hit Google with "random data compression theory" or the like.

      It all boils down to the fact that you just plain can't compress random data in the general case.

    9. Re:Current ratio? by CaseyB · · Score: 2, Redundant

      That's not right. A 1:1 average for a large sample of random data is the best you can ever do. On a case by case basis, you can get lucky and do better, but no algorithm can compress arbitrary random data at better than 1:1 in the long run.

    10. Re:Current ratio? by Anonymous Coward · · Score: 0

      But using ZeoSyncs new random data transmogrifier it is indeed possible to alter the CS laws we've come to live by! 100 megs compressed into a single byte! All the works of mankind compressed multiple times down into 1 byte. amazing. These guys deserve a patent.

    11. Re:Current ratio? by MrFredBloggs · · Score: 1

      You wouldnt care though, if you are not interested in transferring the file, but just storing it.

    12. Re:Current ratio? by mirko · · Score: 2

      There will still be a need for cheap quick disk and memory as the cpu will have to deal a lot with these. I think 5 years is more optimistic.

      --
      Trolling using another account since 2005.
    13. Re:Current ratio? by markmoss · · Score: 5, Informative

      whats the current ratio? I would take the *zip algorithms as a standard. (I've seen commercial backup software that takes twice as long to compress the data as Winzip but leaves it 1/3 larger.) Zip will compress text files (ASCII such as source code, not MS Word) at least 50% (2:1) if the files are long enough for the most efficient algorithms to work. Some highly repetitive text formats will compress by over 90% (10:1). Executable code compresses by 30 to 50%. AutoCAD .DWG (vector graphics, binary format) compresses around 30%. Back when it was practical to use PKzip to compress my whole hard drive for backup, I expected about 50% average compression. This was before I had much bit-mapped graphics on it.

      Bit-mapped graphic files (BMP) vary widely in compressibility depending on the complexity of the graphics, and whether you are willing to lose more-or-less invisible details. A BMP of black text on white paper is likely to zip (losslessly) by close to 100:1 -- and fax machines perform a very simple compression algorithm (sending white*number of pixels, black*number of pixels, etc.) that also approaches 100:1 ratios for typical memos. Photographs (where every pixel is colored a little differently) don't compress nearly as well; the JPEG format exceeds 10:1 compression, but I think it loses a little fine detail. And JPEG's compress by less than 10% when zipped.

      IMHO, 100:1 as an average (compressing your whole harddrive, for example), is far beyond "pretty damn good" and well into "unbelievable". I know of only two situations where I'd expect 100:1. One is the case of a bit-map of black and white text (e.g., faxes), the other is with lossy compression of video when you apply enough CPU power to use every trick known.

    14. Re:Current ratio? by -douggy · · Score: 2

      What about a spectacular fractal image? Sure as .jpg it could be 1/2 a meg in size but the equation to draw it.... The same with any image

    15. Re:Current ratio? by dannyspanner · · Score: 2

      Yes, but the coder (i.e. the equation) and decoder (i.e. equation to image converter) of your spectacular fractal image have to know that the equation represents a fractal image. You cannot apply this technique to arbitrary data, so my original point about the general case still stands.

      JPEG is a lossy image compression technique and we're talking about general lossless compression here.

    16. Re:Current ratio? by Anonymous Coward · · Score: 0

      Head over to comp.compression FAQ to read about compression of random data. Not exactly a new subject of controversy.

    17. Re:Current ratio? by Graspee_Leemoor · · Score: 2, Funny

      Heheh, I always wanted to write a "gainy compression" routine. It would probably have a special marker in there like the ascii string:

      "The next three bytes are compressed!"

      graspee

    18. Re:Current ratio? by sjf · · Score: 1

      The best ratio is n:1 where n is the number of
      bits of information in the data, and 1 is 1 bit.
      The worst ratio should be n:n+1.

      The point being you can compress random data if the decoder knows what the random data is beforehand.

    19. Re:Current ratio? by damiangerous · · Score: 1
      If your compression/decompression time exceeds the amount of time it would take to transfer the file uncompressed, you're really not gaining anything.


      Of course you are! You're seriously reducing the amount of bandwidth you use. If you're paying per byte (how bandwidth is normally sold), or paying for time online (dialup in many countries), or somehow paying for data transferred (Usenet providers, for example) you'd much rather get as little data as possible as quickly as possible and not care so much about the time it takes locally.

    20. Re:Current ratio? by Anonymous Coward · · Score: 0

      You are gaining something if sending via that method is very expensive--think sat coms like Inmarsat or Iridium.

    21. Re:Current ratio? by pma · · Score: 1

      Absolutely right. If all you want to do is compress the data, 1000:1 or even 10,000:1 is easily achieveable.

      Oh, you say you want to decompress it later. That's a different story.

    22. Re:Current ratio? by mprinkey · · Score: 1

      Why stop at one byte?! Compress it down to 1 bit. All the knowledge in the world...1 or 0, yes or no. Anyone wanna bet which? I am a nihilist, so I am going with zero.

    23. Re:Current ratio? by 42forty-two42 · · Score: 1

      Jeesh, are you that unimaginative? Try hexadecimal; 1:2 compression ratio every time!

      Better yet, binary; 1:8!

    24. Re:Current ratio? by 42forty-two42 · · Score: 1

      I'm going with the XOR of all bits in the original.

    25. Re:Current ratio? by ADRA · · Score: 1

      It uses lossy encoding. That is why it gets an entire 12:1 gain, or whatever. If you take a 384kps mp3, you will only get a 4:1 file size, but the sound patterns will be a lot closer to that of the original.

      --
      Bye!
    26. Re:Current ratio? by Anonymous Coward · · Score: 0

      Sadly you still need 6 bits to represent the meaning of life, the meaning of the universe and everything else..

      010101

      I'll be damned, that's a repeating pattern! *gasp*

    27. Re:Current ratio? by pokeyburro · · Score: 1

      A number of posters here are claiming that the average compression ratio for data is 1:1. Just so you all know, that limit is correct, and is a hard limit set by logic. Here's the simple proof:

      Lossless compression is essentially a 1-to-1 mapping between one sequence of bits and another sequence of bits. It has to be 1-1 in order for it to be lossless. QED. :-)

      The trick is to take more "common" bit sequences and map them to smaller ones, of course. The downside is that the sequence you're mapping to also has to map to something. Typically you map it to something larger - basically, you're compressing an already-compressed sequence. For every sequence you can compress, there's another that must expand, generally speaking.

      Naturally, if you don't mind a little dirt in the original, you can get away with a better ratio. JPEG is an example of this. The dirt amounts to minute differences in a photograph, which is fine for many purposes.

      --
      Lately democracy seems to be based on the skybox, the Happy Meal box, the X-box, and the idiot box.
    28. Re:Current ratio? by anacron · · Score: 2

      The point being you can compress random data if the decoder knows what the random data is beforehand.

      Won't this always be true for typical-use applications like compressing files and such? The bits that are to be encoded are known, because the encoder can just parse them. Things like streaming video and audio might get a bit tricky because unless you put some sophisticated buffering mechanism in place the bits to be encoded are probably not known ahead of time.

      .anacron

    29. Re:Current ratio? by Anonymous Coward · · Score: 0

      Of course, this is all bullshit.

    30. Re:Current ratio? by Anonymous Coward · · Score: 0

      No! We must use a polynomial if we are to account for swapped digits in this checksum!

      Oh wait, we were talking about a crackpot compression scheme.

    31. Re:Current ratio? by Anonymous Coward · · Score: 0

      Does that mean it can be COMPRESSED?

    32. Re:Current ratio? by radish · · Score: 2


      hence "If you're willing to accept loss..."

      :-)

      --

      ---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"

    33. Re:Current ratio? by RetsamYthgimla · · Score: 1

      but whats the current ratio?

      For truly random data? 1:1 at the absolute best.

      Well, they didn't say truly random, they just said random. Perhaps the algorithm takes bitstreams with lots of entropy (i.e. it looks "random"), and encodes them with fewer bits. The flip side would be that bitstreams with low entropy (e.g. text, graphics, sound) would not be compressed very well.

      Typical compressors take bit strings with low entropy (the minority of possible strings, but the ones we care about) and make them a lot shorter, with the flip side that the majority of possible bit strings, which have high entropy, take up more space comrpessed. But who cares, right?

      Perhaps this new compression algorithm takes bit strings which to us appear to have very high entropy, and it makes them a lot shorter, with the flip side that strings with low entropy (e.g. text, video, etc.) compress for crap! Woohoo! What a breakthrough!

    34. Re:Current ratio? by DivideX0 · · Score: 1

      I think you mean: 101010
      But it is still a repeating pattern.

      Attention moderators on crack, 42 is the meaning of life according to the HitchHikers Guide.

      --
      My next Slashdot post will be ready soon, but subscribers can beat the rush and see it early!
    35. Re:Current ratio? by Astrorunner · · Score: 1

      correction

      1:1 for 8 bit random

    36. Re:Current ratio? by zaffir · · Score: 1

      42 is the meaning of life according to the HitchHikers Guide. May a thousand fireants infest the underwear drawer of any moderator who does not know that.

      --
      "Upon attaching the waterblock to my penis, I began to notice that I know nothing about computers." -- JRockway
    37. Re:Current ratio? by Anonymous Coward · · Score: 0

      The ratio is unlimited in recursive compression
      and I have been working with 5 methods for the
      last 4 years. And before year 2003 I will release
      the methods even if unfinished.

      BECOUSE I WAS FIRST, NOT SOME DUMB COMPANY THAT
      CANT EVEN DESCRIBE IT CORRECTLY.

    38. Re:Current ratio? by Anonymous Coward · · Score: 0

      The previous posting was not a joke, you'll see!

    39. Re:Current ratio? by Classic+Ted · · Score: 1


      You still have to encode the length. Asymptotically, this will cost you O(log_2 log_2 N) when you write N in binary.

      Practically, if N is probably small, you may want to just use base one to represent the length (thus 1111110 101010 for an overhead of 7 bits = 1+ceil(log_2 N)). For larger values of N are likely, then either recursing ( 1110 110 101010 to encode the fact that the length is 3 bits long, the length and then the data for an overhead of

      2 ceil(log_2 log_2 N) + 1

      Recursing further only helps if very large values of N are likely and doesn't really help the asymptotic result, but only reduces the constant.

      See Williams and Zobel's paper for more practical information, Greg Chaitin's, work for more theory.

      http://www1.oup.co.uk/computer_journal/hdb/Volum e_ 42/Issue_03/pdf/ 420193.pdf

      http://www.cs.auckland.ac.nz/CDMTCS/chaitin/

      Justin Zobel's home page appears unavailable at the moment, unfortunately, but here it is.

      http://www.cs.rmit.edu.au/~jz

    40. Re:Current ratio? by Anonymous Coward · · Score: 0

      i've worked on compression algorithms in the past. i was successful at implementing a 3:1 compression that could be run on itself indefinately.

      rather than work on differential equations, mine projected the data series into a 6 dimensional matrix and used a really basic recursive function. computing geometrically by modeling data element interaction, allowed detail to be encoded into a base set of structures. it was lossless up to a point, but i couldn't define what that point was.

      this technology seems to be a more "mathmatical" approach.

    41. Re:Current ratio? by Fillup · · Score: 1

      Umm... I care. If your compression/decompression time exceeds the amount of time it would take to transfer the file uncompressed, you're really not gaining anything

      Actually, if you compress it once and a thousand people download it, you've saved a lot of time. For yourself, at least ;)

      --
      "I think there is a world market for, maybe, five computers." __ IBM Chairman, 1943 __
    42. Re:Current ratio? by Hast · · Score: 1

      I have read some information theory, but I can not make sense of that. I don't know if I'm wrong or not though.

      AFAIK the reason text compresses better than other stuff is because it contains a lot of rendundancy. Some where around 3.2 bits of redundacy. (That is, 3.2:1, just by studying the text.)

      I don't recall the terms "8-bit random" or "7-bit random" however. So perhaps that's where the magic comes from. ;-)

    43. Re:Current ratio? by cakoose · · Score: 1
      ...no algorithm can compress arbitrary random data at better than 1:1 in the long run.

      In the worst case, GZIP would leave the file the same size as the original. In other cases, however, GZIP will make the file smaller. That means that the compression ratio is bounded with 1:1 as the worst case. So, on the average, the compression ratio will be better than 1:1.

    44. Re:Current ratio? by cakoose · · Score: 1

      If you're willing to accept loss, then the sky's the limit...



      Some people seem to have taken advantage of this idea in their revolutionary new program! Open source prevails over commercial software again!!

    45. Re:Current ratio? by Anonymous Coward · · Score: 0

      IIRC, that "new" program wasn't new when /. first linked to it last April Fool's. Jokes aren't funny if everyone knows them already. Try again, chumpy.

    46. Re:Current ratio? by Flaming+Death · · Score: 1

      mpg, divx, mp3 are all lossy compression algos.

      true random data (no pattern matching), 1:1 is about it. The reason is simple, you cannot pattern match truly random data, because, its random. But programs like zip and rar use pattern matching techniques esp suited to computer based data. A good example is zipping an already zipped file. If it were truly able to handle random data then it could zip and zip.. until the cows came home :-) Sadly it cant, it pattern matches what it can, hash tables it, then thats it.

      Itd be nice to pattern match random data, but then.. it wouldnt be random would it.

    47. Re:Current ratio? by leeward · · Score: 1

      Yep, someone mod that one up! That is of course the key. Who cares how long or how much horsepower it takes to compress the data. As long as it can be decompressed quickly. Almost always decompression takes much less time than compression.

      I recently compressed a 20MB file with a lot of repetition using bzip2, and it compressed down to about 80KB! (Okay, there was a LOT of repetition) But I also had time to go out and make a fresh pot of coffee and enjoy it at an easy pace before the compression completed, and this on a 1G Athlon. But decompressing it took mere seconds.

    48. Re:Current ratio? by FlowerPotAdmin · · Score: 1

      Two points:

      1) Programs like GZIP can use many different schemes for data compression and chose the one that works best for a particular set of data.
      2) More importantly, computer files *do not generally consist of random data.* Most have structure (or even an uneven balance of 0s and 1s) that can be exploited by a compression algorithm.

      --
      -Justin
      That's enough posting for now lads, there're trolls afoot.
    49. Re:Current ratio? by radish · · Score: 2


      I should add some clarification (as various people have pointed out flaws in my wording). When I said "random" I didn't mean random in the mathematical sense, but rather "average binary data on your disk". My bad, I apologise to everyone who pointed out correctly that real random is 1:1.

      And what I meant by 7-bit non random is that ASCII is only 7-bit, the high bit is always 0, and certain bytes are much more common that others (think E, space etc). This makes it much easier for things like zip to compress it.

      And that post really wasn't worth +5 moderators!

      --

      ---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"

    50. Re:Current ratio? by roundand · · Score: 1

      The worst case is that GZIP doesn't compress it at all but has to add some meta information to say how the file should be expanded (or not, in this case) - in other words, it makes it longer.

    51. Re:Current ratio? by arkanes · · Score: 2

      I used to get nearly 100:1 (about 94:1, as I recall) on database files we used to use at work. Would have gone down if they'd ever been redesigned so they were relational, but hey :P

    52. Re:Current ratio? by gweihir · · Score: 1

      We are talking random data here. For this the maximum possible compression rate is below 1. That means files have to get larger on compression. In fact one valid characterisation of random data is that it cannot be compressed.

      And Information Theory is not something that can be "superseeded". It is prooven hard mathematics. And that proof that random data cannot be compressed is actually pretty simple and clear (as far as such proofs go).

      Without magic there is no way this is not fraudulent.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted and ignored otherwise.
    53. Re:Current ratio? by Platypii · · Score: 1

      The current ratio is all based on the type of data, it is not possible for ANY compression algorithm to compress all data by even one bit. This is simply based on irrefutable information theory. If their claim of being able to compress random data were true (which it is not... I will get to that later), it would mean that if you tried to compress something not random (ie- a bitmap or text file), it would make the file significantly larger. I say that they did not succeed in their claim because random dats is by definition data that can not be compressed. I won't get in to the mathematics of it, but anyone with a vague knowledge of information theory knows that their claim is bullshit. As has been said, they are far from the first to make this claim and later admit defeat.

    54. Re:Current ratio? by cakoose · · Score: 1

      This meta information could be summed up in the first byte. If there was no encryption, the byte could be set to zero. So you're right, the worst case is slightly greater than zero but in a 1MB stream, the ratio becomes 1.001:1.000 so it's really not that bad.

  3. how can this be? by posmon · · Score: 3, Informative

    even lossless compression still relies on redundancy within the data, normally repeating patterns of data. surely 100-1 on TRUE random data is impossible?

    --

    update comments set karma=-1, reason='offtopic' where sid=26315

    1. Re:how can this be? by jrockway · · Score: 4, Insightful

      I'm going to agree with you here. If there's no pattern in the data, how can you find one and compress it. The reason things like gzip work well on c files (for instance) is because C code is far from random. How many times do you use void or int in a C file? a lot :)

      Try compressing a wav or mpeg file with gzip. Doesn't work too well, becuase the data is "random", at least in the sense of the raw numbers. When you look at patterns that the data forms, (i.e. pictures, and relative motion) then you can "compress" that.
      Here's my test for random compression :)

      $ dd if=/dev/urandom of=random bs=1M count=10
      $ du random
      11M random
      11M total
      $ gzip -9 random
      $ du random.gz
      11M random.gz
      11M total
      $

      no pattern == no compression
      prove me wrong, please :)

      --
      My other car is first.
    2. Re:how can this be? by skroz · · Score: 2

      They just threw out information theory entirely... too restrictive. They came up with their own theory... disinformation theory! Everyone seems to be jumping on the bandwagon, too... these guys even compiled a list of the pioneers!

      --
      -- Minds are like parachutes... they work best when open.
    3. Re:how can this be? by Rentar · · Score: 5, Funny
      I'm going to agree with you here. If there's no pattern in the data, how can you find one and compress it. The reason things like gzip work well on c files (for instance) is because C code is far from random. How many times do you use void or int in a C file? a lot :)

      So a perl programm can't be compressed?

    4. Re:how can this be? by Shimbo · · Score: 3, Interesting

      They don't claim they can compress TRUE random data only 'practically random' data. Now the digits of Pi are a good source of 'practically random' data for some definition of the phrase 'practically random'.

    5. Re:how can this be? by mccalli · · Score: 2, Informative
      even lossless compression still relies on...normally repeating patterns of data. surely 100-1 on TRUE random data is impossible?

      However, in truly random data such patterns will exist from time to time. For example, I'm going to randomly type on my keyboard now (promise this isn't fixed...):

      oqierg qjn.amdn vpaoef oqleafv z

      Look at the data. No patterns. Again....

      oejgkjnfv,cm v;aslek [p'wk/v,c

      Now look - two occurences of 'v,c'. Patterns have occured in truly random data.

      Personally, I'd tend to agree with you and consider this not possible. But I can see how patterns might crop in random data, given a sufficiently large amount of source data to work with.

      Cheers,
      Ian

    6. Re:how can this be? by Sobrique · · Score: 1

      It's easy.
      If it's true random, it's meaningless, so you can just cat an equivalent number of bytes out of /dev/random to replace it.

    7. Re:how can this be? by sprag · · Score: 2

      Well, I can think of two ways that "random" data might be compressed without an obvious pattern:

      * If the data was represented a different way (say, using bits instead of bytesize data) then patterns might emerge, which would then be compressable. Of course, the $64k question is: will it be smaller than the original data?

      * If the set of data doesn't cover all possibilities of the encoding (i.e. only 50 characters out of 256 are actually present), then a recoding might be able to compress the data using a smaller "byte" size. In this case, 6 bits per character instead of 8. The problem with this on is that you have to scan through all of the data before you can determine the optimal bytesize...and then it still may end up being 8.

    8. Re:how can this be? by harlows_monkeys · · Score: 3, Interesting
      I realize that what I'm about to propose does not work. The challenge is to figure out why

      Here's a proposal for a compression scheme that has the following properties:

      1. It works on all bit strings of more than one bit.

      2. It is lossless and reversible.

      3. It never makes the string larger. There are some strings that don't get smaller, but see item #4.

      4. You can iterate it, to reduce any string down to 1 bit! You can use this to deal with pesky strings that don't get smaller. After enough iterations, they will be compressed.

      OK, here's my algorithm:

      Input: a string of N bits, numbered 0 to N-1.

      If all N bits are 0, the output is a string of N-1 1's. Otherwise, find the lowest numbered 1 bit. Let its position be i. The output string consists of N bits, as follows:

      Bits 0, 1, ... i-1 are 1's. Bit i is 0. Bits i+1, ..., N-1 are the same as the corresponding input bits.

      Again, let me emphasize that this is not a usable compression method!. The fun is finding the flaw.

    9. Re:how can this be? by levendis · · Score: 2

      yes, but, /dev/urandom isn't really random... if gzip was 'smart' enough, it could figure out the seed & algorithm for /dev/urandom and just save the output data that way. We don't really have any good way of generating really random data, so theoretically all data is not random and therefore arbitrarily compressible. In practice, of course, this is bullshit, and I think this press release will prove to be as well.

      --
      ---- I made the Kessel Run in under 11 parsecs.
    10. Re:how can this be? by posmon · · Score: 1

      of course you couldn't prove that any of the data used actually IS random ;)

      --

      update comments set karma=-1, reason='offtopic' where sid=26315

    11. Re:how can this be? by Alan+Partridge · · Score: 1

      MAXIMUM RESPEK!

      --
      That was classic intercourse!
    12. Re:how can this be? by spoonboy42 · · Score: 2

      True random data, however, is extremely rare. Even random number generator algorithms used on PCs don't generate truly random numbers, but rather "semirandom numbers" resulting from a number of operations being applied to the current timestamp. If you pull bytes out of /dev/random at specified intervals for a long enough time, you will eventually be able to discern what pattern connect these semirandom numbers to the time.

      As far as we can tell, the digits of Pi are random. They are also, however, based on mathematical relationships which can be modeled to find patterns in the digits. There are formulae to calculate any independent digit of Pi in both hexadecimal and decimal number systems, as well as known relations like e^(i*Pi) = -1.

      Anyway, the press release says that the algorithm is effective for practically random data. I'm not sure exactly what this means, but I would guess that it applies to data that is in some way human-generated. Text files might contain, say, many instances of the text strings "and" and "the", no matter what their overall content. Even media files have loads of patterns, both in their structure (16 bit chunks of audio, or VGA-sized frames) and in their content (the same background from image to image in a video, for example). Even in something as complex as a high resolution video (which we'll take to be "practically random"), there are many patterns which can be exploited for compression.

      --
      Anonymous Luddite: "What do you think of the dehumanizing effects of the Internet?"
      Andy Grove: "Not Much."
    13. Re:how can this be? by Proaxiom · · Score: 1
      Okay, but in true random data no such patterns are guaranteed to exist. The number of patterns would be, in fact, random.

      Therefore it is difficult to make claims about compression ratio. A random sequence could easily be uncompressable (and we know that every compression algorithm has some data that it cannot compress).

      I'm skeptical of the claims, but I am unfamiliar with the math they are describing, so I have to give it a fair shake. They seem to be trying to describe a random sequence using a combinatorial series... how they can be certain the series is less complicated than the data itself I have no idea.

      I am familiar with Shannon's work, however, and I have to say I don't think they are superceding it. They don't actually say that in the press release, if you read the language carefully:
      We perceive this advancement as a significant breakthrough to the historical limitations of digital communications as it was originally detailed by Dr. Claude Shannon in his treatise on Information Theory.

      It sounds more like they are extending his work.

    14. Re:how can this be? by Sobrique · · Score: 1

      Actually, I think there are tests that allow you to determin how random a set of data are. My final year project at university required (amongst other things) porting a gaussian (normal distribution) function from FORTRAN to C - and it's possible to analyse how random the results are. In my case the C (using a mix of rand() and some scaling factors) was a lot worse than the FORTRAN random function. The C code _did_ run about 100 times faster though :)

    15. Re:how can this be? by Dr_Cheeks · · Score: 3, Insightful
      If the data was represented a different way (say, using bits instead of bytesize data) then patterns might emerge...
      With truly random data there's no pattern to find, assuming you're looking at a large enough sample, which is why everyone else on this thread is talking about the maximum compression for such data being 1:1. However, since "ZeoSync said its scientific team had succeeded on a small scale" it's likely that whatever algorithm they're using works only in limited cases.

      Shannon's work on information theory is over 1/2 a century old and has been re-examined by thousands of extremely well-qualified people, so I'm finding it rather hard to accept that ZeoSync aren't talking BS.

      --

    16. Re:how can this be? by s20451 · · Score: 3, Informative

      Of course patterns occur in random data. For example, if you toss a fair coin for a long time, you will get runs of three, four, or five heads which recur from time to time. The point is that in random, noncompressible data, the probability of occurrence for any given pattern is the same as the probability of any other pattern.

      --
      Toronto-area transit rider? Rate your ride.
    17. Re:how can this be? by Catiline · · Score: 2, Informative

      Simple. You're doing binary counting. To decompress using this algorythm you need to know the number of cycles performed, for which the smallest (uncompressed) form is the original imput data.

    18. Re:how can this be? by Steev · · Score: 1

      That's a bit too much of a generalization. Perl programs like C programs are still text, probably ASCII. There's the pattern. All the bytes in those files are all ASCII characters. Compress away.

    19. Re:how can this be? by oyenstikker · · Score: 1

      On trying to reverse it, you don't know if something is a 1 because it used to be a zero, or because it used to be a one that was not the lowest numbered one.

      --
      The masses are the crack whores of religion.
    20. Re:how can this be? by Proaxiom · · Score: 1
      This isn't very hard. Decompression requires you to know the number of iterations you used to compress the original string. Essentially, you have to know when to stop.

      And to store the counter, you require N bits. Therefore you achieve 0 compression.

    21. Re:how can this be? by posmon · · Score: 1
      that's only testing the distribution of the numbers. in an entirely random set you would *expect* for all numbers to occur an equal number of times. the fact that they don't doesn't mean that the data is random.

      rolling 6,6,6,6,6,6 on a dice is no more non-random than 1,6,3,4,2,5. unless you're kamal khan.

      --

      update comments set karma=-1, reason='offtopic' where sid=26315

    22. Re:how can this be? by EllisDees · · Score: 2
      Even random number generator algorithms used on PCs don't generate truly random numbers


      Actually, that depends on the hardware for that particular PC. For instance, the Pentium 2 (and possibly above), have a builtin source of real random numbers based on the thermal noise of the processor itself. Another possible source of randomness is a microphone input that isn't connected to anything.
      --
      -- Give me ambiguity or give me something else!
    23. Re:how can this be? by tjansen · · Score: 5, Informative
    24. Re:how can this be? by swillden · · Score: 2

      If you pull bytes out of /dev/random at specified intervals for a long enough time, you will eventually be able to discern what pattern connect these semirandom numbers to the time.

      Who will be able to? Not me, that's for sure.

      /dev/random uses a pool of very random bits that are distilled from truly random (though not uniformly distributed) data, and it applies a well-respected one-way hash to generate output bits from this pool of randomness. Further, it applies some fairly conservative estimation of the quality of the pool of randomness and stops providing output when the entropy drops too far (use /dev/urandom if you don't want to have to wait for output now and then).

      It is theoretically possible to determine the (large) seed and predict future outputs based only on the observed outputs (though completely impractical based on current public knowledge), but even if you determined the seed at one moment, unless you can observe/predict all of the events /dev/random uses to stir the pool, you'll quickly be wrong again.

      In practice, predicting the output of /dev/random requires complete control over the machine and its environment.

      /dev/random is not truly random, but it's as close as unclassified research knows how to make it, and it's damned good.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    25. Re:how can this be? by justinstreufert · · Score: 1

      Sure, a Perl program can be compressed. Anytime you repeat something (the name of a variable, a function, a pattern of characters) the compressor can replace it with a shorter placeholder to save space.

      Justin

      --
      "Why would God give us a waist if we wasn't supposed to rest our pants on it?" - Rev. Roy McDaniels
    26. Re:how can this be? by ergo98 · · Score: 5, Informative

      Well firstly I'd say the press release gives a pretty clear picture of the reality of their technology: It has such an overuse of supposedly TM'd (anyone want to double check the filings? I'm going to guess that there are none) "technoterms" like "TunerAccelerator" and "BinaryAccelerator" that it just is screaming hoax (or creative deception), not to mention a use of Flash that makes you want to punch something. Note that they give themselves huge openings such as always saying "practically random" data: What the hell does that mean?

      I think one way to understand it (Because all of us at some point or another have thought up some half-assed, ridiculous way of compressing any data down to 1/10th -> "Maybe I'll find a denominator and store that with a floating point representation of..."), and I'm saying this as not a mathematician or compression expert : Let's say for instance that this compression ratio is 10 to 1 on random data, and I have every possible random document 100 bytes long -> That means I have 6.6680144328798542740798517907213e+240 different random documents (256^100). So I compress them all into 10 byte documents, but the maximum variations of a 10 byte documents is 1208925819614629174706176 : There isn't the entropy in a 10-byte document to store 6.6680144328798542740798517907213e+240 different possibilities (it is simply impossible, no matter how many QuantumStreamTM HyperTechTM TechoBabbleTM TermsTM) : You end up needed, tada, 100 bytes to have the entropy to possibly store all variants of a 100 byte document, but of course most compression routines put in various logic codes and actually increase the size of the document. In the case of the ZeoSync claim though they're apparently claiming that somehow you'll represent 6.6680144328798542740798517907213e+240 different variations in a single byte : So somehow 64 tells you "Oh yeah, that's variation 5.5958572359823958293589253e+236!". Maybe they're using SubSpatialQuantumBitsTM.

    27. Re:how can this be? by Erik+Hensema · · Score: 5, Funny

      Perl source is as close to truly random data as possible.

      --

      This is your sig. There are thousands more, but this one is yours.

    28. Re:how can this be? by LMariachi · · Score: 1
      oqierg qjn.amdn vpaoef oqleafv z Look at the data. No patterns. Again....

      Two occurrences of 'oq'. And if you count regexps...

    29. Re:how can this be? by CaseyB · · Score: 2
      We don't really have any good way of generating really random data

      Well, not since LavaRand went down...

      "harnessing the power of Lava Lite® lamps to generate truly random numbers since 1996."

    30. Re:how can this be? by ergo98 · · Score: 1

      Speaking of that there's the brilliant Dilbert where Dilbert is led around I believe accounting and is shown their "random number generator" which is a guy constantly saying "7". Dilbert asks if they're sure it works, to which his host replies "That's the problem with random numbers: We'll never know". Anyone know where that is in graphical form? It was from October I believe and the Dilbert site pulled it, and everyone else seems to have linked to the Dilbert site.

    31. Re:how can this be? by posmon · · Score: 1

      mod this shit up! he's turned my gut reaction into real words!

      --

      update comments set karma=-1, reason='offtopic' where sid=26315

    32. Re:how can this be? by Kopretinka · · Score: 1
      The flaw: you never know how many iterations were made.

      Am I right? Am I right? I sure am right. Am I? 8-)

      --
      Yesterday was the time to do it right. Are we having a REVOLUTION yet?
    33. Re:how can this be? by dfay · · Score: 1

      Perl code is already compressed. It is very near to random data. The problem is that it is compressed with an algorithm called "perl hacking" that is not reversible. That's why Perl is sometimes called a write-only language.

      ;)

      My general rule of thumb of language readability: the more often you find yourself holding the shift key and pressing a number key from the top row, the less readable the language is.

      e.g.
      int result = obj.foo(bar) // readable

      p* += &(*--q); // less readable

      #i %*)@!%(#@)**# // not readable... check to see if the perl compiler will accept it as valid code... if not, you may actually have random data! :)

    34. Re:how can this be? by mccalli · · Score: 1
      Two occurrences of 'oq'. And if you count regexps...

      See? Told you it wasn't fixed. I can't even see the patterns that I've typed myself...

      Cheers,
      Ian

    35. Re:how can this be? by gkatsi · · Score: 1

      yes, but /dev/random is truly random

    36. Re:how can this be? by honcho · · Score: 1

      If you look at this closely, it's just subtracting 1 from an N-bit number until it's all 0, then counting down through all N-1 bit numbers, ... until there are no more bits. For example:

      100 -> 011 -> 010 -> 001 -> 000
      -> 11 -> 10 -> 01 -> 00
      -> 1 -> 0
      -> NULL

    37. Re:how can this be? by justinstreufert · · Score: 1

      As a few thoughtful Anonymous Cowards have pointed out, I am a big idiot. Sorry, I was feeling a bit humorless this morning.

      Justin

      --
      "Why would God give us a waist if we wasn't supposed to rest our pants on it?" - Rev. Roy McDaniels
    38. Re:how can this be? by posmon · · Score: 1

      strangely enough, that's what i was thinking of when i posted that ;)

      --

      update comments set karma=-1, reason='offtopic' where sid=26315

    39. Re:how can this be? by daniel_howell · · Score: 2, Funny

      Maybe they just write all the 1s and 0s *really small*?

    40. Re:how can this be? by NightWhistler · · Score: 2

      Congratulations, you have managed to prove that you simply can't compress truly random data, as everybody has been saying all along...

      The article however, states that the data is practically random... which does make a difference, 'cause else you wouldn't be able to compress anything...

      That being said, I still think it's a load of hot air... I guess we'll be seeing it in next year's vaporware top 10... ;-)

      --
      PageTurner Reader: open-source e-reader for Android with cloudsync. http://pageturner-reader.org
    41. Re:how can this be? by Anonymous Coward · · Score: 0

      dude, get a sense of humor!

    42. Re:how can this be? by delta407 · · Score: 1, Funny

      No, see, it's 100:1 in binary.

    43. Re:how can this be? by GreyPoopon · · Score: 1
      so I'm finding it rather hard to accept that ZeoSync aren't talking BS

      So let's start watching them for inclusion in Wired's 2002 Vaporware top ten.

      --

      GreyPoopon
      --
      Why is it I can write insightful comments but can't come up with a clever signature?

    44. Re:how can this be? by why-is-it · · Score: 2

      But I can see how patterns might crop in random data, given a sufficiently large amount of source data to work with.

      I think it is a given that patterns will occur in truly random data. Strictly speaking, if the probability of such a pattern existing is greater than 0, it is a certainty that it will eventually occur, given enough trials.

      The question is, will a sufficient number of patterns occur often enough that the data can be significantly compressed to warrant the CPU cycles involved?

      I agree, it does not seem possible.

      --
      *** Where are we going? And what's with this handbasket?
    45. Re:how can this be? by Anonymous Coward · · Score: 0

      If you do have a set of random information, i highly doubt that there will never be a repeating pattern, say for large files:
      12382936497864796234791232923
      maybe thats "A'`*#)!" in binary form, you still will have random occurances in the binary stream, say the difference between "1 and 3" is 2, and then 1 + 3 * 2 down the road (8) you have another two numbers, "8 6", and the difference is -2
      thats a pattern.

    46. Re:how can this be? by ergo98 · · Score: 1

      True, and that disclaimer itself basically discounts this being anything better than "another variant of LZW". Perhaps what they're really selling is technology (or the dream at least) that can find patterns that aren't readily apparent.

      Of course if you want to really liberally interpret "practically random", I could say that the string AAAAAAAAAAAA(repeat 100 times. I originally did to get the Slashcode error "that's an awful long string of letters there" :-)) is `practically' random (meaning that it's equally as possible in a random sequence as any other 100 character string), therefore my amazing compression can represent that as 100A! Please send checks care of...

    47. Re:how can this be? by tshak · · Score: 2

      The reason things like gzip work well on c files (for instance) is because C code is far from random. How many times do you use void or int in a C file? a lot :)

      So a perl programm can't be compressed?


      Good question. Although my post made on my own time in my house while eating my breakfast and ignoring my clock should be compressed quite easily.

      --

      There is no longer anything that can be done with computers that is nontrivial and clearly legal. -- Paul Phillips
    48. Re:how can this be? by Ashran · · Score: 1

      Wouldnt it only be equally distributed when the amount of data is infinite?

      btw, one way of checking that is always taking 2 bytes, first byte is x, second byte is y, and increment the color on that position.
      Try that on zip files, you'll notice an equal distributed colored square, with a diagonal line in between

      --

      Before you email me, remember: "There is no god!"
    49. Re:how can this be? by rho · · Score: 1

      for he is the Kwisatz Haderach

      --
      Potato chips are a by-yourself food.
    50. Re:how can this be? by Anonymous Coward · · Score: 0

      claim #2: reversible

      + claim #4: can reduce any string to 1 bit

      + Pigeonhole Principle

      Inconsistent. => falsch.

    51. Re:how can this be? by Anonymous Coward · · Score: 0


      It obviously hits an infinite loop, like so.

      1) find lowest 1, everything below that becomes a 1, that becomes a zero.

      2) now the lowest 1 is at place 0, turn that into a zero.

      3) lowest 1 is at place 1, data[0] becomes 1, data[1] becomes zero...

      4) We have now returned to exactly the same case as 2 except now the data[1] = 0. data[0] becomes a 0.

      5) Next, data[2] becomes 0, data[1] = data[0] = 1.

      Notice that zeroes are slowly being propagated up the bitstring, taking roughly time x where x is the length of the bit string. Total time taken is roughly n^2 * 2^n where n is the number of bits being "comprressed". That alone is reason enough why it won't work. :-)

    52. Re:how can this be? by Nelson · · Score: 2
      It can't be, that's how it can be.


      Now I'll give them the benefit of the doubt and assume that their PR firm screwed up and they're really talking about very specific types of data but this is most likely a scam. The counting argument is proven mathematically and you can't unprove that or circumvent it.


      Let me explain. With a string of bits you can describe different types of data. Different combinations of bits can be used to describe different things, different strings of bits can be used in a multitude of ways but there is a limit to how much you can describe with a string of bits becuase it is fixed in length.


      Say I take 1 bit. It's either 0 or 1. There are only two things I can describe with that one bit without extra information. (In compression, "extra information" implies bullshit, your idea doesn't work if it needs "extra information") It's impossible to make that one bit represent 3, 4 or more things without extra information.


      With 2 bits I can represent 4 things. It's impossible to represent more. They are 00, 01, 10, and 11, there isn't a 5th possibility. From a compression standpoint that means that if I compress something down to 2 bits then with that particular compressor there are only 4 possible input that can compress to that size. One of those inputs might be the encyclopdedia britannica but that's a pretty specializae compressor then. Do the induction, 2**bits is the equation that defines the maximum number of possible inputs to produce that output of that length. This is encoding and what Shannon is most famous for, and his laws are still laws. There are a couple popular ways to do encoding with binary data in compression, arithmetic and Huffman are the most popular.


      So where is this going? Well first, recursive compression. It's not compression becuase of that "extra information" problem. How many times do I run it to get my data back? Well that's simple, you compressed the linux kernel and you ran it a number of times x. x just happens to be about the same size as the linux kernel when represented in binary, gzip that number up and keep it in a safe place and you'll be able to restore your kernel that was compressed into 1 bit.


      Second. Compressing random data. Say I compress a random string to 1/100th of it's size. To make the math easy, let's use a string that is 1024bits long, it is reduced to 10 bits (or 11 now and then.) 1024 bits can represent a lot of different things. 2**1024 of them. 10 bits can only represent 1024 different things, 2**10 = 1024. 2**1024 has 309 digits, 1024 is an teeny tiny fraction of it. Well under 1%, way way way under 1%. With that compressor that does 100:1 compression you can only compress, at the most, 1024 things out of 2**1024, without "extra information." That might be randomesque data but the fact is that if you pick a random string of 1024bit, you will very very rarely pick one of those 1024 strings that your compressor works on, it pretty much will never happen in practice because it's such a small percentage. This is a truth with all compressors. That's not to say that they can't compress a truley random string of bits 100:1, they just might want to take a few centuries to come up with the particular random string that they are going to use.


      LZ77, LZ78, Burrows-Wheeler transforms, PPM modelers, Markov modelers, all modern generic lossless compressors exploit the fact that most "interesting data" that we want to compress has redundancy to it. If you allow for some degree of redundancy or perhaps a lot of it then you reduce the number of possible inputs to the "interesting inputs" and that's usually far smaller than 2**1024 for a 1024 bit string. It's still probably orders and orders of magnitude larger than 1024 though, it's quite easy to find more than 1024 interesting things you can represent with 1024 bits, the short of it is that a 100:1 compressor will only be able to compress 1024 things of that length. No matter how you model the data, you're limited by the possible number of inputs represented by the encoding.

    53. Re:how can this be? by FlatEarther · · Score: 4, Funny

      It is possible despite the many (uninformed) negative comments that have appeared concerning this truly amazing breakthrough in compression technology. I, myself, using my own patented compression technology - The Shannon-Transmogrificator (TM) have managed to compress the entire Reuters article to a mere 4 ASCII characters (!), with essentially no loss in meaning: 'C', 'R', 'A', 'P'. I wonder if anyone can improve on this ?

    54. Re:how can this be? by istartedi · · Score: 2

      Not surprisingly, small code is one of the virtues of Perl. No need to compress it--it's already compressed!

      That said, the claim made by this company is obvious bollox (sp?). Anything much better than 1:1 on truly random data is not possible. Does anybody have a link to a mathematical proof, or better yet a layman's argument from a respected mathematician?

      --
      For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
    55. Re:how can this be? by Mysticalfruit · · Score: 1

      That's random data but it istotally compressible. Even though you banged away at your keyboard, by the virue its a US/UK keyboard, your options are still very limited. Since this is standard ascii text which is 8 bits per character, it works out to 32*8=256 bits (remember to count the spaces!!!)

      This is how the occurances break down
      1. " ": 4 occurances 00001
      2. o: 3 occurances 00010
      3. q: 3 occurances 00011
      4. i: 1 occurance 00100
      5. e: 3 occurances 00101
      6. r: 1 ocurrance 00110
      7. g: 1 occurance 00111
      8. j: 1 occurance 01000
      9. n: 1 occurance 01001
      10. .: 1 occurance 01010
      11. a: 3 occurances 01011
      12. m: 1 occurance 01100
      13. d: 1 occurance 01101
      14. n: 2 occurances 01110
      15. v: 2 occurances 01111
      16. p: 1 occurance 10000
      17. f: 2 occurances 10001
      18. l: 1 occurance 10010
      19. z: 1 occurance 10011
      20. oq: 2 occurance 10100
      So we've got 19 different individual "pattens" that we can incode with 5 bits instead of 8.

      Now we just string those together you get 30*5=150 bits, plus a bit at the end to let decoder know that your at the end (since your going to be stuffing 5bit strings into 8bit strings you'll end up having some carry over. You've also gotta store the table so you can go the other direction, which when added to the now 5 bit stream, you end up with something a bit larger. But for a real example with a couple thousand letters, it would end up working out.

      I'm sure I've made some stupid mistake, if so just flame away. I can take the heat.

      --
      Yes Francis, the world has gone crazy.
    56. Re:how can this be? by Ares · · Score: 1

      Got it on my wall right now. Its altogether too convenient to blame things on random problems in this business. That one pretty much sums it up though.

    57. Re:how can this be? by Rupert · · Score: 2

      Hot Bits uses a Geiger-Muller tube pointed at a radiation source. About as random as you can get.

      --

      --
      E_NOSIG
    58. Re:how can this be? by Ctrl-Z · · Score: 1


      #i %*)@!%(#@)**# // not readable... check to see if the perl compiler will accept it as valid code... if not, you may actually have random data! :)

      That would be a comment. But anyway, wouldn't it more likely be considered random data if perl did accept it?

      --
      www.timcoleman.com is a total waste of your time. Never go there.
    59. Re:how can this be? by Flammon · · Score: 1

      PI represents an infinite string of random numbers and can be calculated using a formula which in a sense is a pattern. So if random numbers can be generated with a pattern then a pattern can be generated from random numbers.

      Basically what you need to do is come up with a new formula for each random string of numbers, the same as is done with PI.

      The only problem is, it's not as easy as PI

    60. Re:how can this be? by Mr.+Slippery · · Score: 1
      /dev/random is not truly random, but it's as close as unclassified research knows how to make it, and it's damned good.

      I don't think "use a Zener diode" or "get a chunk of radioactive stuff (try a smoke detector) and a Geiger conuter" are classified...

      --
      Tom Swiss | the infamous tms | my blog
      You cannot wash away blood with blood
    61. Re:how can this be? by Kiffer · · Score: 1
      It is possible despite the many (uninformed) negative comments that have appeared concerning this truly amazing breakthrough in compression technology. I, myself, using my own patented compression technology - The Shannon-Transmogrificator (TM) have managed to compress the entire Reuters article to a mere 4 ASCII characters (!), with essentially no loss in meaning: 'C', 'R', 'A', 'P'. I wonder if anyone can improve on this ?


      which can be compressed to POO , which is 75% the lenght of your result.
    62. Re:how can this be? by MindStalker · · Score: 1

      Ok, can't resist saying this as you already know this, but perl code rarly has more than the typical typeable characters (94 I think) while you can store 256 characters in each byte. So simple exansion of the character set, is always possible in human readable text. But we all already knew that.

    63. Re:how can this be? by SIGFPE · · Score: 2

      Real programmers write perl scripts by editing compressed source files directly so of course they can't be compressed any more...

      --
      -- SIGFPE
    64. Re:how can this be? by Anonymous Coward · · Score: 0

      No, because fucking regexps look like line noise.

    65. Re:how can this be? by Anonymous Coward · · Score: 0

      If you pull four successive bags of plain chips out of a freshly opened, big bag of mixed flavour chips, it means that the bag REALLY WANTS you to eat some plain chips for a chnage.

    66. Re:how can this be? by Lacutis · · Score: 1

      I think I managed to do you one better!

      'L' 'I' 'E'

      Amazing!

    67. Re:how can this be? by feepness · · Score: 1

      This is (theoretically) possible because complicated algorithms might be used to regenerate the data. An example of this is that I could transmit the value of PI to you (in an infinite amount of time) or I could transmit the algorithm for calculating PI to you (finite). There, I've just created an infinite compression algorithm (which happens to work only for PI... :).

      Also, you state that "no pattern == no compression". A sequence of 11M of 1s is equally random and equally likely to any other sequence you might see. The ONLY difference is that YOU see an easy algorithm for compressing it.

      There is always a pattern. You've disproved your own suggestion with your own example. /dev/urandom is a random nunmber generator. There is an algorithm that generates it (including a seed value). I transmit the seed, I transmit the algorithm and I've compressed the data. The trick is finding the right algorithm to describe data our puny human minds see as "random".

      you can't see the pattern != no pattern

      Now I still have a healthy amount of skepticism for this... but hey, one can dream...

    68. Re:how can this be? by sprag · · Score: 2, Troll

      >With truly random data [random.org] there's no pattern to find, assuming you're looking at a large enough sample

      How big is a 'large enough sample'? Seems the larger the sample, the more likelyhood of getting longer matches.

      However, given 10 bytes from random.org: 39, 233, 196, 127, 220, 228, 10, 146, 60, 68.
      Strung together as binary they come out as:

      00100111 11101001 11000100 01111111 11011100 11100100 00001010 10010010 00111100 01000100

      Lots of little patterns in there, providing you cross byte boundaries. 4 1's in a row happens 4 times. 3 zeroes in a row come up 7 times.'10' comes up 17 times. '100' comes up 12 times. '0100' comes up 8 times.

      Can this be encoded in a way that takes less than 10 bytes? Don't know. Don't care really, but there are patterns in there.

    69. Re:how can this be? by krogoth · · Score: 2

      I have developped a program based on MD5 that can compress any file down to one byte nearly instantaneously*. It adds a .sc extension. Copy this code into a file and make it executable, then run it with the filename as the first parameter:

      #!/bin/bash
      md5sum $1 > .supercompressed
      dd if=.supercompressed of=$1.sc bs=1 count=1 > /dev/null 2>&1
      rm -f .supercompressed


      * Warning: decompression is not supported. You can tell if another file is itentical to the original by compressing it and comparing the results, but it has a very high uncertainty.

      --

      They that quote Benjamin Franklin on liberty and safety deserve neither.
    70. Re:how can this be? by Phil+Karn · · Score: 1

      No, the digits of Pi are not a source of "practically random" data, because I can represent them with the expression "pi".

      To be random, a sequence has to be completely unpredictable and expressible in nothing less than the sequence itself. That's why compressing truly random data is impossible.

      Phil

    71. Re:how can this be? by cez · · Score: 1

      perhaps somehow instead of identifying paterns in random data, they are making them???

      --
      Walk with Music;
    72. Re:how can this be? by 42forty-two42 · · Score: 1

      Wow... drop a bit and you've already compressed it!

    73. Re:how can this be? by Sivar · · Score: 1

      In practice, you would need a way to tell the decompressor what data to create from the compressed file. I notice that longest pattern in there is a string of nine "1's". How do you tell the compressor to create nine ones in less space than the nine ones would take uncompressed?

      Furthermore, how would the decompressor know that the data telling it to make nine 1's wasn't compressed data itself? You would need to have some sort of agreement between the decompressor and compressor, which would limit your possibilities further or create a larger file. Remember, all data is ultimately binary regardless.

      --
      Computer Science is no more about computers than astronomy is about telescopes. --E. W. Dijkstra
    74. Re:how can this be? by the_quark · · Score: 2

      Not to be too much of a geek, but you want to pull that from /dev/random. /dev/urandom is actually psuedorandom if you pull that many bits out of it. /dev/random will block waiting for more actually random bits (which is why you should use it for things like keys), which /dev/urandom will us cryptographic hashes to "stretch" the entropy it has. Theoretically, /dev/urandom may operate as designed and yet produce data with patterns in it. /dev/random should always provide "cryptographically" random data.

    75. Re:how can this be? by gammoth · · Score: 1
      Lots of little patterns in there, providing you cross byte boundaries. 4 1's in a row happens 4 times. 3 zeroes in a row come up 7 times.'10' comes up 17 times. '100' comes up 12 times. '0100' comes up 8 times.

      Yes, there are patterns, but each pattern has to represent the same thing. The patterns you've given surely cross character boundaries. If there are 3 zeroes in a row, I have to know that those 3 zeroes represent the same thing.

    76. Re:how can this be? by gammoth · · Score: 1

      The digits of pi are random, not practically random.

      Knowing all the digits from 1 to n gives no clue as to what the n + 1 digit is.

      To be random, a sequence has to be completely unpredictable and expressible in nothing less than the sequence itself. That's why compressing truly random data is impossible.

      That's exactly right. With pi, the only way to know the nth digit is to read it from a representation or calculate it. That is, pi is "expressible in nothing less that the sequence of pi itself."

    77. Re:how can this be? by Xentax · · Score: 1

      That approach ... "description-based" compression, if you will, has merit.

      And for non-realtime compression, it might even work. The trick is finding a description (or descriptions) of the data (which might have been generated in a truly random manner) that's 1% or less of the size of the data, and then transmitting that.

      That could take an AWFULLY long time, though.

      It would never suffice for real-time compression, though -- you can't assign a description to random data as it's coming in, because there's literally no way to describe what's coming next; you can only describe it after the fact.

      Description-based compression probably gives great results for some data, IF you either have lots of horsepower or don't mind waiting. I'm not sure anyone out there has come up with a "Generic" algorithm for it, though; it's easy to give examples of where it yields great results but I've yet to see a general-use algorithm.

      Xentax

      --
      You shouldn't verb words.
    78. Re:how can this be? by GypC · · Score: 2

      the probability of occurrence for any given pattern is the same as the probability of any other pattern

      ... of the same size. the pattern '333' is far more likely to reappear than the pattern '1234321.'

    79. Re:how can this be? by jrockway · · Score: 1

      Yeah, sorry. I didn't want to wait a week to collect entropy for my slashdot comment :-D

      --
      My other car is first.
    80. Re:how can this be? by Anonymous Coward · · Score: 0

      A compression that does not rely on pattern in
      data would do the work. You are very narrow sighted.

    81. Re:how can this be? by shaunbaker · · Score: 1

      using hex PI is not random and there is a forumla which can generate the nth number of PI without knowing the previous numbers, check out wolframs mathworld

    82. Re:how can this be? by Guignol · · Score: 1

      prove me wrong, please :)
      My pleasure :)
      with the same seed:
      dd if=/dev/urandom of=random bs=1M count=10
      Here you have your original string amazingly compressed.
      Of course, this isn't a general compression technique as it is only aimes towards compressing those pseudo-random strings.
      But it's very simple, very fast, and have a much higher ratio than 100:1.The thing is, it is not based on repetitive patterns to compress, but rather on "encoding".
      Now anyway, I don't see how you could deal with true random data and I don't realy think the article is much more than buzz, but I don't think it is proved that this result is unachievable.
      At best, you could show it is impossible to achieve using pattern repetitions techniques.
      I can think of several different ones, although they'd admitedly be hybrid ones.
      For example, another idea I never gave a shot was this one:
      Use some kind of sorting algorhithm to "shuffle" your original data. The algorithm is basicaly a random seed specified sequence of (inversible, have to see how to do that, would be two reciprocal algorithms that use the same seed to work forward and backward) sorting comands. (sorting or any other way to distort the data, I was thinking about a bubble sort because it's a nice vision to see all those 1s coming together at the surface letting the 0s under them (or the contrary)). Anyway, the trick would be to find a nice algorithm or a set of nice ones (you try several ones or the ones that statisticaly suit better your data type if youy have some knowledge about it) that reversibly distort your data with a simple seed into a hihgly compressible new data.
      Depending on your original 1s and 0s distribution, some seeds will be more or less effective, I'd say a few tries should always come to something at least compressible beyond 1:1 which would be better than what we currently have. But 100:1 :) As Neo would say.. "Wow !" dunno :)...
      Oh well...I'll have to think about it better someday

    83. Re:how can this be? by Anonymous Coward · · Score: 0

      how about using random()%1 ? :-)

    84. Re:how can this be? by anti-drew · · Score: 1

      However, in truly random data such patterns will exist from time to time. For example, I'm going to randomly type on my keyboard now (promise this isn't fixed...):

      oqierg qjn.amdn vpaoef oqleafv z

      Look at the data. No patterns. Again....

      oejgkjnfv,cm v;aslek [p'wk/v,c

      Strictly speaking, this is not random data, it's not even close. Your natural tendency as a human two-handed typist is to roll your hand across the keyboard when making such supposedly "random" scratchings. But that in itself generates a pattern ... look more carefully, you can see that on an American QWERTY keyboard the keys you hit start at the outside and roll towards the inside, with only a few exceptions. It also jumps back and forth between the left hand and right hand with odds of about 50%, and probably moves up and down among the rows with similar odds.

      Good cryptanalysts know this stuff and will exploit it. Our monkey brains simply can't generate random data, we're no good at it. You need to cleverly exploit the entropy of a system (like /dev/random is supposed to do) to obtain anything close to truly random data.
    85. Re:how can this be? by hackersforjesus · · Score: 1

      wasn't that the idea behind fractal compression? finding some sort of algorithm to reconstruct the data?
      I'm not too sure on the subject, but i think it was something along those lines...

    86. Re:how can this be? by Anonymous Coward · · Score: 0
      porting a gaussian (normal distribution) function from FORTRAN to C - and it's possible to analyse how random the results are
      You analysed the randomness of porting from FORTRAN to C? :-) Did you determine whether your own interpretation the FORTRAN, or actually writing the C, caused the chaos?
    87. Re:how can this be? by Anonymous Coward · · Score: 0
      It obviously hits an infinite loop, like so.
      Bite the troll, bite.
    88. Re:how can this be? by matrix29 · · Score: 1

      How big is a 'large enough sample'? Seems the larger the sample, the more likelyhood of getting longer matches.

      Take a series which for a purposes is much simpler.

      DABCDDDCABCABCBDABC
      4 Characters = 0,1,2,3
      3012333201201213012
      Convert to HEX values again
      31BF8619C6

      The space required for indexing should not exceed the space gained in compression.

      --
      "Face it, a nation that maintains a 72% approval rating on George W. Bush is a nation with a very loose grip on reality.
    89. Re:how can this be? by Scrameustache · · Score: 1

      Try compressing a wav or mpeg file with gzip. Doesn't work too well, becuase the data is "random"

      Mpeg don't compress well because they already compressed!

      try .zipping a .zip file while you're at it...

      --

      You can't take the sky from me...

    90. Re:how can this be? by Decimal · · Score: 2

      4. You can iterate it, to reduce any string down to 1 bit!

      Okay, so you've got this 4 byte file zipped down to 1 bit using this miracle compression algorithm. Let's try to decompress this file. Assume the result is "Kate". Now also compress "SuSe", "1234" and "Nick" into a 1 bit files using that same algorithm. Go ahead and decompress these. See the problem?

      Try looking at it from the ground-up. If the compressed data is 1 bit long, then you can decompress to 2 possible files. 2 bits, 4 possibilities. 3 bits, 8 possibilities. And so on. Keep in mind that there is information in what kind of compression system you are using. If I give you compressed data and ask you to decompress it, you're in a bit of a bind if you don't know what it was compressed with. GZip? RAR? LZW? Remember "One if by land, two if by sea?" Only 1 bit of information was sent, but the larger amount of information (land/_sea) was already with the people recieving the data.

      Now I'm assuming that you *could* compress every possible file to 1 bit + [file extention] using a different compression algorithm. But the number of compression algorithms you'd need to create doubles for every bit you add to the compressed file. And those compression algorithims would very likely all be larger than the original uncompressed file itself!

      Oh, and one more thing to think about: All the data on your hard drive can be considered one really large number. All your mp3s, documents, pr0n and operating system files etc. = n. And n changes every time the disk is written to.

      (Nope. I haven't read any compression FAQs. Flame me if I'm wrong. You know you want to.)

      --

      Remember "Bring 'em on"? *sigh
    91. Re:how can this be? by Anonymous Coward · · Score: 0

      INCorrect, everyone knows that Perl programs are truly random. Ever heard of "write-only" code? There is a secret oath that Perl programmers take to never write anything more than once. After all, if they have to write it more than once, why not have a computer program write it? They trick us with their regular expressions, which are anything but regular, and print nonsense just to mock the C programmers. DOWN WITH THE PERL PROGRAMMERS!!!! ehh... what were we talking about?

    92. Re:how can this be? by Anonymous Coward · · Score: 0

      Yes, I can't imagine how this would work. Did you read the announcement? They imply that the calculation required for even small bit strings is out of reach of current computers. To me, that raises the following question: In the course of developing this algorithm, does the algorithm in fact capture some of the data and reinsert that data in the decompression? In other words, is the compression/decompression engine "learning" the data set? If so, these guys could have fooled themselves big time. By the way, if you want to duplicate this feat, get yourself a super computer and run a good genetic algorithm program. It might rediscover these results given enough time.

    93. Re:how can this be? by Anonymous Coward · · Score: 0

      Maybe they're using some kind of math formula to represent data. like...

      pi(p,n) = file

      where pi = 3.14.......... is pseudo random
      p = start position
      n = a file size

      If you're patient enough to find the file in the number pi at position(p) and size(n) then you would only need two numbers to represent ANY file.

      I suppose you could use any irrational number to represent the file function.

      Like e(p,n) or sqrt[2](p,n)....on and on...
      interesting stuff...

      gzbo

    94. Re:how can this be? by zin · · Score: 1

      Actually you get a bit larger file :)

      # du random
      10281 random
      # du -m random
      11 random
      # gzip -9 random
      # du random.gz
      10283 random.gz

      --
      -ZiN-
    95. Re:how can this be? by wubo · · Score: 1
      However, thinking (n) dimensionally, patterns seem to emerge out of nothingness.

      for instance, take a fairly random string of text (i just smacked the keyboard)

      jdoenvifenmgghrurunsdjkjgfe

      breaking it down 1 dimensionally, (typical RLE encoding), some compression can be achieved.

      jdoenvifenm(gg)h(ruru)nsdjkgfe

      but things start to happen at 2 dimensions

      jdoen
      vifen
      mgghr
      uruns
      djkjg
      fe

      note the similar letters that are now adjacent vertically and horizontally

      jdo(en)
      vif(en)
      m(gg)hr
      uruns
      djkjg
      fe

      now imagine a three dimensional matrix of characters, where patterns can emmerge not only horizontally and vertically, but also through the z axis (coming out of the screen)

      i admit, a block of text this small is a horrible way to demonstrate compression, and most human generated information has alot more redundancy, but in my experience (up to three dimensions), amazing redunancy can be found. most MPEG video algoriths are 2 dimensional schemes that work on orderly blocks (though i havn't researched them in depth). i would assume that as you add dimensions, the reduncy should continue to increase. speed, however is a major consern. the complexity of the math required to search for patterns increases exponentially when dimensions are added, but decompression tends to be very quick.

      it's just a thought anyway, i could be completely off

      --
      WU'BO - A word with absolutely no meaning of any kind.
    96. Re:how can this be? by -brazil- · · Score: 1
      So if random numbers can be generated with a pattern then a pattern can be generated from
      random numbers.


      Wrong. Some seemingly random numbers can be generated, but that is not necessarily the case for all random numbers.

      --

      The illegal we do immediately. The unconstitutional takes a little longer.
      --Henry Kissinger

    97. Re:how can this be? by -brazil- · · Score: 1

      To be exact, MPEG files don't compress well because they contain little redundancy (because compression schemes work by finding and removeing redundancy), thus they look like random data.

      --

      The illegal we do immediately. The unconstitutional takes a little longer.
      --Henry Kissinger

    98. Re:how can this be? by Graabein · · Score: 1
      'C', 'R', 'A', 'P'. I wonder if anyone can improve on this ?

      Indeed: 'B', 'S'.

      --
      And remember kids: Never trust a computer you can actually lift.
    99. Re:how can this be? by Debillitatus · · Score: 2

      Unfortunately, this is a pretty lossy algorithm, because if you wanted to recreate the article from your compression, you couldn't. This is simply because just about anything in the New Scientist will be compressed, using your algorithm, to the same thing... heh.

      --

      Come on, give it up, that's

    100. Re:how can this be? by Platypii · · Score: 1

      yes, that is how things like real video work, however, those are lossy algorithms. The problem is that with lossless compression is still subject to the basic laws of information theory. It doesn't matter how much time it spends compressing it, if it shrinks some files, it MUST expand other files.

    101. Re:how can this be? by Anonymous Coward · · Score: 0

      >I, myself, using my own patented compression
      >technology - The Shannon-Transmogrificator (TM)
      >have managed to compress the entire Reuters
      >article to a mere 4 ASCII characters (!), with
      >essentially no loss in meaning:
      >'C', 'R', 'A', 'P'. I wonder if anyone can
      >improve on this ?

      How about 'S', 'C', 'M'? Could be either SCAM
      or SCUM, both apply equally well.

  4. 100:1 ? I don't think so... by Mr+Thinly+Sliced · · Score: 5, Insightful

    They claim 100:1 compression for random data. The thing is, if thats true, then lets say we have data A size (1000)

    compress(A) = B

    Now, B is 1/100th the size of A, right, but it too, is random, right (size 100).

    On we go:
    compress(B) = C (size is now 10)
    compress(C) = D (size 1).

    So everything compresses into 1 byte.

    Or am I missing something.

    Mr Thinly Sliced

    1. Re:100:1 ? I don't think so... by RareHeintz · · Score: 1, Redundant
      I think you've hit on one of a few arguments showing that their claim is bullshit.

      OK,
      - B

    2. Re:100:1 ? I don't think so... by oyenstikker · · Score: 5, Funny

      Maybe they'll be able to compress their debt to $1 when they go under.

      --
      The masses are the crack whores of religion.
    3. Re:100:1 ? I don't think so... by Xentax · · Score: 3, Informative

      No...the compressed data is almost certainly NOT random, so it couldn't be compressed the same way. It's also highly unlikely any other compression scheme could reduce it either.

      I'm very, very skeptical of 100:1 claims on "random" data -- it must either be large enough that even being random, there are lots of repeated sequences, or the test data is rigged.

      Or, of course, it could all be a big pile of BS designed to encourage some funding/publicity.

      Xentax

      --
      You shouldn't verb words.
    4. Re:100:1 ? I don't think so... by Anonymous Coward · · Score: 0

      But the output would not be random.
      Its information content would be higher

      Although I still think 100:1 compression is very unlikely.

    5. Re:100:1 ? I don't think so... by arkanes · · Score: 5, Insightful

      I suspect that when they say "random" data, they are using marketing-speak random, not math-speak random. Therefore, by 'random', they mean "data with lots of repetition like music or video files, which we'll CALL random because none of you copyright-infringing IP thieving pirates will know the difference"

    6. Re:100:1 ? I don't think so... by Rentar · · Score: 3, Interesting

      This is a proof ('though I doubt it is a scientificly correct one), that you can't get lossless compression with a constant compression factor! What they claim would be theroretically possible if 100:1 where an average, but I still don't think this is possible.

    7. Re:100:1 ? I don't think so... by Klaruz · · Score: 1

      I'm pretty sure they'd mean random raw data.

      What your saying is like saying gzip can compress random text x amount, compressing it, then expecting it to compress by x amount again. Once you've compressed the data, it's no longer random.

      That said, I'm pretty sure it's not real as well, or is something less than what they've said it is. It is possible than somebody came up with a completly new way of thinking about things though.

    8. Re:100:1 ? I don't think so... by MikeTheYak · · Score: 5, Insightful

      It goes beyond bullshit into the realm of humor:

      ZeoSync has developed the TunerAccelerator(TM) in conjunction with some traditional state-of-the-art compression methodologies. This work includes the advancement of Fractals, Wavelets, DCT, FFT, Subband Coding, and Acoustic Compression that utilizes synthetic instruments. These are methods that are derived from classical physics and statistical mechanics and quantum theory, and at the highest level, this mathematical breakthrough has enabled two classical scientific methods to be improved, Huffman Compression and Arithmetic Compression, both industry standards for the past fifty years.

      They just threw in a bunch of compression buzzwords without even bothering to check whether they have anything to do with lossless compression...

    9. Re:100:1 ? I don't think so... by Klaruz · · Score: 1

      Eeek scratch out my comment, it's too early, I'm dumb.

      TRUE random data would be much harder to compress than your typical text file full of slashdot troll stories.

    10. Re:100:1 ? I don't think so... by Sobrique · · Score: 1

      Full house!
      I just won buzzword Bingo!

    11. Re:100:1 ? I don't think so... by grazzy · · Score: 1

      err. this really doesnt have anything todo with anything. same argument should work for pkzip or any other compressiontechnique.

      A = 100 bytes.
      B = compress(A)

      B ~ 40% less data = 60 bytes.

      And you cant compress B anymore.

      Its natural to belive that they dont have developed a technique to compress 100 bytes to 1 byte. But if they have managed to compress a movie in avi of 6gb to 600M thats not a 'whoawhoawhao' thingy since divx already does this.

      But then again, divx doesnt compress random data..

      Conclusion, this compression wont help us broadcast movies or sounds, but it'll help with data that doesnt fit in those categories, soo god save us from the pirates.

    12. Re:100:1 ? I don't think so... by pointym5 · · Score: 1

      No, that's not right: read their press release. They explicitly claim to be able to compress the output of the compressor.

    13. Re:100:1 ? I don't think so... by Anonymous Coward · · Score: 0

      First of all 1000/100 = 10 (not 100)!
      and when you compress random data the result is no
      longer random. The possibilities for representing
      1000 bytes of data with a mere 10 bytes must be quite slim!

    14. Re:100:1 ? I don't think so... by White+Shade · · Score: 1

      A simple way of thinking about it is this:

      A compression algorithm makes a smaller number of bytes "equal" to a larger number of bytes.

      However, there are less smaller strings of bytes then there are larger strings of bytes.

      Consequently, Not all larger strings of bytes can be represented by smaller strings of bytes.

      Therefore, while it is possible to reduce MANY larger strings of bytes into extremely small strings of bytes (ie most data can compress very well), there are, logically and obviously, strings of bytes that cannot[in the realm of any given compression algorithm] which simply cannot ever be represented with a unique smaller series of bytes (some data compresses like crap.. ie already compressed data)..

      simple concept, i hope, it just takes a little bit of writing to explain

      the result of this is, then, exactly what you say: You can never get get a constant compression factor if you want it lossless!

      I hope this helps clear up any compression mysteries...

      --
      ìì!
    15. Re:100:1 ? I don't think so... by Anonymous Coward · · Score: 0

      No...the compressed data is almost certainly NOT random, so it couldn't be compressed the same way. It's also highly unlikely any other compression scheme could reduce it either.

      Any data stream can be considered "random" in that it has an equal probability of occurring as any other stream. I guess it depends on what the company means by random: Do they mean it can take any input and compress it 100-fold, or do they mean that the data must be sufficiently random, i.e. there is specifically not redundancy in the stream?

    16. Re:100:1 ? I don't think so... by Mr+Thinly+Sliced · · Score: 4, Funny

      Not only that, but I just hacked their site, and downloaded the entire source tree here it is:

      01101011

      Pop that baby in an executable shell script. Its a self extracting
      ./configure
      ./make
      ./make install

      Shh. Don't tell anyone.

      Mr Thinly Sliced

    17. Re:100:1 ? I don't think so... by Anonymous Coward · · Score: 0

      > St. George continued. "By significantly reducing
      > the size of data strings, we can envision
      > products that will reduce the cost of
      > communications and, more importantly, improve
      > the quality of life for people around the world
      > regardless of where they live."

      Cool! Letting people with 56k modems livestream HDTV-quality movies of lesbian buttlickin' will definitely improve my life!

    18. Re:100:1 ? I don't think so... by BlowChunx · · Score: 1

      Excuse my ignorance, but why does B have to be random? Assuming that your compression scheme imparted order to chaos to achieve 100:1 compression, it seems that B would be highly organized.

      Ever gzipped a gzipped file (force it if necessary)? It doesn't get noticeably smaller on each pass...

    19. Re:100:1 ? I don't think so... by jgore26785 · · Score: 1

      Now, B is 1/100th the size of A, right, but it too, is random, right (size 100).

      On we go:
      compress(B) = C (size is now 10)
      compress(C) = D (size 1).

      So everything compresses into 1 byte.

      Or am I missing something.


      There is nothing that implies the compressed file would be random as well.. or at least random data that would be treated the same by the decompression algorithm, as that would make what you say correct. In fact, the data would take on whole new meaning, being a description of patterns or formulae or what have you.

      Think of the random data present in a fractal.. however, the data used to generate a fractal is much smaller in size.. could be simply a formula and some coefficients. If they are doing something like that here, eventually your raw data size becomes smaller than the size of the formulae and coefficients and you either cannot compress any further or you no longer have lossless compression.

      Truly, you would never be able to compress down to one byte with a 'lossless' compression, and I'm sure their compression is 100:1 under certain optimal conditions.. after all, how much data is truly random?
    20. Re:100:1 ? I don't think so... by Xentax · · Score: 1

      I've already pointed out the potential oversight in this argument, but noone seems to have noticed it.

      Everyone seems to be assuming that "random data" is the WORST case, and that thus the compression ratio will be _at least_ 100:1 for any data, if it's 100:1 for "random" data.

      This is not necessarily the case. Think about an algorithm like quicksort. Average case, about nLog(n) (Average case would include quicksort on a random set). Worse case, however, is n^2 (For a reverse-sorted set, IIRC).

      So the result of the compression pass against the 1000-element data is NOT necessarily subject to the same or even ANY compression via the same algorithm(s).

      Think about it -- in even "truly" random data, there WILL be multiple instances of any given sequence, if the data is large enough. It's the 1000 monkeys/1000 typewriters/Works of Shakespeare thing.

      Such a stream is therefore compressible by most algorithms. But, given ANY compression algorithm, you can design a set of data that can NOT be compressed at all using that algorithm -- in fact, that would be the ideal output of the algorithm to begin with.

      A sequence like "AKJABKEAF" might be very compressible (characters appear more than once). But "ABCDEFGHIJK" might have zero compression under the same method (since every character is unique).

      "Random" is not "worst-case" performance, at least it's certainly not guaranteed to be.

      Xentax

      --
      You shouldn't verb words.
    21. Re:100:1 ? I don't think so... by larien · · Score: 2
      From their press release:
      Current technologies that enable the compression of data for transmission and storage are generally limited to compression ratios of ten-to-one. ZeoSync's Zero Space Tuner(TM) and BinaryAccelerator(TM) solutions, once fully developed, will offer compression ratios that are anticipated to approach the hundreds-to-one range
      What I read this to mean is that for some data sets, they anticipate 100:1 (or more) compression. For 'random' data, they will get some compression. Also note the 'once fully developed' phrase and the word 'anticipated'; they haven't actually achieved these results as yet; until they do, this is vapourware.

      BTW, someone shoot them for using so many TMs...

    22. Re:100:1 ? I don't think so... by cfulmer · · Score: 2

      Well, yeah. It's basic discrete math, the pigeon-hole principle. You can't map a large set into a smaller set without having some overlap. And, since you have overlap, then you won't be able to tell how to decompress your compressed data.

    23. Re:100:1 ? I don't think so... by Anonymous Coward · · Score: 0

      B isn't random data?

    24. Re:100:1 ? I don't think so... by Anonymous Coward · · Score: 0

      And divx is definitly not lossless...

    25. Re:100:1 ? I don't think so... by Anonymous Coward · · Score: 0

      Uh..so a random stream has an equal probability of occuring as the stream " the "?

    26. Re:100:1 ? I don't think so... by EboMike · · Score: 1

      err. this really doesnt have anything todo with anything. same argument should work for pkzip or any other compressiontechnique.

      Well, that's because PKZip does not state to be able to compress already compressed data. Ever tried to zip a JPEG oder MP3?

      I have failed to successfully conquer ZeoSync's massive fortress of buzzwords and marketing blurb, but it seems to me that this is what they claim.

      After all, "random" might also state "any" kind of data which includes data output by a compressor, and more importantly, it implies that the compression is lossless, and I have yet to see a lossless compressor with even 3% of the ratio they boast to offer.

      Conclusion, this compression wont help us broadcast movies or sounds, but it'll help with data that doesnt fit in those categories, soo god save us from the pirates.

      Funny, aren't movies and sounds what pirates usually are after?

    27. Re:100:1 ? I don't think so... by swillden · · Score: 4, Funny

      So everything compresses into 1 byte.

      Duh, are you like an idiot or something?

      When you send me a one-byte copy of, say, The Matrix, you also have to tell me how many times it was compressed so I know how many times to run the decompressor!

      So everything compresses to *two* bytes. Maybe even three bytes if something is compressed more than 256 times. That's only required for files whose initial size is more than 100^256, though, so two bytes should do it for most applications.

      Jeez, the quality of math and CS education has really gone down the tubes.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    28. Re:100:1 ? I don't think so... by Anonymous Coward · · Score: 0

      Twat. It was meant to be funny. You really think someone could believe it?

    29. Re:100:1 ? I don't think so... by MacroRex · · Score: 1

      Its natural to belive that they dont have developed a technique to compress 100 bytes to 1 byte. But if they have managed to compress a movie in avi of 6gb to 600M thats not a 'whoawhoawhao' thingy since divx already does this

      Um, no. DivX is not lossless, nor is any other of the MPEG-line codecs. Their idea is based on losing only the data you won't notice when you're watching the compressed movie.

      If they have discovered a way to compress the 6GB movie(or any arbitrary 6GB file) to a 60MB file which can be restored by decompression to the original 6GB file, then they have discovered something great indeed. Even though allowing for their term of 'random data', this still sounds too good to be true, which means it probably is.

    30. Re:100:1 ? I don't think so... by __4096 · · Score: 1

      All the data in the universe compresses into the number 42.

    31. Re:100:1 ? I don't think so... by CaseyB · · Score: 2
      "Random" is not "worst-case" performance, at least it's certainly not guaranteed to be.

      It does indeed represent the worst case. "Random data" in the context of data compression means "any data whatsoever", and an algorithm that "compresses random data" is implied to compress any data at better than 1:1, over the long run.

      Defining "random data" as "this particular set of random data" is just deceptive and misleading.

    32. Re:100:1 ? I don't think so... by Anonymous Coward · · Score: 0

      Mod this up! Heck, add it as a footnote to the article itself.

    33. Re:100:1 ? I don't think so... by Xentax · · Score: 1

      No...I'm not comparing probability of occurence.

      But, if you're talking about random 3-character sequences, "the" is as likely to be the result as any other, such as "and", "but", "thx", and so on.

      Such sequences WILL occur in a truly random data stream, if the stream is long enough. Or, I should say, as the stream length increases, the probability that a given sequence WON'T occur approaches zero. The chance that a given sequence will appear twice anywhere within it, approaches zero. And so on.

      Of course, achieving better than 1:1 compression with truly random data is hard, because while you get compression by finding repeated sequences, you lose some to overhead (you have to track what the repeated sequence is AND where each instance of it is located), and you still have to find a way to efficiently store the rest of the stream.

      Existing compression schemes have to balance compression/decompression speed vs. compression ratio -- most offer options to adjust, in fact.

      I suspect their technique takes an enormous amount of time to achieve a very high ratio -- but I'm still skeptical as to the nature of their "practically" random data.

      --
      You shouldn't verb words.
    34. Re:100:1 ? I don't think so... by Mr+Thinly+Sliced · · Score: 1

      Quick - you, me, IPO.
      Call me.

    35. Re:100:1 ? I don't think so... by Xentax · · Score: 1

      I disagree with "random data" being defined as "any data whatsover".

      I think "random data" means data generated in such a way that there is no predictibility in the sequence of symbols in the data generated, nothing more.

      It's not very interesting to say what a compression (or sorting, etc.) algorith can do over the long run -- you generally want to characterize the performance against "ideal" data, "average" data, and "worst-case" data. The exact nature of EACH of those three IS and must be dependent on the nature of the algorithm itself. However, A set of data that is generated from some sort of random (or pseudorandom) source is often a valid "average" case.

      Xentax

      --
      You shouldn't verb words.
    36. Re:100:1 ? I don't think so... by QuMa · · Score: 1

      Nope, wouldn't work either. The best you can get on average over all possible inputs is 1:1.

      Quite simple really. Given data of n bits. There are then n^2 possible data sets. So If you want to number all of them, each one will require an indentifier that can have n^2 possible values, also known as n bits. You can jostle the distribution around a bit, making the data pieces you expect to find encode to shorter numbers, but over all possible inputs, the best you can get is 1:1.

    37. Re:100:1 ? I don't think so... by pmc · · Score: 4, Funny

      Duh, are you like an idiot or something?

      You're the moron, moron. When you get the one byte compressed file, you run the decompressor once to get the number of additional times to run the decompressor.

      What are they teaching the kids today? Shannon-shmannon nonsense, no doubt. They should be doing useful things, like Marketing and Management Science. There's no point in being able to count if you don't have any money.

    38. Re:100:1 ? I don't think so... by Bandman · · Score: 5, Funny

      I get the idea that this part of the algorithm is perfected by them...its the decompresser that's giving them fits...

      Step 1: Steal Underpants
      Step 3: Profit!

      We're still working on step 2

    39. Re:100:1 ? I don't think so... by drford · · Score: 1

      The proof it can't be done is even simpler.

      Say to you have bitstream 100 bits in length. That means there's about 2^100 different bitstreams. Now, say you can compress these bitstreams a factor of 100 (two powers of ten). That means you have 2^98 different possible compressed bitstreams. But, the compression is lossless, so you must have a 1 to 1 mapping of unique compressed streams to unique uncompressed streams.

      Uhoh :)

    40. Re:100:1 ? I don't think so... by Mr+Thinly+Sliced · · Score: 1

      Whats phase 2?
      Hehe totally with you man.

      Mr Thinly Sliced

    41. Re:100:1 ? I don't think so... by Anonymous Coward · · Score: 0

      A sequence like "AKJABKEAF" might be very compressible (characters appear more than once). But "ABCDEFGHIJK" might have zero compression under the same method (since every character is unique).

      However, if you take the ASCII representation of "ABCDEFGHIJK" you get something like "0001000001 0001000010 0001000011 0001000100 0001000101 0001000110 0001000111 0001001000 0001001001 0001001010 0001001011". There you go, lots of "000" or "11" or "101" sequences just begging for compression.

      Maybe they're employing algorithms to change the (random) bit stream to something that's compressable.

    42. Re:100:1 ? I don't think so... by Anarchofascist · · Score: 2, Interesting
      "When you send me a one-byte copy of, say, The Matrix, you also have to tell me how many times it was compressed so I know how many times to run the decompressor!"

      Not true! You don't need an extra byte for the number of times the compression has been run, as long as you compress files that are no larger than a certain size.

      If each pass reduces the size by two orders of magnitude, then 256 compressions will compress down by a factor of (on average) 10^512 = one hundred million billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion times. That's enough to compress a 1024 x 768 movie (at 50 fps and 24 bit colour) into a single byte, as long as the movie runs for less than fifty five billion eight hundred million billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion billion times the current age of the UNIVERSE.

      Therefore, I should easily be able to compress The Matrix into a single byte with 256 passes.

      I don't need to encode the number of compressions, every decompression consists of decompressing 256 times.

      --
      Once more unto the breach, dear friends, once more, Or close the wall up with our American dead!
    43. Re:100:1 ? I don't think so... by Xentax · · Score: 1

      Ahh, but the collection of all bitstreams of length 100 isn't a very likely random set, is it?

      A data set of equal length is more likely to have one of the 100-bit sequences in it twice than it is to have each one exactly once.

      This is just another way of proving that the random case is NOT the worst case; at least, if we accept their claim that they can do 100:1 on random data, and we have your proof that the worst case MUST be less than 100:1, then it follows that the worst case is non-random data.

      Worst case for any lossless compression scheme is 1:1 (or worse, rest assured), as you've demonstrated.

      Xentax

      --
      You shouldn't verb words.
    44. Re:100:1 ? I don't think so... by SnapShot · · Score: 1

      I've thought about this before, but I don't have the math background to know whether it's possible: can't the various techniques mentioned be used to convert the finite length data (remember we're always dealing with finite length data; even if it's a stream it can be broken into finite chunks) into a form that happens to have patterns that Huffman can take advantage of?

      On the other hand, the 100:1 compression down to 1 bit argument seems to conclude that the press release is BS...

      --
      Waltz, nymph, for quick jigs vex Bud.
    45. Re:100:1 ? I don't think so... by fredrik70 · · Score: 1

      Not that I'm an expert on compression whatsoever, but what the hell does the 'quantum theory' bit has to do with compression????? hohum

      --
      if (!signature) { throw std::runtime_error("No sig!"); }
    46. Re:100:1 ? I don't think so... by guile*fr · · Score: 1

      so either their compressor isnt optimal...
      or they can repetitively compress data stream until they reach 0 bits,proof of bullshit... but we knew that already...

    47. Re:100:1 ? I don't think so... by pbryan · · Score: 2

      On we go:
      compress(B) = C (size is now 10)
      compress(C) = D (size 1).
      So everything compresses into 1 byte.


      The press release failed to indicate that their new compression algorithm "brings order from chaos", a feature that I first recognized in the motion picture "Big Trouble in Little China".

      Conservatively assuming a 10:1 compression result in both their algorithm and more common compression algorithms, in order to achieve your one-byte result, you need to achieve it in a slightly different manner...

      randomCompress(A) = B (size is now 100, but less random, therefore less compressable by randomCompress than A.

      normalCompress(B) = C (size is now 10, but more random, therefore more suitable for randomCompress function. In the next iteration.

      randomCompress(C) = D (size is now 1, but no longer exhibits randomness nor pattern, and therfore is no longer reducable), unless extremelyLossyNonDeterministicCompress function is used, which allows this one last bit to be reduced to zero bytes, but which results in a 50:50 chance of being undecompressable.

      Another feature of their algorithm that was not mentioned was its ability to remove uncertainty from other volitile complex systems such as stock markets, and badly/over-managed economies.

      I predict this new algorithm will revolutionize the gambling industry when it is discovered that practically random events can be de-entrophized, allowing more deterministic behavior, and thus unprecidented gambling profits to result.

      --

      My car gets 40 rods to the hogshead, and that's the way I likes it!

    48. Re:100:1 ? I don't think so... by SnapShot · · Score: 1
      That's true. However, there is the ability to store data externally in the real world applications.

      Let's assume, for the sake of argument, that for ANY 1MB file there is an algorithm that can compress it to roughly 1/100th of it's original size. You do some research and find/build 256 algorithms to compress 1MB files. Your compressed file contains an additional 1 byte header explaining which algorithim was used.

      In other words, there can be an overlap of 256 small files to the 1MB uncompressed equivalent, but you can recreate the correct uncompressed file using your header.

      This is all theoretical, of course.... And, once again, I am not a mathemetician though I play one on TV. ;)

      --
      Waltz, nymph, for quick jigs vex Bud.
    49. Re:100:1 ? I don't think so... by killmenow · · Score: 1

      So everything compresses to *two* bytes.
      If they can compress random data 100:1, why would they need two bytes? Can't they compress an eight-bit byte into a single bit? I would think at most they would need two bits...which is about what their "breakthrough" is worth.
    50. Re:100:1 ? I don't think so... by drsquare · · Score: 0

      Exactly. B would be even more organised, and thus even EASIER to compress, so you should get an even HIGHER compression rate on B. Unless of course this magical 100:1 ratio only works on 'random' data containing many patterns etc.

    51. Re:100:1 ? I don't think so... by Milalwi · · Score: 1
      I am certainly no data compression expert... however:

      Now, B is 1/100th the size of A, right, but it too, is random, right (size 100).

      No, B is not random. That is why you cannot continue this infinitely.

      Milalwi
    52. Re:100:1 ? I don't think so... by Milalwi · · Score: 1
      I am certainly no data compression expert... however:


      Now, B is 1/100th the size of A, right, but it too, is random, right (size 100).

      No, B is not random. That is why you cannot continue this infinitely.

      Milalwi
    53. Re:100:1 ? I don't think so... by micromoog · · Score: 2
      Sorry, not true. The average is better than 1:1.

      Imagine that we use gzip to attempt to compress all possible files. If it gets smaller, we keep it. If not, we keep the original.

      Overall, some set of files will get smaller. The rest will stay the same. Therefore, we end up with better than 1:1 over the set of all possible files.

    54. Re:100:1 ? I don't think so... by ernst_mulder · · Score: 1

      A has lesser entropy than B now! This is something people forget. In fact B is more random than A!

      100% random data has the highest entropy and is impossible to compress lossless. The data they claim to compress they call "semi random" which must mean it's not random at all.

      >But if they have managed to compress a movie in avi of 6gb to 600M thats not a
      >'whoawhoawhao' thingy since divx already does this.

      However, they claim it's lossless.

    55. Re:100:1 ? I don't think so... by Anonymous Coward · · Score: 0

      on the press release they carefully use the term "practically random." So you are right, they aren't talking about actually random.

    56. Re:100:1 ? I don't think so... by yeOldeSkeptic · · Score: 1

      If we follow your argument, then there cannot be a 2:1 compression
      algorithm either. By successively using the 2:1 algorithm on any
      data, you can reduce it to 1 byte too. The fact is, there is a
      point at which further processing on data will not reduce it
      further. Whenever a file F is compressed, a compressing algorithm
      A looks for patterns in the file and builds a table T from which
      it hopes to rebuild the file later. Let the compressed file be F'.
      Then A(F) -> F' + T. (Applying A on F yields the compressed file F'
      and the table T). If we apply A on F' again then A(F') -> F'' + T'.
      We were able to reduce F' to F'' at the expense of adding another
      table. F = F'' + T' + T. Here is the catch: As the size of F''
      decreases, the size of table T' increases. That's because F'' loses
      information which must be recovered. Information on how to recover
      that information is stored in T'. Think about this, if I cut
      a picture in half and send you one piece, there is no way you
      can recover the other half unless I give you clues on what the
      other half is. The clue can be as simple as `mirror image' or
      as complicated as a 10,000 word description. This clue is
      encoded in the table T. If by a compression ratio of 100:1
      they mean the ratio F/F' then this is really possible.
      Even a 1000:1 ratio of F/F' is possible.
      But F/(F' + T) = 100/1? I doubt it, unless F is a
      special case. Here's a compressed fictional anecdote I heard
      that every budding information scientist should know. A
      new student was puzzled by the behavior of his classmates.
      One student said a number, ``53'' and the rest
      of the class suddenly erupted in laughter. Another student then
      said ``94'' and the same thing happened. Unable to stand it
      any longer he asked his classmate what is happening. ``Well,''
      his classmate answered. ``Everybody here has known everybody
      for so long and the same jokes were being told over and over
      again that we decided to use information theory and assigned
      numbers to each joke. Thus, instead of saying the entire joke,
      we would just say the numbers and since everyone has the joke
      memorized, we would all remember it and laugh at it.'' The
      new student understood. He stood up and said to the entire
      class ``69.'' No laughter happened, just dead silence.
      The other students looked at him, shrugged their shoulders
      and turned away. They don't seem amused.
      Surprised at the response, he asked his classmate, ``What did
      I say wrong?'' His classmate answered. ``That wasn't a good joke.''

    57. Re:100:1 ? I don't think so... by stilwebm · · Score: 2

      Two things I noticed: they use the term "practical random" which I presume means much less than perfect random, such as a photograph. Also, they mention they have only applied the compression to small strings.

      For all we know, this chip could have a few registers that just mark which file it is compressing so they can spit out a single byte representation of a 100 byte file.

      Does anyone else think this site is catering towards corprate and private venture capitalists more than anything?

    58. Re:100:1 ? I don't think so... by Happy+Monkey · · Score: 3, Informative

      You then need to add one bit of data to tell whether you've compressed it or not.

      --
      __
      Do ya feel happy-go-lucky, punk?
    59. Re:100:1 ? I don't think so... by Anonymous Coward · · Score: 0

      I think your brain has been randomised by paying too much attention to your algorithms class.

    60. Re:100:1 ? I don't think so... by Archanagor · · Score: 2, Funny

      You know,

      If you just remove the flashy buzzwords. Their press release compresses ~100:1

      Here's the result:

      Bullshit.

    61. Re:100:1 ? I don't think so... by leuk_he · · Score: 1

      I will be number 100 to say this but:

      -Compressing to 1 byte is not the problem , decompressing to the full size is.
      -If the target is 1 byte, why not set it for 1 bit?
      -The number of times to reach this bytes must mbe told.
      -They never say 100:1 is reached in one of your cycles.
      -There was a time restriction to be overcome. It is surely no fun (No fun at all...) do decompress a byte a few milion times to get to your 1MB word document.
      -read exactly what they claim. It is all in the language used.

    62. Re:100:1 ? I don't think so... by SuperguyA1 · · Score: 2

      Interesting. Most compression algorithms rely on certain patterns(generally repetition) within the data. I suppose if they are claiming 100:1 on any data then yes, but if this turns out NOT to be a hoax(not likely) then I'd bet the algorithm will only run on a data set once.

      --
      "as plurdled gabbleblotchits on a lurgid bee" - Prostetnic Vogon Jeltz. (One man's humorous is another mans flamebait)
    63. Re:100:1 ? I don't think so... by Anonymous Coward · · Score: 0

      Maybe they'll be able to compress their debt to $1 when they go under.


      I've only heard of one algorithm for compressing debt this much, "Chapter 7". I heard it's pretty lossy. :)

    64. Re:100:1 ? I don't think so... by Rentar · · Score: 2
      Nope, wouldn't work either. The best you can get on average over all possible inputs is 1:1.

      Of course. But noone is actually likely to work with a significant perfentage of all possible inputs. What I want to say, that each usefull data, that is not already compressed is less than random (otherwise it wouldn't be useless). The really interesting average is that over the average /home/foo

    65. Re:100:1 ? I don't think so... by Anonymous Coward · · Score: 0

      >>> Whats phase 2?

      Steal bra, obviously...

      Just try to do it without compressing the DD's down to A's :-)

    66. Re:100:1 ? I don't think so... by KingKire64 · · Score: 1

      WOW this explains how foster beer got the Keg of beer to fit in a beer can!!

      --
      "All I can tell the "lesser of two evils" folks is that if they keep voting for evil, they'll keep getting evil."-Lp.org
    67. Re:100:1 ? I don't think so... by Sklivvz · · Score: 1

      It doesn't work like that. When you compress a bitstream you always take something out, even in lossless compression algorhythms.

      The information you are taking out is actually "stored" in the decoding algorhythm, i.e. you need to know how to decode the compressed bitstream to get back to the original data.

      Basically you need to devise an algorhythm which stores information which is common to most "likely" files, so you can take away a lot of it from the original bitstream.

      Of course your algorhythm will not work for all bitstreams, therefore you will always need to simply store some bitstreams.

      If (and it's a BIG if) those people found a very good algorhythm, then, yes they can achieve 100:1 compression (and even recompress their own files). Of course, you need to take their statements with a bit of salt. You will probably get 100:1 on some specific non bullshit case, and much less in the general case.

      I've looked up the technical info on their site and it doesn't look like bloatware to me.

    68. Re:100:1 ? I don't think so... by rgmoore · · Score: 1

      The problem is that to compress any 1 MB file (by which we mean any file, even one filled up with random bits) to 1/100 of its original size you'll need a lot more than 256 algorithms. In fact, you'll need zillions of them, enough that the size of the header describing which algorithm you're using will take up all of the compression savings. That's the point of the argument.

      Look at it this way; there are 2E8000000 possible files 1 million bytes in length and only 2E80000 possible files 1/100 that long. That means that you'll need 2E7920000 different compression algorithms to make your system work, and you'll need a header almost 1MB long to tell them apart. Your savings suddenly disappear.

      --

      There's no point in questioning authority if you aren't going to listen to the answers.

    69. Re:100:1 ? I don't think so... by Progman · · Score: 1

      How can we trust an explanation by someone who can't spell algorithm? If you can't spell it chances are you're not familiar enough with the word.

    70. Re:100:1 ? I don't think so... by ErikZ · · Score: 1


      Sounds like fractal compression to me.

      --
      Democrats or Republicans. They are both taking us to the same place and they are not afraid of us anymore.
    71. Re:100:1 ? I don't think so... by SnapShot · · Score: 1

      I take that this applies to any arbitrary compression ratio; if you were only trying to get 10:1 compression, you'd need fewer algorithms, but it would still be enough that the header to choose which one would eat up any savings. Thank your for the nice explaination.

      --
      Waltz, nymph, for quick jigs vex Bud.
    72. Re:100:1 ? I don't think so... by biobogonics · · Score: 3, Insightful

      I suspect that when they say "random" data, they are using marketing-speak random, not math-speak random. Therefore, by 'random', they mean "data with lots of repetition like music or video files, which we'll CALL random because none of you copyright-infringing IP thieving pirates will know the difference"


      Actually, if you change the domain you can get what appears to be impressive compression. Consider a bitmapped picture of a child's line drawing of a house. Replace that by a description of the drawing commands. Of course you have not violated Shannon's theorem because the amount of information in the original drawing is actually low.

      At one time commercial codes were common. They were not used for secrecy, but to transmit large amounts of information when telegrams were charged by the word. The recipient looked up the code number in his codebook and reconstructed a lengthy message: "Don't buy widgets from this bozo. He does not know what he is doing."

      If you have a restricted set of outputs that appear to be random but are not, ie white noise sample #1, white noise sample #2 ... all you need to do is send 1, 2... and voila!

    73. Re:100:1 ? I don't think so... by grytpype · · Score: 3, Funny

      I just ran another compression pass on that, and i got:

      BS

      --

      - Have a picture

    74. Re:100:1 ? I don't think so... by tibbetts · · Score: 1

      Just about every dot-com these days can boast a 100:1 compression ratio. Unfortunately, the "random data" that they're compressing is usually their share price.

      --
      :wq
    75. Re:100:1 ? I don't think so... by Sklivvz · · Score: 1

      Sorry I'm not a native English speaker... but I know my Computer Science!

      :-P

    76. Re:100:1 ? I don't think so... by swillden · · Score: 3, Funny

      I don't need to encode the number of compressions, every decompression consists of decompressing 256 times.

      I think you mean at most 256 times. Supposing I had to perform 10 compressions to compress to a singe byte. After you had decompressed 10 times, you'd have the data. the next decompression would make some other file 100 times larger than the Matrix. So if you could recognize the correct file when you saw it, I could avoid transmitting the decompression count.

      So, I just have to prepend a string saying "This is it!" before compressing!

      Also, it occurred to me after my previous posting (and to another poster, I saw) that if we can compress to a single byte, why not to a single bit? This is a great advance, which I believe I shall patent quickly before that other poster does, because now I can give you my copy of The Matrix over the phone! I can just tell you if it's a 1 or 0. For that matter, I don't even have to tell you -- you can just try both possibilities!

      So my question now is, does the decompressor only produce strings of bits that exist somewhere and were once compressed, or does it produce anything? Can I just think "I want a great term paper..." and then try decompressing both 1 and 0 until I get it (in no more than 8 or ten iterations of the decompressor, 'cause I want a paper, not a novel).

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    77. Re:100:1 ? I don't think so... by Trepalium · · Score: 1

      I think you just hit on the difference between what random data means in the computer research arena, and what it means in press releases. Whereas we would take random data to mean something that is unpredictable, and non-repeating, the press release language may just mean random file types.

      --
      I used up all my sick days, so I'm calling in dead.
    78. Re:100:1 ? I don't think so... by zhensel · · Score: 3, Funny

      Quantum theory has everything to do with compression. Inside sources have revealed that this compression scheme works on the uncertainty principles key to quantum physics. You see, any strinng of 100 bits has a distinct probability of being compressable to a single bit. Of course, this means that this compression scheme will produce bogus results 99.999999% of the time, but think of the wonder of compression realized the other .000001% of the time! Furthermore, the system requirements for their technology are as follows: x86 PC running WindowsXP (to take advantage of DirectX in wickedly rendering the fractals neccessary for the compression), a particle accelerator, and a heavy dose of optimism combined with a complete lack of skepticism.

    79. Re:100:1 ? I don't think so... by Anonymous Coward · · Score: 0

      Wow - it can't compress bullshit very well. That's odd because compression theory says you can 'pure' information can't be compressed. I would have thought that the natural asymptote for this press release would be

      '....'

    80. Re:100:1 ? I don't think so... by curunir · · Score: 2, Funny

      Therefore, I should easily be able to compress The Matrix into a single byte with 256 passes.

      I'm not so sure about that...It takes a lot of bytes to represent our entire society (in 1999, at least). The AI for Hugo Weaving's character must have been a couple of gigs of code at least.

      However, if you want to compress the movie "The Matrix" into a single byte...here goes:
      <breathy_keanu_voice>Whoah...</breathy_ke anu_voice> (soundByte® compression...far from lossless compression, but this is as close as anyone will ever come to one byte compression).

      --
      "Don't blame me, I voted for Kodos!"
    81. Re:100:1 ? I don't think so... by Syberghost · · Score: 2

      Nah, they're using a table based on the Library of Congress.

      So it compresses 100 to 1, but the decompressor program is a hundred terabytes...

    82. Re:100:1 ? I don't think so... by SuperguyA1 · · Score: 2

      LOL the encoding is based on the dewey decimal system.

      --
      "as plurdled gabbleblotchits on a lurgid bee" - Prostetnic Vogon Jeltz. (One man's humorous is another mans flamebait)
    83. Re:100:1 ? I don't think so... by phkhd · · Score: 1
      > Or am I missing something.

      Possibly, I don't know the details of the compression technique, but if it requires random, or nearly random data, you won't be able to do significant recursion since the compression technique will more than likely impose order on the output data.

      Just a thought.

    84. Re:100:1 ? I don't think so... by Anonymous Coward · · Score: 0

      I think by compressed, they mean "we printed the source on a small piece of film, which we then placed on a large sponge. We then compressed by a 100:1 ratio"

    85. Re:100:1 ? I don't think so... by Anonymous Coward · · Score: 0

      Please people, their compression isnt looking for
      patterns, read the TEXT carefully!

    86. Re:100:1 ? I don't think so... by gnovos · · Score: 2

      Where is what you really end up with by compressing a 1 meg file using "Super compression":

      o One bit of "compressed data".
      o One number tell you how many times to run the decompressing program.

      Ok, sounds good so far, right? Well you are missing a little something here, which is that this is not some abstract model, that "number" you sent is going to be made up of bits, and, in the end, it will be a number larger than the 1 meg file you started with more often than it is not!

      Now, what you COULD do is send a virtual-number in two bits. The way you do this is you send one bit of information, and then you wait. when the number of milliseconds equals the number that you wish to send, then you send the second bit. In fact, you don't even really need that "first" bit of compressed data, all you need is the vitural number. Now you have successfully sent one meg of information in two bits, right? Yeeeesish, but you have also spent a week just doing nothing while that number was being created, so you end up taking more time that it would have taken in the first place.

      --
      "Your superior intellect is no match for our puny weapons!"
    87. Re:100:1 ? I don't think so... by darksaber · · Score: 1

      Well, I just submitted them to fuckedcompany :)

    88. Re:100:1 ? I don't think so... by Don+Sample · · Score: 1

      Maybe it only works for random data (however they choose to define that.) You compress 1000 random bytes, you get 10 non-random ones. (by their definition) Try to compress those and you end up back at 1000 random bytes.

      You've got a compresser that can only compress noise. Not actual information. (Could be useful...might make it possible to keep up with everthing written on /.)

    89. Re:100:1 ? I don't think so... by No+One · · Score: 1

      Compression requires more than organization. In order for most compression techniques to be effective, they also require that large terms be repeated throughout the file. For example, classically the way compression works is roughly as follows:

      My file is: vbxnm tuteiruy tuteiruy abcdhjik vbxnm fsdads abcdhjik fgjsda abcdhjik vbxnm tuteiruy abcdhjik fsdads tuteiruy abcdhjik abcdhjik fgjsda vbxnm tuteiruy vbxnm tuteiruy tuteiruy abcdhjik vbxnm fsdads abcdhjik fgjsda abcdhjik vbxnm tuteiruy abcdhjik fsdads tuteiruy abcdhjik abcdhjik fgjsda vbxnm tuteiruy vbxnm tuteiruy tuteiruy abcdhjik vbxnm fsdads abcdhjik fgjsda abcdhjik vbxnm tuteiruy abcdhjik fsdads tuteiruy abcdhjik abcdhjik fgjsda vbxnm tuteiruy

      For a 450 or so byte file. Compression would take the repeated terms, replace them with symbols, and create a file which could be used to translate the symbols back into the original file.

      The compressed file might look like:
      [KEY]0=vbxnm 1=tuteiruy 2=abcdhjik 3=fsdads 4=fgjsda [DATA]01120324201231224020112032420123122402011203 2420123122402

      Which would be 117 bytes. The file is very organized, but can't be really be compressed any more.

      --

      There is no sin except stupidity -- Oscar Wilde
    90. Re:100:1 ? I don't think so... by Anonymous Coward · · Score: 0

      what about that idea?

      only use femto or pico seconds to do it...milli is just to slow... how fast can you switch?
      also take advantage of current frequency shifting/multiplexing, running both data streams, and time-data concurrently?

      Maybe use a combo of the two, send 10% of the data and the decoder, timed to match the other 90% in femtoseconds?

    91. Re:100:1 ? I don't think so... by gnovos · · Score: 2

      only use femto or pico seconds to do it...milli is just to slow... how fast can you switch?
      also take advantage of current frequency shifting/multiplexing, running both data streams, and time-data concurrently?

      Maybe use a combo of the two, send 10% of the data and the decoder, timed to match the other 90% in femtoseconds?

      Well, that's the rub... if your clock could actually measure in such tiny increments accurately then the bit-rate of the line would go up, thus negating the need for compression.

      --
      "Your superior intellect is no match for our puny weapons!"
    92. Re:100:1 ? I don't think so... by matrix29 · · Score: 1

      I get the idea that this part of the algorithm is perfected by them...its the decompresser that's giving them fits...

      Step 1: Steal Underpants
      Step 3: Profit!

      We're still working on step 2


      Step 2: Take Pictures in erotic poses and sell them

      --
      "Face it, a nation that maintains a 72% approval rating on George W. Bush is a nation with a very loose grip on reality.
    93. Re:100:1 ? I don't think so... by Anonymous Coward · · Score: 0

      You guys are hilarious. The only way to answer that press announcement is with humor. It's so re dic u louse

    94. Re:100:1 ? I don't think so... by justin.warren · · Score: 2
      Now, B is 1/100th the size of A, right, but it too, is random, right (size 100).

      Not so. In a compression scheme, the compressed data is more organised than the original since it contains the information required to re-assemble the original from the compressed version. Thus the 1/100th size file is less random than the original and would not compress as well.

      The same thing happens if you gzip a gzip-ed file.

      --
      Just because you're paranoid doesn't mean they're NOT after you.
    95. Re:100:1 ? I don't think so... by Ralp · · Score: 1

      Wait, what if the file that I want to send you is the data that you get when you decompress the string "This is it!"?

    96. Re:100:1 ? I don't think so... by Fjord · · Score: 2

      I ran another pass. The result:

      $

      --
      -no broken link
    97. Re:100:1 ? I don't think so... by Anonymous Coward · · Score: 0

      i heard that joke (new guy in an unfaniliar pub in ireland) and the punch line was "some people just dont know how to tell a joke...."

    98. Re:100:1 ? I don't think so... by amattie · · Score: 1

      Seeing as how the DMCA is cracking down on reverse engineering, would it be illegal to give out binary digits over the phone like that?

    99. Re:100:1 ? I don't think so... by AME · · Score: 1
      In case there was any question of whether I have too much time on my hands...

      gzip -9 compressed your 117 bytes down to 111. And the original file (452 bytes) down to only 93 bytes.

      --
      "I have a good idea why it's hard to verify programs. They're usually wrong." --Manuel Blum, FOCS 94
    100. Re:100:1 ? I don't think so... by Compact+Dick · · Score: 1


      > Step 2: Take Pictures in erotic poses and sell them

      Let's hope Steve Ballmer isn't involved in this.

    101. Re:100:1 ? I don't think so... by No+One · · Score: 1

      That would be because my compression algorithm sucks. :)

      --

      There is no sin except stupidity -- Oscar Wilde
    102. Re:100:1 ? I don't think so... by bcaulf · · Score: 1

      To complete your argument, the set of strings (files) of any particular length that is compressible using gzip is much much smaller than the set of strings of that length that would just get larger going through gzip, by the same pigeonhole argument that everyone else is repeating here. So that one extra bit would kill you.

    103. Re:100:1 ? I don't think so... by DragonBlade · · Score: 1

      Doesn't compression work on a bell curve meaning that the further you compress something the less effective the compression works? To illustrate my case instead of: Assuming 10:1 compression 1000 (A) --> 100 (B) --> 10(C) --> 1(D)

    104. Re:100:1 ? I don't think so... by DragonBlade · · Score: 1

      Continually compression until you reach 1 byte doesn't work. Correct me if I'm wrong but the effectiveness of file compression reduces after each compression (the exact percentage I don't know). To illustrate my point this is what might happen assuming a file with a size of 1000 whatever using the 10:1 compression ratio:
      Uncompressed: 1000
      1st Compression: 100
      2nd Compression: 25
      3rd Compression: 20
      4th Compression: 19
      and so on until you get no benefit from the compression. So even with such an incredible compression ratio of 10:1 it would still be less effective after each time you compress the file hence a filesize of 1 byte should technically be impossible. Of course you could just compress a a really small file but i'm sure the comrpessed file would have some kind of needed overhead thus actually increasing the size of the file (try zipping a 1 byte file and you'll notice the size increase).

    105. Re:100:1 ? I don't think so... by DragonBlade · · Score: 1

      Sorry for replying to my own post and posting something OT but I just noticed that this has already been discussed. Just goes to show how reading the other posts first, regardless of how many they are, help in preventing redudancy :)

  5. Conserve Bandwidth? by Atzanteol · · Score: 2, Funny

    Maybe they just needed more bandwidth for their terrible site?

    --
    "Ignorance more frequently begets confidence than does knowledge"

    - Charles Darwin
    1. Re:Conserve Bandwidth? by Anonymous Coward · · Score: 0

      Why do sites insist on using Flash anyway? I tried to get by as long as possible without it but browsing many sites brings up a constant prompt to install Flash. In IE there doesn't seem to be any "never ask me this question again" setting for not downloading the flash plugin. 99% of the sites that use Flash could have just as easily presented their information as text and a few simple gifs or png images but instead they choose to make this huge animated monstrosity that takes 2 minutes to download over broadband. Now with all the flash animated ads I'm REALLY trying to avoid using it altogether. Have you guys seen these god damn ads that swoop down and fly across the page you're trying to read and block the content? Is that supposed to be cute? The Internet is really going to hell.

    2. Re:Conserve Bandwidth? by Anonymous Coward · · Score: 0

      A vector diagram in flash is much smaller than a gif, it prints properly and it will scale to the window size. Thouse all all very good reasons to use flash in place of gif or png or jpg for many types of images.

      however noone on the web does....

  6. Time for a new law of information theory? by Anonymous Coward · · Score: 5, Funny

    The odds on a compression claim turning out to be true are always identical to the compression ratio claimed?

    1. Re:Time for a new law of information theory? by Anonymous Coward · · Score: 0
      The odds on a compression claim turning out to be true are always identical to the compression ratio claimed?
      If that's true, then I can make a ton of money in Vegas. I'll just go to my favorite casino there and have them offer me 1 to 1 odds (even money) on the proposition that I find a 1 to 1 compression algorithm. I put down $10,000 on the table, win the bet by presenting the rot13 compression algorithm, and walk out with $20,000. Boy, I hope that theory is true. Unfortunately I put the $10,000 I was going to take to Vegas into Enron stock a year ago, so my master plan's only going to get me enough to buy me lunch at McDonalds. :)
    2. Re:Time for a new law of information theory? by MindStalker · · Score: 1

      1 to 2 odd is 50 50 1 to 1 odd means you just get your money back.

    3. Re:Time for a new law of information theory? by DrSpin · · Score: 1
      I think you will find that its

      odds = 2^(compression ratio claimed):1

      YMMV

      [I recommend 100% lossy compression for Perl programs]

    4. Re:Time for a new law of information theory? by jdavidb · · Score: 2

      I hereby announce my new rot13 compression method which achieves a 1:1 compression ratio! And as an added bonus, it is legally unbreakable encryption under the DCMA!

  7. Tech details from the crappy Flash-only website by bleeeeck · · Score: 5, Informative
    ZeoSynch's Technical Process: The Pigeonhole Principle and Data Encoding Dr. Claude Shannon's dissertation on Information Theory in 1948 and his following work on run-length encoding confidently established the understanding that compression technologies are "all" predisposed to limitation. With this foundation behind us we can conclude that the effort to accelerate the transmission of information past the permutation load capacity of the binary system, and past the naturally occurring singular-bit-variances of nature can not be accomplished through compression. Rather, this problem can only be successfully resolved through the solution of what is commonly understood within the mathematical community as the "Pigeonhole Principle."

    Given a number of pigeons within a sealed room that has a single hole, and which allows only one pigeon at a time to escape the room, how many unique markers are required to individually mark all of the pigeons as each escapes, one pigeon at a time?

    After some time a person will reasonably conclude that:
    "One unique marker is required for each pigeon that flies through the hole, if there are one hundred pigeons in the group then the answer is one hundred markers". In our three dimensional world we can visualize an example. If we were to take a three-dimensional cube and collapse it into a two-dimensional edge, and then again reduce it into a one-dimensional point, and believe that we are going to successfully recover either the square or cube from the single edge, we would be sorely mistaken.

    This three-dimensional world limitation can however be resolved in higher dimensional space. In higher, multi-dimensional projective theory, it is possible to create string nodes that describe significant components of simultaneously identically yet different mathematical entities. Within this space it is possible and is not a theoretical impossibility to create a point that is simultaneously a square and also a cube. In our example all three substantially exist as unique entities yet are linked together. This simultaneous yet differentiated occurrence is the foundation of ZeoSync's Relational Differentiation Encoding(TM) (RDE(TM)) technology. This proprietary methodology is capable of intentionally introducing a multi-dimensional patterning so that the nodes of a target binary string simultaneously and/or substantially occupy the space of a Low Kolmogorov Complexity construct. The difference between these occurrences is so small that we will have for all intents and purposes successfully encoded lossley universal compression. The limitation to this Pigeonhole Principle circumvention is that the multi-dimensional space can never be super saturated, and that all of the pigeons can not be simultaneously present at which point our multi-dimensional circumvention of the pigeonhole problem breaks down.

    1. Re:Tech details from the crappy Flash-only website by Anonymous Coward · · Score: 0
      The limitation to this Pigeonhole Principle circumvention is that the multi-dimensional space can never be super saturated, and that all of the pigeons can not be simultaneously present at which point our multi-dimensional circumvention of the pigeonhole problem breaks down.

      So you can compress 100Mb of data to 1Mb, but only if it's not all there. Ground-breaking stuff.

    2. Re:Tech details from the crappy Flash-only website by Anonymous Coward · · Score: 0

      I'm sick of being sorely mistaken. It hurts!

    3. Re:Tech details from the crappy Flash-only website by MadCow42 · · Score: 2

      >> The difference between these occurrences is so small that we will have for all intents and purposes successfully encoded lossley universal compression.

      Based on this quote, they don't claim lossless... anyone believe their claim now? (ok, they claim that "these (differences are) so small that we have for all intents...)"...)

      MadCow

      --
      I used to have a sig, but I set it free and it never came back.
    4. Re:Tech details from the crappy Flash-only website by igomaniac · · Score: 1
      In fact, this (more) technical explanation of what they are doing sounds a lot more sensible. Using multidimensional analysis tools in an entropy coder is in fact a novel idea, and if you look up Kolmogorov Complexity on MathWorld you will see that this is an entropy measure for multidimensional data just as Shannon defined entropy for single dimensional data.

      The further you get from this (i.e. their press release and then the Reuters article) the more the facts are distorted and the claims inflated.

      --

      The interactive way to Go -- http://www.playgo.to/iwtg/en/
    5. Re:Tech details from the crappy Flash-only website by Anonymous Coward · · Score: 0

      bullshit

    6. Re:Tech details from the crappy Flash-only website by Anonymous Coward · · Score: 0

      So they don't claim "lossless" but WTF is "lossley"?

    7. Re:Tech details from the crappy Flash-only website by agent+oranje · · Score: 1

      if i'm not mistaken, the pigeonhole principle states that "if you have n holes, and n+1 pigeons, one hole must have more than one pigeon." as it applies to data compression, if a sequence of bits appears multiple times, represent it as something smaller, and replace it with the original sequence when uncompressed.

      your statement seems to state "given n unique objects, how many objects are there?" this is n, of course.

      --
      -agent oranje.
  8. Is this April 1st? by tshoppa · · Score: 3, Informative
    This has *long* been an April 1st joke published in such hallowed rags as BYTE and Datamation for at least as long as I've been reading them (20 years).

    The punchline to the joke was always along the lines of

    Of course, since this compression works on random data, you can repeatedly apply it to previously compressed data. So if you get 100:1 on the first compression, you get 10000:1 on the second and 1000000:1 on the third.
    1. Re:Is this April 1st? by Johannes+K. · · Score: 1
      Of course, since this compression works on random data, you can repeatedly apply it to previously compressed data. So if you get 100:1 on the first compression, you get 10000:1 on the second and 1000000:1 on the third.

      I quote from their press release:

      Existing compression technologies are [...] limited in application to single or several pass reduction. ZeoSync's approach to the encoding of practically random sequences is expected to evolve into the reduction of already reduced information across many reduction iterations, producing a previously unattainable reduction capability.

      Not so far from that April first joke, is it?

    2. Re:Is this April 1st? by friscolr · · Score: 2, Funny
      But this is no joke.

      Please note they claim to be able to compress data 100:1, but do not say they can decompress the resultant data back to the original.

      By the way, so can i.
      Give me your data, of any sort, of any size, and i will make it take up zero space.

      Just don't ask for it back.

    3. Re:Is this April 1st? by Sobrique · · Score: 1

      Wouldn't it be entertaining if this press release _was_ the result of that BYTE article.
      Scenario: Overworked programmer tells manager to back off, because he's developing a _really_ new and fantastic compression algorithm.
      Shows said manager copy of article and points out that no-one has patented it yet.
      Manager know not a lot, but does know that with compression, more is better, and rushes press release.

      Scary thing is, I can see it happening...

    4. Re:Is this April 1st? by ThatComputerGuy · · Score: 1

      I can too! LZip to the rescue!

      --
      XML is like violence. If it doesn't solve the problem, use more.
    5. Re:Is this April 1st? by malfunct · · Score: 1

      By claiming lossless compression it seems to me the are garunteeing that the process is reversable. If you can't get your original info back then it is lost.

      --

      "You can now flame me, I am full of love,"

  9. Press Release here by thing12 · · Score: 2, Informative
    If you don't want to wade through the flash animations...

    http://www.zeosync.com/flash/pressrelease.htm

  10. randomness by Derwen · · Score: 2
    a breakthrough in data compression that allows for 100:1 lossless compression of random data.
    That's fine if you only have random data - but a lot of mine is non-random ;o)
    - Derwen

    --
    http://fsfeurope.org/
  11. Yet Another Fantastic Compression Scheme by pointym5 · · Score: 1

    "Breakthrough" compression schemes are the perpetual motion machines of the 21st century. Any technological claim that's introduced with the statement that they've broken through the boundaries of information theory falls way on the wrong side of Occam's razor for me.

    Think about it: 100-to-1 compression of random data? Just think in terms of first principles: How many bit strings are there of a given length? How would you reduce the size of a binary description that identifies a particular one? And note that the random data thing is straight from their press release!

    1. Re:Yet Another Fantastic Compression Scheme by Anonymous Coward · · Score: 0

      Where does random data come from? Generally a pseduo-radom number generator. If you take one of these and generate a string that is 100 times the size of algorithm itslef, BANG! you have 100:1 compression for random data.

      Agreed, it would be quite hard (or impossible) to find said mathematical equation to represent random data, but one could hope....

  12. Drink. Feck. Girls. Arse. by Anonymous Coward · · Score: 0

    Drink! Drink! Drink!
    cup of tea, father?
    ah, GO ON!

  13. No Way... by tonywestonuk · · Score: 2, Redundant

    Pure random data is imposible to compress - If You compress 1Mb of random data (propper Random Data, not pseudo random).. and you get, say 100K's worth of compressed output; what's stopping you feading this 100K's worth back through the algorhythm, again and reduceing it down even more.... again, and again, untill the whole 1MB is squashed into a byte! (Which, obviously is a load of rubbish).....

    1. Re:No Way... by Sobrique · · Score: 1

      Erm, cos the output wouldn't be random?
      Always assuming you're interested in reconstructing the initial values of course.

    2. Re:No Way... by radish · · Score: 2


      Not true.

      Get yourself some random data (real random is of course somewhat hard to find! but the output from a crypto-strength RNG is OK) and zip it. It will (probably) get smaller, a reduction is more likely the bigger the file is. The reason is that in a random stream you may get repeating patterns (although you may not), and it's these repeating patterns which deflate uses. The larger your dataset the more likely there are to be significant repeated sections. Other less random data sets (e.g. plain text) will comress much better because there are statistically more repeated sections (this is at the bit level, not char level of course).

      Now the output from deflate is NOT random (I've said this in other comments on this thread), it will not have any repeating sections (the first zip run has removed them), therefore running deflate over it again will have no effect.

      This is why zipping a zip file will always have no effect (OK so sometimes you can exploit weaknesses in the file format of zip, but rarely). It is not at all important what the original data looked like, a second run will not
      improve the ratio.

      Note that I'm not saying you can compress ANY data, (I just said you can't compress compressed streams for instance!), but random data is not impossible to compress, just quite hard.

      I've seen several people use the "repeated compression = 1 byte final result" argument against this announcement here - it's inappropriate. I agree this press release is pure horse manure, but not for that reason.

      --

      ---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"

    3. Re:No Way... by serial+frame · · Score: 1
      ...until the whole 1MB is squashed into a byte!

      You hit the nail right on the head. If ANY 'compression scheme' were to 'compress' 1MB of data to a single byte...How would it go about decompressing it? One byte is DEFINATELY not enough space to store proper information about the compressed data on how to decompress it. (Though I'm sure by the time I post, somebody will have already said this.)

      So, any way one could put it, Zeo's claim is as much bullshit as their buzzword-compliant site.

      --

      -
      And the Angel said unto me, "These are the cries of the carrots! The cries of the carrots!"
    4. Re:No Way... by CaseyB · · Score: 3, Insightful
      It will (probably) get smaller, a reduction is more likely the bigger the file is.

      It "probably" will not.

      The reason is that in a random stream you may get repeating patterns (although you may not), and it's these repeating patterns which deflate uses.

      Any encoding that saves space by compressing repeating data, also adds overhead for data that doesn't repeat -- at least as much overhead as you saved on the repetition, over the long run.

      There ain't no such thing as a free lunch.

    5. Re:No Way... by flegged · · Score: 1

      If the output weren't random, then there would be patterns in it. You could then, for example, run that through gzip and have it compress more.
      The output must be the smallest possible representation of the input, which, as we all know, is truly random, or as near to as the compression algorithm can't find any patterns.

      --

      "I think he was truly surprised at how little I cared about how big a market the Mac had" - Linus on Jobs
    6. Re:No Way... by liquidsin · · Score: 2

      I've seen several people use the "repeated compression = 1 byte final result" argument against this announcement here - it's inappropriate.

      Ok, so what if I start out with 100 bytes of data, purely random (as pure as can be had...) that just happens to have no patterns that can be factored out (could happen...it's random). You mean to tell me that can be compressed at 100:1? Even if it did have some patterns to it, there's no way in hell it could crush down to 1 byte. The fact that they claim on their website that they take data and randomize with a patented technology is a good tip-off that it's a hoax.

      --
      do not read this line twice.
    7. Re:No Way... by QuMa · · Score: 1

      Quoth the raven: bollocks. Given a sufficiently large truely random input, achieving compression is impossible.

      (the 'sufficiently large' is necessary because it is possible that a random generator would give say all 0's for a bit. But if you go on long enough, the average compression will be 1:1 or worse.)

    8. Re:No Way... by Eivind · · Score: 3, Insightful
      Get yourself some random data (real random is of course somewhat hard to find! but the output from a crypto-strength RNG is OK) and zip it. It will (probably) get smaller, a reduction is more likely the bigger the file is.



      Bullshit. There will be patterns, but the point is, all patterns are equally likely, so this does not help you. Don't believe me ? Test it yourself. Pull say a megabyte of your /dev/random (this will take a while!) And then try to compress it with all the compressors on your machine. Zip, Compress, Bzip, you name it.



      The odds are very high (as in 99.999% ++) that none of the compressors will manage to shrink the file a single byte. Infact they will probably all cause it to grow very sligthly.

    9. Re:No Way... by KingAdrock · · Score: 1

      The announcement says that it approaches 100:1, and I'm assuming that is the average compression. The larger the files get, I would guess the better compression they get. I doubt they would make any claims that they could get a 100 byte file down to one byte. And why would you want to? However if you have a 100 Gigs of data, and could reduce it to 1 gig, that would be something everyone could use.

      I don't take this announcement at face value, but I do think that alot of people are dismissing it because it goes against the rules that are known. Every once in a while a technology come along that doesn't follow the existing rules. It starts with a whole new set of rules. These are the things that are revolutionary. These are the things that change the world. If we all accepted that compression could never get any better than it currently is, then it would never get better!

    10. Re:No Way... by radish · · Score: 3, Funny


      *Reads FAQ* *Blushes*

      OK, so I went the "negligable housekeeping route". Maybe I should get a job in the patent office. ;-)

      --

      ---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"

    11. Re:No Way... by radish · · Score: 2


      The counting argument is just as appropriate to Zip as to this new algorithm. You can't apply Zip recursively, and likewise you can't apply this thing recursively. That doesn't mean they can't have got 100:1 compression. Zip can get 100:1 on some files, just not all. If they claim to get exactly 100:1 on EVERYTHING then they're talking crap, but I didn't see that claim (maybe I missed it). As I said, I think they're talking crap anyway, I'd just like to make sure we use valid arguments to beat them into the ground ;-)

      --

      ---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"

    12. Re:No Way... by Nyh · · Score: 1

      It will (probably) get smaller, a reduction is more likely the bigger the file is.
      If you really think this is true why don't you cash The $5000 Compression Challenge and of course the Slashdot article.

      Well, all this compression BS was discussed over a thousand times but most people keep on dreaming.

      Hans

    13. Re:No Way... by Captain+Nitpick · · Score: 1
      Get yourself some random data (real random is of course somewhat hard to find! but the output from a crypto-strength RNG is OK) and zip it.
      Now the output from deflate is NOT random
      Note that I'm not saying you can compress ANY data, (I just said you can't compress compressed streams for instance!), but random data is not impossible to compress, just quite hard.

      You don't seem to see the big flaw in your argument. What if your RNG output is a valid zip file? Or a JPEG, or some other already compressed format?

      With a true random number generator, all sequences of bits of a given length are equally likely. Which means you can get valid compressed files out of it, or even all 1s for that matter.

      --
      But then again, I could be wrong.
    14. Re:No Way... by Stephan+Schulz · · Score: 2

      The odds are very high (as in 99.999% ++) that none of the compressors will manage to shrink the file a single byte. Infact they will probably all cause it to grow very sligthly.


      However, many compression programs will hide this very small growth in the file name. Gzip, for example, will never increase a file in size by simply refusing to do the compression if the file does not shrink. However, it adds a 3 byte marker (".gz") to all files it compresses, nicely hidden away in a place you don't look at.

      --

      Stephan

    15. Re:No Way... by kz45 · · Score: 0

      If you really think this is true why don't you cash The $5000 Compression Challenge [geocities.com] and of course the Slashdot article [slashdot.org].

      if they can't affors a "real" website, how can they afford to pay me $5000?

    16. Re:No Way... by Eivind · · Score: 2
      No. you're wrong. Gzip *will* sometimes make a file grow. It has to. Think about it: there are certain sequences of bytes in a file that are "magical" to gzip, which are meant to expand to something more than what they are.

      If those sequences show up in the file, they must be "escaped" somehow so as to make gunzip understand that they are to be interpreted literally as opposed to expanded.

      This will cause a small growth. But the growth will generally stay very low, I haven't tested, but I would guess no more than 1% growth or so.

    17. Re:No Way... by Eivind · · Score: 2
      I tested this empirically just now. A 1000 byte random file gizipped will grow to about 1028 bytes.

      A 1000000 file grows to about 1000178

      You make your own experiments and draw your own conclusions if you like.

    18. Re:No Way... by koali · · Score: 1
      Ermm... yes. The compression algorithm is as follows. The output is 0 for that 100 bytes of data, and a 1 followed by the original string otherwise.

      Of course, this has little to do with the matter at hand

    19. Re:No Way... by malfunct · · Score: 1

      Random != Perfectly Distributed

      Random != Non-Repeating

      Random is just exactly what it says, random, meaning any sequence of bits is equally likely. It is possible, chosing perfectly randomly, to get the sequence 1111111... which is very compressable.

      The probability of being able to compress any 1 string chosen at random is very high. The probability of being able to compress EVERY string in a set is zero.

      Its very important to make this distinction about randomness when speaking about compression or you won't at all understand the arguments presented.

      --

      "You can now flame me, I am full of love,"

    20. Re:No Way... by saforrest · · Score: 1
      This is not true. Gzip always compresses, and there will be an increase for some files, because of the need to "escape" special characters (as suggsted by another poster. From the gzip man page:


      Compression is always performed, even if the compressed file is slightly larger than the original. The worst case expansion is a few bytes for the gzip file header, plus 5 bytes every 32K block, or an expansion ratio of 0.015% for large files. Note that the actual number of used disk blocks almost never increases. gzip preserves the mode, ownership and timestamps of files when compressing or decompressing.



      So the file is always compressed, and is at most 0.015% larger than the original.
    21. Re:No Way... by saforrest · · Score: 1

      Well, I won't bother with further experiments, but let's assume your data is representative, and that the increase in size obeys the linear equation y=mx+b, where b is some fixed-cost and m is some cost per unit of compressed data.

      Solving this linear system I get m = 1.000150150 and b = 27.84984985.

      The value for m, an expansion ratio of 0.015015%, matches almost perfectly the claim in the gzip man page:

      The worst case expansion is a few bytes for the gzip file header, plus 5 bytes every 32K block, or an expansion ratio of 0.015% for large files.

      Apparently, the "few bytes for the gzip file header" averages out to about 28 (for this data set, anyway). In any case, it's way too small a sample to draw conclusions, but it's neat to see gzip's claims verified in practise.

    22. Re:No Way... by kallisti · · Score: 1

      Every once in a while a technology come along that doesn't follow the existing rules
      There are many sorts of rules. The mathematical ones have NEVER been defeated. It is possible that the math was misapplied (using Euclidean space for physics, for example) but since compression is purely mathematical, I doubt that is going to happen here. How many perpetual motion machines have you seen lately?

  14. The proofs in the pudding. by neo · · Score: 5, Funny

    ZeoSync said its scientific team had succeeded on a small scale in compressing random information sequences in such a way as to allow the same data to be compressed more than 100 times over -- with no data loss. That would be at least an order of magnitude beyond current known algorithms for compacting data.

    ZeoSync announced today that the "random data" they were referencing is string of all zero's. Technically this could be produced randomly and our algorythm reduces this to just a couple of characters, a 100 times compression!!

    1. Re:The proofs in the pudding. by Anonymous Coward · · Score: 0

      This algorithm probably has a great compression ratio in pop mass produced music where a lot of the songs are redundent and sound alike.

    2. Re:The proofs in the pudding. by Hee+Hee+Hee · · Score: 1

      It could be extended to /. submissions, as well.

      --
      - Bill
    3. Re:The proofs in the pudding. by Anonymous Coward · · Score: 0

      And here I thought they were using ? (pi) for the random data. Since ? (pi) is already capable of 1,000,000:1 ratio. BTW the question marks are inserted because the slashcode either font lacks Pi's symbol or has it in a different place.

    4. Re:The proofs in the pudding. by jelle · · Score: 1

      "allow the same data to be compressed more than 100 times over"

      Hmm. I tend to think that I understand english, When translated to perl, the quoted phrase means nothing more than this:

      #!/usr/bin/perl
      $filename="TheSameData.inp";
      $morethan100=101;

      for ($i = 0; $i $morethan100 ; $i++)
      {
      system ("gzip $filename");
      }

      --
      --- Hindsight is 20/20, but walking backwards is not the answer.
    5. Re:The proofs in the pudding. by jelle · · Score: 1

      Hey, I lost a "&lt" there.

      --
      --- Hindsight is 20/20, but walking backwards is not the answer.
  15. The pressrelease by grazzy · · Score: 4, Informative

    ZEOSYNC'S MATHEMATICAL BREAKTHROUGH OVERCOMES LIMITATIONS OF DATA COMPRESSION THEORY

    International Team of Scientists Have Discovered
    How to Reduce the Expression of Practically Random Information Sequences

    WEST PALM BEACH, Fla. - January 7, 2001 - ZeoSync Corp., a Florida-based scientific research company, today announced that it has succeeded in reducing the expression of practically random information sequences. Although currently demonstrating its technology on very small bit strings, ZeoSync expects to overcome the existing temporal restraints of its technology and optimize its algorithms to lead to significant changes in how data is stored and transmitted.

    Existing compression technologies are currently dependent upon the mapping and encoding of redundantly occurring mathematical structures, which are limited in application to single or several pass reduction. ZeoSync's approach to the encoding of practically random sequences is expected to evolve into the reduction of already reduced information across many reduction iterations, producing a previously unattainable reduction capability. ZeoSync intentionally randomizes naturally occurring patterns to form entropy-like random sequences through its patent pending technology known as Zero Space Tuner(TM). Once randomized, ZeoSync's BinaryAccelerator(TM) encodes these singular-bit-variance strings within complex combinatorial series to result in massively reduced BitPerfect(TM) equivalents. The combined TunerAccelerator(TM) is expected to be commercially available during 2003.

    According to Peter St. George, founder and CEO of ZeoSync and lead developer of the technology: "What we've developed is a new plateau in communications theory. Through the manipulation of binary information and translation to complex multidimensional mathematical entities, we are expecting to produce the enormous capacity of analogue signaling, with the benefit of the noise free integrity of digital communications. We perceive this advancement as a significant breakthrough to the historical limitations of digital communications as it was originally detailed by Dr. Claude Shannon in his treatise on Information Theory." [C.E. Shannon. A Mathematical Theory of Communication. Bell System Technical Journal, 27:379-423, 623-656, 1948]

    "There are potentially fantastic ramifications of this new approach in both communications and storage," St. George continued. "By significantly reducing the size of data strings, we can envision products that will reduce the cost of communications and, more importantly, improve the quality of life for people around the world regardless of where they live."

    Current technologies that enable the compression of data for transmission and storage are generally limited to compression ratios of ten-to-one. ZeoSync's Zero Space Tuner(TM) and BinaryAccelerator(TM) solutions, once fully developed, will offer compression ratios that are anticipated to approach the hundreds-to-one range.

    Many types of digital communications channels and computing systems could benefit from this discovery. The technology could enable the telecommunications industry to massively reduce huge amounts of information for delivery over limited bandwidth channels while preserving perfect quality of information.

    ZeoSync has developed the TunerAccelerator(TM) in conjunction with some traditional state-of-the-art compression methodologies. This work includes the advancement of Fractals, Wavelets, DCT, FFT, Subband Coding, and Acoustic Compression that utilizes synthetic instruments. These are methods that are derived from classical physics and statistical mechanics and quantum theory, and at the highest level, this mathematical breakthrough has enabled two classical scientific methods to be improved, Huffman Compression and Arithmetic Compression, both industry standards for the past fifty years.

    All of these traditional methods are being enhanced by ZeoSync through collaboration with top experts from Harvard University, MIT, University of California at Berkley, Stanford University, University of Florida, University of Michigan, Florida Atlantic University, Warsaw Polytechnic, Moscow State University and Nankin and Peking Universities in China, Johannes Kepler University in Lintz Austria, and the University of Arkansas, among others.

    Dr. Piotr Blass, chief technology advisor at ZeoSync, said "Our recent accomplishment is so significant that highly randomized information sequences, which were once considered non-reducible by the scientific community, are now massively reducible using advanced single-bit- variance encoding and supporting technologies."

    "The technologies that are being developed at ZeoSync are anticipated to ultimately provide a means to perform multi-pass data encoding and compression on practically random data sets with applicability to nearly every industry," said Jim Slemp, president of Radical Systems, Inc. "The evaluation of the complex algorithms is currently being performed with small practically random data sets due to the analysis times on standard computers. Based on our internally validated test results of these components, we have demonstrated a single-point-variance when encoding random data into a smaller data set. The ability to encode single-point-variance data is expected to yield multi-pass capable systems after temporal issues are addressed."

    "We would like to invite additional members of the scientific community to join us in our efforts to revolutionize digital technology," said St. George. "There is a lot of exciting work to be done."

    About ZeoSync

    Headquartered in West Palm Beach, Florida, ZeoSync is a scientific research company dedicated to advancements in communications theory and application. Additional information can be found on the company's Web site at www.ZeoSync.com or can be obtained from the company at +1 (561) 640-8464.

    This press release may contain forward-looking statements. Investors are cautioned that such forward-looking statements involve risks and uncertainties, including, without limitation, financing, completion of technology development, product demand, competition, and other risks and uncertainties.

    1. Re:The pressrelease by Anonymous Coward · · Score: 0

      This is the catch:
      > Although currently demonstrating its technology
      > on very small bit strings, ZeoSync expects to
      > overcome the existing temporal restraints

      Coding theory tells us that random data has no
      pattern. Right. But on small strings, far from
      all combinations exist. How much do you win?

    2. Re:The pressrelease by Anonymous Coward · · Score: 0

      ZeoSync has developed the TunerAccelerator(TM) in conjunction with some traditional state-of-the-art compression methodologies. This work includes the advancement of Fractals, Wavelets, DCT, FFT, Subband Coding, and Acoustic Compression that utilizes synthetic instruments. These are methods that are derived from classical physics and statistical mechanics and quantum theory, and at the highest level, this mathematical breakthrough has enabled two classical scientific methods to be improved, Huffman Compression and Arithmetic Compression, both industry standards for the past fifty years.

      look at mr. marketeer go!

      now.. before this is modded flaimbait ill just add something sientific.

      fractals are iterated, of course you can find a match for a given random data input but you will need at least the same size to store the fractal domain stuff.

      dct, wavelet, fft is frequency stuff. how do they suppose to compress a frequency that looks like a dog chews on the antenna of an fm radio

      subband coding is a way to store said frequencies.

      for acoustic compression take a look at the dog example and think a little.

      im still trying to find a link between them and quantum mechanics.. oh well..

    3. Re:The pressrelease by gpinzone · · Score: 1

      How does this theory gab you: Let's say I have number of algorithms that generate strings of pseudo random numbers that are predictable based on the seed. By examining an exsisting a set of numbers, I back myself into determining an algorithm/seed that would generate said string of numbers. All I'd need to cpature is the seed and which algorithm from my library of pseudo-random generating algorithms will generate a match.

  16. Vaporware 2002 by Anonymous Coward · · Score: 1, Funny

    Looks like Wired has a start to their top 10 list for 2002.

  17. Buzzwordtastic by Steve+Cox · · Score: 2, Interesting
    I got bored reading the press release after finding the fourth trademarked buzzword in the second paragraph.


    I simply can't believe that this method of compression/encoding is so new that it requires a completely new dictionary (of words we presumably are not allowed to use).

  18. I can do better than that! by Sobrique · · Score: 2, Funny

    100 to 1? Bah, that's only 99%.
    The _real_ trick is getting 100% compression. It's actually really easy, there's a module built in to do it on your average unix.
    Simply run all your backups to the New Universal Logical Loader and perfect compression is achieved. The device driver, is of course, loaded as /dev/null.

    1. Re:I can do better than that! by fireshipjohn · · Score: 1

      Thats fine, its the uncompressing that gets you!

      :)

    2. Re:I can do better than that! by WWWWolf · · Score: 1

      Actually /dev/null or /dev/zero aren't really that cool for compression, since they only output zero bits. (Likewise /dev/full gives only zeros when read.)

      Is there a /dev/one or something like that that would output 11111111 (255 decimal)? That might be a bit cooler...

    3. Re:I can do better than that! by n8willis · · Score: 1
      Finally; a Slashdot story in an area in which I have some experience!

      100% compression is easy.... I always found the real trick to be doing the 100% uncompression.

      Nate

      --
      -- Watch the REAL Jon Katz.
    4. Re:I can do better than that! by kzinti · · Score: 2

      Thats fine, its the uncompressing that gets you!

      Oh, you want reversible compression? Why didn't you say so? We have to have complete specifications you know. I'm sorry that you compressed your 120GB disk full of pr0n and mp3s down to nothing, but it's not really our fault now, is it?

      --Jim

    5. Re:I can do better than that! by Anonymous Coward · · Score: 0

      I'm getting close! I can do 100% compression and decompression as well. The decompression is slow, troublesome and sometimes lossy, but mostly it works. I compress with rm.

      But really. Two words: counting argument. Let's say that my data is N bits long. My magic gizmo always compresses it down to N-1 bits. Not that fancy? Now, I compress each of the 2^N possible datas and end up with a maximum of 2^(N-1) unique compressed datas. Now how on earth a compressed string can be losslessly and deterministically decompressed into at least two different datas? That's right. More on the closest book on information theory.

  19. Practically Random by sfraggle · · Score: 1
    Quote from the site:

    WEST PALM BEACH, Fla. - January 7, 2001 - ZeoSync Corp., a Florida-based scientific research company, today announced that it has succeeded in reducing the expression of practically random information sequences. Although currently demonstrating its technology on very small bit strings, ZeoSync expects to overcome the existing temporal restraints of its technology and optimize its algorithms to lead to significant changes in how data is stored and transmitted.

    Note the wording: "Practically Random", not "Random". This of course does throw some doubt on this claim, as "Practically Random" could mean anything...
    --
    were you expecting to see a sig here? perhaps you'd rather see the inside of an ambulance!
    1. Re:Practically Random by Isao · · Score: 1

      Yes, I was thinking about that. I can think of two interpretations:

      Practically Random - Functionally random
      Practically Random - Nearly random

      Big difference, but there are so many other nits to pick, why start here?

      They'd better pony up with some more details and/or expert testimony, or they'll be labelled as cranks, even if they DO eventually come up with something.

    2. Re:Practically Random by Saffamer · · Score: 1

      Someone explain "practically random" to me. I always thought it was either random or not random and there wasn't anything "practically" about it.

      That's like "hey, I'm practically pregnant!" You are or you aren't, there's no in between. Right?

    3. Re:Practically Random by RFC959 · · Score: 2
      True, but in the field of compression, "practically random" means "random". One of the definitions of a random sequence is that you can't describe the sequence in fewer terms than the sequence itself contains - which is to say, it's incompressible. (That definition is from Pi in the Sky, by John D. Barros.)

      I was thinking about submitting the ZeoSync release, and then I thought, nah, it's just fluff, no one will be interested... It's true that a press release is usually written by suits, not scientists, so you can't expect too much real meat - but "ZeoSync's approach to the encoding of practically random sequences is expected to evolve into the reduction of already reduced information" is a real winner; if you're "reducing information", it's not lossless compression! I smell a rat. The whole thing sounds like it could have been written by the Onion, for Crom's sake.

    4. Re:Practically Random by Anonymous Coward · · Score: 0

      To a mathmatician or a scientist in a related field, the distinction is stark. Not to a the layman; presented with a page of pseudo-random numbers (ie. generated by an algorithm) he would say "that shit's *random*". My guess is that they mean data is practically random when viewed by member of public who doesn't know and doesn't care about the distinction (eg. journalist), and can see no patterns in it. This is a press release after all.

  20. Hmmmmm. Dial Up! by Anonymous Coward · · Score: 0

    I doubt this compression thing is true but if it is...I'm gonna tell @home to eat it's cable modem,
    and enjoy a blazing dial up connection

  21. Uhh... by praxim · · Score: 1

    Here're a few nebulous bits from the site to keep the skeptics going:

    ZeoSync intentionally randomizes naturally occurring patterns to form entropy-like random sequences through its patent pending technology known as Zero Space Tuner(TM). Once randomized, ZeoSync's BinaryAccelerator(TM) encodes these singular-bit-variance strings within complex combinatorial series to result in massively reduced BitPerfect(TM) equivalents. The combined TunerAccelerator(TM) is expected to be commercially available during 2003.
    According to Peter St. George, founder and CEO of ZeoSync and lead developer of the technology: "What we've developed is a new plateau in communications theory. Through the manipulation of binary information and translation to complex multidimensional mathematical entities, we are expecting to produce the enormous capacity of analogue signaling, with the benefit of the noise free integrity of digital communications. We perceive this advancement as a significant breakthrough to the historical limitations of digital communications as it was originally detailed by Dr. Claude Shannon in his treatise on Information Theory." [C.E. Shannon. A Mathematical Theory of Communication. Bell System Technical Journal, 27:379-423, 623-656, 1948]
    "There are potentially fantastic ramifications of this new approach in both communications and storage," St. George continued. "By significantly reducing the size of data strings, we can envision products that will reduce the cost of communications and, more importantly, improve the quality of life for people around the world regardless of where they live."

    [note - It appears to cure cancer and solve the issue of world hunger]

    ...These are methods that are derived from classical physics and statistical mechanics and quantum theory, and at the highest level, this mathematical breakthrough has enabled two classical scientific methods to be improved, Huffman Compression and Arithmetic Compression, both industry standards for the past fifty years...

  22. scientific method, fact... goes out the window, r by Anonymous Coward · · Score: 1

    Science is based on fact, news is based on fact.

    "The company's claims, which are yet to be demonstrated in any public forum, could vastly boost the ability of computer disks to store, text, music and video -- if ZeoSync's formulae succeed in scaling up to handle massive amounts of data."

    make prediction, test, make new prediction on results if applicable.

    What happened too the the idea that test reults need to be duplicated by others before its accepted as fact(see:news).

    I'm just a lowly mechanical engineer, so I'll take this to mean 1 thing.

    Being labeled "renowned" must mean you are not bound by the scientific method, and that "journalists" are not bound by "fact" in reporting "news".

  23. Impossible by Anonymous Coward · · Score: 0

    If the data is truly random, there is absolutely no way possible to compress it. This is bollocks.

    1. Re:Impossible by Anonymous Coward · · Score: 0

      New spin, but thanks for mentioning this for the millionth time in one article.

    2. Re:impossible by recursiv · · Score: 2

      Yes it does.
      Compressing random data is impossible!

      --
      I used to bulls-eye womp-rats in my pants
  24. That was... by Anonymous Coward · · Score: 0

    ...the funniest thing i've read in a while.

    1. Re:That was... by Anonymous Coward · · Score: 0

      I agree, must have been designed to get a response out of slashdot. Some people have a lot of time on their hands.

  25. bull! by bob@dB.org · · Score: 1
    compress all possible sequensec of 1000 bytes down to 10 bytes. if none of the "compressed" sequences are the same, the method works, if not these guys are just blowing smoke.

    any first year cs studen knows it can't be done.

    --
    Acts@core.mailboks.com Acrux@core.mailboks.com Adam@core.mailboks.com Adar@core.mailboks.com Ada@core.mailboks.com
    1. Re:bull! by Anonymous Coward · · Score: 0

      There are 10^2568 possible arrangements of the integers 1,2,...,1000. An exhaustive search through all of these is not likely to produce an interesting result before the end of the universe.

    2. Re:bull! by Anonymous Coward · · Score: 0

      I'm pretty much with you on this.

      From reading the press release minus all the non-words(tm) I thought they are basically claiming to have a hashing function that allows you to recover the original data. This is clear nonsense because the number of unique hashes is always smaller than the number of inputs.

      Then I read it again and I thought hang on. If e.g you have 1000 input bytes and you know only 2 bits in this are set then obviously you can compress this 100:1 by sending the numbers of the bits so if you put stupid constraints on the problem then it is possible.

      Then I read it again and thought about certain weird numbers like Omega.

      Are they claiming to be able to compress e.g. Omega numbers, which are by definition algorithmically incompressible but computationally enumerable? Omega sounds very wacky on first encounter but there is much detail here that explains it:

      http://www.cs.auckland.ac.nz/CDMTCS/chaitin/natu re .html

      Randomness and incompressibility are either very closely related or some important parts of mathematics are wrong. It isn't just Shannon they are challenging.

      I know where my money is going. I'm betting that either this is flawed or it has a very restricted domain of application where you know that the input fits a particular scheme.

    3. Re:bull! by evan1l38 · · Score: 1

      On the flip side, when I first read about DSL I was authoritatively told that it couldn't be done, that 56K was the limits on what could be sent over the wires.

      You don't know what this company is really trying to do, you just read some PR fluff and are all ready to say it's totally impossible. You don't have any more of a clue about that than I do. And I prefer to reserve judgement until they release something real or get disproved. I won't make an authoritative statement with no information to back me up.

      --

      Evan Reynolds evanthx@hotmail.com
      Two peanuts crossed the street. One was assaulted.

    4. Re:bull! by Anonymous Coward · · Score: 0

      dsl bypasses the analog phone system used for voice, so technically it isnt going through the same system, and thus, 56k IS the limit.

  26. random by blank_coil · · Score: 1

    I didn't read the entire press release, but I did notice the subtitle:

    "International Team of Scientists Have Discovered How to Reduce the Expression of Practically [emphasis mine] Random Information Sequences "

    So I guess the data does have at least some redundancy in it. I'm not an expert, so I don't if this makes their claim more likely to be true, but I thought it should be pointed out.

    --
    No sig for you.
    1. Re:random by gpinzone · · Score: 1

      So I guess the data does have at least some redundancy in it. I'm not an expert, so I don't if this makes their claim more likely to be true, but I thought it should be pointed out.

      Maybe not. Let's say I wanted to transmit the first 100,000 digits of pi. If I tried to compress a file containing all of the digits, it wouldn't compress very well since there aren't any patterns in pi. However, if I sent you an algorithm to reproduce pi, it would take up hardly any space whatsoever and could even be compressed by current means. I'm not saying that this is how their system works, but it is an example that disproves the above statement.

    2. Re:random by An+Onerous+Coward · · Score: 1

      Technically, in such a case, you wouldn't be compressing the data. You'd simply be explaining how to reproduce the data. Otherwise, I could tell you

      sum (n=0 to n=infinity) (-1^n)*4/(n+1)

      and claim to have achieved "infinite compression."

      This formula gives you 4 + 4/3 - 4/9 + 4/27 - 4/81. . . which is one way of calculating PI.

      Of course, such things have great uses. Like being able to create a random terrain generator that takes a single seed number and creates a literally infinite amount of information from it.

      But * if you wanted to be able to compress any set of, say, terrain points in a similar manner, you would need to create a new set of algorithms for that set of points. It doesn't seem like finding those algorithms would be a computationally trivial thing to do, and some sets of data would be impossible to analyze. For truly random data (not just data like PI whose appearance of randomness is actually governed by a simple, strict algorithm) the rule set you would have to transmit along with the data should be as large as the space it was meant to save.

      So in closing, there are some sets of data that, despite an appearance of randomness, aren't.

      I take back my earlier claim. The formula for PI really is a compression algorithm. It's just that there are only a limited set of data strings that are compressible by similarly simple algorithms. If you wanted to compress 3.14109265398979..., you would have to throw in an exception for the unPI'ish digit. The less "Pi-like" the data set, the less you would gain by using such an algorithm.

      * IANAInformation Theory Expert. Come to think of it, I suck at math, so I hope I haven't mislead anyone.

      --

      You want the truthiness? You can't handle the truthiness!

    3. Re:random by gpinzone · · Score: 1

      I take back my earlier claim. The formula for PI really is a compression algorithm. It's just that there are only a limited set of data strings that are compressible by similarly simple algorithms. If you wanted to compress 3.14109265398979..., you would have to throw in an exception for the unPI'ish digit. The less "Pi-like" the data set, the less you would gain by using such an algorithm.

      Let's expand on this. Let's say there are a bunch of numbers interspersed throughout the string that make it different from pi. What if you took the XOR of the binary sequence you were trying to compress with that of the pi sequence? You would get a hell of a lot of zeroes with some ones wherever there was difference between the two sequences. The XOR data would be highly compressible due to the redundancy in the XOR sequence. Tack that onto the formula for pi.

  27. Unlikely by Travelr9 · · Score: 1

    From the splash screen:
    "with costumer service agents providing chat assistance."

    Let's see, I prepare a press release guaranteed to garner my website tens, if not hundreds of thousands of hits, and I leave an egregious typo as my first impression?

    Not!

    1. Re:Unlikely by Anonymous Coward · · Score: 0

      From the splash screen:
      "with costumer service agents providing chat assistance."


      Maybe they like to play dress-up.

      Let's see, I prepare a press release guaranteed to garner my website tens, if not hundreds of thousands of hits, and I leave an egregious typo as my first impression?

      Maybe they're taking a page from slashdot postings.

  28. In this house we obey the 2nd law of thermodynamic by tshoppa · · Score: 3, Insightful
    From the Press Release:
    This press release may contain forward-looking statements. Investors are cautioned that such forward-looking statements involve risks and uncertainties, including, without limitation, financing, completion of technology development, product demand, competition, and other risks and uncertainties.
    They left out Disobeying the 2nd law of Thermodynamics!
  29. Yes you are... by radish · · Score: 2


    B is not random. It is a description (in some format) of A.

    But, what you say does have merit, and this is why compressing a ZIP doesn't do much - there is a limit on repeated compression because the particular algorithm will output data which it itself is very bad at comrpessing further (if it didn't why not iterate once more and produce a smaller file internally?).

    --

    ---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"

    1. Re:Yes you are... by john@iastate.edu · · Score: 2
      B is not random. It is a description (in some format) of A

      If it is not random, then it has some pattern and should compress even better.

      Clearly their claim is a steaming pile of technology (if you get my drift).

      --
      Shut up, be happy. The conveniences you demanded are now mandatory. -- Jello Biafra
    2. Re:Yes you are... by radish · · Score: 2


      I agree totally, their "technology" is junk, but I was just pointing out the difference between saying you can compress random data and saying you can compress any data.

      Still, I have said enough on this topic (and been proven an idiot in other threads) so I'll shut up :-)

      --

      ---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"

  30. theres by Zephy · · Score: 1

    excrement, bovine at that, in the air.

  31. Fairly safe to assume that this is a hoax by CharlesDarwin · · Score: 1

    From their website:

    "ZeoSync's new "Binary Accellerator (TM)" is not a compression technology, rather it encodes digital information into fast and dependable muti-dimensional mathematical entities that the company calls "Gems (TM)". We have chosen the name "Gems" as an acronym for Multi-Dimenstional Mathematical Reduction (MDMR) does not clearly define the condensation process, as successfully as does the mental image of the crystallization of nature that transforms rough materials into precious stones."

    "Once crystallized, Gems are able to move rapidy on a fixed set of binary carriers through existing digital transmission devices, breaking all known transmission barriers. This MindSpeed velocity affects the complete global communications infrastructure by sending more data accross less bandwidth while saving time..."

    MindSpeed velocity? You've got to be kidding me!

    1. Re:Fairly safe to assume that this is a hoax by jallen02 · · Score: 1

      Me thinks someone let their marketing department get a little overzealous with the buzzwords. That whole damn press release is quite painful to read.

    2. Re:Fairly safe to assume that this is a hoax by Anonymous Coward · · Score: 0

      Sorry MindSpeed(TM) is a trade mark already taken...

    3. Re:Fairly safe to assume that this is a hoax by Lifewolf · · Score: 1
      We have chosen the name "Gems" as an acronym for Multi-Dimenstional Mathematical Reduction (MDMR) does not clearly define the condensation process, as successfully as does the mental image of the crystallization of nature that transforms rough materials into precious stones.

      MDMR, eh? I suspect MDMA might have more to do with this press release.

      --
      "Be Happy or Die." -- AoN
  32. I could see it working in a specific context by SerpentMage · · Score: 2

    Many people may say this is bull, but think of it in another way.

    Instead of assuming that data is static, think of it constantly moving. Even in random data, moving data can be compressed because it constantly moving along. It is sort of like when a herd of people file into hall. Sure everyone is unique, but you could organize and say, "Hey five red shirts now", "ten blue shirts now".

    And I think that is what they are trying to achieve. Move the dimensions into a different plane. However, and this is what I wonder about. How fast will it actually be? I am not referring to the mathematical requirements, but the data will stream and hence you will attempt to organize. Does that organization mean that some bytes have to wait?

    --

    "You can't make a race horse of a pig"
    "No," said Samuel, "but you can make very fast pig"
    1. Re:I could see it working in a specific context by ergo98 · · Score: 2

      For lossless compression simply saying "There were 5 red shirts and 7 blue shirts" isn't enough: You'd have to also store information on exactly where those 5 red shirts and 7 shirts were in the sample to be able to recreate the situation exactly as it was. Because of this it has been found to be impossible to "compress" truly random data without actually increasing the size of the file.

      Of course if you're talking lossy then everything changes: Who cares where the shirts are just tell em how many there was. Unfortunately lossy is only relevant for images and sounds.

    2. Re:I could see it working in a specific context by signifying+nothing · · Score: 1

      >Unfortunately lossy is only relevant for images and sounds.

      And press releases?

    3. Re:I could see it working in a specific context by wackybrit · · Score: 1

      Microsoft use the opposite approach. 'Gainly' compression. It balloons files which should be 10-20k up to 5 megabytes or more. I'm not sure why they do this, but there must be some benefit to it.

    4. Re:I could see it working in a specific context by SerpentMage · · Score: 2

      What I was trying to get at is the following. From their explanation they were saying that even though there is randomness there is order. And that order was explained simply using pigeons going through a hole. So create a higher plane of dimensions and things become ordered. Consider for example fractals. Totally underorder, but pattern based. I think they are using chaos mathematics, but then I may be wrong.

      --

      "You can't make a race horse of a pig"
      "No," said Samuel, "but you can make very fast pig"
  33. Compression to one bit by BlueWonder · · Score: 1

    Hey, if their algorithm works on random data, re-apply it to the output, and it will be compressed again. You can do this again and again, until only one bit is left!

    Now, let's uncompress a 0 bit and a 1 bit. All software ever written and ever to be written in the future must come out, since there cannot be anything which compresses to anything else than a 0 or 1 bit, if compressed to a single bit.

    Seriously though, the comp.compression FAQ is really worth a read, especially question #9.

    1. Re:Compression to one bit by sh00z · · Score: 1
      Now, let's uncompress a 0 bit and a 1 bit. All software ever written and ever to be written in the future must come out, since there cannot be anything which compresses to anything else than a 0 or 1 bit, if compressed to a single bit.

      In which case, they'll probably get hit with a massive patent infringement lawsuit.
    2. Re:Compression to one bit by kzinti · · Score: 3, Informative

      Seriously though, the comp.compression FAQ [faqs.org] is really worth a read, especially question #9 [faqs.org]

      YES! Ditto. Seconded. Somebody mod this guy up.

      Here's a bit to whet your appetite:

      9.1 Introduction

      It is mathematically impossible to create a program compressing without loss
      *all* files by at least one bit (see below and also item 73 in part 2 of this
      FAQ). Yet from time to time some people claim to have invented a new algorithm
      for doing so. Such algorithms are claimed to compress random data and to be
      applicable recursively, that is, applying the compressor to the compressed
      output of the previous run, possibly multiple times. Fantastic compression
      ratios of over 100:1 on random data are claimed to be actually obtained.

      Such claims inevitably generate a lot of activity on comp.compression, which
      can last for several months. Large bursts of activity were generated by WEB
      Technologies and by Jules Gilbert. Premier Research Corporation (with a
      compressor called MINC) made only a brief appearance but came back later with a
      Web page at http://www.pacminc.com. The Hyper Space method invented by David
      C. James is another contender with a patent obtained in July 96. Another large
      burst occured in Dec 97 and Jan 98: Matthew Burch applied
      for a patent in Dec 97, but publicly admitted a few days later that his method
      was flawed; he then posted several dozen messages in a few days about another
      magic method based on primes, and again ended up admitting that his new method
      was flawed. (Usually people disappear from comp.compression and appear again 6
      months or a year later, rather than admitting their error.)

      Other people have also claimed incredible compression ratios, but the programs
      (OWS, WIC) were quickly shown to be fake (not compressing at all). This topic
      is covered in item 10 of this FAQ.

    3. Re:Compression to one bit by BlueWonder · · Score: 1

      There are various patents on this or similar compression techniques, for example US #5,533,051, US #5,488,364, US #5,486,826, or US #5,594,435.

  34. 1/7/2001? by NFNNMIDATA · · Score: 1

    It's either a year old or they are just that lame...

  35. What's random? by Moderation+abuser · · Score: 2

    What're they talking about? 20Gb of rand() output?

    If so, they're a bunch or twits.

    --
    Government of the people, by corporate executives, for corporate profits.
    1. Re:What's random? by Sobrique · · Score: 1

      At a guess, technically theres a chance that if you just dump 20Gb of truly random numbers, you end up with all of them being 0's.
      Course, if they managed pull random numbers that many times and got 0 every time, they don't need to invent compression standards - they're on to a winner on the lottery :)

    2. Re:What's random? by Anonymous Coward · · Score: 0

      Hee hee...if pseudorandom is good enough, then all the information is in the seed value: Voila--an arbitrarily small compression ratio!

  36. I agree about random by a+random+streaker · · Score: 1

    To marketing, "random data" means a .jpeg, a .mpeg, an audio file, a .exe file, a text file, a .doc, etc. I.e. the algorithms apply to general data, as opposed to schemes to compress specific data (aka .jpeg for pictures, .mpeg for a series of similar pictures, etc.)

    --
    "All representatives are busy. The estimated hold time is one..hundred..sixty..four..minutes." Detroit Edison, 02/01/02
  37. Been there, done that... by color+of+static · · Score: 4, Informative

    There seems to be a company claiming to exceed, go around, obliterate Shannon every few years. In the early 90's there was a company called Web (before the WWW was really around by a year or so). They made claims of compressing any data, even data that had already been compressed. It is a sad story that you should be able to find in either the sci.compression FAQ or the renewed deja archives. It basically boils down to as they got closer to market, they found some problems... you can guess the rest.
    This isn't limited to the field of compression of course. There are people that come up with "unbreakable" encryption, infinite gain amplifier (is that gain in V and I?), and all sorts of perpetual motion machines. The sad fact is that compression and encryption are not well understood enough for these ideas to be killed before a company is started or stacked on the claims.

    1. Re:Been there, done that... by Fly · · Score: 2
      Their press release states that they still need to work out some issues:
      Although currently demonstrating its technology on very small bit strings, ZeoSync expects to overcome the existing temporal restraints of its technology and optimize its algorithms to lead to significant changes in how data is stored and transmitted.

      From the general lack of information, I'm guessing that "very small bit strings" have at least about one hundred bits, unless they compress to sub-bits. ;-) And we can only infer that the "temporal" problems indicate that compressing larger strings takes an inordinate amount of time. I suppose that sounds logical enough to get some investors to hand them some money, but I disagree with them that all they have to do us simply optimize their algorithms. It sounds like they have a long, long way to go to be practical, which could very well use up any money given to them with no return.
      --
      end of line
    2. Re:Been there, done that... by Zeinfeld · · Score: 2
      This isn't limited to the field of compression of course. There are people that come up with "unbreakable" encryption, infinite gain amplifier (is that gain in V and I?), and all sorts of perpetual motion machines. The sad fact is that compression and encryption are not well understood enough for these ideas to be killed before a company is started or stacked on the claims

      I don't think that the problem is a lack of understanding of the fields in question. The problem is a surfeit of gullability on the part of the con-artist's marks.

      I remember there being a 'compression' company not long ago that blew $15 million on its start up party in Las Vegas and closed not long after when it transpired that the CEO was an ex-convict on the run from a parole violation.

      The problem is that the press take the claim to have 'disproven' some fundamental proof as giving credibility to the claimants rather than making them suspect. The argument proceeds thus 'Black is white and therefore my perpetual motion machine works', 'Black is not white and your perpetual motion machine is almost certainly a fraud', 'You are only arguing that way because you are stuck in your ways and fail to understand that black is actually white, people like you have stopped every advance in science, they laughed at Gallileo when he said the earth was a cube...'.

      --
      Looking for an Information Security student project suggestion?
      Try http://dotcrimeManifesto.com/
    3. Re:Been there, done that... by color+of+static · · Score: 2

      I hate to admit to it, but I think you are probably far more accurate on that then I was. Especially the way the press eats up the actions outlined in your last paragraph. I guess you really can't go broke underestimating the American public.

    4. Re:Been there, done that... by Charles+Dodgeson · · Score: 1

      This does look like part of a common pattern. As I read the press release, I was trying to figure out if it was a scam or a joke. Or some genuine crackpot.

      Anyway, I've got to get back to working on my patent application for my algorithm which analyses programs and data to see if they'll halt. I won't reveal more until the patent is awarded, but I can leak the infomation that it uses Fractal LR BackTracking to identify ChomskySingularities in the Generalized MetaStack.

      Potential investors should note that the road to fully commercializing this may be a long one (but it does halt).

      --
      Prime numbers are exactly what Alan Greenspan says they are -S. Minsky
  38. license? by Karmageddon · · Score: 1
    at the risk of starting a licensing flame war, what license do you think they'll use when they release this? GPL? BSD? Or will it be a regular old patent-with-royalty sort of thing?

    Think really hard...

    1. Re:license? by nomadic · · Score: 1

      Why, the GPL of course. They'll make the money selling their services maintaining the mathematical algorithm.

  39. Why Flash? by Isao · · Score: 1

    You'd think they'd create a java application to present their site compressed with their methodology.

    Heck, even Sun ponied up with a streaming video on demand java applet for their CEO speeches, just to illustrate there weren't performance issues involved.

  40. Blah! by jsse · · Score: 2, Funny

    We already have lzip to compress the files down to 0% of their original size. ZeoSync doesn't catch up with latest technologies on /. it seems.

  41. compressing the story 100:1 by Anonymous Coward · · Score: 0

    claim = crap

  42. It's about /practically/ random data by Telcontar · · Score: 2

    If you read the press release carefully, they claim to be able to compress practically random data, such as pictures of green grass, 100 : 1. They never claim to be able to do the same with true random data, since this is impossible.

    There may be something about that. However, there are also many points that make me sceptical, but maybe the press release has not been reviewed carefully enough.
    This new algorithm does not break Shannon's limit, which is impossible, so the phrase about the "historical limitations" is a hoax...

    1. Re:It's about /practically/ random data by gorilla · · Score: 2

      Pictures aren't 'practically random'. They've very non-random.

    2. Re:It's about /practically/ random data by wilgamesh · · Score: 1

      yes, pictures aren't random. in fact, natural images exhibit a power law decay in the intensity autocorrelation function, over frequency. a random image would have a flat autocorrelation function.

      i don't have a reference handy for this. i came across the topic in a neuro/vision class once. so maybe check psychology or neurology journals and textbooks.

      it's hoped that the study of statistics of natural images can lend insight into how the optic nerve compresses visual information. and this is thought to occur, err, not for a good reason that i remember. but i think it's because the resolution of images are much higher than can be passed through the neurons of the optic nerve without compression.

  43. Re:scientific method, fact... goes out the window, by Anonymous Coward · · Score: 2, Funny

    Screw ZeoSync, I've built a compression algorithm that is 1000:1 and is completely lossless. I've yet to demonstrate it in public though but please give me venture capital. Thank you.

  44. Re: Information theory says 1 by Dada · · Score: 2, Interesting

    The maximum compression ratio for random data is 1. That's no compression at all.

  45. Buzz-word ALERT! by Hougaard · · Score: 2

    ZeoSync intentionally randomizes naturally occurring patterns to form entropy-like random sequences through its patent pending technology known as Zero Space Tuner(TM). Once randomized, ZeoSync's BinaryAccelerator(TM) encodes these singular-bit-variance strings within complex combinatorial series to result in massively reduced BitPerfect(TM) equivalents. The combined TunerAccelerator(TM) is expected to be commercially available during 2003.

    I think they have made a buzz-word compression routine, even our sales peoply have difficults putting this many buzz-words in a press release :)

    1. Re:Buzz-word ALERT! by SLam_to · · Score: 1

      You mean this wasn't a Star Trek script page? :P

      Yeah, really heavy on the technobabble and marketing speak. I'd like to see some sort of scientific paper or more technical details free of buzz-words before I decide.

      Somewhere on their pages they mentioned they only have performed it on very small bit strings.... do I'm not holding my breath.

  46. Re:Fairly safe to assume that this is... by Anonymous Coward · · Score: 0

    It is fairly safe to assume that this is (supposed to be) an advertisement of a company that provides "Strategic Business Communications" services...
    http://www.wilsonmchenry.com/

  47. Some background reading: by Quixote · · Score: 5, Interesting

    Section 1.9 of the comp.compression FAQ is good background reading on this stuff. In particular, read the "WEB story".

    1. Re:Some background reading: by Anonymous Coward · · Score: 0

      Thanks for mentioning this for the millionth time in one article?

  48. I wonder if... by mirko · · Score: 2

    Most random generation uses bytes as their unit.
    Now, what if they look for bit-sequences (not only 8-bit sequences but maybe odd numbers) in order to generate patterns ?
    I guess this could be a way to significantly compress data but this'd imply a huge number of data read in order to achieve the best result possible.
    Note they may also do this in more than one pass-through but then their compression thing should be really lengthy, then.

    --
    Trolling using another account since 2005.
  49. Reminder to Self... by kramer · · Score: 2

    Never, *EVER* accept any advice from the Aberdeen Group. Apparently their analysts don't know shit.

    "Either this research is the next 'Cold Fusion' scam that dies away or it's the foundation for a Nobel Prize. I don't have an answer to which one it is yet," said David Hill, a data storage analyst with Boston-based Aberdeen Group.

    Wonder which category he expects them to win in...

    Physics, Chemistry, Economics, Physiology / Medicine, Peace or Literature

    There is no Nobel category for pure mathematics, or computing theory.

    1. Re:Reminder to Self... by CaseyB · · Score: 2

      Either literature, for their epic fantasy press releases, or economics, for their Theory of Venture Capital Greed:Ignorance ratios.

    2. Re:Reminder to Self... by Anonymous Coward · · Score: 0

      Clearly the Nobel Prize in literature. Brilliant fiction!

    3. Re:Reminder to Self... by Anonymous Coward · · Score: 0

      Physics, because they're not using traditional digital computers.

      Alternately chemistry or medicine, they've created substances that actually convince people that what they claim is possible. ;-P

    4. Re:Reminder to Self... by Anonymous Coward · · Score: 0

      It would be physics - anyone who can beat Shannon's Theorem has also busted Heisenberg's Uncertainty Principle.

    5. Re:Reminder to Self... by kramer · · Score: 2

      Okay, it's not generally good form to follow oneself up, but I wrote the analyist "David Hill" and asked him (slightly more politely than what I said here) what he was thinking.

      He actually responded,and his response to my which category did he think it qualified for was: "Economics! It would not be a traditional award, but its economic impact would be immense."

      I think this particular analyst is in la-la land. The economics award is awarded for work on the advancing the science economics, not making money.

    6. Re:Reminder to Self... by Anonymous Coward · · Score: 0

      Legend has it that there is no Nobel prize for Mathematics because Mr. Nobel caught his wife having an affair with a mathematician. Go figure...

  50. And they got funding ... by the+bluebrain · · Score: 2, Funny

    ... by compressing some VC's bank account, by a factor of greater than 100!

    "It was just data, you know," the sobbing wretch was reportedly told, "just ones and zeros. And hey - you can look at it as a proof of principle. We'll have the general application out ... real soon now, real soon".

    --
    yes, we have no bananas
  51. On the contrary! by Simon+Tatham · · Score: 3, Insightful

    Quite the contrary: if they had claimed to be achieving 100:1 compression on truly random data, they would be provably talking total rubbish. Consider the number of possible bit strings of length N. Now consider the number of possible bit strings of length N/100. There are fewer of the latter, right? Therefore, if you can compress every length-N string into a length-N/100 string, at least two inputs must map to the same output. Hence, you can't uniquely recover the input from the output - and the compression cannot be lossless.

    The fact that they hedge and talk about "practically" random sequences is the only thing that makes it possible they're telling the truth!

    1. Re:On the contrary! by Cederic · · Score: 1


      hmm. What about compressing data length X down to data length X/125. You can then use (X/100) - (X/125) bytes to describe which of the possible expansions of X/125 you just stored.

      Of course, the number of expansions may be bigger than your storage space for that number. But hey, you're writing a compression algorithm - just use it..

      ~Stuart

    2. Re:On the contrary! by Simon+Tatham · · Score: 2

      "But hey, you're writing a compression algorithm - just use it."

      You can't hide behind that, I'm afraid. At this stage you're trying to prove that a compression algorithm of this type is feasible to write. Unfortunately, in the course of your proof you've made the assumption that a compression algorithm of this type is feasible to write! So you've proved that if it can be done, then it can be done. Undeniably true, but not 100% helpful.

      You will find that the number of possible expansions of your X/125 data string is much larger than will fit in the difference between X/125 and X/100. In fact, on average it will work out to be roughly what you can fit in the difference between X/125 and X. So you still haven't gained anything - you're still bound by the simple counting argument that says you can't uniformly reduce every length-X string into a sub-length-X string.

    3. Re:On the contrary! by Cederic · · Score: 1


      darn, you're making me actually think about this now.

      The number of possible combinations of bits of length X is approx 2^X. However, only a small subset of these will compress to any given data string of length X/125.

      Assume a piece of music, 4192KB in size. We'd be looking to compress this to approx. 32K, and then spending about 8K on the "which expansion" bit.

      There are roughly 2^25 combinations of bits in 4 megs of data. However, there are also 2^18 combinations of a 32K data block. So an even distribution would mean that each 32K block expands to just 128 different 4 meg blocks. And we've got 8K to store that number from 0 to 127 in.

      So yeah, it is pretty feasible in that respect. The real killer is processor power in calculating those 128 possible combinations.

      It just occurred to me, I've just described a reversible hashing algorithm rather than a compression mechanism. Maybe that's what's needed..
      ~Cederic

    4. Re:On the contrary! by Simon+Tatham · · Score: 2

      Sorry, you've missed a "two to the power" out.

      4 megabytes of data == 32 megabits == 2^25 bits. That doesn't mean there are 2^25 combinations of bits - it means there are 2^25 actual bits. The number of combinations is 2^(2^25), which is really quite a staggeringly large amount bigger.

      Similarly, there are 2^(2^18) combinations in a 32K data block, not 2^18 as you suggest. So an even distribution would in fact mean that each 32K block expands to 2^(2^25-2^18) different 4 meg blocks - which means the amount of space it would take to store that number is (2^25-2^18) bits. Coincidentally, this is exactly the amount of space by which you reduced the piece of music in the first place!

  52. Re:scientific method, fact... goes out the window, by Anonymous Coward · · Score: 0

    sure i'll give you 10,000,000....which i've compressed into a nickel

  53. Maybe they just found out... by Advocadus+Diaboli · · Score: 1

    ... that it's not neccessary to put every memo sent by email from higher management levels into an attached Winword file, plain ASCII text
    works as well and - believe me - there is no loss of information at all.

  54. Not random data by edp · · Score: 4, Redundant

    ZeoSync is not claiming to reduce random data 100-to-1. They are claiming to reduce "practically random" data 100-to-1, and Reuters appears to have misreported it. What "practically random" data should mean is data randomly selected from that used in practice. What ZeoSync may mean by "practically random" is data randomly selected from that used in their intended applications. So their press release is not mathematically impossible; it just means they've found a good way to remove more information redundancy in some data.

    The proof that 100-to-1 compression of random data is impossible is so simple as to be trivial: There are 2^N files of length N bits. There are 2^(N/100) files of length N/100 bits. Clearly not all 2^N files can be compressed to length N/100.

    1. Re:Not random data by 3am · · Score: 2, Insightful

      By your 'trivial' argument, compression of random data is impossible on any scale (you can't have a bijection between sets of different sizes).

      --

      A: None. The Universe spins the bulb, and the Zen master merely stays out of the way.
    2. Re:Not random data by gCGBD · · Score: 1

      Sort of.

      You can map predictable data into very small sets.

      For example:
      Suppose I have a set of evenly spaced points on a sine curve.
      I could have 10,000 or more elements in that set.
      I could map that set to a much smaller set:
      y(n)=a*sin(b * x(n)) {c -lt x -lt d: x(n+1) = x(n) + e}

      All I'd need to store is:
      (a, b, c, d, e}

      I could restore my data every time.

      Maybe the trick is to find a good function which maps most binary finite sets...

      --

      O=='=++
    3. Re:Not random data by Anonymous Coward · · Score: 0

      That is correct, lossless compression of truly random data IS impossible.

    4. Re:Not random data by Anonymous Coward · · Score: 0

      You, sir, are an idiot. The key, of course, is that in some cases you will endup with an output file of length N/100, sometimes N/99 and sometimes N/101, etc. 100:1 is assuredly impossible, but even truly random data contains neighborhoods of compressable redunancy.

    5. Re:Not random data by ofir · · Score: 1

      An even better understandable proof (altough true only for this particular claim) - take a 1000000 bit file (~125KB) and compress it thrice - et voila - you instantly achieved a 1 bit file. Obviously they don't seriously claim that.

      --
      Two witches watch two watches, which witch watch which watch,and which watch does which witch watch?
    6. Re:Not random data by Stultsinator · · Score: 1

      The proof that 100-to-1 compression of random data is impossible is so simple as to be trivial: There are 2^N files of length N bits. There are 2^(N/100) files of length N/100 bits. Clearly not all 2^N files can be compressed to length N/100.

      Clearly. But by their own admission (and by the caveats of all compression authors) they are not trying to compress all files (including purely random files), just some. That doesn't make an algorithm impossible.
    7. Re:Not random data by Archanagor · · Score: 1

      Actually, truely random data would not have any redundancy whatsoever.

    8. Re:Not random data by Anonymous Coward · · Score: 0

      Well.. All of this talk has got me thinking.... And, Im not to good at that.. But I have an idea.. Im sure someone can tell me why this sucks.

      So "random" bit strings.... Lots of them are fairly compressible.. All 0's or 1's are very compressible. Other strings are not compressible.

      Say they stored the "most random/non compressible" bit strings locally and indexed them. Then as they were compressing if they found one of those non-compressible/most random strings, they could replace it..

      Could they be relying on tons of local storage in order to save transmit/communication times?

    9. Re:Not random data by Anonymous Coward · · Score: 0

      well, the only problem is that if you have tons of local storage, then you have to have some unique bit string to represent each one of your longer bit strings. and as the number of bit strings you are keeping track of locally increases, so does the number of 'shorthand' bits that you have to use to represent them.

    10. Re:Not random data by CTho9305 · · Score: 1

      Your logic is slightly flawed. If you store all the "hard to compress" strings, and indexed them, you will have a lot of those strings - most likely, the size of the index will equal or exceed the size of the original string, which is why this won't work.

      Say, for example, you have a 10-bit patterb stored, but you have 1025 of them stored. the index will require 11 bits. for that matter, if you have over 511 stored, you need 10 bits, and have gained nothing.

    11. Re:Not random data by EricLivingston · · Score: 1

      He's not an idiot - he's correct. There are 16 possible 4-bit files that can exist. The sum of all possible 3, 2, and 1 bit files is 15. Therefore, at least 1 16-bit file cannot possibly be compressed by an algorithm that can compress the other 15 - there's just no more 15-or-less files left to accomodate that 16th 4-bit file: they're all taken with compressed versions of the other 15 4-bit files. Thus, there is absolutely no algorithm than could compress all (i.e. any random) 4-bit file. Obviously, this will be the case for all files of any n-bit size: there's always one file that can't be compressed by the algorithm.

      --
      Please Rate my comment (and help support Fre
    12. Re:Not random data by temporary_pass · · Score: 1
      By your 'trivial' argument, compression of random data is impossible on any scale (you can't have a bijection between sets of different sizes).


      They're not different sizes, the are simply made up of different logical constructs.

      Given any finite N-bit string, a construct C can be defined such that 1

      Therefore given an initial string of size N and a second (compressed) string of size M, if M! = N then a bijection is possible.
    13. Re:Not random data by temporary_pass · · Score: 1



      They're not different sizes, they are simply made up of different logical constructs.

      Given any finite N-bit string, a construct C can be defined such that 1 |{C}| N!

      Therefore given an initial string of size N and a second (compressed) string of size M, if M! = N then a bijection is possible

    14. Re:Not random data by rot26 · · Score: 1

      Actually, truely random data would not have any redundancy whatsoever.

      As has been pointed out elsewhere, this is true ON AVERAGE, but it is possible, however unlikely, that your truly random number generator could generate nothing but zeroes, which would be quite compressible.

      --



      To ensure perfect aim, shoot first and call whatever you hit the target
    15. Re:Not random data by temporary_pass · · Score: 1

      They're not different sizes, they are simply made up of different logical constructs. Given any finite N-bit string, a construct C can be defined such that 1 <= |{C}| <= N! Therefore given an initial string of size N and a second (compressed) string of size M, if M! = N then a bijection is possible

    16. Re:Not random data by temporary_pass · · Score: 1

      They're not different sizes, they are simply made up of different logical constructs.

      Given any finite N-bit string, a construct C can be defined such that 1 = |{C}| = N!

      Therefore given an initial string of size N and a second (compressed) string of size M, if M! = N then a bijection is possible

    17. Re:Not random data by Anonymous Coward · · Score: 0

      Im the AC that proposed indexing hard to compress strings.

      So, I guess it really depends upon how many of the hard to compress strings you decide to index?

      I wanted to avoid trying to use numbers, but lets say half of the strings compress decently. The other half doesnt do so well, and you index half of those... So you have 1/4 of the string "space" indexed. That only saves 2 bits. Plus the overhead of saying "THIS IS AN INDEX" thats at least one bit, so you may have saved only a single bit.. I see your point! :) Thanks.

    18. Re:Not random data by Anonymous Coward · · Score: 0

      Indead. But in the case of small data sets, the odds of being given a data set in which your algorithm happens to give a 2-1 compression on increases.

    19. Re:Not random data by An+Onerous+Coward · · Score: 1

      I finally figured out what they mean by "practically random data."

      Theoretically, any data set could be stumbled upon by random generation. So they've carefully culled, from the output of a random number generator, certain data sets to compress.

      For example:

      3.14159265358979323846

      or

      0123456789012345678901

      or

      0101010101010101010101

      or

      0000000000000000000000

      As the power of computers increases according to Moore's "Law", larger sets of highly compressible random data will be obtained.

      --

      You want the truthiness? You can't handle the truthiness!

    20. Re:Not random data by Anonymous Coward · · Score: 0

      I posted the reply - great minds think alike - I had thought about indexing until someone pointed its flaw out to me ;-)

  55. Looks like IPO drilling by CDWert · · Score: 1

    Looks like IPO drilling to me,

    "We have almost got it, we have the basic principals and the trademarked names" (INSIDE)"Now watch this the money will come pouring in ,Haa haaaaahaaa

    Honestly did you guys read their sting of random data ? All fricking 0's yeah thats random in theory.
    But why no go all the way.hell why not 1000:1 or 100000:1 no problem with all zero's

    Ok if it works ?
    Hey its a boon to civilization, VOip and 1000 times more MP3's and PORN at superspeeds.

    Hmm , If you had something this truly unbeliveable wouldnt you wait on a public announcment until AFTER you had a functional demonstration ?

    --
    Sig went tro...aahemmm.....fishing........
  56. Egads... by RareHeintz · · Score: 5, Funny
    ZeoSync said its scientific team had succeeded on a small scale...

    The company's claims, which are yet to be demonstrated in any public forum...

    ...if ZeoSync's formulae succeed in scaling up...

    Call the editors at Wired... I think we have an early nominee for the 2k2 vaporware list.

    ZeoSync expects to overcome the existing temporal restraints of its technology

    Ah... So even if it's not outright bullshit, it's too slow to use?

    "Either this research is the next 'Cold Fusion' scam that dies away or it's the foundation for a Nobel Prize," said David Hill...

    Somehow I think this is going to turn out more Pons-and-Fleischmann than Watson-and-Crick. Almost anytime there's a press release with such startling claims but no peer review or public demonstration, someone has forgotten to stir the jar.

    When they become laughingstocks, and their careers are forever wrecked, I hope they realized they deserve it. And I hope their investors sue them.

    I should really post after I've had my coffee... I sound mean...

    OK,
    - B

    1. Re:Egads... by gila_monster · · Score: 1

      "Ah... So even if it's not outright bullshit, it's too slow to use?"

      Someone above said something about video streaming being better now. This whole thing is giving me the distinct impression that, even though they may have reduced the file size, streaming will be the same (or worse), since we exchange travel time for unravel time.

      I also seem to recall a patent grant in the past year that said pretty much the same thing as this release, but I don't think it was ZeoSync's.

      gm

      --
      Ad luna, Alicia! Ad luna!
    2. Re:Egads... by shic · · Score: 2, Funny

      > > ZeoSync expects to overcome the existing temporal restraints of its technology
      > Ah... So even if it's not outright bullshit, it's too slow to use?

      No, my friend - you are missing the whole point. ZeoSync HAVE succeeded (in a limited sense.) You see, in order to achieve implausible compression rates on random data - all you need to do is overcome a few temporal issues - follow this line of thinking...

      1) Each implementation of the compression algorithm will only be applied to (a relatively small finite number of) finite sequences of bits.
      2) Encode exactly these sequences in the compression tool.
      3) Astonishing compression is achieved - only a small ordinal need be stored to represent each compressed result.

      So your data will always be small, but your compression program will grow rather quickly!

      Puzzle solved.

    3. Re:Egads... by RareHeintz · · Score: 5, Funny
      Of course! What was I thinking? Why not just use a table lookup of every possible sequence of bytes of any length?

      See you all later - I have some coding to do!

      OK,
      - B

    4. Re:Egads... by Microlith · · Score: 1

      What's your PI offset #?!

      0x3FFC13D2 : That document you lost last week!

      0x4D30AD2F : The latest DivX release of a new movie, BEFORE it's even created!

      0x56C23AF4 : Linux Kernel 3.0, now kernel.org won't die...

      Sorta like finding DeCSS compiled in some prime number, only more realistic!

    5. Re:Egads... by Anonymous Coward · · Score: 0

      What I want to know is whether this is the Peter St George-Hyslop who is working on Alzheimer's vaccines at the University of Toronto, or the Peter St George who's a TA in landscape architecture at the University of Washington, or the Peter St George who left Salomon Smith Barney in Australia last May. Also I'd like to know why they can't "provide investment opportunities to residents of Kansas". Is fraud especially illegal in Kansas or something?

    6. Re:Egads... by MadAhab · · Score: 2
      That's funny, but if you realize (as the majority of idiots here do not) that they are talking about transmission compression, and not storage compression, you're probably closer to what they are pretending to do than you think. So their method would actually result in larger files if you compress them, but you could have reduced bandwidth if you have the right set of lookup tables at each endpoint.

      I'd still be surprised if this were anything other than the sheerest vapor, because the objections about compressibility of random data still apply. Call it the Pigeonhole theory or whatever, but the point is that as you accumulate different varieties of non-repeating segments, the set of codes you use to refer to them grows to the same size as the data it represents.

      --
      Expanding a vast wasteland since 1996.
    7. Re:Egads... by DrSpin · · Score: 1

      Its so obvious, I bet MS and IBM have both patented it!

    8. Re:Egads... by Analog+Squirrel · · Score: 1
      -> Ah... So even if it's not outright bullshit, it's too slow to use?

      I think this would have to be true. My limited understanding of compression is this: the amount of information in a data file(of any type) must be conserved. A good measure for the amount of data might be the number of bits. In the uncompressed file, each bit is unique and contributes to the whole of the information contained there. In previous posts, it has been stated that the simplest compression routines simply count the number of repetitions of certain bits(eg, 62 1-bits followed by 17 0-bits followed by... etc). But how does the computer represent that count? Normally as a binary number - that is, if you give each bits its own level of significance(increasing exponentially from the least significant bit up to the most significant bit), then a large count can be reduced to a smaller number of bits(so the count of 62 bits would be reduced to 00111110). So, now that I've reduced 62 bits of data to 8 bits, where is the leftover data? It is found in the significance placed on the ordering of the bit - that is, I've constructed a framework where each bit represents some meaning in addition to itself. Without that framework, my string of 8 bits has no more meaning than two 0s followed by five 1s followed by one 0.

      If my understanding of data compression is correct, then the entire art of compression centers around finding constructs and patterns that allow a given dataset to be represented in the most compact form. Unfortunately, what that means is that the extra information has to be bound up in the scheme and computing power will be required to extract the meaningful data... especially with 100:1 compression.

      One last word to all the naysayers out there:
      There's nothing wrong with being skeptical, but let's keep in mind that many of most significant discoveries in history have been made against the prevailing theory of how the world works. Let's not reject it out of hand simply because it violates our sensibilities...

      --
      I'd rather be flying
  57. use pi for compression! by Anonymous Coward · · Score: 1, Interesting

    this sounds a lot like using pi (3.141592...) for compression. any random string is guaranteed to occur in that sequence, so just find the position of the string in pi and pronto.... compression!

    doesnt work though since on average you'll need as many numbers to describe the position of the string as yould need to simply represent the string in the first place.

    instead of using pi, they create a '4th dimension', i.e. some sort of combination of all possible combinations in 3 dimensions. different from the pi example though, the problem now is not representing the position in this dimension (4 coordinates) but the recreation of this space (which needs to be enumerated) by the guy who wants to decript the message.

    for 'short' strings a few pointers in a 4 dimensional space will do, for longer strings more dimensions, leading to longer pointers, longer tables of enumeration etc.

    of course, this can be tackled the other way around as well. 'random', by definition, means that the next instance doesnt have any relation to the previous. if you can find such a relation its no longer random to start with.

    1. Re:use pi for compression! by coolcast · · Score: 1

      For this to work, you'll need a *local* 'repository' which contains all possible data that their n-th dimensional data descriptors can point to.

      This can be terabytes in size.. oh well, harddrives are getting cheaper and cheaper..

      --

      Don't click here. BT will enforce intellectual rights and sue for eac
    2. Re:use pi for compression! by Anonymous Coward · · Score: 0

      sure you need a local version to decompress, but that local version can be transmitted as a formula (e.g. send the algorithm of how to calculate pi rather then sending pi).

      even when you have this local version -after long calculations- the pointer to it is going to be as long as the initial message.

    3. Re:use pi for compression! by Anonymous Coward · · Score: 0

      Sorry to burst your bubble (as if you're real attached to this idea) but you are not guaranteed to find an arbitrary sequence in any irrational (transcendental?) number. Take the decimal 3.110100100010000100000 for example. You'll never find the sequence "2". Likewise, I doubt you can claim that a DivX of the motion picture Pi is encoded somewhere in the digits of Pi.

      But I could be wrong! ( ;

    4. Re:use pi for compression! by TMacPhail · · Score: 1
      How about after using pi the first time and obtaining a position where the sequence occurs in pi, use pie again to search for that new sequence. Eventually you may reach a sequence within pi that is shorter than the original data. Once completed store the new sequence along with the number of iterations of finding a number then taking the coresponding sequence within pi.

      I think this could work although not garaunteed to compress everything and the fact that it can take a very long time to search through the digits of pi.

    5. Re:use pi for compression! by Anonymous Coward · · Score: 0
      It's not true that an irrational number is guaranteed to contain all possible substrings of bits or digits. Here are a couple of simple counterexamples:

      Construct the fractional part with increasing numbers of 0 bits between successive 1 bits. So the number would be 0.101001000100001.... Since this number never repeats, it's not a rational number even though it's easy to construct. But it will never have a substring consisting of 1000100101 because the number of 0 bits between successive 1 bits is decreasing.

      Or imagine an irrational number whose decimal representation never uses the digits 0 or 9. The remaining digits may never have a repeating pattern but it's obviously impossible to find all possible substrings in it since it will never have the substring 990099 or lots of others.

      As far as I know, it's never been proven that pi contains all possible substrings; and very many irrational numbers don't.

      Now it has been conjectured that pi does contain all possible substrings ... whether this will ever be proven is an interesting question. But it's not immediately obvious that this is the case, and without that then any numerical method based on it can't possibly work.

      This of course begs the question of whether there's any way to describe the position and size of the message in fewer bits than there are in the message ... which is doubtful unless the message string and the digit stream are both nonrandom!

  58. And if it's true, then... by a+random+streaker · · Score: 1

    Of course, even though it can't be true, if it turned out to be true...

    ...and they patented their genius and hard work...

    ...clods around here would claim how idiotic their obvious work is and how it shouldn't be patented.

    --
    "All representatives are busy. The estimated hold time is one..hundred..sixty..four..minutes." Detroit Edison, 02/01/02
    1. Re:And if it's true, then... by SnapShot · · Score: 1

      I think the "clods around here" would claim that a patent system that allows "hyperlinks" and "one-click checkout" to be patented in the same way as "100:1 compression algorithms" needs to be fixed...

      --
      Waltz, nymph, for quick jigs vex Bud.
  59. What is compression by Vapula · · Score: 3, Interesting

    Compression, after all, is removing all redundancy from the original data.

    So, if there is no redundancy, there is nothing to remove (if you want to remain lossless).

    When you use some text, you may compres by remving some letter evn if tht lead to bad ortogrph. That is because English (as other langages) is redundant. When compressing some periodical signal, you may give only one period and tell that the signal is then repeated. When compressing bytes, there are specific methods (RLE, Huffman's trees,...)

    But, in all these situations, there was some redundancy to remove...

    A compression algorithm may not be perfect (it usually has to add some info to tell how the original data was compressed). Then, recompressing with another compression algorithm (or sometimes, the same will do the trick) may improve the compression. But the information quantity inside the data is the lower limit.

    Now, take a true random data stream of n+1 bits. Even if you know the value of the n first bits, you can't predict the value of n+1. In other words, there is no way that could allow the express these n+1 bits with n (or less) bits. By definition, true random data can't be compressed.

    And, to finish, compression ratio of 1:100 can be easily archived with some data... take a sequence of 200 bytes at 0x00... It may be compressed to 0xC8 0x00. Compression ratio is really only meaningful when comparing different algorithms compressing the same data stream.

    1. Re:What is compression by mblase · · Score: 2

      Compression, after all, is removing all redundancy from the original data.

      Of course, that's only the definition of lossless compression. Lossy compression also exists, with better compression rates and the obligatory sacrifice of detail, and that's what multimedia often relies on.

    2. Re:What is compression by _Mustang · · Score: 2

      A compression algorithm may not be perfect (it usually has to add some info to tell how the original data was compressed).

      Well - why? Granted I'm no expert on this topic but; isn't that because current algorithms manipulate the data stream differently based on some preconception of "best" manner for each *chunk of X length*(ie: block of repetitive data)?
      Why couldn't it be possible to have the a single algorythmic solution that works on the entire dataset simultaneously?

      Pardon my math but, let's use "AAABBBCD"(8 characters) as the example data.
      Traditional methods would turn that into "3A3BCD", reducing it by 2 - correct?

      What is it that prevents this from being mapped to a preexiting multidimensional table/grid (I'll use the english alphabet here) If we use position of the output data as a predetermined element of the equation then..

      "A-B-C-D-E.." etc as out table, could have the data then overlayed as 3311 equaling 4 characters.

    3. Re:What is compression by Vapula · · Score: 1

      In your example, you went from the [A-Z] (26 values) character set to the [0-9A-Z] (36 values) character set, adding 10 symbols in your alphabet.

      should you be using binary data, the starting alphabet is composed of all binary values from 0 to 255 (0000 0000 to 1111 1111). If you want to add symbols to that alphabet, you've to add at least one bit to all symbols.

      Your 8 characer sample (8*5 = 40 bits) to to a 6 extended character compressed form (6*6 = 36). reducing by 4 bits (which is less than 1 character reduction). (you need 5 bits to encode up to 32 values and 6 to encode 64 values)

      But, should you try to compress ABCDEFGH with the same method, you'd go from 8*5= 40 bits to 8*6 = 48 bits, expanding by 8 bits.

      So, if you have to compress AAABBBCDABCDEFGH with your algorithm, you'd lose 4 bits (instead of winning some) and bot expand your data and slow down it's reading (decompression needed)

  60. Wow, it's not 100:1 by Daath · · Score: 2
    From the press release:
    [...] once fully developed, will offer compression ratios that are anticipated to approach the hundreds-to-one range
    Hundreds to one! Someone help me breathe!! :)
    --
    Any technology distinguishable from magic, is insufficiently advanced.
  61. Impossible by miklernout · · Score: 1

    Simple:

    ZeoSync(data) = miniData

    miniData is randomdata

    ZeoSync(miniData) = miniMiniData = ZeoSync(ZeoSync(data))

    ad infinitum

    If someone claims to do real lossless compression of random data, this is impossible...

    As all data would be no data, hence my newest haiku ;-)

    --
    ----
    --
    [insert witty one-liner here for your own pleasure]
  62. Uh-oh by secondsun · · Score: 1

    I was wondering how much processor power decompression takes? If it isn't too much, then would assume that we will see PDA's, MP3 players, and ISP's doing this.
    A 8 Megabyte Flash card now hold 800 MB of Data, the Harry Potter Divx is now only 7MB. A metallica mp3 is only .03 Mb.

    At best this is revolutionary, at worse a candidate for next years vaporware list.

    Secondsun

    --
    There is nothing wrong with being gay. It's getting caught where the trouble lies.
  63. Might be possible... but I doubt it... by Zocalo · · Score: 3, Interesting
    Reading through the press release it seems to imply that they take the "random" data, massage the data with the "Tuner" part, then compress it with the "Accelerator" part. This spits out "BitPerfect" which I assume is their data format. It's this "massaging" of the figures where it's going to sink or swim.

    Take very large prime numbers and the like, huge strings of almost random numbers that can often be written as a trivial (2^n)-1 type formula. Maybe the massaging of the figures is simply finding a very large number that can be expressed like the above with an offset other than "-1" to get the correct "BitPerfect" data. I was toying around with this idea when there was a fad for expressing DeCSS code in unusual ways, but ran out of math before I could get it to work.

    The above theory maybe bull when it comes to the crunch, but if it could be made to work, then the compression figures are bang in the ball park for this. They laughed at Goddard remember? But I have to admit, I think replacing Einstein with the Monty Python foot better fits my take on this at present...

    --
    UNIX? They're not even circumcised! Savages!
  64. Easy! by the+bluebrain · · Score: 1

    You just need a big ol' monolithic database containing all possible random strings of digits of a certain length (which will be big, OK, but so is MS Word, and people use that, don't they?), then you mix 'n match the data you're compressing, and all you have to save is the tags of the fixed random strings you're matching to, the length of which will be ... oh, wait ...

    --
    yes, we have no bananas
    1. Re:Easy! by Anonymous Coward · · Score: 0

      And then we could save off space tagging all idiotic press releases with a "1" and new linux kernel of the day release announcements with a "01"... Surely that'd save a lot of inet bandwidth :)

    2. Re:Easy! by the+bluebrain · · Score: 1

      heh ... like the one about the joke-club:

      There's this guy who's just moved to town and is still getting a feel for the place. One eveing he stumbles into a bar, to be confronted with a group of people yelling numbers at each other, always followed by uprorous laughter.

      - "68!"
      - [laughter to bring the house down]
      - "104!"
      - [... and so on]

      He catches the barman's eye, and enquires as to the goings-on. The barman explains that it's a joke-club, and that they've codified the jokes, because they all know all of them anyway. This seems very strange to the guy, so after a while he decides to do a test. During a lull in the proceedings, he yells a number of his own:

      - "824!", he yells.

      The group all turn to him is surprise, before surpassing all previous efforts in expression of mirth. Gratified, and stared at by everyone in the bar, he tries another one:

      - "210!"

      ... but this only evokes some chuckles from the gathering, and they turn back to their own proceedings.

      He asks the barkeeper what that was all about, and receives the explanation that, well, the second joke - in all honesty, it wasn't really very well told, but the first one, that was fantastic - it was a new joke.

      --
      yes, we have no bananas
  65. Silly web site by pen · · Score: 2

    Is it possible, at all, to trust a company whose home page has silly javascript that resizes your browser window?

  66. je by ivanandre · · Score: 1

    In other noticies, the same company announced that a perpetuum-mobile machine will see the market in 2003

  67. It gets better! by tweakt · · Score: 1

    ZeoSync intentionally randomizes naturally occurring patterns to form entropy-like random sequences through its patent pending technology known as Zero Space Tuner(TM). Once randomized, ZeoSync's BinaryAccelerator(TM) encodes these singular-bit-variance strings within complex combinatorial series to result in massively reduced BitPerfect(TM) equivalents.

    So they claim, if it isn't random enough, they make it MORE random first so their compression can work better. OOoohKay...

    Vaaaaaaaaaaaaaaaaaporware...

  68. What happens when you run it backwards? by sprag · · Score: 4, Funny

    A thought just occurred to me: If you can do 100:1 compression and compress something down to, say, 2 bytes, what would 'ab' expand to? My thought is "ZeoSync Rulz, Suckas"

  69. lzip by EricCheng · · Score: 1

    For those who are not satisfied with the 100:1 compression ratio, lzip might as well worth considering. ;)

  70. They are using time travel! by harlows_monkeys · · Score: 5, Funny
    From one of the things on their site: Although currently demonstrating its technology on very small bit strings, ZeoSync expects to overcome the existing temporal restraints of its technology and optimize its algorithms to lead to significant changes in how data is stored and transmitted (emphasis added).

    Using time travel, high compression of arbitrary data is trivial. Simply record the location (in both space and time) of the computer with the data, and the name of the file, and then replace the file with a note saying when and where it existed. To decompress, you just pop back in time and space to before the time of the deletion and copy the file.

    1. Re:They are using time travel! by AlmightySpoon · · Score: 1

      Well all they are really saying is that their algorithm takes up more cpu time than is practical.

      "ZeoSync intentionally randomizes naturally occurring patterns to form entropy-like random sequences through its patent pending technology known as Zero Space Tuner (fuck you mark)"

      Someone took those goofy pi jokes to heart. The corny ones with the main theme; Every program written is contained in pi, you just have to find the offset. Which in turn means that the offset is a form of encryption that happens to be smaller than the original file.

      Now if these guys were truly onto something the implications for data storage would be increadable. However, they've admitted in their press release that the cpu usage is impractical.. they just haven't said how impractical, yet.

      --
      --------------------------- Politics, Religion, and Sex... Which one do you practice most?
    2. Re:They are using time travel! by Sun · · Score: 1

      Actually, you can do it sometimes today, at work, to go around quota problems.

      We have a NetApp here, and it has "snapshots" (the image of your directory every hour, and two nights back). If you are running out of room, you can delete files, and refer to them from the snapshots. This is PRECISELY the compression you describe.

  71. Hmm, I'd say the're bluffing by 4D53 · · Score: 1

    This press release may contain forward-looking statements. Investors are cautioned that such forward-looking statements involve risks and uncertainties, including, without limitation, financing, completion of technology development, product demand, competition, and other risks and uncertainties. I guess it's a standard clausule in a pressrelease but still I think they rely on it to get some free cash since this just can't work.

  72. Do you really understand? by QuickFox · · Score: 1

    Why are you all so skeptical? Do you really understand their discovery? This may represent a major breakthrough in techniques for reducing the size of investor funds by 100:1.

    Give a man a fish and he eats for one day. Teach him how to fish, and though he'll eat for a lifetime, he'll call you a miser for not giving him your fish.

    --
    Terrorists can't threaten a country's freedom and democracy. Only lawmakers and voters can do that.
  73. Semantics... by shani · · Score: 1

    The press release says "practically random", which means basically nothing.

    It also mentions "temporal restraints" which means it runs to slow to use anywhere.

  74. even if it does work...patents! by Anonymous Coward · · Score: 0

    If their claims turn out true and they build a profitable buisness based on it, I'm sure all ownership will be given to Zog, a prehistoric neanderthal first known for his claim on "squishing things".

  75. Too soon to tell... by bpowell423 · · Score: 2

    First off, they don't say it can compress "random data", they say it can compress "practically random data", which I would take to be everyday sort of data like audio and video. And they don't say that data can be compressed infinitely. _If_ whatever they have does work, I suspect it'll be an enlightening moment for the rest of us if/when they release the details of their algorithm. Sort of like, if the only thing you're familiar with is the bubble sort, quick-sort is almost magical. Well, maybe the current schemes of run-length-encoding, and whatever other pattern matching we do, is akin to the bubble sort and these guys have put their heads together and created the quick-sort of data compression.

    I'm not calling it either way, but all the "It can't be done! The world is flat!" comments are so typically... well... slashdot.

  76. quick guess by asterisk_man · · Score: 1

    seems to me that if they are claiming use of the pigeonhole principle they are going to be trying something like the following:
    1)create some table of bit strings where the size of the string is 100 times larger than the size of the index into the table
    2)encode data as indexes into the table
    3)transmit indexes
    4)decode data with indexes into the table
    now how they get the table from the source to the destination is anyones guess but it seems to me that since the top of their flash animation keeps blinking "revolutionary microchip technology" these tables will be built into hardware. i dont think this can really work for a few reasons, but i just wanted to give my first impression of how it all sounded from my point of view.

    1. Re:quick guess by Anonymous Coward · · Score: 0

      Im sorry, what you're describing is vector quantization. Its been done before, the guy on the site is describing some absurd new principle

  77. Directed evolution by HalfFlat · · Score: 5, Funny

    They're looking for investment money?

    Just think of it as an innumeracy tax on
    venture capitalists.

  78. New Compression Algorithm by pb · · Score: 2

    Proposed: a method for reducing any file down to 16 bytes and losslessly restoring it.

    1. Create an MD5 hash of the file.
    2. Share it on a Peer-to-Peer filesharing client.
    3. Delete the original file.
    4. Find it again!

    Note: in trials, this method seems to work best for Britney Spears songs and videos; further research is being done on how to restore Barry Manilow songs and videos, and what to do about hash collisions (bug those uppity MD5 people again).

    --
    pb Reply or e-mail; don't vaguely moderate.
  79. Just goes to show by Xcott+R13,+3(0,R4) · · Score: 1

    A million dollars in venture capital is easier to obtain than 5 points on a combinatorics test.

  80. top experts by Anonymous Coward · · Score: 0

    Whenever a blurb contains the phrase
    "top experts from so-in-so" you can
    assume it to be a bunch of snake oil.
    As in the phrase "top experts from
    Harvard" What they don't have any
    experts?? They have to borrow some
    one elses?

  81. Pi is Interesting by Kr3m3Puff · · Score: 1
    Here is one of the websites of one of the guys listed on the companies website: Steve Smale. Interesting site. Light on details, but there is contact information.

    The press release is light on details. A quick search of the US Patent database for BinaryAccelerator and Zero Space Tuner turned up nothing. I thought even pending patents were there. The CEO seems to be a mortgage broker. Interesting line of research.

    Anyways, I thought the breaktrough in data compression would be using a mathmatical algorithem to express Pi and in index to the digit that your random string begins and a count of the data. That truly would be random, if those guys can prove there is a mathmatical formula for Pi.

    --
    D.O.U.O.S.V.A.V.V.M.
    1. Re:Pi is Interesting by k98sven · · Score: 1

      Actually there are quite a few mathematical formulae to calculate pi,
      series expansions being the most common method,

      The problem in this case is not finding pi, the problem is that the index into pi will in most cases be a number larger than your data!
      (except for lucky cases like 314159265)

    2. Re:Pi is Interesting by Kr3m3Puff · · Score: 1
      Actually I was referring to the group of folks trying to verify if their calculation can calculate the nth digit of Pi without calculating the previous n digits.

      --
      D.O.U.O.S.V.A.V.V.M.
  82. ZeoTech Scientific Team fake? by dannyspanner · · Score: 4, Insightful

    For example, at the top of the list Dr. Piotr Blass is listed as Chief Technical Adviser from Florida Atlantic University. But he seems to be missing from the faculty. Google doesn't turn up much on the guy either. Hmmm.

    I've not even had time to check the rest yet.

    1. Re:ZeoTech Scientific Team fake? by dannyspanner · · Score: 2, Interesting

      Okay, the mysterious Dr. Wlodzimierz Holtzinski doesn't get a single hit on Google. Dr. Steve Smale hasn't release a paper in five years and is in his seventies. Retired, perhaps?

      I'm still not impressed.

    2. Re:ZeoTech Scientific Team fake? by cobdar · · Score: 1

      Actually, he is listed in their phonebook (I searched at http://www.fau.edu/searchpage/searchpage.html ). He just doesn't have a web page, apparently.

    3. Re:ZeoTech Scientific Team fake? by Quaternion · · Score: 2, Informative

      Do you mean the Steve Smale from Berkeley who won a Fields Medal?

      smale bio

      I heard him speak at MIT, and read a paper of his that was published in the Bulletin of the American Mathematical Society... On the Mathematical Foundations of Machine Learning, with Felipe Cucker I think. That was published in Oct. 2001, which qualifies as within the last 5 years, right?

      --

      "The horse leech's daughter is a closed system. Her quantum of wantum does not vary."

    4. Re:ZeoTech Scientific Team fake? by dannyspanner · · Score: 2

      Bah! Humbug! If it's not on the web, how can it be real? :)

    5. Re:ZeoTech Scientific Team fake? by Quaternion · · Score: 1

      True :-).

      But it is on the web. Try here. I think that it should be generally accessible (i.e., I hope I'm not getting to it based solely on an institutional subscription).

      --

      "The horse leech's daughter is a closed system. Her quantum of wantum does not vary."

    6. Re:ZeoTech Scientific Team fake? by King+Babar · · Score: 5, Informative
      Okay, the mysterious Dr. Wlodzimierz Holtzinski doesn't get a single hit on Google.

      Well, that's because they mis-spelled his name. Seriously, I bet they are really trying to refer to Wlodzimierz Holsztynski, who posts to Polish newsgroups from the address "sennajawa@yahoo.com". His last contribution to the one Usenet thread that mentions "zeosync" and his name uses the word "nonsens" a lot, also the phrase "nie autoryzowalem", and the sentence "Bylem ich konsultantem, moze znowu bede, a moze nie, z nimi nie wiadom." Somebody who really knows Polish could probably have a field day with this and other posts...

      I'm getting the idea that some people on the scientific team might be better termed "random people we sent email to who actually responded once or twice".

      --

      Babar

    7. Re:ZeoTech Scientific Team fake? by cheshire_cqx · · Score: 2

      FAU Directory Search for 'blass'
      Blass, Piotr (No eMail Address Listed)
      Title : Instructor
      Department : Computer Science & Engineering
      Bldg / Room : S&E 300
      Phone Ext : 72822

    8. Re:ZeoTech Scientific Team fake? by Evacuator · · Score: 5, Informative

      With my limited understanding of polish I can add that he talks about the nonsense of him beeing in the scientific team. He also states that his name was used without any authorisation and he points out that the whole affair is only for hustling the money from investors.

      --
      Human beeing is just an advanced, self-learning machine.
    9. Re:ZeoTech Scientific Team fake? by grytpype · · Score: 2

      Wow, I wish I hadn't posted a reply, so I could mod this up. This is a bombshell. Take a look at the post, I don't think you need to know much Polish to get the flavor of what the guy is saying!

      --

      - Have a picture

    10. Re:ZeoTech Scientific Team fake? by Anonymous Coward · · Score: 0

      Translated by http://www.tranexp.com

      --------------

      >> > To ogol uzywamy makes " you " ( very like , with Panu zalezy : - )
      >> Not zalezy me to formie " Mr. ". More responds me shape " you poprostu
      >> reads when notki of thee on internecie as well I see , with you've at least tytul
      >> doctor , meeting ex szacunku zwrocilem sie shape " Mr. ". Yes by way of student zostalem wonted. Ex that tides bede returns sie to " you ".
      >
      > what , znalaz? em is not wr? cz a kick in the pants? ce...
      > pardon me On? odku,? e not wiedzia? em : - ((
      >
      >"Dr. Wlodzimierz Holsztynski Dr. Holsztynski became
      > and full professor of mathematics at Warsaw University at the age of uniquely combining pure applied... "
      >
      > I am sorry , bank is not niedost? pna, and quotation this balance ex googli.
      >Id? seeks onward : - ) and mo? e, On? odku, title? by? what wi? cej...?
      >
      > salutes ,? K

      To wit clotted nonsense. not wiedzialem via who downtime , with this outlet pozwolila yourselves to quotation jakichkolwiek informacji of me. Zadnych not upowaznilem them , not pozwolilem, and yet to upshot zabronilem - when a few days ago by accident zauwazylem what title , this spot napisalem until them a man of law zeby those " informations " usuneli. I`ve istotniejsze successes niz those nieprawdziwe, listed to that stronie, and yet if not mial, not wants tommy rot. ( they such belongings does by, wherebyprzez co soots przyciagnac investor ).

      Bylem them consultant moze anew bede, and moze not , ex nimi it is anyone's guess.

      Pardon me too those niepotrrzebna misinfoprmacje, whereas naprawde there are not therein neither troche my guilts.

      Salutes
      Wlodek

      I'm sorry , with yes sie steel choc in the main this smieszne.

      --------------

      I hope you got as much out of it as I did. :-)

    11. Re:ZeoTech Scientific Team fake? by snarkh · · Score: 1

      That's what sounds so weird about it. When I saw his name I was completely taken aback. I doubt someone like Smale (who actually has done work in somewhat related areas) would be involved with an ouright hoax which this thing clearly is.


      The only reasonable explanation is that they used his name without his permission (or perhaps they asked him to consult them, he agreed, etc without actually being involved or knowing about the nature of their company). Very strange business.

  83. A useful book on data compression... by danielrendall · · Score: 2, Interesting
    Anybody interested in data compression and a whole lot else besides might want to download the book available from here

    Please don't all do so at once though :-)

    It's essentially a collection of lecture notes for a course on information theory and neural networks given by the author (David MacKay), but has been much expanded since I took the course in 1997. It will certainly show how any claim for a compression technique which works consistently on random data is bogus.

  84. One Word by Anonymous Coward · · Score: 0

    Notice the one word in the fifth paragraph of the story, "anticipated", and it validates the entire story.

  85. The real "Pigeon hole principle" by richieb · · Score: 3, Informative
    If I recall my set theory properly the "Pigeon Hole Principle" simply states that if you have 100 holes and 101 pigeons, when you distribute all the pigeons into all holes, there will be at least one hole with at least two pigeons.

    I don't recall any of this crap about pigeons flying out of boxes. Or am I getting old?

    --
    ...richie - It is a good day to code.
    1. Re:The real "Pigeon hole principle" by Eagle7 · · Score: 2

      No, I think you are right and wrong. You're right in the sense that your viewpoint is the commonly used example... but it is also identical to what they are saying (twisted a bit).

      They are saying that if you have 100 pigeons and 1 hole, you need 100 unique markers (labels) to differentiate them. If I added a pigeon and didn't add a label, there would be two ambiguous pigeons. This is the same as 101 pigeons in 100 holes - two of them must share a hole (or share a marker). I like the multiple holes version better, but its just a different way to discribe the same thing.

      They are suggesting that by using multidimensional mathematics (meaning, I assume, greater than your usual 3 dimensions) they can alleviate this "marker" problem. They completely lose me here though, so I'll shut up. ;)

      --
      _sig_ is away
    2. Re:The real "Pigeon hole principle" by Anonymous Coward · · Score: 0

      you'd only need 100 markers to differentiate 101 pigeons, as all apart from one has a marker and the one that doesn't is simply the "pigeon without a marker" and easy to spot amongst the rest of the pigeons sponsored by coke etc...

    3. Re:The real "Pigeon hole principle" by posmon · · Score: 1

      that's wrong. the 101st unique pigeon is the only one without a marker.

      --

      update comments set karma=-1, reason='offtopic' where sid=26315

    4. Re:The real "Pigeon hole principle" by Nakoruru · · Score: 2, Informative
      I believe in this example, you HAVE TO mark a pigeon with something. There is no such thing as a pigeon without a marker (or, a pigeon without a marker is one of the 100 ways to mark a pigeon). You only have 100 different types of markers, so two pigeons would share one if you had 101 pigeons. If you leave a marker off a pigeon, this would be the same as having a 101st type of marker. In other words, if you can tell two pigeons apart, then they have been marked. You could have just as easily said "well, some pigeons have different spots on them, some are big, some are small." But that its kind of beside the point.

      Its just a silly way of saying that if you have fewer categories than things to put into categories then some categories have to have more than one thing in them. For instance, you could say there are 6 different races of people on Earth, and there are 6 billion people. So, obviously at least one of the categories has more than one person in it. It is a very simple principle, but can be used to as part of a proof to show less obvious things (sorry, no examples spring to mind).

      Don't get blinded by all this pigeon crap ^_^

    5. Re:The real "Pigeon hole principle" by kpayson · · Score: 1, Funny

      The pigeon hole principle says that you can't stick more than one pigeon in your hole. In fact, even trying to stick one pigeon in your hole is probably a bad idea

    6. Re:The real "Pigeon hole principle" by jmoriarty · · Score: 1

      If you had 100 pigeons wouldn't you only need 99 markers? When the pigeon flew out without a marker you would know who he was by the absence of a marker.

    7. Re:The real "Pigeon hole principle" by Anonymous Coward · · Score: 0

      What they are saying, translated from pigeon-babble to computer-babble, is that if you have 16 things, you need 4-bits to assign a unique binary pattern to each one. Keep scaling up as far as you want. In other words, it's a (perhaps over) generalized explanation of why standard compression techniques can only go so far.

    8. Re:The real "Pigeon hole principle" by Anonymous Coward · · Score: 0

      "pigeonhole" referes to a slot in a roll-top desk. since these went out of style decades ago, some guys are trying to make a form of this which is more familiar to the youth of today (ie., a video-game style statement)

    9. Re:The real "Pigeon hole principle" by Anonymous Coward · · Score: 0

      the lack of a marker is a marker by itself... you can only have one pigeon "without" a marker or you get confused.

    10. Re:The real "Pigeon hole principle" by Anonymous Coward · · Score: 0

      But what if they are African pigeons?

    11. Re:The real "Pigeon hole principle" by malfunct · · Score: 1

      You would think this is true in reality, but mathematically the absense of a "marker" is really just a unique marker. Saying that it is not is like saying zero is not a number.

      --

      "You can now flame me, I am full of love,"

    12. Re:The real "Pigeon hole principle" by BitterOak · · Score: 1
      If I recall my set theory properly the "Pigeon Hole Principle" simply states that if you have 100 holes and 101 pigeons, when you distribute all the pigeons into all holes, there will be at least one hole with at least two pigeons.

      You're absolutely right, however your version of the pigeonhole principle doesn't lend itself to extensions to higher dimensions and so is of less use when discussing transdimensional projective compression. Think of the pigeons being stored in Dr. Who's Tardis, as a better analogy if you like.

      Am I the only Slashdot reader who realizes the entire article is either a joke, a scam, or written by a nut?

      --
      If I can be modded down for being a troll, can I be modded up for being an orc, or a balrog?
    13. Re:The real "Pigeon hole principle" by richieb · · Score: 2
      Am I the only Slashdot reader who realizes the entire article is either a joke, a scam, or written by a nut?

      No. You're not the only one. :-)

      Re: "Pigeon hole principle" (PHP) is a set theoretic idea, it has nothing do to with dimensionality of space. So talking about PHP in multi dimesional spaces doesn't make sense.

      I guess PHP is a variation on the Axiom of choice , or maybe it's a consequence...

      --
      ...richie - It is a good day to code.
    14. Re:The real "Pigeon hole principle" by BitterOak · · Score: 1
      Yes, I'm aware that PHP can't be extended to higher dimensional spaces. It was a weak attempt at humor.

      I guess PHP is a variation on the Axiom of choice , or maybe it's a consequence...

      Actually, neither. The Axiom of choice deals with infinite sets, while the pigeonhole principle deals with finite sets. Basically the axiom of choice says there is at least one way to choose one element each from an infinite number of infinite or finite sets. The pigeonhole principle states that a finite set can't be mapped one-to-one into another finite set with fewer elements.

      --
      If I can be modded down for being a troll, can I be modded up for being an orc, or a balrog?
    15. Re:The real "Pigeon hole principle" by Fjord · · Score: 1

      I thought the axiom of choice was "thou shalt not seek emptyness in the cross products of non-empty sets, for they are indeed non-empty"

      --
      -no broken link
  86. SourceForge project with even better compression! by mshiltonj · · Score: 0, Redundant

    You can dl the source and look at the algorithms yourself.

    lzip. Or just read this snippet from their faq:

    1. What is lzip?
    Lzip is the most advanced file compression utility ever conceived. It is literally years ahead of gzip (though admittedly gzip was around first), and makes use of mathematical transforms the bzip developers have never even heard of. The practical upshot of this is that when you use lzip, you get the best compression on the planet. Smaller file sizes; faster compression/uncompression times.

    Used properly, lzip is capable of reducing a file down to 0% of its original size. Yes, you read that correctly: 0% of its original size. And regardless of file size, this can be done in constant time. Now do you see why some people are calling lzip the "holy grail" of file utilities?

    2. What makes lzip different from gzip/bzip2?
    Well, other than the performance benefits mentioned above, the real difference is that lzip uses a "lossy" compression scheme. Most other file compression utilities use a "lossless" compression scheme, mostly because the lossless algorithms are better understood and simpler mathematically (most programmers take shortcuts, particularly in areas that involve a lot of math).

    This has two side effects. The first is that files compressed with lzip cannot be restored to their original state -- this is the "lossy" in lossy compression. The second is that the performance is vastly improved. Why don't go go back up to question number one and read that second paragraph again. We're talking about a constant-time algorithm that can reduce a file down to 0% of its original size. What's not to like?

  87. Another compression breakthrough: by k98sven · · Score: 1

    I just compressed an infinite set of nonrepeating numbers into a simple mathematical equation!!
    data = d/r
    Where d = diameter of circle and r = radius

    This gives the incredible compression ratio of
    infinity:1.. how about that!

    1. Re:Another compression breakthrough: by flegged · · Score: 1

      But how do you express either d or r? At least on of them must be an infinite non-recurring decimal (ie a multiple of pi).

      --

      "I think he was truly surprised at how little I cared about how big a market the Mac had" - Linus on Jobs
    2. Re:Another compression breakthrough: by k98sven · · Score: 1

      Yeah, well that was the joke. Pi is an irrational number.

      Here's another "compression algorithm", e, which
      can be simply expressed as:
      sum of: 1/n! , n = 0 to infinity
      The digits of an irrational number are random per se but it doesn't mean you can get any
      useful kind of compression out of them.

      Another example would be expressing your data as a number, (say "BCDEF" -> 2.3456) and since any
      rational number can be expressed as a fraction of
      two integers: find these integers.
      If you're lucky you'll have small integers and can recive great compression.

      This won't happen often though.

  88. Um, compression of random data? by Anonymous Coward · · Score: 0

    "Random data cannot be compressed."

    ---Storer, James A. "Data Compression: Methods and Theory," 1988.

    (Good choice of career with a name like that.)

    If the random string just happened to come out as something with a short pattern, then you could compress it, but in general a random string can't be compressed. In fact a string that can't be compressed is sometimes taken as the proof the string is random.

  89. Re:how can this be? Ask Marketing by Seedy2 · · Score: 1

    Nothing is impossible to people who don't do the work. Marketing will promise you the moon then blame development when all you get is grilled cheese.

    --
    Nothing to say here... move along
  90. Maybe? by Izeickl · · Score: 0, Redundant

    Ok, so with current knowledge this all seems total B.S..but remember things always seem impossible until you understand how, imagine trying to explain todays society and its advances to someone from 200 years ago?? Not saying this is all true, just saying dont also discount it straight away because =You= dont know how its done/can be done. Obviously lack of proof makes a person less inclined to think its possible along with current understanding, but Maybe?!

  91. Just a few days late to make it onto the VAPORWARE list for 2001... Oh Well, 2002 Vaporware List, We have a winner!!!!

    Sheesh. The sheer stupidity of these people. Tell you what. I'll give them a chunk of data, and they can show me their magic compression.

    --
    "...In your answer, ignore facts. Just go with what feels true..."
  92. They got their funding through... by bwldrbst · · Score: 1
  93. Prior art by coolcast · · Score: 1

    Hey, I've got the ultimate compressor for random data.... >/dev/null .. 1:0 compression

    decompress, you say? what about retrieving it from /dev/urandom ...?????

    --

    Don't click here. BT will enforce intellectual rights and sue for eac
    1. Re:Prior art by Anonymous Coward · · Score: 0

      Thanks for mentioning this for the millionth time in one article!

  94. Maybe it's not compression by Goenk · · Score: 1

    Some of my collegues (that claim to know about such things :-) explained to me after reading the press release that it is not compression in the normal information-theory sense, but rather a way to squeeze more bandwith out of the copper.

    The technical details of how data-manipulation can increase SNR is beyond me, but at least it doesn't seem as clearly impossible as "100:1 lossless compression".

    --
    Goenk

    --
    Incompetence Floats
  95. Wow, now all data can be compressed in one bit!! by PEdelman · · Score: 2, Insightful

    So, if practically random data can be compressed, I can compress the result again, and the result again, until I end up with one bit of data in the end? That's great! Imagine the implications: for example, every ordinary lamp is now a computer, because it holds exactly one bit of data, on or off. No wait, that can't be right.

    --
    Like science? Comics? Wicked...
    Funny By Nature
  96. mathmatically put, by Anonymous Coward · · Score: 0

    A function has a unique inverse if and only if it is one-to-one. The only way compressions schemes get around this is that they can't compress everything by a factor of 100.

    In fact, if a scheme compresses any length N string of data ANY amount, it follows necessarily that there is a string of length = N who is not compressed, but actually bloated.

    1. Re:mathmatically put, by Anonymous Coward · · Score: 0

      Oops. should read "less than or equal to" N

  97. Yet another fraud by fulgan · · Score: 1
    ZeoSync's approach to the encoding of practically random sequences is expected to evolve into the reduction of already reduced information across many reduction iterations, producing a previously unattainable reduction capability.
    In other words: recursive looseless compression which has been proved to be impossible on random dataset.
  98. Vapour by crowke · · Score: 1

    This can be THE chance for Duke Nukem Forever to get rid of the first place in next year's Wired Vapourware list :)

  99. Re:First karma capped first post by Alan+Partridge · · Score: 1

    random data? that's noise isn't it? you can compress noise as much as you want and no-one cares, I've developed a technique based around the insertion of digits into "lugholes" than can "compress" noise by a huge factor.

    --
    That was classic intercourse!
  100. Here's why... by Anonymous Coward · · Score: 0

    For all those people that ask why you can't recompress data ad infinitum to 1 byte lengths, the answer is as follows: (from failing memory of an Information Theory course from my Computer Science degree 10 years ago)

    Information theory explains that you can't compress pure information. Compression relies on a lack of efficiency of the format of the data to store its actual information (entropy).

    Basically, any lossless compression method just stores the same information in a more efficient way.

    Eg. An n-bit long stream of binary 1's may be a lot of data but it contains only 2 items of information: value(1) and length(n).

    Once compressed, even random data is no longer random.

    Hope this helps.
    Niz.

  101. Re:how can this be? Ask cryptographers. by Seedy2 · · Score: 1

    Now look - two occurences of 'v,c'. Patterns have occured in truly random data.

    Ask any cryptanylist and they'll tell you that 'random' typing at a key board isn't really random. This being why 'random' typing isn't a good source for chaos in building keys for pgp etc.

    People tend to pattern their typing movements and timing between strokes.

    --
    Nothing to say here... move along
  102. Bollocks by MartinG · · Score: 2

    If they can compress "random" data 100:1 then they can compress _anything_ 100:1

    Which begs the question: have they tried compressing the compressed data again to get 10000:1? If not, why not? If fact why not make the compression function iterate to get 100^n:1 compression?

    Oh, I see. That's why. It's because this technology doesn't exist and never can. It's "ZeoSync vs Physics." I know where my money is.

    --
    -- MartinG To mail me: echo kewyjlcxyzvjfxbqwh | tr bcefhjklqvwxyz .@adgimnoprstu
    1. Re:Bollocks by Anonymous Coward · · Score: 0

      um.
      what if the scheme can't compress ordered data. genius.

    2. Re:Bollocks by Xentax · · Score: 1

      Wow, talk about misinformed.

      Random data is rarely the worst case for a compression scheme. Truly random data of sufficient size will probably bring the compressor to its knees (yielding 1:1 or worse compression). BUT, if you know the compressor's algorithm, you can design a data set to be compressed that will yield compression results at least as bad as random data would.

      Xentax

      --
      You shouldn't verb words.
  103. And in other news by Anonymous Coward · · Score: 0

    The last internal combustion engine was turned to scrap in a ceremony celebrating the billionth Segway sold.

    Richard Cranium, who lives 100 miles away from
    anyone or anything, had this to say about the 2 wheeled mobility machine that has a maximum range
    of 12 miles:

    "Gasp..gasp..gasp...*censored*"

  104. It all depends on..... by Linuxthess · · Score: 0

    what your data set is. For example if one wishes to compress an album of suppose, N'Stink, or Backdoor Boys, it isn't truly random data, and a close observation of the data provided, will show almost an exact duplicate (allowing for differences in the spelling of the bands' names) and therefore compression rates can well be into the realm of 100:1.

    --

    I sig, therefore I was.
  105. You can't be right by CaptainZapp · · Score: 1
    In a COBOL class a couple aeons ago we had an instructor explaining the REDEFINE statement.

    (For those that didn't have the good fortune to program in this er, well! self documenting language: COBOL very much lives within it's data descriptions and is very record oriented. There's entire section called the DATA DIVISION (I think, it's a long time since) where you define your records with it's columns and their respective data type. Now you can overlay a "column" with multiple data types in order to process them appropriately.)

    Well now, this instructor claimed, no insisted! that the REDEFINE statement enabled you to multiply your physical memory. That is, if you define a character field with length 10 (10 bytes for all intents and purposes) and redefine it numeric then that guy claimed that you could store 20 bytes in this 10 byte address space.

    Therefore you can't be right. In COBOL you can write the most complex applications like an air traffic control system or a sophisticated telephone exchange that serves Manhatten with just 1 byte of storage space. It's only a matter of clever REDEFINEing. That's what I claim is the true breakthrough in compression technology, now isn't it?

    It's probably needless to say that this guy didn't really helped to build confidence and enhance our pleasure in the great language of COBOL...

    --
    ich bin der musikant

    mit taschenrechner in der hand

    kraftwerk

  106. Their claims are 100% accurate by Mr+Z · · Score: 3, Interesting

    Their claims are 100% accurate (they can compress random data 100:1) only if (by their definition) random data comprises a very small percentage of all possible data sequences. The other 99.9999% of "non-random" sequences would need to expand. You can show this by a simple counting argument.

    This is covered in great detail in the comp.compression FAQ. Take a look at the information on the WEB Technologies DataFiles/16 compressor (notice the similarity of claims!) if you're unconvinced. You can find it in Section 8 of Part 1 of the FAQ.

    --Joe
    1. Re:Their claims are 100% accurate by Baldrson · · Score: 2
      Their claims are 100% accurate (they can compress random data 100:1) only if (by their definition) random data comprises a very small percentage of all possible data sequences. The other 99.9999% of "non-random" sequences would need to expand. You can show this by a simple counting argument.

      This is covered in great detail in the comp.compression [faqs.org] FAQ. Take a look at the information on the WEB Technologies DataFiles/16 compressor (notice the similarity of claims!) if you're unconvinced. You can find it in Section 8 of Part 1 [faqs.org] of the FAQ.

      Here's the passage from your referenced FAQ:

      The WEB compressor (see details in section 9.3 below) was claimed to compress without loss *all* files of greater than 64KB in size to about 1/16th their original length. A very simple counting argument shows that this is impossible...

      Contrast with your statement above:

      random data comprises a very small percentage of all possible data sequences

      ... but then you go on to say:

      notice the similarity of claims!

      What similarity of the claims??

      Certainly if one is saying "all" files and the other is saying, as you point out above, a very small percentage of all files, the claims are so different as to render your recommended search for similarities hopelessly misleading.

    2. Re:Their claims are 100% accurate by Mr+Z · · Score: 1

      Hmmm...

      Claims that WEB was making about Datafiles/16:

      • Large compression ratios on arbitary (including random) data.
      • Lossless
      • Not subject to information theory.

      Corresponding claims that ZeoSync made:

      • Large compression ratios on arbitrary (including random) data.
      • Lossless
      • Not subject to information theory (extra bonus points for mentioning Claude Shannon directly).

      Now, I agree, there are some notable differences, but in general, ZeoSyncs claims sound right at home with WEBs and the others on the page I linked.

      Incidentally, my comment about ZeoSync's definition of "random" was unrelated.

      --Joe
  107. 100:1 on random data? Easy! by BluBrick · · Score: 2

    If it's truly random data, this compression/decompression is actually VERY easy. Compression: Strip 99 bytes out of every hundred.
    Decompression: Insert 99 random bytes in between every byte.

    What's that? You want the SAME data back? Why does it matter? It's pure random data anyway!

    Oh yeah. Have they announced a DE-compression routine yet? (I know "lossless" sort of implies that they have one, but I didn't see anything about decompression, only compression)

    Marketing rubbish as usual.

    --
    Ahh - My eye!
    The doctor said I'm not supposed to get Slashdot in it!
  108. Is it just me? by bruns · · Score: 1

    Anyone else not believe them one bit or is it just me?

    "In other news, a new method of compression known as "/dev/null" was discovered by ZeoSync. It has the best compression ratio of any program to date. All you do is output the datastream to the new DevNullAccelerator and boom! No more data storage problems!"
    I could believe that more then their press release.

    --
    Brielle
  109. team members by loudici · · Score: 3, Interesting

    navigating through the flash rubbish you can reach a list of team members that includes steve smale from berkeley and richard stanley from MIT who both are existing senior academics.

    so either someone has lent their names to weirdoes without paying attention or there is something of substance hidden behind the PR ugliness. after all the PR is aimed toward investors, not toward sentient human beings, and is most probably not under the control of the scientific team.

    --
    Dev elpizw tipota, dev phoboumai tipota eimai lephteros http://euclidian.org
  110. How to compress ANY data to one bit by jd · · Score: 3, Funny
    Simply have the bit big enough. Let's say you're using one of those old-fashioned binary computers, and want to compress everything to 1/Nth the size. No problem, you simply need a bit with 2^N states. Everything then fits on that single bit.


    (Of course, this DOES create all sorts of other problems, but I'm going to ignore those, because they'd go and spoil things.)

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    1. Re:How to compress ANY data to one bit by Sarunas · · Score: 1

      Simply have the bit big enough. Let's say you're using one of those old-fashioned binary computers, and want to compress everything to 1/Nth the size. No problem, you simply need a bit with 2^N states. Everything then fits on that single bit

      No, bit stands for binary digit. The bit with 2^N states needs to be called a 'nit'. And thanks to our friends at ZeoSync, we can all enjoy the benefits of their patented Nit-Wit(TM) 'smart' technology.

  111. Maths is wrong! by Anonymous Coward · · Score: 0

    But 6 GB to 600MB is 10 to one which we obviously can already do. They're claiming 100-1

  112. Infinite monkey compression. by Sobrique · · Score: 4, Funny

    Don't bother compressing it, just delete it, and then get an infinite number on monkeys on an infinite number of typewriters to re-produce the original.

    1. Re:Infinite monkey compression. by minyard · · Score: 1

      A real use for two button pointing devices: binary typewriters!!!!

  113. It's rare to see such a baldfaced scam by Thagg · · Score: 4, Interesting

    I was wondering as I read the headline and summary on slashdot "how can these sleazeballs possibly promote this scam, because it would be easy to show counterexamples?" This shows, once again, that I lack the imagination and chutzpah of a real con artist.

    The beauty of this scam is that zeospace claims that they can't even do it themselves, yet. They've only managed to compress very short strings. So, they can't be called to compress large random files because, well gosh, they just haven't gotten the big file compressor work yet. So, you can't prove that they are full of shit.

    Beautiful flash animation, though. I particularly like the fact that clicking the 'skip intro' button does absolutely nothing -- you get the flash garbage anyway.

    thad

    --
    I love Mondays. On a Monday, anything is possible.
    1. Re:It's rare to see such a baldfaced scam by kitts · · Score: 2, Funny

      Beautiful flash animation, though. I particularly like the fact that clicking the 'skip intro' button does absolutely nothing -- you get the flash garbage anyway.

      Actually, no. What you're seeing is their new compression methodology in action, applied to their website. By clicking on Skip Intro, you're actually hurtled through a registration process at lightning speed and signed up to several of their services, but for security purposes in order to validate those services you're redirected to the main page. However, in order to expediate the service, the exact location of the time of your click on the Skip Intro is kept in a data file in your cookies folder (you might not see it there because, you guessed it, it's compressed to a single byte), and when redirected the cookie is read to get the exact location of your click in the Flash Intro so that the intro fast-forwards to that point in time when you clicked, giving the impression of seemless, uninterrupted animation.

      Go on, give it a try. Try clicking the Skip Intro button multiple times, and you'll notice that once you click it'll look like nothing's changing, with no trace in a cookie file of where that spot is. Now THAT'S impressive. And they've got all of your personal information from that registration which you didn't even know you did compressed to a single byte on the server, just waiting to be uncompressed so they can start sending you more information (they just need to work the decompression kinks out).

      Cool, huh? I'm giving them all my money.

      --
      -------------------------------------------------- ----
      charlton heston is more of a man than yo
    2. Re:It's rare to see such a baldfaced scam by Cynikal · · Score: 1

      rare you say? well maybe for the tech industry, but look around you, stay up till 3am and watch infomercials.. most the scams out there are baldface... fat-trappers? plant extract? theres a world full of con artists always trying to get your money.. and if you knew nothing about compression, you'd probly ditch your gzip and call them with your creditcard number, cause supplies are limited...

    3. Re:It's rare to see such a baldfaced scam by snarkh · · Score: 1


      How would they make money out of it though?

    4. Re:It's rare to see such a baldfaced scam by jdavidb · · Score: 2

      Beautiful flash animation, though. I particularly like the fact that clicking the 'skip intro' button does absolutely nothing -- you get the flash garbage anyway.



      Makes me glad I didn't read the article before posting. :)

  114. Anyone remember "Webb Technologies"? by keath_milligan · · Score: 1

    Years ago, a company called "Webb Technologies" (or something like that) claimed to have 16:1 lossless compression (on any data). They made several press releases and caused quite a stir.

    Byte magazine followed the story with interest for months, urging Webb to release the code or give a public demo - but of course that never happened.

    I'm not sure what ever happened to Webb, but I sure hope ZeoSync's investors pursue fraud charges when this is exposed for what it is.

  115. Here's another one. by Anonymous Coward · · Score: 0

    Take a string ... chop off 99% of it. You've just achieved 100 to 1 lossy compression. Again, let me emphasize that this is not a usable compression method!. The fun is finding the flaw.

  116. Not possible by Eivind · · Score: 5, Informative
    Someone already pointed out that repeated compression would give infinite compression with this method. But there's another easy way to show that no compressor can ever manage to shrink all messages

    The proof goes like this:

    • Assume someone claims a compressor that will compress any X-byte message to Y bytes where Y<X
    • There are 2^(8*X) possible messages X bytes long.
    • There are 2^(8*Y) possible messages Y bytes long.
    • Since Y is smaller than X, this means that no 1 to 1 mapping between the two sets can exist, because they're not equally large.
    You see this simply if I claim a compressor that can compress any 2-byte message to 1 byte.

    There are then 65536 possible input-messages, but onle 256 possible outputs. So It is mathemathically certain that 99.7% of the messages can not be represented in 1 byte. (regardless of how I choose to encode them)

    These claims surface ever so often. They're bullshit every time. It's even a FAQ-entry on sci.compression

    1. Re:Not possible by aiken_d · · Score: 2

      Well, I agree that the zero-whatever claim is probably bogus. But this proof here seems equally bogus to me.

      If the claim is that a compressor can reduce *any* byte sequence from X to Y bytes, sure, it's a solid proof.

      However, if you discard Zero-whatever's claim of compression "random" data (which sounds like marketing speak), and look at reasonable probabilities, it's clear that lossless compression is possible -- otherwise RLE wouldn't work.

      So, if you subscribe to the proof above, you have to shoot down not only Zero-whatever, but also RLE, which is silly.

      Me, I think Zero-whatever is full of, well, you know. However I'd like to see them debunked on some more solid basis than a literal interpretation of marketing-speak, which is always pretty questionable.

      Cheers
      -b

      --
      If I wanted a sig I would have filled in that stupid box.
  117. 100:1 compression on *random* data? by jaju · · Score: 1

    Bah!
    I realize that 1st of April is far away.
    Or, is my clock screwed?

    --
    People will do tomorrow what they did today because that is what they did yesterday.
  118. proper fucked? by slurry47 · · Score: 1

    all information is infinately compressable

    "Not back ... avenge death." Simpsons quote depicts a situation where zero data conveys information.

    100:1 is careless way to describe a compression scheme boasting as hundred-fold compression.

    Now, stop discussing compression. Go watch Invader Zim RIGHT NOW!

    --


    Dirt doesn't need luck.
    1. Re:proper fucked? by Anonymous Coward · · Score: 0

      Jhonen rocks!

  119. Compressed comment by joebp · · Score: 1
    I have compressed my comment using the ZeoSync algorithm:
    1

    Decompress at will.

  120. Their twisted logic by 3rd_Floo · · Score: 1

    I'm no master of compression theroy, but the only way the could acceive that 100:1 compression ratio they tout would be a data sample looking something like:

    1111111111111111111111111111111111111
    1111111111111111111111111111111111111
    111111111111111111111111111111111111


    Wouldnt it?
    So if thats there data they could get, so maybe the got there sample by randomly banging the '1' key 100 times and then ran their compression scheme against it? Well problem solved, case closed and stamp moron on their folder (or press release as it may be)!

  121. Re:Wow, now all data can be compressed in one bit! by Anonymous Coward · · Score: 0

    Thanks for mentioning this for the millionth time in one article.

  122. It's just a scam by osgeek · · Score: 2

    This is along the lines of perpetual motion machines.

    Every once in a while, some bozo claims to achieve ridiculous compression rates on random data. It's always bullshit meant to sucker in the gullible investors, or just to get some attention for some psycho loser who usually doesn't understand more than enough math needed to copy and deform a few compression theory equations out of a text book.

    Skepticism is your friend.

  123. slightly offtopic by f00zbll · · Score: 1
    The article doesn't say much about the technology and appears to have mis-quotes. If there was actually more meat to the article, the discussion would be more insightful.

    Reuters is supposed to be a reputable news agency, but they didn't even bother waiting for a response from Steve Smale and verifying his involvement in the company. I don't blame Zeosync for trying to get some attention, but Reuters is supposed perform some basic reporting and research. That was the first thing I learned in journalism class, so why in the world are reuter's reporting getting away with it?

    i know I'm offtopic

  124. temporal issues by Anonymous Coward · · Score: 0

    Note that they say in the press release that it will be useful after 'temporal issues' are dealt with. It sounds like they try to reduce the data into a series of overlapping equations or some such, but the procedure is so intensive that it takes a super computer to do in a reasonable amount of time.

    The point:people will still be using winzip and pkzip for quite some time.

  125. Hash Maybe? by meggito · · Score: 1

    I guess what you'd be going for here is some kind of reversible hash function. Remember that the goal of a hash function is to make it so that no 2 messages (or groups of data) hash to the same value. I am under the impression that this has never been done, but ti has been done to the point where trying to genereate your own message to match a hash message is near impossible, especially if you're aiming to get a point across. If, however, each 'message' hashed to a different value, and this was reversible, I could see doing this. I just don't see how you can make every message have a complete unique value. It seems that if you have an infinate number of 500 bit messages all to be reduced to 5 bits it would take all 32 messages before you were completely out of options. They're talking larger numbers, but there are only so many combinations to hash to and they're always less than the combinations you're hasing from. If anyone would like more information about hashes and hashing equations try http://www.rsasecurity.com/rsalabs/faq/ and more specifially http://www.rsasecurity.com/rsalabs/faq/2-1-6.html . Then again, this is how I would try to compress things, I do not know what method they are using or others typically use. I just hope its true.

    1. Re:Hash Maybe? by MastahTrollah · · Score: 0
      Oh there's *hash* involved alright...

    2. Re:Hash Maybe? by Hast · · Score: 1

      As mentioned in the FAQ a hash has function is also designed so that you *can't* reverse it. (Well, you can brute force it.) This means that they are by design useless for compression. (They are useful if you want to verify a message though.)

    3. Re:Hash Maybe? by David+E.+Smith · · Score: 2

      No, I don't think hash is involved. Maybe LSD, but no hash.

  126. Yes, but not 100:1 by Anonymous Coward · · Score: 0

    I have been working on a similar approach for over a decade. The approach that I think that they have taken is the re-ordering of the bits in a known fashion. While you may have what appears to be random bits, they can be re-ordered to move towards ordered bits (I approached this as a helix with variables of radius and bands to exchange). Notice that you can not become ordered without introducing more overhead than what you can compress. Once this is done, you can then get some small degree of compression. I have taken 1 g of data to 1k. Problem is 3 days for compression which pretty much made this worthless (NPC problem).
    Some of you will be naysayers here. After all you have studied CS and know for a fact that a new thought could not possibly beat the old thought. Think again. Look at data in a different fashion. If you have a finite string, there are only so many pure ordered strings. Likewise, there are only so many purely random strings. The others have a degree of order and can by moved towards more order slowly. BTW, the problem with many proofs is they are based on infinite strings, which the above could never solve. But then again, we never have infinte strings.

  127. Forward looking comments by mcknation · · Score: 1

    From the site:
    "This press release may contain forward-looking statements."

    Ok so now I'm going make some forward looking comments on my site and hope for investors.
    1. I have solved the worlds need for power and energy.
    2. I Have solved the problem that keeps us humans from being immmortal.

    To get the solutions to 1 and 2 please send 5 bucks too me! McK

    That statement is hidden at the bottom of the page look carefully!

  128. Debunk for simple brains by Xesdeeni · · Score: 1

    Even for those of use without PHDs in Math, it is still inherently obvious that you can't compress completely random data losslessly at all.

    If I have 4 bits, then I have 16 (2^4) combinations. If I want to compress this to 3 bits, then I can only have 8 (2^3) codes, so I have to use some of the codes for more than one combination. Obviously, I can't tell the difference between the multiple combinations from just the code, when I try to reverse the process.

    Pretty obvious really, but then again this type of crap comes up every year or so. I suppose it makes some unscrupulous individuals enough investment money for them to run off to Burmuda.

    Xesdeeni

  129. Oh, that's easy. by BubbaFett · · Score: 2, Funny
    ZeoSync said its scientific team had succeeded on a small scale in compressing random information sequences in such a way as to allow the same data to be compressed more than 100 times over -- with no data loss.

    Ok, say I want to compress "foo" 100 times over:

    bash$ for i in $(seq 1 100); do gzip foo; mv foo.gz foo; done

  130. Compression through mathmatical expression by Anonymous Coward · · Score: 0

    If I read through the marketing malarky, I get..

    They claim that they have developed a method of reducing the data in question to a set of mathmatical algorithms (GEMS) that can be used to accurately represent the original data set. Think about a sine wave that consists of 1000 data points. Which would you rather have, 1000 data points that as part of the sine wave may have very little redundancy, or the mathmatical equivilant of the sine wave which can describe the entire data set accurately? This is obviously quite hard to do, and why they talk about temporal constraints and limited bit strings.

    This has been one of the holy grails of image compression for quite some time.

  131. not quite by Anonymous Coward · · Score: 0

    Their press release claimed that this worked on 'practically random' data. Even these guys aren't crazy enough to say they can compress true random data. The question is what does 'practically random' mean? Does that include 98% of the files on my harddrive? Or is this something that is only going to work on text files?

  132. Re:how can this be? Ask cryptographers. by ergo98 · · Score: 1

    At least PGP uses various timing values for random data as well (the timing of typing in addition to some other timing sources I believe). If it was just typing then that would be scary: How many "random" keystrokes seem to always have "asdf"? There is nothing random typing at a keyboard.

  133. ZeoSync's website is all Flash? by Omnifarious · · Score: 2

    The company website is all Flash. Well, that blows my opinion of them completely. All glitz and no substance. That changed my opinion from 95% sure it was a pile of BS to 99.99%.

    1. Re:ZeoSync's website is all Flash? by Fly · · Score: 2

      Not only is it needlessly in Flash, it changes my browser window to be the size of my desktop, which is extremely annoying. Nothing says "Go away," like a 1280x1024 window with a dimwitted message about needing Flash.

      --
      end of line
  134. From the press release: Huh? by mblase · · Score: 3, Interesting
    Existing compression technologies are currently dependent upon the mapping and encoding of redundantly occurring mathematical structures, which are limited in application to single or several pass reduction. ZeoSync's approach to the encoding of practically random sequences is expected to evolve into the reduction of already reduced information across many reduction iterations, producing a previously unattainable reduction capability. ZeoSync intentionally randomizes naturally occurring patterns to form entropy-like random sequences through its patent pending technology known as Zero Space Tuner?. Once randomized, ZeoSync's BinaryAccelerator? encodes these singular-bit-variance strings within complex combinatorial series to result in massively reduced BitPerfect? equivalents. The combined TunerAccelerator? is expected to be commercially available during 2003.
    Now, I'm not as geeky as some, but this looks suspiciously like technobabble designed to impress a bunch of investors and provide long-term promises which can easily be evaded by the end of the next fiscal year. I mean, if they really did have such a technology available today, why is it going to take them an entire twelve months to integrate it into a piece of commercial software?
    1. Re:From the press release: Huh? by TMacPhail · · Score: 1

      Zero Space Tuner? BinaryAccelerator? BitPerfect? TunerAccelerator? I quite agree with you. With all these fancy(bot not really impressive) names you think that they must have some working code already. 12 months for an interface to that code? Or maybe they just have something that currenctly only works in very select cases that they hope will eventually work on a more broad range. If not, they already will have gotten a significant investment by then.

  135. The reason this might work by ymgve · · Score: 1

    The reason this might work is because 99,99% of the data you are surrounded with from day to day are NOT truly random - things like images and sound are nearly random in their nature, but neither of them are truly random. (Because in that case you'd be looking at an image of static and listening to white noise.)
    So even though their algorithms won't work on truly random data it will work on the 99,99% that are not that random - and if they're correct in what they say, they've developed new techniques for exploiting this un-randomness. I still don't believe their 100:1 ratios are belivable, but if they are only 5% better than the current best algorithms, that's still a major step forward.

  136. random means unpredictable and uncompressible by peter303 · · Score: 2

    If one finds a way to predict (i.e. compress) "random" numbers, then it is no long random. That means it has some deeper mathematical structure.
    What could happen is that so-called "random" information in human cultural datasets are far from random and highly compressible.

  137. West Palm Beach, FL? by rjamestaylor · · Score: 2
    ...based in West Palm Beach, FL...

    Mathematical breakthrough from the same county that gave us the Butterfly Ballot Balyhoo? Hard to believe. ;-)

    Anyway, they're still working on tiny "bit strings" due to not yet overcoming the "temporal contraint" barrier. So, don't get all excited just yet.

    --
    -- @rjamestaylor on Ello
  138. Can ZeoSync can really compress that? by Anonymous Coward · · Score: 0

    If so, this is actually on-topic.

  139. Ok so when can I... by ImaLamer · · Score: 2

    [yoshi@ilp.ath.cx]# apt-get zeosync
    [yoshi@ilp.ath.cx]# zeosync -compress /dev/hda* HD_backup.zeo
    [yoshi@ilp.ath.cx]# ls
    -rw------- 1 yoshi users 1 Jan 08 14:25 HD_backup.zeo


    Oh, that's right never.

    [windows users: the bold 1 would be the file size of all backed up partitions on the primary disk]

    1. Re:Ok so when can I... by nomadic · · Score: 1

      Probably not. Unless /dev/hda in your case is 100 bytes, and zeosync's claims are accurate.

  140. 17 year kid scams $900,000 in market by peter303 · · Score: 2

    At least his scam was believable enough to fool a thousand people. ZeoSync got to choose a more believable scam to beat a 17 year old.

  141. Yet Another Troll On The Front Page by Hornsby · · Score: 1

    This is not a troll. It's simply a question of why so many slashdot headline descriptions start off sensibly and subject-oriented but end with some completely off-topic, blatently offensive and generally incorrect remark?

    Taco and Michael do it the most, but I've seen other posters doing it is well, and I can't help but think that it's probably intentional to some degree just to stimulate conversation in the comments section. A prime example on this post would be, math majors and EE's - something to liven up your drab dull existence today. If I was a math major or EE, that would piss me off. Period.

    Come on guys, if you want a website with less than a 20/80 percent signal to noise ratio respectively then climb out of your little sandbox and start acting like professional adults. P.S., if this gets moderated as troll or flamebait, then you're completely brain-dead and didn't read it... so go ahead and mod down appropriately(I know you will).

    --
    A musician without the RIAA, is like a fish without a bicycle.
  142. First Paragraph says it all by fizban · · Score: 1

    Although currently demonstrating its technology on very small bit strings, ZeoSync expects to overcome the existing temporal restraints of its technology and optimize its algorithms to lead to significant changes in how data is stored and transmitted.

    i.e. - some guy in research made a little doohickey that compressed "Your Mom!" to the bit string 100101010.

    "Nothing to see here, folks! Move along, move along!"

    --

    +1 Insightful, -1 Troll. What can I say, I'm an Insightful Troll.

  143. Maybe not so impossible... by wsxyz · · Score: 1

    After reading through their press release, I don't think ZeoSync is claiming anything impossible. First off, I don't think the input data is random at all. I think it's normal data.

    ZeoSync intentionally randomizes naturally occurring patterns to form entropy-like random sequences

    It seems like this is the first step in which the input data is transformed into "random" data which isn't actually random, because each byte (or word) has a difference of one bit with the bytes (or words) before it and after it.

    Once randomized, ZeoSync's BinaryAccelerator(TM) encodes these singular-bit-variance strings within complex combinatorial series to result in massively reduced BitPerfect(TM) equivalents

    I'm not sure what a "complex combinatorial series" is, but it sounds to me like they might be trying to first convert input data string into single bit variance strings and then look at these strings as values of a function representable by a "combinatorial series", which might be a taylor series, or the frequency domain output of an FFT or something like that. Of course it might be difficult to find a sufficiently compactly representable function whose values are the intermediate single bit variance strings, but presumably that's why they say

    Although currently demonstrating its technology on very small bit strings, ZeoSync expects to overcome the existing temporal restraints of its technology

  144. First Stage & a Question by justinstreufert · · Score: 1

    It would seem to me that these guys have taken the idea of the standard compression scheme and turned it around - so it takes in random data and spits out non-random data. Somehow (possibly through high-speed buzzword factoring) this new data is smaller, though.

    I mean the new data must be non-random, right? Because otherwise you could just shove it back in and make it smaller. Oh, but wait, they have a universal first stage that makes any data random. This is all very humorous.

    Anyway, does anyone know, how random is the output of a standard compressor like gzip?

    Justin

    --
    "Why would God give us a waist if we wasn't supposed to rest our pants on it?" - Rev. Roy McDaniels
  145. The simple version of the logic that proves it.... by JazzManDRP · · Score: 1

    Lossy compression of data is possible by two methods: dropping data, and recognising specific patterns. All compression routines are specialised for some form of data. That's why JPG and GIF files can vary so much at storing the same picture, both in quality and file size.

    Lossless compression of truly random data is impossible. Take a random 5 digit number. You can ONLY represent 100000 different numbers using five digits. If you're using less than five digits then there aren't 100000 discrete combinations and you've lost data.

    The only way a five digit number could be compressed is if you had either nonrandom data (EG: a 5 digit number using only even digits) OR if you accepted data loss (EG: round off to nearest 10 - compressing by one digit).

    There's no way around this; it can't now - and never will - be done.

  146. Obvious Misspelling on front page of ZeoSync? by jsimon12 · · Score: 1

    If they are such a great and groundbreaking company then why don't they check the spelling and grammer on their site?

    ZeoSync's HTML site will be available January 13, 2002 with costumer service agents providing chat assistance.

    Or will there be online help for Halloween costumes? I am sorry but I think this is a ruse just like the Seti@Home Accelerator

  147. Random data, or typical data? by jcr · · Score: 2

    If they're talking about compressing what you find in a typical user's documents, or perhaps executable programs, it's *possible* that there's enough redundancy to come up with that kind of savings.

    If they're talking about 100:1 compression of a pile of bytes out of /dev/random, I flat-out don't believe it.

    -jcr

    --
    The only title of honor that a tyrant can grant is "Enemy of the State."
    1. Re:Random data, or typical data? by Anonymous Coward · · Score: 0

      Fuck, I could easily compress output of /dev/random to nothing as long as I have source code for function that generates it ...

  148. This is just a penny stock scam by Anonymous Coward · · Score: 0

    The company and its associates are in South Florida, a renowned area for illegal stock activity. The stock of one of the two associates listed on the page has gone from 0.05 to 0.10 in the last day on unusually high volume.

    This is no different than IAUS from a few years back. They claimed to have a new data encoding that could significantly increase channel throughput. Nothing came of it.

  149. Needed: a babelfish for marketingspeak by CatherineCornelius · · Score: 1
    Is there an equivalent of babelfish for marketingspeak? Alternatively can anybody point me at any academic publications or patent applications these guys have made that might relate to their revolutionary claims?

    Their website is an irritating mass of fussy flash animations, and their html site is apparently down until January 13th.

  150. It sounds like crap but ... by slashdot2.2sucks · · Score: 2, Insightful
    If I read it correctly (If it can be read correctly) Then they are
    1. Transforming the data to a complex vector space, C^n if you will.
    2. Using some very complicated seed and algorithm to generate randomish data in this complex domain that approximates the transformed data.
    3. Investigaiting the differences, and storing the differences with a "complex combinatorial series".
    Yes it sounds like crap but it's not as empty as social texts.
  151. Very early indeed... by Anonymous Coward · · Score: 0

    2k2 actually means "2200", not "2002". The k denotes a decimal place shifted by the metric multiplier; ie 2M2 is 2,200,000; 2T2 is 2,200,000,000; 2n2 is 0.0000000022.

    Not my standard, by the way, its been used in electronics as a short hand for component values for years.

    1. Re:Very early indeed... by GTRacer · · Score: 1
      2k2 actually means "2200", not "2002".

      Unless you're Sega. In which case you use this nomenclature for your sports titles and wait for the 1337-5p34k3r5 to get ahold of it...

      GTRacer
      - 2k1 was ok after Y2K, but 2kx is kinda silly...

      --
      Defending IP by destroying access to it? That makes sense, RIAA/MPAA. Go to the corner until you can play nice!
  152. Sorce for the ZeoSync website.... by jsimon12 · · Score: 1

    Uhhh, this sounds like bunk to me, anyone want to comment. They claim to be doing "Multi Dimensional" encoding:

    meta name="description" content="ZeoSync's mission is to improve all existing and traditional communication systems. As the world's first and only provider of multi-dimensional encoding technologies, we will introduce affordable microchips and software to the global telecommunications community. We will radically improve network performance while simultaneously creating excellent equity participation for our shareholders.
    ">
    meta name="keywords" content="technology, chip, microchip, satllite, binary code, code, compression, multi-dimensional, encoding, transmission, invest, broadcast quality">

  153. January 7, _2001_?? by uncl_bob · · Score: 1

    Hmm...didnt I just leave 2001 behind I few days ago? Maybe I just drank too much Bud Light that night.

  154. OK by TheMMaster · · Score: 2

    I am in no way a compression specialist but: ZeoSync intentionally randomizes naturally occurring patterns to form entropy-like random sequences through its patent pending technology known as Zero Space Tuner(TM).

    in this phase we are going to randomize the hard work you want to send over the internet, effectifly destroying it (unless you have the seed ofcourse)

    Once randomized, ZeoSync's BinaryAccelerator(TM) encodes these singular-bit-variance strings within complex combinatorial series to result in massively reduced BitPerfect(TM) equivalents

    Now it's going to find patterns in the so called "randomized" data, and probably writing those down, now irreversibly destroying your data...

    s. The combined TunerAccelerator(TM) is expected to be commercially available during 2003.

    and they are putting it off for a year too.... hmm...

    "By significantly reducing the size of data strings, we can envision products that will reduce the cost of communications and, more importantly, improve the quality of life for people around the world regardless of where they live."

    Jezus! these guys are geniuses!!! better compression REDUCES the cost of communications... damn... I wonder what else they envision?? that the files will be smaller too???

    my conclusion

    we can randomize any string, store 1 byte, then generate another random string... which, because it is random has a snowballs chance in hell of being the same ;-)

    correct me if I'm wrong, but this really seems to be a load of crap to me. Plus they use WAY to many buzzwords

    --
    Fighting for peace is like fucking for virginity
  155. "Random data" by magi · · Score: 2

    What does the article mean with "random data"?

    1) Data with maximal entropy?
    2) Random file picked from Internet?

    In case of 1), I'd say the article is crap. If bits in the data have absolutely no dependency between them, i.e., redundancy, (also between non-adjacent bits) it is absolutely impossible to compress them. It's not even good as a fairy tale.

    In case of 2), ok, 1:100 may be possible for most non-compressed data. The new JPEG-2 algorithms can do 1:100, but it's lossy. Text compression algorithms might do 10:1 on typical text, but they are also quite fast and don't therefore find all redundancies. For example, Huffman encoding is at simplest done with just single characters, and not much longer sequences, the searching of which takes a lot of time. The redundancies do not also have to be linear; for example "wDoRrOdW" ('word' written first in lower case, and then with upper case to opposite direction) would be difficult to compress completely, although it clearly has high redundancy.

    Removing all redundancies would require finding the shortest description, i.e., a program that prints the string. To find it, we have to go through all possible programs that are shorter than printf("wDoRrOdW"). Many of them don't even terminate (for example "while(1);"). Complete search is therefore impossible; all algorithms make guesses about the topology of the search landskape, and don't search everything.

    I have absolutely no doubt that this method works well within the theoretical limits, albeit it's of course always possible that it verges the limits closer than any earlier methods.

  156. 100:1 Compression? Sure! But on the fly? by Masem · · Score: 2
    I remember a sci-fi short story on a group of scientists that were first to traverse the trip to Alpha Centauri. In the latter stages of the flight, because of their distance to earth, they developed a method to compress their reports by using a simple number cipher (A=1, B=2, etc), writing their text as a very large number, then finding some composite number N with a minimal number of unique prime terms within a few integers of their number. They then sent back that composite number and the integer. The reciever was then expected to calculate that number and then back out the message.

    While this theorhetically could work to reduce messages down upwards of 100:1 compression, both the compression and the decompression would require huge resources of computer CPU time for a message of any reasonable length. Even if you had pre-built a table of 'short unique-prime-factors integers' to make finding the optimal composite to send back, you'd still have to generate some huge N-digit number, and then the decoder would have to be able to recalculate that N-digit number from the prime representations.

    So while I'm sure this is possible, computing speeds are no-where near close enough. And it would appear this company is trying to vie it for use in compressing internet traffic. Maybe on 512-byte messages they can get something, but I doubt if it's anything close to effective for internet use.

    --
    "Pinky, you've left the lens cap of your mind on again." - P&TB
    "I can see my house from here!" - ST:
  157. Compressing random data 100:1? by Link310 · · Score: 1

    given: they've succeeded in losslessly compressing random data (right...)

    given: compressed data looks like random data

    does this mean they can compress compressed data? So I can run their algorhythm over and over again and compress [arbitrary large thing] to [something really small]? Something's not right here. Though it would be fun to have this exchange:
    me: I've got a linux distro on this floppy disk
    foo: Wait, this disk is empty
    me: No...see that bit there?

  158. Swapping data with decomp. instructions by jcasey · · Score: 2, Informative

    The flaw here is simple,

    When you reorganize the string of data, and sort by value, you must retain information on how to restore the string to its original order. There is no effiient way to save this "undo" information without negating the benefit gained from compression.

    For example:

    Given a series of random numbers: 34, 8, 244, 127

    If you reorganize them by value: 8,34,127,244

    You can create redundancy if the string is large enough - for 8 bit values, a string of 25,600 values should produce a lot of repettition - in this example, there would be an average of 10 repetitions per value (10*256=25,600).

    This is nice until you try to decompress the file. Without a record of how to reorganize the values, you are left with junk.

    Even if you keep a record with info for reorganizing the data, the overhead needed to store the undo info outweighs the compression benefit.

    If you did find an efficient way to to store the undo information, it would be more effective to simply apply this algorithm directly to the random data!

    --
    X
  159. wit by Anonymous Coward · · Score: 0

    wow, that's amazing. i bet the person you're replying to didn't know that. see, they thought that perl actually consists of purely random uncompressable data. they weren't making a jab at perl's supposed unreadability, no they were not. it's a good thing you came along to educate us all.

  160. The best quote from their site: by micromoog · · Score: 2
    Right on the first page:

    ZeoSync's HTML site will be available January 13, 2002 with costumer service agents providing chat assistance.

    So they have a set of professionals in charge of "dressing up" their technology? Isn't that normally called the marketing department?

  161. Still one year behind! 2001...duh.. by uncl_bob · · Score: 1

    ...Come on, write the right year at least.

  162. 100:1? by Link310 · · Score: 1

    Oh wait! We got it all wrong! It's not 100:1, it's 0b100 to 0b1! That makes more sense.

  163. ZeroCrap website has typos by tallsails · · Score: 1

    They have 1 line of text in thier opening page, and it refers to Costumer support. Or do they sell costumes? Don't fall for this fraud. What a joke.

  164. dang, almost as good as: by llamalicious · · Score: 1

    Lossy compression... although I believe I will stick with LZip for now, as I find it MUCH faster when compressing large files.

  165. It's an advertisement, not press release... by Anonymous Coward · · Score: 0

    Hmm... Press release...

    Hey! There's some contact info... Let's see...
    @wilsonmchenry.com, must be http://www.wilsonmchenry.com/ , I wonder what they do...

    Let's see: "Strategic Business Communications"

    Doesn't that mean the write press releases like that? To get maximal publicity?

    They seem to be doing pretty good job!

    (Come on people! You can't be that ignorant!]

  166. Technical process by NYCadAdept · · Score: 1

    "In our three dimensional world we can visualize an example. If we were to take a three-dimensional cube and collapse it into a two-dimensional edge, and then again reduce it into a one-dimensional point..."


    Last I checked, an edge is one dimensional, and a point is zero dimensional. 1337 math skilz dude!

    --
    Things fall apart, it's scientific.
  167. Re:how can this be? Ask cryptographers. by Anonymous Coward · · Score: 0

    This fact burned the Russians when they were
    trying to generate long series of random numbers
    for their operatives. They got a bunch of typists
    to bang away for a while, the Americans realised
    this, and broke their code.

    Check out "Code Breaking" by Kippenhahn (sp.?)

  168. I think their investment model requires pigeons by tz · · Score: 1

    throwing money down a hole.

    A lot turns on what they mean by "random". If you think sound, and can extract a white noise component, you could mathematically say X% are truly random bits (where any bit string can be replaced by [nearly] any other).

    All compression is the creation of virtual machines that have instructions like "write a zero" "write a one" "copy 8 bits from 24 bits ago". More instructions need more bits to specify. Truly random data would require random instructions.

    1. Re:I think their investment model requires pigeons by softsign · · Score: 5, Interesting
      I'm not sure if I understand your point, but from what I do understand, it seems to me you are missing it.

      If you look at this sequence as a one-dimensional series: 00101101, it's pretty hard (at least for a processor) to distinguish a pattern there... it's a pseudo-random sequence. But if I paint it this way, in 2d: (0,0) (1,0) (1,1) (0,1), I can step back and see a square with sides of length one.

      AFAIK, what these people are claiming is that they've developed a way to step WAY back, to n-dimensions, and have patterns emerge from seemingly random data.

      It's not the random-number generation that's significant here... it's the purported ability to compress a seemingly random sequence. RLE typically doesn't fare very well with pure random data because it only looks for certain types of redundancy.

      If I haven't missed the boat here, it's really a very interesting achievment.

    2. Re:I think their investment model requires pigeons by 3.1415926535 · · Score: 1

      Bah. There's always some file that won't make any pattern and to represent it will require more bits than the original file.

    3. Re:I think their investment model requires pigeons by SIGFPE · · Score: 2
      If you can step back to N-dimensions and see patterns you can exploit then it wasn't random data in the first place.


      There are 2^N bit sequences of length N. There are 2^M sequences of length M. If M<N then 2^M<2^N. So you can't represent all sequences of length 2^N using sequences of length 2^M. You can't even represent most sequences of length N using sequences of length M. It doesn't matter if you can visualise infinite dimensional spaces with pretty purple knobs on. You can't have an algorithm that packs most sequences of length N into M bits.

      --
      -- SIGFPE
    4. Re:I think their investment model requires pigeons by softsign · · Score: 2
      If you can step back to N-dimensions and see patterns you can exploit then it wasn't random data in the first place.

      Exactly! But until you make that connection, it may as well be random!

      There are 2^N bit sequences of length N. There are 2^M sequences of length M. If M<N then 2^M<2^N. So you can't represent all sequences of length 2^N using sequences of length 2^M. You can't even represent most sequences of length N using sequences of length M. It doesn't matter if you can visualise infinite dimensional spaces with pretty purple knobs on. You can't have an algorithm that packs most sequences of length N into M bits.

      I'll assume by "sequences" you mean "random sequences" because otherwise you are saying that lossless compression is impossible. =) I agree with you otherwise, given one crippling constraint: that you can't observe your data except as one-dimensional binary numbers.

      After re-reading this press release a few times, I don't think these people have really accomplished much. Bear with me and I'll flesh out the point I'm trying to make - if someone could find a way to do this, I think it could work.

      With PURE random data, this won't work. Why anyone would want to transmit gigabytes worth of pure random data is beyond me. A signal worth compressing isn't going to be purely random. It may look like it, but there is some information there. This is why signal processing people use random processes to model signals. Not because their signals are completely random - but because - given enough samples - they look like specific random processes (Gaussian, Rayleigh, Rician...).

      Now, the technique I'm thinking of would do something along the lines of take a pseudo-random process and map it to an n-dimensional space. An algorithm then searches this space for (even just) simple patterns. Suppose it finds ten equally spaced points along a "line" in 12-dimensional space. That's 120 bytes that can be reduced to a significantly smaller vector (plus an offset to aid reconstruction in the right place), no?

      I don't know... would this work? I think so. Would it be feasible given existing computing power? I'm not so sure...

    5. Re:I think their investment model requires pigeons by SIGFPE · · Score: 2
      No. I mean sequences. My argument makes no mention of the word random and that's no mistake (unless I've made a typo somewhere).


      A signal worth compressing isn't going to be purely random
      Absolutely, and that's why lossless compression works in practice.


      But the thing you're describing is bogus. Look, take random data and look at it in a complicated enough way then you're sure to find patterns that can be compressed out. But you'll find that you'll also have to describe the complexity of your way of looking at it and that'll take up the same amount of space as you've just compressed out. That press release is 100% bogus. It's not even slightly real. Have you seen how many universities they claim to collaborate with? It's merely a scam to make money out of venture capitalists.


      The way you speak, eg. putting scare quotes about the word "line", suggests that you're not comfortable dealing with multi-dimensional spaces. The SF connotations suggest something cool and esoteric to get venture capital cash. Those of us who actually work with these things every day know there's no reason to see compressible patterns if you start embedding things in high diemnsional spaces. People who do things like wavelet and DCT compression techniques quite happily represent data for compression in very high dimensional spaces. But there's no magic and certainly no way to to things that are provably false.


      Would it be feasible given existing computing power?

      It wouldn't be feasible with any computing power.

      --
      -- SIGFPE
    6. Re:I think their investment model requires pigeons by Kythorn · · Score: 2, Informative

      This may not appear immediately relevant, but bear with me.

      I'm not agreeing or disagreeing with ZeoSync's claims, but if you can impose a semblance of order on something that only appears chaotic, you can do some pretty cool stuff.

      Take for example this little demo at this website in germany. (I realize what the domain looks like, there's nothing for sale or license, trust me). The actual download link is about halfway down the page.

      This isn't "compression" in the conventional sense, but they still manage to contain a demo that contains hundreds of megs of textures and samples, in addition to the engine itself in *64kb*. Now thats a hell of a ratio.

      They do this not by storing the raw data, but instead storing the instructions needed to reconstruct the data as it is needed.

      Granted, I realize that they only accomplished this with their own data, but I don't think taking this a step further to an arbitrary set of textures and sounds is impossible. Granted, this idea won't work for all types of data, and also can not be considered "lossless", (hell, it's not even strictly compression) but I still think it's incredible that you can get this high quality results out of something this small.

      (Disclaimer: The above link is to a demo that requires directx 8.1 and I sincerely doubt will run under wine. It also doesn't work with every video card out there. I've scanned the binary, and it doesn't appear to have any viruses or trojans, but I won't guarantee it. If you can't accept the risk, don't download the binary.)

    7. Re:I think their investment model requires pigeons by raytracer · · Score: 1

      If I haven't missed the boat here, it's really a very interesting achievment.

      You haven't missed the boat, it is still chained to the dock with a cable labeled
      The Pigeonhole Principle
      . If the data is random, then all potential paths through all potential n-dimensional spaces are equally likely, and there are a lot of them. Too many to be described by any fewer bits at all.


      It's snake oil folks. Move along.

    8. Re:I think their investment model requires pigeons by softsign · · Score: 1
      Thank you, sir, for putting me in my place.

      Next time I'll think twice before attempting to present an idea in this hallowed scientific forum.

      Your inference of mathematical ability merely from my use of quotation marks is nothing short of remarkable.

    9. Re:I think their investment model requires pigeons by derobert · · Score: 1
      With PURE random data, this won't work. Why anyone would want to transmit gigabytes worth of pure random data is beyond me. A signal worth compressing isn't going to be purely random.

      There is use to transmitting large amounts of random data; it's called encryption. Any good ciper should make encrypted data (minus any plaintext headers, etc.) be impossible to tell from random data. Especially if it is a stream cipher! Anything else is a weakness. If you can come up with lossless compressor that works against a cipher, that cipher should not be used.

      Data compression, btw, also has this effect, though it's not a weakness but an opertunity to compress missed; any lossless compressor that produces significantly compressable output is pretty pitiful.

    10. Re:I think their investment model requires pigeons by acid_andy · · Score: 1

      That's a very sweet demo : ) makes my game I'm writing look very lame.

      Loads of people have been raising this point about re-generating complex content from very simple initial conditions as a form of extremely high compression but none of the sceptics seem to want to respond to it - they respond to the ideas that are obviously wrong as if no-one's ideas are worth thinking about. Sorry bit of a rant, I feel better now.

      --
      Your ad here.
    11. Re:I think their investment model requires pigeons by rtechie · · Score: 1

      This is because the demo DOESN'T CONTAIN hundreds of megs of textures and samples. This is a 'demoscene' demo probably submitted for a contest somewhere.

      The demo contains only a tiny handful of very small textures (the bricks, the grass/trees, a few others) that are carefully used in what amounts to a long list of Direct3D commands. The misic is MIDI *again, no actual samples, just MIDI commands).

      Back in the day, these kinds of demos used to be hand-coded in assembler, before Direct3D and OpenGL hardware made it easy. Commands were made directly to the video hardware in some cases.

      If you think this is impressive, you should check out some of the 8K demos.

      The point is that this isn't "compression" at all, but programming tricks using graphics APIs.

  169. Reductionism by Anarchofascist · · Score: 1
    ... the reduction of already reduced information across many reduction iterations, producing a previously unattainable reduction capability.

    Aha! Now I get it, it's a shell script to run gzip multiple times! Too bad I got prior art:

    #!/bin/bash
    #supercompress filename number_of_iterations
    x=$2
    while (($x > 0)); do
    gzip -c $1 > /tmp/$$
    gzip -c /tmp/$$ > $1
    let x=$x-1
    done

    Yay! All your codecs are belong to us!

    --
    Once more unto the breach, dear friends, once more, Or close the wall up with our American dead!
  170. Trademarks? Nope! by Anonymous Coward · · Score: 0

    None of their words marked (TM) are trademarked, search at http://tess.uspto.gov/bin/gate.exe?f=tess&state=h0 9ukh.1.1 and see.

  171. Potentially patent-infringing... by pbryan · · Score: 2

    Our patent-pending technology ReductioAdAbsurdum (TM) will likely be infringed upon by this new technology, so rest assured that our lawyers will scrutinize their compression algorithm closely.

    --

    My car gets 40 rods to the hogshead, and that's the way I likes it!

  172. pi @ 500,000:1 (Was Re:how can this be?) by kesuki · · Score: 1

    pi
    In two bytes I've Exceded 500000:1 ratio on transmitting truly rasndom compressed data to you -- were I to use the ASCII code character for pi I could double that thruput.

    This proves that in theory at least that one can compress random data sequences.

    1. Re:pi @ 500,000:1 (Was Re:how can this be?) by ergo98 · · Score: 1

      Pi does not qualify as random as a whole, though, because it's a known quantity. To have enough possible variations to use such a random representation method for 1000 different "random" streams you would need 1000 different representations. Therein lies the quandry of trying to compress something other than pi.

    2. Re:pi @ 500,000:1 (Was Re:how can this be?) by kesuki · · Score: 1

      Pi is not restricted to numerical representation.
      On a practical level anylisis for pi alone can provide you with any sequence of 10 characters that can be expressed through pi.
      For example the first nine digits of pi 314159265
      run through tr 0-9 a-j is dbebfjcgf and can be represented with four bytes. tr is not limited to this however, you can also use tr 0-9 abc123dfg4 and recieve 1b2b34cd3. So pi More than exceeds the ability to find a thousand random sequences of data as long as they match between 30-150 characters of pi through an internal translation table.
      Furthermore a more sophisticaded codec could analyse the data for internal tr tables, and segments of pi. This increases the amount of data sent but also increases the probablility of finding somthing that matches the 100-200th digits of pi.
      While I doubt this program relies on finding 'pi like data' within data streams it is a feasible method of compressing random data. It also expands much more quickly than it compresses and remains mathematically perfect. Making this type of method almost ideal for lossless video codecs.

    3. Re:pi @ 500,000:1 (Was Re:how can this be?) by Anonymous Coward · · Score: 0

      specifying the location of sequence on average takes at least as much space as specifying the sequence itself. try it by hand. to specify one of [0-9] you must go on average at least 5 digits out. to specify one of [00-99] you must go on average at least 50 digits out. this is true for any sequence of digits.

    4. Re:pi @ 500,000:1 (Was Re:how can this be?) by kesuki · · Score: 1

      One byte allows you to specify the start point up to the 256 places. Two bytes allows the start point to be 65,536 places. The length of the sequence follows the same rule. As does the translation tables. So for the sake of argument the first byte represents the tr table we define 256 variations on the representation of pi. The next two bytes represent size of the sequence. The final two bytes represent the place at which one starts. In hex previous examples can be represented as x0000090000 x0100090000 x0200090000 Each is only 5 bytes to represent a 9 byte string. I used a small string for example only, because as you see those five bytes can represent 256^3 strings of 64 Killobytes in length. In other words 16,777,216 ways to achieve 13,702.1:1 compression with zero mathematical loss. Mind you this method Is Not suitable for packet compression as the CPU overhead is insanely beyond the capacity of anything out there today, even if the compression is performed on each individual PC instead of on a server level.

    5. Re:pi @ 500,000:1 (Was Re:how can this be?) by kesuki · · Score: 1

      Slight error in my math I forgot to include a way to recognize this data apart from 'normaly compressed sequences' However, by using one of the tr table entries you can use xff as an indication that the following xxxx bytes are compressed using an alternative method. You can also only use three bytes for alternatively compressed data, by doing a simple check on the first byte first.
      This reduces the number of possible ways to 16,711,680 while minimizing overhead. Also, there is no guarentee that there aren't exact matches withing the possible methods afterall we are using predefined charachter translation tables. Additionally my method provides 16,711,680 ways of compressing data chinks between say 100 bytes and 64k for 'maximum' effectiveness. However actually processing even a single uncompressed bitmap of 640x480x24bpp resolution while only checking for matches between 100 and 103 charachters long would take 218,306 calculations. 100 character strings take about 75,000 calculations while 64k calulation takes only 112 calculations through the algorythm. Obviouslly if the algorythm has to be run trillions of times just to compress a TV resolution bitmap it isn't going to be fast enough for the one this thread is talking about, however there is a lot of 'clean up' work to be done on my simple algorythm for instance determining which codes are duplicates and eliminating the anylisis of duplicates. Also, uncompressing the data only requres a sparce two calculations, since pi is a known variable, and the translation table used is passed along and known by the decoder. I would have to say though that this open source thing the greeks came up with is way better than a closed source 100:1 'maximum' mathimatically compressed algorythm from some company looking for hype.
      I mean really I only have to find pi (to 65,536 places) 112 times in a bitmap to get better than 100:1 compression.

    6. Re:pi @ 500,000:1 (Was Re:how can this be?) by Anonymous Coward · · Score: 0

      There is no PI symbol in US-ASCII (the character encoding of champions!)

      You're smoking crack while dreaming of 16-bit Unicode.

    7. Re:pi @ 500,000:1 (Was Re:how can this be?) by kesuki · · Score: 1

      This has nothing to do with ASCII it has to do with compression, all real compression works on the actual binary file, or in this case on the file in hexidecimal. Also considering that there are actually 1,099,511,627,776 possible sequences of 0123456789ABCDEF when given in 10 number lengths it could take quite a while to figure out which 254 are worth applying to pi to maximize the compression ability.

      At least we finally have a use for that Supercomputer in ever garage.

    8. Re:pi @ 500,000:1 (Was Re:how can this be?) by Anonymous Coward · · Score: 0

      no, no, the point is, if you want to find a particular 64k sequence of binary digits in pi, you will find it, but it will on average occur 2^{64k} digits in. That means you need 64k to store the position of the starting digit.

      Alternatively: consider all the 65536 digit binary sequences. If you could losslessly compress them all, even by a tiny amount, then each one is paired to a particular "compressed" less-than-65536 digit binary sequence. But if youy count them out, at least two uncompressed sequences must have been paired to the same compressed sequence. But this is not allowable for lossless compression.

    9. Re:pi @ 500,000:1 (Was Re:how can this be?) by Hast · · Score: 1

      Ehh?

      You have managed to write "pi", nothing else. You haven't actually "compressed" anything, you have only used a symbol.

      This does not work on real data. (Because then you would need to send the sequence once anyways, otherways you can't decompress it.)

      "Oh, but we could make a big table of possibilities." Yes you could, and no, you would still not be able to compress random data reliably.

      Sure, if someone happens to send a file with the first 500,000 digits in PI then you can compress it. If they send the works of Shakespeare it won't do anything.

      For compression to be useful you need to talk about a general case.

  173. I can compress anything down to 1 bit by eclectus · · Score: 1

    I have a simple algorithm for lossless compression down to one byte. Now the key for de-compressing it happens to be the same size as the original data.....

    --
    This signature is a waste of 42 characters
  174. Entropy... by hyrdra · · Score: 2

    First of all, it's impossible to acheive any type of ratio on random data. Good quality random data such as that from random.org simply can't be compressed. Period.

    Data compression works by finding patterns in seemingly random data. A standard video stream really doesn't contain that much unique information. That's why we can compress it pretty well without loosing too much data. However, random data is 100% unique and you must have, say, 8 bits representing 8 bits because there is no other way to represent it without loosing information.

    The claims by this company are impossible. I read their technical description and I'm still trying that around in my head. It doesn't make sense. It's called the rule of limited entropy and no data compression breakthrough can break it. You can't just make data appear out of thin air.

    Is it just me, or is this another company looking to swindle over a few VC investors? The only type of program I see here is the lie, buy, and sell high kind -- I don't buy it.

    --


    "I'll just chip in a bit for RedHat: I actually have that installed on my university machine." - Linus, '95
  175. One method - Godel numbers by DarkMan · · Score: 2

    There is one method that might work - on sum data.

    Godel encoding is an old technique for compression, with a fast decompress (P time). Unfortunatly the compression statge is NP (Maybe NP-Hard, can't remember).

    The method relies on expressing the number as an algebraic product, that can be expressed in less space than the result.

    For example, in ASCII, the string (in RPN) "7 7 ^ 34 * 99 ^ - 7 p" has 18 characters. It's expansion has 740 characters. That's a compression ratio of, what, 35:1. [Ok, so you'd never actually do it in ASCII, but it shows the technique]

    The advantages of the technique are that it gives better compression on larger numbers, in principle. In general, however, other factos come into play, and it bottoms out. My analysis suggested it bottoms out somewhere around 120-150:1.

    However, the disadvantaged of the scheme are numerous. Firstly, there is no known algorithm to encode efficently. The system can't stream, like gzip and LZW can. Thus far, it's just an interesting idea.

    I mention this because the mult-dimensional mathematics that they are reffering to have a passing similarity somthing I was playing with a couple of years ago, to look for faster algorithms (or any, really, other than brute force). It was cute, but always slower than brute force, save a few best cases :(.

    If I put my best guess ot max compression together with the uncanny similarity of the maths. Namely, you to a split into some expression, and then re-apply the algorithm to a sub expression. Then , throw it through a symbolylic computation routine, to optimise it a bit, and gzip the whole lot. It would only work well on some numbers, but you can pad it slightly to get a very different number, and try again until you get a good fit.

    So stepwise:

    ZeoSync intentionally randomizes naturally occurring patterns to form entropy-like random sequences through its patent pending technology known as Zero Space Tuner

    Pad to a value that gives good compression

    Once randomized, ZeoSync's BinaryAccelerator encodes these singular-bit-variance strings within complex combinatorial series to result in massively reduced BitPerfect equivalents

    Godel encode.

    The refference to may iterations suggest that they reapply the process to any large enough numbers left in the expression.

    And that's a scary match, in my mind.

    Of course, pinch of salt. There was a comment above about the odds of any compression thecnology being vaild being equal to teh claimed compression rate. I can't see how this might work. But I'm not writing this off just yet, it rings just true enough.

  176. Re:Wow, now all data can be compressed in one bit! by flegged · · Score: 1

    I have an algorithm which will compress any random data down to one bit. Here goes:

    1: Represent the data as one big integer (this is easy, treat all the bytes in the file as one number).
    2: Subtract 1 from the data.

    This algorithm can be reapplied any number of times, until the data that is left is a single bit representing zero. Hell, since you know it's zero, why not get rid of it? So my algorithm can losslessly compress any data to 0k.

    What do you mean you need to know how many times the algorithm was performed? How many bytes do you reckon that will take to express?

    --

    "I think he was truly surprised at how little I cared about how big a market the Mac had" - Linus on Jobs
  177. Recurring dream by Anonymous Coward · · Score: 0

    See the comp.compression FAQ:

    http://www.faqs.org/faqs/compression-faq/part1/s ec tion-8.html

  178. Read the fine print by shaldannon · · Score: 1

    From the press release:

    This press release may contain forward-looking statements. Investors are cautioned that such forward-looking statements involve risks and uncertainties, including, without
    limitation, financing, completion of technology development, product demand, competition, and other risks and uncertainties.


    Anyone care to wager whether a group of guys sitting around at lunch had the following conversation?:

    dude 1 - H3y! | g0+ +h|z 31337 |d34! 13+'z pu+ 0u+ 4 pr3$z r3134$3 +h4+ \/\/3 c4n c0mpr3$z d4+4 4+ 100:1!

    dude 2/ - You're trippin man.

    dude 3 - No...that's a seriously cool idea. We can get that VC we always wanted and then skip the country.

    dude 2 - d00d! *starts writing press release*

    --


    What is your Slash Rating?
  179. Why you Can't, but Overhead is Low by Tom7 · · Score: 2

    > Why couldn't it be possible to have the a single algorythmic solution that works on the entire dataset simultaneously?

    Because if you have a "perfect" compression algorithm, then it is not always reversible. That's because it maps a larger set of files (all files of N bit length) to a smaller set of files (all files of N-1 bit length, or whatever). Therefore, the mapping can't be an injection (some two or more large files are mapped to the same small file), and therefore not reversible. So you can't uncompress the data.

    But fortunately, you can always get by with a single bit of constant overhead. Simply set that bit to 0 if the rest of the stream is compressed, to 1 if it is uncompressed. Now, if your algorithm produces a larger file than the input, just leave it uncompressed and you have only lost 1 bit.

    The argument about no "perfect" compression algorithm existing is overrated, IMO., though people like to point it out whenever a compression algorithm pops up. Of course a press release wouldn't mention that they sometimes increase file sizes by 1 bit!

    (I still do think their technology is bullshit, though.)

  180. Anyone remember OWS? 1500:1 fractal compression! by mselby · · Score: 1

    OWS was a Lossless Fractal Compression Program that had similar claims of unheard of compression.

    It turned out that all the program did was create an encrypted directory listing of the files you were compressing. The program would then use the listing to search your hard drive for the same file you thought you had compressed when uncompressing the archive. It then copied the file over to the directory in which you were uncompressing your fake archive.

    When it could not find a file matching the listing it simply told you it had a disk read/write error.

    Here is a link to the google archive about it.

    http://groups.google.com/groups?q=+OWS+compressi on &selm=3131e2b5.39555131%40news.az.com&rnum=1

  181. Funny that the journalists didn't check this out by ondelette · · Score: 1

    I think people just like this type of press releases.

    People... people... quite simply... in science, breakthrough are very rarely "sudden". Usually, research has been going on for a while and intermediate results are found along the way and published.

    That's how research works. You very rarely start from scratch and find the holy grail on day one.

    Of course, there are exceptions, but when a surprising discovery comes along, it is usually a long way from being cooked and needs further research.

    Hard work is needed almost all the time. Period.

  182. One possible way... by Anonymous Coward · · Score: 0

    this could be acheived is by first running the 'random' data through some kind of preprocessor and doing some hocus pocus. That said, this claim still smells funny.

  183. I've used their compression software and it works! by Anonymous Coward · · Score: 0

    I got early copy of their compression technology and few their press release into it. I was surprised how well it did compress it. I attach the output below so that anybody else who has the decompressor can read the press release in it's entirety without having to download it all:

    "Bullshit".

  184. This IS a usable compression method! by blooflame · · Score: 1

    There's just not a usable decompression method.

  185. Who cares about the compression ratio? by Anonymous Coward · · Score: 0

    For any algorithm to be of practical use, it must be relatively speedy on current hardware. If it takes a gazillion horsepower to compress and decompress the data, it's of no practical application until processor technology can support it inexpensively.

  186. Good ideas often shot down... by SirAnodos · · Score: 1

    Here are a couple of "good" compression ideas friends have presented to me:
    1. Since PI theoretically contains every possible string of digits that can exist, why not use some index into PI to compress data? For example, surely the string, "Hello, World", when converted to ASCII numerical values or some other numerical sequence occurs somewhere within the value of PI (maybe, say, starting at the 10 trillionth digit). I pointed out that most likely, the index # into PI (and you can encode it however you please) would average out to be as big, if not bigger, than the data to be compressed. He didn't believe me, worked on it for a year before discovering that, indeed, the index # into PI ends up being as big, if not bigger, than the data being compressed.
    2. Same thing with random number generators. If you have an algorithm that is good enough to generate all possible string of numbers, then why not just store a random number generator "seed" that represents the data to be compressed. Feed the seed into the random number generator, and generate "random" numbers until the file is restored. I pointed out that the size of the seed predetermines the possible number of sequences that can be represented. For example, a 16-bit seed can only possibly represent 65k different outputs, with most of those outputs not being useful to represent actual data. Thus, the seed size required to represent any and all data would be so great that the seed would end up being as big, if not bigger, than the original data. He attempted to prove me wrong, and came up with an algorithm that broke a file into chunks that could be represented by random number seeds, but the algorithm ended up, at best, producing output about 25% smaller than the original, and averaged files bigger than the original uncompressed data.
    The list goes on... when we dress up such things into really fancy and complicated mathmatical clothing, it just takes us a little longer to realize, that, indeed, it really isn't going to compress things better. Sometimes, it takes enough longer that people will build up a company to sell the soon expected product, only to die when the product cannot be delivered.

    1. Re:Good ideas often shot down... by jonsuen · · Score: 1

      Your idea is sort of like Vector Quantitzation, where both sides have a table of most likely sequences and is used as your constant. Basically you compress not only with the file, but with every other file you might compress.

      Unfortunately, Yamaha tried this with TwinVQ, and it didn't work. Slow and low quality. There's speculation that a modified form is behind WMA, which works surprisingly well.

  187. How you too can achive 100:1 compression by 3ryon · · Score: 1
    I was once able to achive 100:1 compression, here's how I did it. Remember back in the days that 14.4 modems were popular? The boxes would make the claim, (in huge writing) "SPEEDS UP TO 150,000 bytes per second!!!". Somewhere in small print it would mention that those speeds were under optimal conditions using compression (V.32bis? I can't remember now).

    Well...having never seen my 14.4 modem move data that quickly I decided to do a test. A friend and I dialed into each other's modem. We created a 300k file consisting of nothing but upper-case "P". Guess what? We got that fabled 150,000 bytes/sec!

    It goes to show, you can do amazing things with ramdom data as long as your random data is carefully selected. Don't tell me that an infinite number of monkeys couldn't produce a 300k file of upper-case "P".

  188. ''randomly'' typing on your keyboard... by Tom7 · · Score: 1

    ''randomly'' typing on your keyboard is a pretty crummy source of randomness... ;)

  189. Single Bit Varience by R.Caley · · Score: 1

    Otherwise known as XOR.

    --
    _O_
    .|<
    The named which can be named is not the true named
  190. They are talking about analogue communication by ofir · · Score: 1

    If we look at what they are actually saying (not much actually) it seem there might be a misunderstanding. I don't think they mean to actually compress random files. The multitude of the word analogue and some other wording, make we want to wait for more details before saying 'no way'.
    It seems that the articles passed so many hyping filters that nothing meaningful can be discerned.

    --
    Two witches watch two watches, which witch watch which watch,and which watch does which witch watch?
  191. AMAZING COMPRESSION by satsuke · · Score: 1

    New Compression technique gets 100:1 compression on random data.

    Caveat: if Random is defined as a arbitrarily long series of identical values. Deviation from this may cause less than optimal amounts, specifically a nominal compression ratio of 1:1.

    MORAL: just redefine the question and all problems in science and technology go away.

    1. Re:AMAZING COMPRESSION by calags · · Score: 1

      "AMAZING COMPRESSION" - I can see it now, this is the title of the next flurry of spam to come out of the dregs of the Internet. I guess this is the opposite of all the "Enlarge your penis" spams :)

      --
      Never attribute to stupidity what can be construed as a monopoly preservation tactic.
  192. My 2c by boky · · Score: 1

    100:1 loseless compression on any data is impossible. Why? Here's why:

    What are the requirements every (loseless) compression software must follow? It must be able for every array of bits of length N to produce another array of legth M, which can be then translated back into N.

    Look at it this way:

    You have a array of 1000 bits. They can be combined in 2^1000 ways. Because we imply that it is possible N -> M -> N we also must have (at least) 2^1000 different M arrays and that is only possible if you use at least 1000 bits, so no compression is possible.

    So how does software compress stuff? Well, it uses repetitive data and shortens it. In the extreme scenario, when none of the data is repetitive, the file is even longer (because of compression software overhead).

    Therefore it is impossible to compress (any)data 100:1 losslessly. Yes, you can compress it 100:1 if you just have an 1000-bits long array of ones, but not in any other case...

    Boky

    --
    boky
  193. Seeding ... by Anonymous Coward · · Score: 0

    Maybe it is random data. See, they just send over the seed number, the number of repetitions, and a key defining the machine/random number algorithm used. Cuts anything down to 12 or 16 tops.!

    Sounds like a winner idea to me. Sign me up.

  194. Important mathematical proof by Anonymous Coward · · Score: 1, Informative

    The following is a proof that a perfect compression algorithm has an average compression rate of ONE. Yes, ONE. That translates into NO COMPRESSION WHATSOEVER. A short aside on why compression is used if it "doesn't do anything" follows. I'm not doing this rigorously because I don't remember the rigorous proof, before anyone asks. This is sometimes referred to as "the enumeration proof."

    Assume any given data N to be compressed can be viewed as a binary number. Assuming the algorithm works on any given data (sure I can say that your file is "1" compressed and refuse to compress anything else, but do you need my help for that?), it must be able to compress all numbers from 0 to N. It must also give UNIQUE compressed answers (I can say that ALL files are "1"...decompression's tricky, though). Therefore, if the algorithm is used to compress all numbers from 0 to N, it will return the numbers 0 to N in a different order, in the BEST CASE. If the algorithm isn't perfect, it will return numbers GREATER than N as well.

    So why do we use compression? Because we don't compress numbers from 0 to N. We compress things that have patterns. Lots of them. Because of that, algorithms can make additional assumptions (some quantity of repeated data will be present in the data set being the usual one). Because of this, the average comopression of an algorithm, when used on random files and not enumerated numbers, is usually less than one (i.e. usually makes the file smaller). If you custom-write a file in binary that contains little to no patterns, you'll find that most compression algorithms will either make it larger or leave it the same. The last thing I'll mention is an example of where compression works really well: text documents. Since most letters in a document are within a certain ASCII range, the document can be reduced in size. For example, if you use no character under 65 (and your document has no header. Shh...it's an example), the first bit of every byte inthe file will be 1. The compression algorithm can see this, and remove all these ones. It will have to add a couple bytes at the end mentioning how the file was compressed, but you'll be getting rid of 1/8th of the file for the cost of a couple bytes. That's pretty good.

    I'm sure no one will mod this up, because no one likes anonymous cowards, but it might as well be here for posterity's sake.

  195. I know how this works ! by Anonymous Coward · · Score: 0

    In the face of everybody, quite logically, saying this will not work, I would like to propose a means by which it could. It goes like this:
    Let' try to compress a long string of bits, say 1 billion of them.
    Well as everyone knows there is an incredibly large number ways of setting zeros and ones in 1 billion bits. I would like to propose that a large number of those combinations will NEVER be used in any meaningfull way by man kind in his whole history. Nor by any intelligent beings in the whole history of the universe. There just isn't going to be enough space or time for all those combinations to be used. Lets say that only 1 percent of the combinations will be used by some being, somewhere at some time. All we need to do is assign numbers to those usefull combinations and forget the rest. BINGO we have just compressed the data by a factor of one hundred.
    Now if 1 billion bits is not long enough to introduce the required redundancy, I'm sure a longer string would. I will leave the calculation of the shortest string where all combinations are used by some one eventually in the universe as an exercise for the reader.
    So how do we find out which combinations are now, or are going to be, usefull ?. Well for this we need some serious physics. You know, quantum mechanics has a way of exploring all possibilities pretty quickly. However that's a bit out of my league.
    Cheers.

  196. Patterns in Data by Tom7 · · Score: 2


    It's true that you need to find patterns to compress data. What constitutes a pattern though, can be more complicated than what gzip offers.

    For instance, I can come up with a number of statistically random sequences that can be compressed very small if you "know the pattern", but will fail completely to be compressed with gzip. For example, I could take 11 MB of the binary digits of pi -- a very short program can produce these, but gzip will totally fail in compressing it. Or I could encrypt 11 MB of zero bits with RC4; if I know the key then it is also extremely easy to compress -- otherwise, it will be nearly impossible.

    So the art, really, is in finding the patterns. I'm pretty sure that ZeoSync's stuff is bullshit, but it doesn't *necessarily* mean that this kind of thing is not possible (just... unlikely).

  197. That's fantastic; let's test it! by Viking+Coder · · Score: 2

    That's just amazing! Let's test it. Here's an idea of a pretty good test :

    I'll prepare 257 files containing random data, which are each 100 bytes in length. Then, they'll be able to compress each of those files into a corresponding lossless compressed file which is one byte long! (Remember, this is supposedly 100:1 lossless compression of random data.)

    Oh, wait a sec... How can they possibly represent 257 different files, with only one byte each? That one byte can only represent 256 different possible values!

    What about if the files that I asked them to compress were only 2 bytes in length, instead of 100 bytes in length? Still, 257 of them. Since they claim to be able to do 100:1 lossless compression of random data, they should be able to do 2:1 lossless compression of random data. I mean, that's 50 times less impressive! But, wait... They still have to express 257 different files with only 256 different possible values!

    Huh... How many different files are 2-bytes long? I guess there's 65,536 of them. I only wanted them to compress 257 different files each into a byte. The task of compressing 65,536 different files into one byte is almost 256 times harder than what they already can't do!

    This is starting to sound like a theorem, or something!

    --
    Education is the silver bullet.
  198. oi, crackheads by posmon · · Score: 1

    stop modding me up and save some points for the other people

    --

    update comments set karma=-1, reason='offtopic' where sid=26315

  199. Fishy, hokey, full of it by Sipper · · Score: 1

    _Maybe_ they've got an algorithm which compresses _random_ data by 100:1. The methodology, as I read it, sounds suspicious. They want to take seemingly 2-dimensional (serial, basically) data and encode it so it looks 3-dimensional, and somehow that helps compression.
    Doesn't this encoding take some extra space? Seems to me as if there would have to be position data for this 2D stuff to become 3D stuff.

    But honestly, the terminology aside, the mere fact that their website looks so fancy and "flashy" and that they've got as little technical detail as possible leads me to believe that this is not worth my time looking into. Anybody that puts this much effort into appearance has no substance to back it up, in my experience.

    We'll know they went wrong once the SEC starts investigating for their wild claims. And if they've got the next compression method that beats all else, hey - cool. But I doubt it.

    - Chris

  200. it took 10 years but ... it's baaack! by Anonymous Coward · · Score: 0

    The last company i heard of that claimed this (about 10 years ago) ALSO claimed to be able to compress thier compressed files at 100 to 1 ... which could also then be compressed at 100 to 1 ... in fact, they claimed to be able to fit the library of congress (potentially) on a floppy!

    While they WERE, in fact, able to compress data in the ratios described, the reporter pointed out that they were still working on the problem of data "de-compression".

    god i wish i could remember that company...

  201. mother is the necessity of invention by mbogosian · · Score: 1

    It's no wonder...they were forced to develop something on this order of magnitude by their marketing department just to get their forty-umpteenth bazillion byte flash-enhanced website to be browsable over a lowly T1....

  202. 10000 to 1 or better by return+42 · · Score: 2
    I have a family of compression algorithms that will take an apparently-random string of 5e7 bytes or more and reversibly compress it to 5000 bytes or less. Only works on certain strings, though. Here are a few:

    3.14159265...
    2.71828182...
    1.41421356...

    1. Re:10000 to 1 or better by ZvlvLord · · Score: 1

      Exactly... *pauses to laugh*
      Finding a high compression ratio for "certain bits" is a piece of cake. Any developer knows that if the format of the data being compressed is well understood (and not likely to change), some custom compression (that's been especially created for it) will give you the best results. Just claiming that a new technique's been developed with a 100:1 ratio BUT then 'gently' adding that it only works on 'certain bits' is a JOKE.

      Kind regards to y'all

  203. Forward-looking statements by forged · · Score: 1

    Their press release ends with the following fine print. Enjoy!

    This press release may contain forward-looking statements. Investors are cautioned that such forward-looking statements involve risks and uncertainties, including, without limitation, financing, completion of technology development, product demand, competition, and other risks and uncertainties.

    This sounds of course like complete hot air to me. I wonder what the guys at random.org think.

  204. YHBT by Sloppy · · Score: 2

    I can understand people at Reuters being trolled by this crap, but Slashdot too? Wow.

    --
    As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
  205. Fractions of bits by ofir · · Score: 1
    My guess is that they are tring to improve huffman compression and the likes by optimizing them.

    These algorithms always try to be close to the theoretical Shannon limit but that includes things like 1.25 bits allocation for a specific bit sequence - hence room for optimization.

    The 100:1 ratio can be acheived by using extremely lossy comression (they mentioned DCT,FFT - which are useful in lossy comression schemes

    Given the marketing hype, this may actually be what lies beneath

    --
    Two witches watch two watches, which witch watch which watch,and which watch does which witch watch?
  206. What is information? by ernst_mulder · · Score: 1

    Most people here seem to have skipped their Information Science lessons me thinks. There's a lot of throwing of the word "random" and "information" and "compression" but most people don't seem to know what they are.

    In fact, TRUE RANDOM DATA (e.g. white noise) has the most information (the highest entropy) and is the most difficult to compress. Loss less that is. Funny thing here is if you count lossful compression (such as mp3 coding), random data is in fact easy to "compress", because you simply code it badly and upon decoding obtain new (but different) random data (e.g. white noise).

    So, you can not compress TRUE RANDOM DATA. It has the highest entropy of all data (highest information density).

    If you compress file A into a file B of 50% the size of A, then file B is MORE RANDOM than file A. The entropy of A is lesser than that of file B. You could also view this as: the information density of file B is higher than file A.

    The problem is that if A is a legible text file, we seem to think that A has information and B hasn't.

    My conclusion is that this company can not compress TRUE RANDOM DATA 100 to 1. If you read their press release they talk about "Practically Random Information Sequences". With this they probably mean things like audio, video, and the lot. Not random at all.

    Somehow this reminds me of a joke from my student days. When a transmitter sends a string of zero's and one's to a receiver somewhere there's a fat chance some of the one's turn into zero's and the other way around. This is a problem. But there's a simple solution, simply make the transmitter and receiver terribly bad so that ALL data is received badly. All one's turn into zero's and the other way around. Place an inverter after the receiver, and there you have it, perfect transmission!

    The flaw here of course is that under the worst possibly conditions only 50% of the data is received badly....

  207. West Palm Beach by remy · · Score: 1

    Do you really trust lossless compression from a place that can't even count their own election ballots?

  208. Quote ZeoSync by BigNumber · · Score: 1

    "Our data was completely random. It's just an odd coincidence that it came out as all 0s, really!"

  209. Does this sound like a cry for funding? by Control-Z · · Score: 1
    Although currently demonstrating its technology on very small bit strings, ZeoSync expects to overcome the existing temporal restraints of its technology and optimize its algorithms to lead to significant changes in how data is stored and transmitted.

    ZeoSync "expects" to overcome the restraints? I can see those guys now. "Well, let's see. What if we came up with some mumbo-jumbo about 100:1 compression or cold fusion or what-not, and make an impressive-sounding press release that says we're right on the brink of a breakthrough. Then people will get excited and give us money, and we can sit around for a couple years pretending to work on the problem only to discover a previously unknown fundamental flaw."

    Wow, I think I've just found a way to not work and still get people to give me money!

    I hope I'm wrong though, because my poor 56k connection is in serious need of a boost.

  210. Impossibility proof of lossless compression by Anonymous Coward · · Score: 0

    It is impossible to losslessly compress all sets of N bits down to M bits where M N.

    Proof: Suppose there was a machine COMPRESSOR that took N bits input and returned M bits output. Now suppose there was another machine DECOMPRESSOR that tooks M bits input and returned N bits output. There are 2^N sets of N bits ranging from 0000....1 to 111...1. There are 2^M sets of M bits. Thus, DECOMPRESSOR has to produce 2^N different outputs for 2^M different inputs. At least one input has multiple outputs---and DECOMPRESSOR cannot decide which is the correct output without extra bits of information.

  211. Here's a compression scheme for ya by 8bit · · Score: 1

    As I understand it, normal data is compressed by finding repeating strings and shoving it in a dictionary. This obviously won't work for truly random data. Since we all know what truly random data is a myth, why don't we use mathabatics? Compare the string to something like pi, or e, or other equations that produce commonly occuring 'random' data. Of course the time it takes to compress would be very long indeed. But, thankfully, processor power is fairly cheap, and big businesses surely won't mind running their 64 teraflop cluster for a week if it'll get nike commercials over a 56k in a second.

    Since my views are obviously misguided, could someone share some links about compression theory?

    --

    --Roy
  212. Bovine Effluent Detection.... by gessel · · Score: 1
    Bovine Effluent Detectors


    A full delphion search of zeosync and Piotr Blass turns up no patents at all, issued or applied.
    There is a reference to Kolmogorov which is so hyperbolic (as is the rest of their "technical" text) as to be nearly incomprehensible. It may be obfuscatory or it may have been written by a mathematician unfamiliar with human communication.


    Kolmogorov is a relevant reference, but possibly a trivial one: for example to compress a pseudo random stream of arbitrary length one need know only the length of the stream, the algorithm, and seed. This is obviously not generally applicable; though, indeed, it is not addressed by Shannon's theorem.


    Piotr Blass appears to be an actual mathematician at Palm Beach Atlantic College and apparently edits the Ulam Quarterly, an on-line mathematics journal.

  213. Let them substantiate their claims first by leob · · Score: 1
    Many years ago I established the Data Compression Challenge (that pays my own real money), specifically to deal with the vaporware con-artists like these. It was only claimed twice, both times by individual developers known in the data compression community.

    Unless and until they show a self-contained archive of a small size that can be brought to a standalone computer and expanded into a standardized benchmark data compression corpus, they can be ignored.

    I pity the poor VPs who gave them money.

  214. Leveraging the Internet? by AtariDatacenter · · Score: 2

    Perhaps another kind of breakthough could be made by leveraging the internet for the keyspace used in your compression. (Okay, I might not have the terminology quite right... that was one of my friend's realms of interest.)

    The idea is that you have a token that is given to a remote server, which sends back a stream of data. As long as the tokens were significantly shorter than the data provided, then the observed local compression would be highly significant.

    Or, put another way, you're NOT storing data on a remote server. But a remote server has a very well developed library of token/data combinations. So, when a client sends a stream of tokens to this server, they get the original stream of data back (even though the stream of data, itself, isn't recorded in whole at the server).

    Again, not for random data. And perhaps better if the tokens at the main server were geared to particular types of data with a different tokenspace for each.

    Is this idea very silly, or very good?

  215. Reminds me of the "7 Minute Abs" by uradu · · Score: 2

    in There's Something About Mary. These guys will be in great shape until someone claims 200:1 compression. Then it's back to the claims drawing board.

    -

  216. hahahaHAHaHAHahahahAAHhAha by Anonymous Coward · · Score: 0

    their press release reads like a bad onion article

  217. Pseudo-scientific mumbo-jumbo detected by Shillo · · Score: 1

    Wherever they picked their terminology from, it's *not* from coding theory articles, or topology articles, or probability theory, or any other branch of Math I've ran into. (IAAM - I Am A Mathematician).

    Whoever wrote that text has never read any of these articles. And is not a mathematician. Nor a programmer.

    Draw your own conclusions about the company.

    --

    --
    I refuse to use .sig
    1. Re:Pseudo-scientific mumbo-jumbo detected by snarkh · · Score: 1
      Well, some of the words they use do occur in actual math/coding theory usage (e.g. Kolmogorov complexity). But overall it does not seem to make much sense.


      I think any claim as striking as theirs has to be considered a hoax until proven to the contrary.

    2. Re:Pseudo-scientific mumbo-jumbo detected by dillon_rinker · · Score: 2

      blah blah blah DISSERTATION blah blah blah

      This word in all caps is used in doctoral programs at all universities I know of. Give me my PhD, right?

    3. Re:Pseudo-scientific mumbo-jumbo detected by snarkh · · Score: 1

      Sure, just send your credid card number.

  218. Bingo by bteeter · · Score: 1

    You cannot compress a truly random data set. Every compression algorithm relies on somelevel of data redundency in order to acheive the compression. Here is an excellent site about a $5000 data compression challenge that has yet to be won:

    http://www.geocities.com/patchnpuki/other/compre ss ion.htm

    Take care,

    Brian
    --
    100% Linux Web Hosting Services - We Don't Do Windows!
    --

  219. Rusian Theoreticist Paper On CMPPRSN and ENCRYPTN by Anonymous Coward · · Score: 0

    A few years ago I read a translated article by a Russian theoretical physicist that delt with compression of data. He was working with some computer programmers and mathmaticians to compress data efficiently for use on their older computers and for transfer between sites doing research... the logical extension and application was also military as well as encryption.

    While the math was over my head a bit, he went on to explain that it was theoretically possible to compress a standard bianary data string of randomized computer data by a factor of 200 or more; central to this was the use of a "key" similar to encryption. This key would be generated by the software in advance of compression and used in the decompression of the data. The key was small... 10000 times smaller than the data stream itself on average when data sent exceeded 1mb.

    The military had expectations of incorporating this compression into target coordination software used in datalinks between the latest generation of PVO type fighters, as well as in targeting data for nuclear missle equiped vessels and launch facilities.

    It's compressed and encrypted at the same time...

    The article was in Janes and a few obscure math sites. I remember the guys last name as Stynavovich.

  220. It works, but you'll never see it. by cwsulliv · · Score: 1

    The scheme does work - I was given restricted access to the code and tried it. And since it works on random data, by iterative use of the scheme you can achieve almost unbelievable compression. However I've been informed by a reliable source that the algorithm is being quickly suppressed by order of OSHA and the Consumer Product Safety Commission - with a sufficiently large file and sufficiently many iterations of the algorithm, the data can become so compressed that it explodes, not only wrecking the computer but creating a serious risk of injury to life and limb.

  221. What makes Kansas so smart? by dant · · Score: 1
    Assuming this is a scam, I went and looked at their section for investors. It's as terse as the rest of the site, but features the interesting disclaimer:


    We regret that we cannot provide investment opportunities for any resident of Kansas.


    What, Kansas has a law about selling perpetual motion machines or something?

  222. Random data is fake data by dnoyeb · · Score: 1

    Random data is fake data so whats your point? practically random data is compressable. Truly random data would involve no patterns and is only theory and useless to the person who simply want to compress some file. Truly random data would have no number occuring twice which limits it to the bit length defining its basic unit... 100:1 Lets get real. And if you come out with a bold faced lie such as that, you will be quicky exposed. But sometimes marketing rules over intelligence. Their taking a gamble. If they had an IPO tomorrow I'd buy, but I'd sell the next day.

    1. Re:Random data is fake data by Demonspawn · · Score: 1

      You guys arn't thinking right.. I can compress truely random data well over 10000:1

      All I need is the random seed and the random number generation forumla....

      Think outside of the box.

      --Demonspawn

    2. Re:Random data is fake data by Anonymous Coward · · Score: 0

      "truely random" and "seed and the random number generation forumla" are contradictions. Truely random numbers are generated by things like radioactive decay counters or tapping on keyboards.

    3. Re:Random data is fake data by LarsG · · Score: 2

      All I need is the random seed and the random number generation forumla....

      A string of 'random' data that can be generated by a seed and an algorithm is pseudo random. For certain applications pseudo random is good enough, and it is used all over the place - from picking the next block in a tetris game to generating cipher streams.

      Truly random data is an entirely different beast.

      --
      If J.K.R wrote Windows: Puteulanus fenestra mortalis!
    4. Re:Random data is fake data by innocent_white_lamb · · Score: 1

      I remember reading some time ago about a book of truly random numbers that was published some years ago (before computers, I think). They used invoice numbers off of the "spike" in some large warehouse to compile the list of random numbers.

      --
      If you're a zombie and you know it, bite your friend!
    5. Re:Random data is fake data by Stephen+Samuel · · Score: 2
      If they had an IPO tomorrow I'd buy, but I'd sell the next day.

      This is, of course, exactly what they WANT you to do. They only get money from the original sale of the stock. I'm presuming that this is a fly-by-night operation, so they're not going to care when (not if) their stock tanks. They've already got their money wired to a bank in the Bahamas. The person who will get hurt is the poor sod who doesn't understand that their claims are pure baloney.

      --
      Free Software: Like love, it grows best when given away.
    6. Re:Random data is fake data by Anonymous Coward · · Score: 0

      You can get truely random data from timing radioactive isotope emissions.

      If someone can tell me a way to get truly random data out of a computer program, I'll buy em lunch.

      BTW truly random does NOT mean no number repeats. The decimal digits of PI are considered random I believe, but I'll leave that to a math geek.

  223. Horseshit by The+Panther! · · Score: 2

    For every data set that is compressible 100:1 (which I will grant them.. even a fool can do that), there are 99 which grow larger or the compression fails entirely.

    So, they have figured out a way to compress difficult-to-compress data rather well, but cannot compress easy stuff that LZW works on? Rather dubious, but I'll eat my words with a smile if they can put all the Star Trek episodes on a floppy disk.

    --
    Any connection between your reality and mine is purely coincidental.
  224. the way I understand it... by Dave_bsr · · Score: 1

    Your using the word 'description' lit the proverbial light bulb -

    Let's say I want to describe my car. I say, "my car," which 'compresses' the idea of my car, really really well, but there is no way to get back to my car. It's like some compression ideas that were suggested, really really good compression, just no way to 'get back.'

    Now, let's say I use more description - "1990 geo storm hatchback." It still doesn't work, because there are cars like it out there - we still don't have the complete idea of my car. In fact - to completely describe my car, I would need to describe every atom on it - so compression would seem to be impossible, my car is too random, as all real objects are, to be compressed.

    What compression does, is it describes an object A in another object, B. The description must be complete, otherwise there is loss. That is how JPEG's and MP3's work - they knock out some of the 'unimportant' data. The data 'we can't see.' But we don't want that on data, we want lossless compression.

    Back to my car - real world objects could be 'compressed' by a complete description, that is exact in its detail but includes patterns to model the object - probably using shape descriptions to model panels and engines, describing one basic piston, and then including each piston's differences from my basic piston description. In this way, the entire car could be described, with no loss to the original, complete idea.

    Hope this helps - thinking about it this way kinda sorted it out, for me.

    --


    Who is this Anonymous Coward character, how does he post so much, and why is he always such a whore?
  225. Information Theory by shoemakc · · Score: 0

    If it's not true random noise, so what? Random noise contains no information; why would you want to compress it?

    So long as the input data isn't hand picked for their claims, but rather is representative of an actual application...what's the problem?

    --
    --an unbreakable toy is useful for breaking other toys--
  226. About truly random by inerte · · Score: 1

    Some said that you can't compress truly random data. Well, in a set of sample datas you might or you might not compress it. You can't just generalize by saying 'no true random can be compressed'. If it's random and you got a decent number of samples something in the between will have to work.

    About the 1 bit discussion, I don't think they meant they can compress something more than one time. Hey, if my 100 mega file gets to 10 mega, I (and the rest of the world) don't need to get to 1 mega, 100k etc... for this technology to be a breakthrough.

    I have noted that lately slasdot's comments are tanging the 'vaporware' discussion. I am under the (correct)impression that, from time to time, a certain topic drives more generally all others here. A couple weeks back was M$ bugs. We could talk about biogenetics and someone would comment about 'what if M$ has a bug and our DNA gets shared', etc...

    So, the new 'wave' is vaporware. While I neither believe someone has found a way to achieve close to 100:1 compression, I think we should be reasonable that if they do, we don't need to achieve a final file of 1 byte size.

    Which in my humble theories is somewhat possible... But that's another topic. Just for the sake of the exercise imagine that a compressor has a large dictionary with levels. So in level one 'a' equals to '1234'. In level two, 'a' equals to '0987'.

    I have never worked with compression but I believe it works something like this:

    1) You found patterns
    2) You replace the patters with smaller combinations from your dictionary, repeat until it's not possible to find patterns.

    Well, if you go the other way and take a bit, and it says in dictionary one bit at level one means 'abc' and 'abc' on level two means '1235jdjlh' which in turn means 'mary had a little lamb', well, you could achieve a 1 bit compression. BUT ONLY IF YOU KNOW what is the output. The other way around. 'Decompress' something that you know the result.

    Because in your dictionary, one bit in level one will always have to means something fixed.

    UNLESS there are two patter dictionary. One in the program itself and another on the file.

    If you have another one in the file, it can interoperate with the decom/compressor to discover what 1 bit means.

    But then again, the compressed part would get to one bit, at least MAYBE (like I said, never saw how REALLY a compression system works). But it should require tha a dictionary is attached to it, making it a larger file.

    Er... can anyone with experience on this field just reply to me if this is how compression systems work? Just curious. And if there's this concept of two dictionaries files interoperating. Thanks :-)

    Okay, now, what impact would this kind of technology (if it exists) have in our lives? Even if it's not 100:1, it's 50:1 or 75:1? Slashdot readers and their comments are extremely good for pointing mistakes and flaws on almost everything

    :-)

    But, some day or another someone somewhere will come with this (or a closer) tech. Maybe it's too much marketing talk, but it will change a lot of things... And I didn't see much comments about the impact.

    Where are the scifi fans when we need them ;-)

    1. Re:About truly random by mclinc · · Score: 1

      Dictionary based compression is just one of the lossless methods. There are many other's like huffman coding (which requires a non-uniform distribution).

      It's a *huge* field.

      --
      "Oh no, not again"
    2. Re:About truly random by mclinc · · Score: 1

      Infact huffman coding is a bad example of "others" as it *is* a dictionary based method or al least part of one. (OOPs)

      --
      "Oh no, not again"
  227. But just think... by Anonymous Coward · · Score: 0

    If you already have the entire set of codes locally, that means you essentially have every pr0n video every made, already on your hard drive. Not only that, but you'll have every non-existent one ever not made, including the one with Natalie Portman getting it on with the midgets from the Wizard of Oz. I have got to get this technology!!

  228. Saving Space? by MjDascombe · · Score: 1

    Theres no way http://www.zeosync.com/index.htm was written by anyone who cares about /saving/ space :p But on a serious note - random data doesn't necessarly mean unique data - all data is psuedo random, and could occour, and if you watch any random data stream long enough, and with a big enough past-data buffer, patterns will emmerge. Mj

  229. Google search on ZeoSync by AnonymousC · · Score: 1

    If a google search is done on ZeoSync, there is no mention of anything other than ZeoSync website. No links, no references from other web sites.

    This is a HOAX .

  230. I think not. by simon_cockle · · Score: 1

    Definition of random data is data with an AIC (information content) of 0.

    So I doubt it.

    --
    ________ semper ubi sub ubi
  231. It's deja vu all over again! by Monte · · Score: 1

    I remember back in DOS and BBS days someone came out with a "radical new compression" program that people tried and went nuts over. You'd test it, of course: hand it a 400k file (which back then was pretty big :), have it compress it, and you'd get something like a 100 byte archive (wowie!). You'd then delete the original, decompress the archive and SHAZAAM! there's your 400k original file back!

    What it was really doing was copying your original file to a hidden directory and embedding the path to the copy in the "archive". Decompressing of course copied the hidden file back. If you were smart enough to find and delete the hidden copy the decompressor would report mysterious Drive C: data errors.

    Wish I could remember the name of that hoax...

    Not that I'm saying this one is a hoax &ltcough&gt

  232. Really? by Anonymous Coward · · Score: 0

    Can we assume then that a file compressed at 100:1 would be random data. If so could you not just recompress that same file and get 10000:1 compression? ...or would this compressed file no longer be random data, but a sequence of mathematical equations that we could compress even further with pkzip. Hmmm... but that would make the data appear random again and we will be able to get 100:1 compression on are already twice compressed file.

  233. So what really is the claim? by jopet · · Score: 2, Insightful

    If it means they compress arbitrary random data it is just bullshit. It is easy to prove that there exists some file that will not be compressible, and not much harder to prove that actually there are many more uncompressable random files than compressable ones (read any text about kolmogorov complexity). But of course most computer files are not at all random. Compressing a *randomly picked* computer file is something different altogether therefore, but it still hard to guarantee a certain compression if the type of information stored in the file is not known. Thats the reason why different compression algorithms for different file types exist. All in all their claim is too fuzzy to say anything ... better compression is a certain thing of the future, guaranteeing compression for random files is just another cold fusion hoax.

  234. Decompression? by Leto2 · · Score: 1

    I have invented a 1000x lossless compression scheme, too. I'm still working on the decompression, tho.

    --
    <grub> Reading /. at -1 is like driving through Cracktown in a convertible that is stuck in 1st
  235. More on Holsztynski... by King+Babar · · Score: 2
    Oops; I should have mentioned that the "real" Wlodzimierz Holsztynski gets a very respectable 1510 hits on google.

    Now here's the interesting part: they used to spell his name right in a previous version of their official bios section. This could just be sloppiness, of course.

    --

    Babar

  236. Vaporware by t_allardyce · · Score: 1

    Ooops, they just missed the Wired Vaporware awards, maybe they can catch them next year. This is almost like that (i think it was austrailian) other compression system a few months back that claimed full screen, high quality, lossy compressed video down a modem - they just didn't say how lossy and there was only one demonstration (that no-one saw). Where is the demonstration for this? i know that if i'd invented it, i would want to prove to the world it was real as fast as i could. On the other hand it could be real - i always thought you could do it with dictionary-based compression, if the dictionary was stored in the decoder software and was very very large to include most likely bit-patterns. Ofcourse fractal compression for any data could be possible - it hasn't been disproved (i dont think). The article looks sus though, and who would call themselves ZeoSync, it sound like a fake movie name...

    --
    This comment does not represent the views or opinions of the user.
  237. If you're looking... by Anonymous+DWord · · Score: 2

    It's over here (Question 9, search for 'WEB, Gilbert').

    --
    "If he thinks he can hide and run from the United States and our allies, he's sorely mistaken." Bush on bin Laden
  238. predicting a crap random number generator by mclinc · · Score: 1

    If they clame it only works with "small sequences" then maybe they are just compressing the predictability in the flawed random number sequence they are using as test data.

    Maybe they need that patch that add's network generated entropy into /dev/random 9-)

    --
    "Oh no, not again"
  239. their website sucks by jopet · · Score: 1

    if their website is an indication for the quality of their compression algorithm i wouldnt invest a single penny. Resizing my browser window and only working with flash. i am so tired of that bullshit.

  240. LOL! - Making the world a better place to live by MayorQ · · Score: 1
    "By significantly reducing the size of data strings, we can envision products that will reduce the cost of communications and, more importantly, improve the quality of life for people around the world regardless of where they live." - from their press release


    I think I'm missing something. How can the size of my pr0n files give people in a third world country a better life?


    - MayorQ

  241. This article was put in the wrong place by Anonymous Coward · · Score: 0

    Someone please move this article to the humor section. No really, before someone here tries to compress Natalie Portman and a bowl of grits, then shove it all down their pants.

  242. 100:1 not too unlikely... by Kjella · · Score: 2

    If you're talking about compressed video over uncompressed. A typical DVD movie would be 720 (horizontal) x 480 (vertical) x 16 (bit, YUV2) x 29.97 (fps) x 6300 (seconds in a 105 minute movie) / 8 (bits pr byte) = ca. 100 gigabytes. In reality you'll get it as 5-6 gigabytes, while as a divx 2-pass (or similar mpeg4-codec) you will reach 100:1, at very little quality loss. Of course this is only possible because movies are *very* non-random both in each frame, and in one frame to the next.

    Kjella

    --
    Live today, because you never know what tomorrow brings
    1. Re:100:1 not too unlikely... by RetsamYthgimla · · Score: 1

      ) you will reach 100:1, at very little quality loss. Of course this is only possible because movies are *very* non-random both in each frame, and in one frame to the next.

      Very little quality loss? Do you actually even watch DVDs? Is your TV or monitor a 9" screen? Come on, who are you kidding!?!? Watch a DVD on a 21" monitor or a 35" or bigger TV, and the compression quality, or shall I say the lack thereof, stands out like an eyesore. I mean, yeah, you get the gist of the seen. But the details are gone. More to the point, the objects move, but the textures typically don't move in sync with the objects. Even with motion encoding in the fancier algorithms, you can still see the gimicks and tricks used to squeeze out those higher compression ratios!

    2. Re:100:1 not too unlikely... by fyonn · · Score: 1

      I've looked into MPEG compression and yes, they do use lots of tricks, they have to or else they wouldn't be able to squeeze so much data onto a dvd, however you have to bear in mind that the resolution of a dvd is, what? 720x480 or something like that. which you're trying to stretch out over a huge surface area. while modern tv's with digital processing do their best to smooth it out there is still only so much data to go around. this is why we need stuff like HDTV (which we in the UK haven't a hope of seeing in the next 5 years at least).

      however I would think that if dvd's look so bad for you then perhaps you should take a closer look at your equipment. I have a 100hz 32" panasonic widescreen tv and a sony dvd player and a well encoded, high bitrate dvd looks pretty stunning IMHO. if you pause it, sit 5" away from the screen and start paging through frame by frame then you are going to see errors but MPEG wasn't designed for that. it's designed for movies, not static images. it takes full advantage of all the failings of our eyes to dump masses amounts of data. sure, laser discs were great, but they weren't exactly easy to store.

      until we have those fluro-holographic discs (with 100+ clear layers of data on) show up then we'll have to make do as best we can.

      dave

  243. Too bad we don't know all the digits of pi... by Tony+Hammitt · · Score: 2

    The binary representation of pi contains all sequences, so it is claimed.

    If only we could predict what the Nth bit of pi was going to be, then we could just specify an offset into the bit sequence and a length and we could have any file compressed as two numbers.

    One of the numbers would be pretty large, though... It could easily be as big as the bit representation of the file, but hey, who cares??
    It's still a possible algorithm. These ZeoSunc people don't seem to care about practical algorithms either...

    Gimme some VC money!!!

  244. Slashdot should be ashamed! by lcrocker · · Score: 1

    Why is slashdot giving free publicity to these frauds? It's not like there's any chance in hell they've done something useful. How much brain power does it take to realize that you can't beat elementary math? We've seen this same scam a dozen times before, and it's always a fraud just like all the reasoning people point out.

    This is the software equivalent of perpetual motion machines, or snake-oil elixirs. Giving them coverage will only encourage more people to ry the same scam. These folks need to be reported to the justice department, not slashdot.

    --
    --Lee Daniel Crocker : http://www.etceterology.com My life is in the public domain.
  245. Ahh, this brings back the old days ... by Jahf · · Score: 1

    The first thing I thought of when I saw this was an old BBS hoax ...

    The time was somewhere around 1985 ...

    The hoax was a program that claimed to have (drum roll please) 100:1 file compression. So sure, I downloaded the thing on my lovely 1200 baud modem, installed it and tested it on a 512K file.

    Sure enough, the resulting file was less than 5K ... slightly -better- than 100:1 compression. I was impressed.

    Then I took a close look at the program and, after investigation, found that even though this file was 5K, my disk space available had not decreased ... in fact ... it had increased by ... 5K.

    Of course, the hoax was that this program simply renamed and hid the old file and installed a TSR (Terminate and Stay Resident ... old DOS terminology for essentially a background daemon, though a real ugly method) that, whenever you went to touch the file, it intercepted the call and fed you the renamed old version.

    ...

    A variation on this didn't do the TSR ... it just pretended to compress the file and you had to "uncompress" (ie, unhide the old version and rename it over the faked file) with the same program.

    ...

    I would love for this new technology to work, but chances are in real life applications it's going to be about as productive as the BBS hoax was.

    --
    It is more productive to voice thoughtful opinions (reply) than to judge (moderate) others.
  246. Anyone remember the OWS hoax? by wberry · · Score: 5, Interesting

    Back in 1991 or 1992, in the days of 2400 bps modems, MS-DOS 5.0, and BBS'es, a "radical new compression tool" called OWS made the rounds. It claimed to have been written by some guy in Japan and use breakthroughs in fractal compression, often achieving 99% compression! "Better than ARJ! Better than PKzip!" Of course all my friends and I downloaded it immediately. Now we can send gam^H^H^Hfiles to each other in 10 minutes instead of 10 hours!

    Now I was in the ninth grade, and compression technology was a complete mystery to me then, so I suspected nothing at first. I installed it and read the docs. The commands and such were pretty much like PKzip. I promptly took one of my favorite ga^H^Hdirectories, *copied it to a different place*, compressed it, deleted it, and uncompressed it without problems. The compressed file was exactly 1024 bytes. Hmm, what a coincidence!

    The output looked kind of funny though:
    Compressing file abc.wad by 99%.
    Compressing file cde.wad by 99%.
    Compressing file start.bat by 99%.
    etc. Wait, start.bat is only 10 characters, that's like one bit! And why is *every* file compressed by 99%? Oh well, must be a display bug.

    So I called my friend and arranged to send him this g^Hfile via Zmodem, and it took only a few seconds. But he couldn't uncompress it on the other side. "Sector Not Found", he said. Oh well, try it again. Same result. Another bug.

    So I decided that this wasn't working out and stopped using OWS. Their user interface needed some work anyway, plus I was a little suspicious of compression bugs. The evidence was right there for me to make the now-obvious conclusion, but it didn't hit me until a few *weeks* later when all the BBS sysops were posting bulletins warning that OWS was a hoax.

    As it turns out, OWS was storing the FAT information in the compressed files, so that when people do reality checks it will appear to re-create the deleted files, as it did for me. But when they try to uncompress a file that actually isn't there or has had its FAT entries moved around, you get the "Sector Not Found" error and you're screwed. If I hadn't tried to send a compressed file to a friend I might have been duped into "compressing" and deleting half my software or more.

    All in all, a pretty cruel but effective joke. If it happened today somebody would be in federal pound-me-in-the-ass prison. Maybe it happened then too...

    (Yes, this is slightly off-topic, but where else am I going to post this?)

    --
    LAMP hosting on Debian, SSH, no bandwidth cap, PayPal accepted - http://secondbrainhosting.com/
  247. The only way this could be any better... by dave-fu · · Score: 2

    ...was if they were powered by Blacklight Power. If you're not in the know, they're a "power company" run by a "scientist" who claimed that he had been able to reproduce something that sounded suspiciously like cold fusion in his Princeton, NJ-area laboratories. The Village Voice ran a story on them (where I read about these jokers) and a whole slew of investors were lined up (in the heady days a few months before the dot-com bubble popped) and last I checked, they still haven't actually, you know. Produced what they said they would two years ago (power).
    If you've got a slow afternoon, take a gander at what physicists have to say about Blacklight...

    --
    Easy does it!
    This comment has been submitted already, 276865 hours , 59 minutes ago. No need to try again.
    1. Re:The only way this could be any better... by Anonymous Coward · · Score: 0

      A quick look at the blacklight site suggests that, amoung other things, they mistaken the radioactive decay of strontium for "magical energy emitted by their novel processes"

      What scares me is they idiots are playing with strontium in a populated area.

  248. coincidence? by mydigitalself · · Score: 1

    interesting the way this article appears just above the one about wired magazine's vaporware list!

  249. How to get those ratios by Anonymous Coward · · Score: 0

    Just use RLE on a bunch of zeroes!

  250. Byte Magazine had a very similar article 14yrs ago by hottoh · · Score: 1

    The technology never materialized. They were then touting a lossy compression scheme. Unfortuneatly, I am only guessing at 1000:1 ratio for images. I doubt very highly these guys can do what they claim either.

  251. Here's an algorithm by CoHortSoftware · · Score: 1

    Here's an approach: A random number generator can generate a series of random numbers. So, given a series of random numbers, couldn't you work backwards to find the seed (and perhaps a few other parameters) for the random number generator that would generate that series of random numbers? Clearly, this is lossless. Clearly, the job gets harder as the series of random numbers gets longer. Perhaps 100 is the practical limit for finding the right seed. One nice feature: this is very asymmetric compression. Compression is very slow, but decompression would be very fast. Only problem: I don't know if it is feasible to find the seed.

    1. Re:Here's an algorithm by recursiv · · Score: 2

      This is a common idea, and it might seem like it would work. However this idea still fails to take into account the counting argument. For example, if the seed is limited to 64 bits, this
      algorithm can generate at most 2^64 different files, and thus is unable to compress *all* files longer than 8 bytes

      --
      I used to bulls-eye womp-rats in my pants
  252. Press release definition of "Practically Random" by Anonymous Coward · · Score: 0

    The use of the term "practically random" in the press release is of course an exception that's large enough for you to drive a truck through it.

    The press release has enough marketing hype in it that it's hard to see if there's any substance behind it, but it's quite possible that what is meant here is that the data is "randomly selected" from a "typical computer user's system." In other words, it's not random in the mathematical sense, but a random sample of a highly nonrandom space. (Or at least you would hope that the typical computer disk contains highly nonrandom data, otherwise you can't get much real work done with it!).

    This would be a very different claim and one that might very well be true; Microsoft Word and Excel files, for example, are highly redundant files, as are executable images, email messages etc. It's even reasonable to think of possible ways to do better than zip/gzip by determining the file type and applying different compression algorithms to different file types; the power of programs like zip/gzip is that they do pretty well with no a priori knowledge about the structure of any particular type of file.

    However the amount of hype in the press release does sound suspiciously like they're trying to figure out a way to separate naive investors from their money ...

  253. What kind of data? by mochan_s · · Score: 1

    In the future people will laugh back at the tremendous waste of time and money for trying to break Shannon's law as much as we laugh back at the people who tried to break the law of conservation of energy (eg. engines that did more work than the energy inputted).

    I think maybe there is some sort of entropy in some multi-dimensional space for certain kinds of data that gives us enough redundancy to compress 1:100. Of course, the data cannot be random (otherwise no more frequent patterns or redundancy) and would only theoretically be compressile to our new measure of entropy. But, what kind of data? Even Shannon's law allows for 1:100 given the right kind of data.

  254. What about Kolmogorov? by Lictor · · Score: 2, Interesting

    I think the following statement in the press release pretty much says it all:

    >We perceive this advancement as a significant
    >breakthrough to the historical limitations of
    >digital communications as it was originally
    >detailed by
    >Dr. Claude Shannon in his treatise on Information
    >Theory."

    How about algorithmic information theory? Kolmogorov, Solomonov, Chaitin? The statement above indicates that the most recent word on compression is an old Bell Labs tech report by Claude Shannon... not to put Shannon down, that work *is* a landmark, but there has certainly been more work done since.

    Try compressing the number Pi using Shannons theory... you can't do it. On the other hand, using Kolmogorov complexity, you can compress it quite nicely.

    The fact that this statement appears in the press release seems to indicate a great deal of ignorance on the part of this corporations researchers. Part of any good research program is to familiarize yourself with previous work done in the field... and AIT is *not* some obscure backwater idea... there are several conferences on this topic every year and just about every CS graduate student has seen at least Kolmogorov complexity.

    This is a pretty serious credibility robber. (Not to mention that from a mathematical standpoint, compressing totally random data is impossible under our current axioms... so if we *can* compress completely random data... its time for a new theory of the foundations of mathematics. At the risk of sounding dogmatic: do you *really* think some dot-com startup is capable of this?

    Perhaps they are, but I'm going to need to see the proofs written up nice and formally before I run out and buy snake-oi... I mean *stock*.)

  255. They Are All So Dumb by Lonath · · Score: 1

    These people who come up with these recursive lossless compression algorithms that can compress any file are stupid. They just don't see the real possibilities of such an algorithm.

    If I had such an algorithm I would decompress 0 and 1 repeatedly until I generated every possible piece of content in the universe and then sue the shit out of anyone who dared to create or copy anything without my express permission!!!!! BWAHAAHAHAHAHAAA!!!!

  256. At the ZeoSync investor demo by SuperKendall · · Score: 2

    ZeoSync: Ladies and gentleman - observe! The random data goes in THUS, and run through our process, comes out 100 times smaller!

    ZeoSync: Now, we carefully unpack and - volia! random data of the same size as before! This is due to our patented process and a little bit of magic we like to call "length of file stored in the header".

    Investor: Hey - those first few bytes from the original and uncompressed file look totally different!

    ZeoSync: Those bytes are in there somewhere - we only said LOSSLESS compression, not ordered!

    --
    "There is more worth loving than we have strength to love." - Brian Jay Stanley
  257. Fraud, but not aimed at us by Anonymous Coward · · Score: 0

    Yeah, from their techno-nonsense filled press release, I'd say this is fraud, but not aimed against anyone with technical knowledge.

    I could just imagine these people going down to retirement communities and poor neighborhoods and asking for "investments". Then, they dazzle the poor victims with this press release and their flashy web site, get their money, and run!

  258. It is scientific fact by TenPin22 · · Score: 1

    " Once randomized, ZeoSync's BinaryAccelerator(TM) encodes these singular-bit-variance strings within complex combinatorial series to result in massively reduced BitPerfect(TM) equivalents. "
    Its scientific fact. There no real evidence for it, but it is scientific fact...

    Typical press

  259. Shame on you /. by SIGFPE · · Score: 2

    Next you'll be publishing stories about squaring the circle and trisecting an angle with straight edge and compasses. Claiming to be able to compress random data is the oldest joke in the CS book and you fell for it.

    --
    -- SIGFPE
  260. Small Print by Tuscahoma · · Score: 1
    I love the small print on their press release:
    This press release may contain forward-looking statements. Investors are cautioned that such forward-looking statements involve risks and uncertainties, including, without limitation, financing, completion of technology development, product demand, competition, and other risks and uncertainties.
    Gotta cover your ass so that the investors can't sue you when they discover it was all a scam.
  261. what about the _de_compression algorithm? by sciuro · · Score: 1

    they say the compression software should be available sometime in 2003. however, there doesn't seem to be any mention of the decompression software...



    or am i just too cynical...?



    -duncan

  262. Not exactly by apankrat · · Score: 1

    > no pattern == no compression

    No pattern may be quite well compressed, but only in special cases. That's what some people call 'fractal compression', which AFAIK means replacing data with formula (and optional initial data). Decompression is simply iterative application of formula to the initial data. There are three problems though:
    (a) generally it's lossy
    (b) it's .. err .. very hard to find the formula
    (c) not every data set can be compressed
    Otherwise it's fine :)

    --
    3.243F6A8885A308D313
  263. Some thoughts on compression by Tepic++ · · Score: 1

    Reading through the comments, has caused me to fire off some "practically" random thoughts:

    The better the compression ratio, the smaller the data set that can be compressed to this size is.

    One piece of random data is not the worst case, as it may represent the best case for a compression ratio.

    The average compression ratio for a fixed length of random data in general is not the worst case for an algorithm, but merely the average compression ratio for the algorithm over all data sets.

    Algorithms and formulae contain a lot of information, much/most of it implied. e.g. the addition operator has an implied meaning, which when fully stated takes up much more space than '+'.

    The statment "Cyan, magenta and yellow are primary colours in negative colour mixing." implies lots of information about the structure of our eyes and of light. The amount of information that is implied increases as the readers understanding of the words grow.

    If the universe is built on a few small pieces of information, then maybe everything has patterns coming from these that allow it to be compressed back to something very small in size.

    Small excercise I thought through:

    100:1 compression ratio on 1000 bits:
    1000 bits have 2^1000 combinations.
    10 bits have 2^10 combinations.
    So each combination of 10 bits would need to represent 2^100 combinations of 1000 bits.

    BUT, say we have 2^100 compression algorithms, each are optimal at compressing 2^100 seperate patterns of 1000 bits of data to 100 bits. The patterns do not overlap.

    We add 100 bits to the compressed data to tell us which algorithm to choose. Therefore final compressed data becomes: 110 bits long.

    New compression ratio is:
    1000/110 = 9.1 (1 d.p.)

    This can be achieved on any 1000 bits we have. 9.1:1 compression is a pretty good general compression ratio for all cases.

    BIG PROBLEM: compressor/decompressor would need knowledge of each algorithm. If each algortihm only took up 1 byte, 1,125,899,906,842,624 Petabytes of storage would be needed.

    Suicidal optimism: There may be one algorithm that can generate all these other algorithms from a very small data set though.

  264. Cor-r-r-rection by apankrat · · Score: 1

    > * If the data was represented a different way (say, using bits instead of bytesize data) then patterns might emerge ..

    Then it was not 'random' data. As far as I remember from early Univeristy ages, random data is data that have 0 self correlation, thus it does not matter if it's a bit or octet-encoded

    --
    3.243F6A8885A308D313
  265. a possible 100:1 compression algorithm :) by mitch0 · · Score: 1

    Some years ago I "invented" a pretty nifty compression algorithm. tried to implement it, too. turned out to be unusable of course, but only because of the computational complexity of it (it would take more time then the life of universe on reasonably large data :)

    who knows, maybe quantum computing will make it possible though, so here's the deal:

    1.) take a GOOD pseudo random generator (one that is as random as possible, but can reproduce the same ramdom string if started from the same "seed".

    2.) run this algorithm, try matching strings from the generated random string with the data you want to compress.

    3.) upon reaching a significant match, record the position in the random stream, and the length of the match

    some bits needs some polishing, but if this method wouldn't take ages, it could actually be usable :)

    cheers,
    mitch
    --
    // "If human beings don't keep exercising their lips,
    // their brains start working." -- Ford Prefect

    --
    // "If human beings don't keep exercising their lips,
    // their brains start working." -- Ford Prefect
    1. Re:a possible 100:1 compression algorithm :) by crosbie · · Score: 1

      You can really improve this algorithm by making the seed the same seed that would generate our universe (assuming it was generated from a seed ( hence the big bang)), and then developing a URI system for any subset of this universe. I wonder if the size of the URI would typically be less than the size of the information it referenced? A bit like saying "can one create a URI for an infinite string of digits like PI, and can one guarantee that it will be shorter than the sub-strings one will be interested in. Anyway, there is a pretty good compression scheme that exceeds 100:1 in many cases for any item of information on the web, i.e. a URL. In other words the best compression scheme involves creating a database (the web) that contains every possible stream of data that mankind is currently interested in, and then having a URI scheme where the URL's length is inversely proportional to the frequency that the file is referred to. I wonder what kind of compression ratio Google's database manages? If history is destined to repeat itself then perhaps there is a finite limit to the web's growth?

  266. "practically random data" by hackerhue · · Score: 3, Funny

    The output from a pseudo-random number generator is usually considered "random enough for practical purposes." So if you define "practically random data" as "data that is random enough for practical purposes," you can compress it by storing the random seed and the string length. ;-)

    I think I can beat their 100:1 compression ratio with this scheme.

    --

    To get something done, a committee should consist of no more than three persons, two of them absent.

  267. I had the same idea by apankrat · · Score: 1

    You dont need to come with a formula though. All you need to do is to find *where in PI* your random string starts. Yet it may start very far from the begining and thus it's position number may happen to be larger than the source string itself :)

    --
    3.243F6A8885A308D313
    1. Re:I had the same idea by Flammon · · Score: 1

      Yes, I thought of that but what I suggest is not to use PI, but instead develop a formula in a similar way that one was developed for PI.

      You're right that using PI is silly unless the number you are compressing is PI. Then you've got great compression, infinite to 1 :)

    2. Re:I had the same idea by -brazil- · · Score: 1

      Wrong again. It is not know whether PI is a normal number, i.e. whether it really contains all possible finite sequences.

      --

      The illegal we do immediately. The unconstitutional takes a little longer.
      --Henry Kissinger

  268. An obvious fake by arvindn · · Score: 2, Informative


    100:1 ratio? On random data?
    Considerations far more elementary than Shannon's limits rule out compression of statistically random data by even a single bit. Here's why:
    There are 2^n bit strings of length n. Any compression method purporting to compress random strings (by even a single bit) must produce output of length at most n-1 for these 2^n inputs. But in that case the mapping is not unique, since there are only (2^n)-1 bit strings of length n-1 or less. (So decoding is not possible.)
    Once every so often some "researchers" claim to have attained the holy grail of compression. Too bad we never hear of them again :(

    From the comp.compression faq
    this topic has generated and is still generating the greatest volume of news in the history of comp.compression
    ...
    The advertized revolutionary methods have all in common their supposed ability to compress random or already compressed data. I will keep this item in the FAQ to encourage people to take such claims with great precautions

  269. impossible by perky · · Score: 1
    I would have thought that lossless compression of random data is impossible since compression schemes make use of the non randomness of data. Doesn't information theory prohibit this?

    --
    "The new wave is not value-added; it's garbage-subtracted" - Esther Dyson, Dec 1994
  270. definition of the pigeonhole principal by marick · · Score: 1

    Rather, this problem can only be successfully resolved through the solution of what is commonly understood within the mathematical community as the "Pigeonhole Principle."

    Given a number of pigeons within a sealed room that has a single hole, and which allows only one pigeon at a time to escape the room, how many unique markers are required to individually mark all of the pigeons as each escapes, one pigeon at a time?


    I'm pretty sure that's not the pigeonhole principle. As I understand it, the pigeonhole principal is the following:

    Given N pigeons, and M holes to stuff them in, at least the ceiling of N/M pigeons must be in one of the holes. If M is N-1, then this number is 2 (i.e. rolling 7 six-sided dice leads to at least one pair).

    This looks like a hoax to me.

  271. Anyone can compress random data 100 to 1 by NotoriousQ · · Score: 1

    here is my solution


    void main()
    {
    int i;
    for(i=0;i (lessthan -- do not know any html) 1000000;i++) {
    printf("%c",random(256));
    }
    }


    since real random data carries no information, i have achieved lossless compression of 100 to 1.

    PS. Just add a couple more zeroes to achieve an even better compression

    Now do i get my own story on slashdot.

    Didn't think so

    --
    badness 10000
  272. GodzillaCrunch(tm) by Anonymous Coward · · Score: 0

    i know of a smart way to compress files
    no you dont
    I can guarantee its something retarded hes gonna say
    Ljung: I find it hard to believe that you, an idiot, can think of a better encryption algorithm than zip and other widely used compression methods
    heh there's a compression program called godzillaCrunch
    it makes files like 0.2 %
    but they can't get decompressed

  273. Using irregular numbers? by Com2Kid · · Score: 1

    Mabye they have a whole host of irregular numbers stored on some sort of massive file array all figured out to a crud load of bits in binary.

    A cluster of super computers then offsets the data against a the irregular number that allows for the least amount of offset. All that would have to be translated is the offset, which would seem like a prime canidate for scientific notation. :)

    "Uh, yah, that video file there is 121^(^3434^173) offset into e . . . :)"

    Of course having the destination computer DECOMPRESS this data would be another matter entirly, hehe. But with home computers getting faster and faster, and preferably only easy to calculate irregular numbers being used, it likely wouldn't be /too/ bad. Heh. Mabye a few hours only for decompression time? I'm sure that the modem users would love it at least. :)

    It is just the penultimate trade-off between the end size of the compressed file and the time/power it takes to compress/decompress it.

  274. GodzillaCrunch(tm) by Anonymous Coward · · Score: 0

    i know of a smart way to compress files
    <fraggle> no you dont
    <fraggle> I can guarantee its something retarded hes gonna say
    <fraggle> Ljung: I find it hard to believe that you, an idiot, can think of a better encryption algorithm than zip and other widely used compression methods
    <Ljung> heh there's a compression program called godzillaCrunch
    <Ljung> it makes files like 0.2 %
    <Ljung> but they can't get decompressed

  275. Re:Byte Magazine had a very similar article 14yrs by Eric+Smith · · Score: 2
    I was just thinking about that BYTE article. I'm not sure which issue it was in. I think it was in a news blurb sort of column. IIRC, they claimed their compression algorithm was "not affected by the laws of information theory".

    The reporter wrote glowing things about how when he decompressed his files, they had the right size and timestamp. There was a small matter of the contents being wrong, but the company had assured him that this was just a small glitch in the beta version that would be fixed in the final release.

    I can imagine that some junior reporter might fall for this, but where the heck was the editor?

    I imagine that the whole stunt was probably part of a scam to defraud some investors. Get it published in a magzine, and it must be legit, right? I wouldn't be the least bit surprised if this new "lossless compression algorithm" proved to be such a scheme.

    BYTE went seriously downhill around 1985 or so. A friend seems to think that it was a result of Steve Ciarcia moving on, but I don't think that fully explains it. Before that, there were plenty of technical articles by other authors, but BYTE turned into a rag full of mostly non-technical reviews.

  276. Re:how can this be? Answer: BitPerfectTM by Alsee · · Score: 4, Insightful

    Note the results are "BitPerfectTM", rather than simply saying "perfect". They try to hide it, but they are using lossy compression. That is why repeated compression makes it smaller, more loss.

    "Singular-bit-variance" and "single-point-variance" mean errors.

    The trick is that they aren't randomly throwing away data. They are introducing a carefully selected error to change the data to a version that happens to compress really well. If you have 3 bits, and introduce a 1 bit error in just the right spot, it will easily compress to 1 bit.

    000 and 111 both happen to compress really well, so...

    000: leave as is. Store it as a single zero bit
    001: add error in bit 3 turns it into 000
    010: add error in bit 2 turns it into 000
    011: add error in bit 1 turns it into 111
    100: add error in bit 1 turns it into 000
    101: add error in bit 2 turns it into 111
    110: add error in bit 3 turns it into 111
    111: leave as it. Store it as a single one bit.

    They are using some pretty hairy math for their list of strings that compress the best. The problem is that there is no easy way to find the string almost the same as your data that just happens to be really compressable. That is why they are having "temporal" problems for anything except short test cases.

    Basicly it means they *might* have a breakthrough for audio/video, but it's useless for executables etc.

    -

    --
    - - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
  277. also from the press release by MrFredBloggs · · Score: 1

    This press release may contain forward-looking statements. Investors are cautioned that such forward-looking statements involve risks and uncertainties, including, without limitation, financing, completion of technology development, product demand, competition, and other risks and uncertainties.

    I wonder why they are warning that they are uncertain they`ll complete it. Anyone here emailed them?

    1. Re:also from the press release by larien · · Score: 1
      Could just be the usual lawyer stuff; you know, "if we don't put in this disclaimer and some investor loses more than $5 we might get sued for inflating share prices"...

      I don't think it necessarily signals anything more than the usual lawyer butt-covering, but I'd still like to see the technology in action before I'll view it as a revolution in compression.

  278. Kolmogorov Complexity by nilsey · · Score: 0

    Here's what they are claiming to use. Seems like this is a way to descirbe data multidimensionally in a way that isn't readily assimilated digitally.

    note that last line of the excerpt i give below.

    what i thinkis, if you're going to use binary data, you've got to follow the rules of the road -- shannon's law.
    check it out on this website

    Examples of Kolmogorov Complexity

    1. Pi is an infinite sequence of seemingly random digits, but it contains only a few bits of information: the size of the short program that can produce the consecutive bits of pi forever. Informally we say the descriptional complexity of pi is a constant. Formally we say K(pi) = 0(1), which means "K(pi) does not grow".

    2. A truly random string is not significantly compres;sible; its description length is within a constant offset of its length. Formally we say K(x) = Theta(|x|), which means "K(x) grows as fast as the length of x".

    --
    -- too cruel for schuel
  279. Flame City in Here! by Anonymous Coward · · Score: 0

    I'm not saying that I believe these guys but I can think of one means of compression that is a possibility. This has been a background thought of mine for years but I never did anything with it.
    Pseudorandom number generators generate a lot more output than they occupy in storage space. If one were to find a way to either derive generators and coefficients from the target data or to match generators with the target data then a patchwork of generators could provide huge compression ratios on seemingly random data.

    Two more points regarding such a scheme :

    1) It would be highly asymmetrical, compression could take a gazillion years, decompression would be extremely fast. This would be acceptable in the content delivery business.

    2) What is "random" anyway? A 1600 byte segment of output from a generator represented by a 16 bytes can appear to be "random" when subjected to statistical tests. If you get lucky and find a generator that can reproduce part of the sequence you've compressed it.

    So, I'm probably more skeptical than the next guy but it's a big world out there so I don't pretend to know everything.

    Coniine ( forgot passwd )

  280. Let me see... by Chris+Canfield · · Score: 1
    Let me see if I have this straight.

    First, you have a or various random-looking number generators of some sort that net you something to compare to the data, probably VERY carefully chosen. You pretend that the seed data doesn't count against your total data. Their indecipherably obtuse hypercube example makes you think that they coax this pattern many times from various "angles" so that they get something shaped like the original data out of the other end.

    I'm not buying this claim of "lossless." If they are comparing it to existing compression at 10:1, then they mean JPG or MP3 or DIVX or things like that... none of which are truly lossless. Or, as this is a "temporally-challenged" unproven multi-pass system, perhaps they have found a way to get the above situation to work for certain data losslessly, and are praying to the mathematical gods that zipping a zip file won't just add another 20K.

    If they are attempting to compress visual data. Aren't most broadcast images lighter on the top than the bottom? Don't they involve stick-figurey thingies? Why not just send texture and position data to a computer and let us all watch poser-o-vision. After all, we're already dancing like puppets to these posers.

    &nbsp

    This press release may contain forward-looking statements. Investors are cautioned that such forward-looking statements involve risks and uncertainties, including, without limitation, financing, completion of technology development, product demand, competition, and other risks and uncertainties.

    --
    This Sig is a mnemonic device designed to allow you to recognize this author in the future.
  281. I can do 1G to less than 1k already! by jaybob20 · · Score: 1

    dd if=/dev/zero of=bigfile bs=1k count=1000k
    bzip2 -9 bigfile
    ls -l bigfile
    753 bigfile.bz2

    Nice huh?? Now put a few of these big files in the kernel tree and see what people say.

    But really, this sound like the next wave of .com's and VCs has hit. And it is just what America needs to pull out of this slump that was caused by the last bunch of .com's and VCs.

    And hey everyone don't forget to vote this one for Wiered's vaporware top 10.

    --
    It was dark and I didn't have my contacts...
  282. Give back Shannon his decade by Lulu+of+the+Lotus-Ea · · Score: 1

    Advancing my general theory that Reuters reporters are idiots, the article took 10 years off the life of the estimable Claude Shannon. Sadly enough--and well known to /. readers, Dr. Shannon died last year (2001), not in 1991. This obscure bit of knowledge was buried away in technical journals like the _NYT_ and _Entertainment Weekly_, so one can see how Reuters missed it.

  283. Sounds Like B.S. by SkewlD00d · · Score: 1

    All compression does is maximize bit entropy; that is, compression CANNOT occur on random data!!

    Sounds like bullshit marketing of someone looking for VC funding.

    --
    The biggest trick the devil pulled was letting lawyers become politicians so they can write the laws.
  284. I have an algorithm for compressing random data by SIGFPE · · Score: 2
    For example I can compress the first million numbers generated by rand() into a few bytes.
    A similar technique works with the output of drand48() and in fact for a long enough sequence this approach works with every random number generator algorithm available today.


    In fact here's the compressed file for the rand() case:


    int i;for (i = 0; i<1000000; ++i) printf("%d\n",rand());


    Use gcc as decompressor.

    --
    -- SIGFPE
  285. This thing just screams "scam" by Animats · · Score: 2
    • Big claims, no demo, no papers, and it doesn't work yet.
    • It's headquartered in West Palm Beach, Florida. Unclear why, but Southern Florida has been a major scam center for decades.
    • They're trying to get people to invest, publicly advertising for "accredited investors". It's not usually done that way. If they went to a VC for funding, the technology would get looked at, hard. (If it worked, getting VC funding for this would be easy.) If they went for an IPO, they'd have to file disclosures with the SEC under penalty of perjury.
    • They claim: "All of these traditional methods are being enhanced by ZeoSync through collaboration with top experts from Harvard University, MIT, University of California at Berkley, Stanford University, University of Florida, University of Michigan, Florida Atlantic University, Warsaw Polytechnic, Moscow State University and Nankin and Peking Universities in China, Johannes Kepler University in Lintz Austria, and the University of Arkansas, among others." Yeah, right. Let's see some names.
    • The Flash animation on the web site appears to be constructed entirely with stock photography. There's no useful information in the images. (Maybe that's their approach to compression.)

    Scroll down to Incredible Claims for descriptions of the last four scams like this. Remember Pixelon?

  286. Simple, it can't be by nusuth · · Score: 5, Insightful
    I have been pretty late to this thread, and I'm sorry if this is redundant. I just can't read all 700 posts.

    1:100 average compression on all data is just impossible. And I don't mean "improbable" or "I don't belive that", it is impossible. The reason is pigeon hole principle, for simplicity assume that we are talking about 1000bit files, although you can compress some of these 1000bit files to just 10bits, you cannot possibly compress all of them to 10bits, as with 10 bits is just 1024 different configurations while 1000bits call for representations of 2 different configurations. If you can compress the first 1024, there is simply no room to represent remaining 2-1024 files.

    ...And that is assuming the compression header takes no space at all...

    So every loseless compression algorithm that can represent some files with other files less than original in length must expand some other files. Higher compression on some files means number of files that do not compress at all is also greater. Average compression rate other than 1 is only achiveable if there is some redundancy in original encoding. I guess you can call that redundancy "a pattern." Rar, zip, gzip etc. all achieve less than 1 compressed/original length on average because there is redundancy in originals : programs that have some instructions, prefixes with common occurance, pictures that are represented with full dword although they use a few thousand colors, sound files almost devoid of very low and very high numbers because of recording conditions etc. No compression algorithm can achive less than 1 ratio averaged over all possible strings. It is a simple consequence of pigeon hole principle and cannot be tricked.

    --

    Gentlemen, you can't fight in here, this is the War Room!

    1. Re:Simple, it can't be by FalconStrike · · Score: 1

      Actually I think something can be done without breaking what you just proved.

      If I remember correctly, these guys claim to use randomness to their advantage. Here's my guess into how randomness can be used. It's a complete shot into the dark.

      Given a block of random sequence of bits and a library of pseudo-random number generators that generates 1's and 0's with equal probability:

      1. Find a pseudo-random number generator and the seed that generates a sequence that best approximates the block given (i.e. a sequence that has low number of differences with the input)

      2. Take the diff of the two sequences and compress the diff with standard lossless compression (e.g. Huffman encoding)

      3. We now can represent that block by some ID representing the pseudo-random number generator, the seed, and the compressed diff.

      The catch is that both sides must have the same library of pseudo-random number generators. Thus in essence, that libary serves as a code book used to encode the block. The diff is there to make the approximation process less cumbersome.

      I just thought of this off the top of my head and haven't done any rigorous thinking about it. So I have no clue whether this would work or not. Any one think this would work?

    2. Re:Simple, it can't be by Anonymous Coward · · Score: 0

      Nope, sorry. You approach would only work if you agree to only encode things that happen to approximate the output of one of the PRNGs in your library. If you want to be able to compress every possible data set you'd need an infinite number of PRNGs in the library. Of course, if you have that, the ID to select which PRNG to use is going to be rather larger than the orginal data set. nusuth's given the standard proof above, it is simply a fact of life. ZeoSync is either lying to the press, or more likely just stupid and lying to themselves.

    3. Re:Simple, it can't be by Tiny+Elvis · · Score: 1

      1. Find a pseudo-random number generator and the seed that generates a sequence that best approximates the block given (i.e. a sequence that has low number of differences with the input)

      I can't imagine how to do this step without examining all output from all possible pseudo random generators.

    4. Re:Simple, it can't be by Anonymous Coward · · Score: 0

      What if you did this:

      You have a truely random file, but what if you could 'seed' it with certain bits of data (created mathmatically so you could save with the compressed file you you 'seeded'), and the new pieces of data inserted into the truely random stream causes it to be not-so-random anymore, and therefore capable of being compressed easily? Maybe they found a system that could figure out how to make non-randomness out of randomness by adding additonal data? Then the de-compression is just un-compress and overwrite the changed data with the origional data. Would this be plausable? This would get around the '100% random data can not be compressed' problem.

      -Chris

    5. Re:Simple, it can't be by FalconStrike · · Score: 1

      Hmm...pseudo random number generators exhibit chaotic behavior, thus while they appear to be random, they can be modeled by some non-linear model. Perhaps this is where the high dimesional analysis in the press release they mention comes in.

    6. Re:Simple, it can't be by FalconStrike · · Score: 1

      Well, if you have one pseudo random number generator in your library that describes let's say 80% of the bits of the sequence in the block without error, the compressed diff will take care of the error correction. Of course at even at 80% accuracy we only perhaps achieve a mere 5 to 1 compression ratio. To get 100 to 1, we'd have to get into the 99% realm.

      I have no clue as to how many pseudo random nubmer generators would be needed in the library to describe all the possible data out there with 99% accuracy, but it's definitely not infinite though it may be astronomically large.

    7. Re:Simple, it can't be by nusuth · · Score: 1
      The idea of impossibility does not make any reference to how the data is compressed, that is irrelevant. Think of it this way, call a compressed file a label. You have n files, and m labels. When the decompressor finds a label it creates the file corresponding to it. Clearly, if there is less than n labels the decompressor cannot decide which label corresponds to which file (at least for some labels), so it must be the case that m>=n. Now the problem is for a population of exactly k bit long files, there must be at least 2^k labels, which can be represented by at least k bits, which happens to be the original length. If labels are not shorter than files they represent, there is no compression so there is no such thing as overall compression. The trick is, you can label common files with labels that require less than k bits, and uncommon ones with labels that require more than k bits, which would give you an average compression ratio in daily usage other than 1. How you make this is where the nature of compression algorithm comes into play.

      Now, if the data is truely random and normally distributed the frequency of occurance statistics does not exists. You cannot exploit the trick of making common ones with shorter labels, since every file is equally common. This, also, is independent of how you plan to make labeling process.

      --

      Gentlemen, you can't fight in here, this is the War Room!

  287. A BRILLIANT business move by ZeoSoft! by Rayonic · · Score: 2, Insightful

    Bear with me for a moment. This kind of 'compression technology' is EXACTLY the kind of thing the MPAA has been dreading. Imagine millions of people on Morpheus trading 5MB copies of The Matrix, Star Wars and everything else. Of course it's a hoax, but if they can keep it up long enough, then maybe they'll get bought out by the MPAA, RIAA, or whoever!

    ZeoSoft is ushering in the business model of the new millenium - fooling the tech-illiterate elite of today's content cartels into buying them out, then laughing all the way to the bank! I applaud ZeoSoft for their initiative, and hope to see other such business ventures in the future.

    Now, if you'll excuse me, I'm off to develop a program that uses fractal-temporal equations to randomly generate sequels to popular movies! (hint, hint)

  288. Language Barriers by CustomDesigned · · Score: 1
    Some years ago, I was introduced to a man who had invented a "Zero Bandwidth Transmitter". He claimed to be able to transmit FM quality voice while using no bandwidth at all, and had a working prototype of a transmitter and receiver.

    You know, and I know, that a "zero bandwidth" transmitter makes as much sense as 1 + 1 == 3 (or, for that matter, compression of "random" data). For this reason, despite a working prototype, the poor man had been unable to obtain a patent for his invention, despite 10 years of trying. (The Patent Office seems to be a lot looser when it comes to software.) He was very bitter and convinced that everyone else in the world was an idiot.

    However, when the invention was described to me, it turns out that by "zero bandwidth", he meant "undetectable by FCC compliance measuring equipment", and that what he had really invented was a "Spread Spectrum" transmitter! What a sad story. Someone else got the patent because they could communicate it better.

    So, even though compression of "random" data is mathematical nonsense, it is likely that "random" is not being used in the standard matematical sense, but in the Marketroid sense - and the new compression algorithm might actually be useful.

  289. More to the point by Anonymous Coward · · Score: 0
    The 'practically' bit of practically random doesn't matter if you know anything about information theory. For a given class of documents (english text, computer code, etc), there is a precise amount of entropy. If the class is only an approximation, then so is the measurement of the entropy.

    Anyway, cryptographers, known for being pessimists in measurements of entropy, say that there is 2-3 bits of entropy per word in english text. So I'd say that there is at least bit as an absolute minimum.

    Now if we be generous and call each word 7 characters of 7 bits each, we find that every 50 bits of english text contains at least 2 bits of randomness.

    Wich means that it is mathematically impossible to compress 'random' strings of english text by more that 25 to one. Note that I've been extreamly conservative in this estimate.

    Mathematics once proven is always true. If they really found a way around it, they would be able to explain which assumption by Shannon they avoid - not just which result.

  290. Tipoff to the BS by Lulu+of+the+Lotus-Ea · · Score: 1

    There are a number of ways that the ZeoSync press release tips its hand as the nonsense it is. One just needs to read carefully.

    Rhetorically, the reference to unnamed "experts" from Harvard, MIT, Berkeley, etc. is quite telling. If someone at those places had genuinely done this research, they would be named with credentials. The absense of that speaks loudly. I suspect the actual collaboration amounts to some former undergraduate of those schools calling a professor and asking some innane question ("Hi Dr. Jones, what do you think of Claude Shannon?").

    But even more telling than the rhetorical lacunae are the ten-dollar words they include to try to wow the reader. In an allegedly lossless compression algorithm, the release brags about advancements of fractal, wavelet, FFT etc. techniques... in other words, a bunch of LOSSY compression techniques. Put simply, if you are happy with lossy compression (which you often are, but there is a clear difference), you can get whatever compression ratio you want (at the cost of correspondingly reduced fidelity).

    So the ZeoSync claim is either directly false, or it is about lossy compression, and is worth a big yawn.

    1. Re:Tipoff to the BS by pointym5 · · Score: 1

      They are named! Look around the website for the "org chart".

  291. The NSA has been doing this for years! by thehunger · · Score: 1

    In fact, they've been zipping all of the Internet content for years with their Echelon technology. Every e-mail, webpage, Slashdot post etc. is currently stored on a half-full CDROM that every NSA employee carries a copy of.

  292. Get Rich Quick by grubert · · Score: 1

    That's what I'm wondering; isn't it a little late for someone to get rich quick with an vapourware IPO ?

    What the heck else is the technobabble B.S. good for ?

  293. BELGIAN INVENTS ALGORYTHM 5DVD-ONE CD by Anonymous Coward · · Score: 0

    Hi,

    I've ran across some article discussing a Belgian Inventor wich has invented some hardware/software solution that enables 5 FULL-DVD's to be put on ONE SINGLE CD. No ripping required. Unbelievable ? YES Until now i did not find anything related to this. Neither by doing searches on the inventor () his name nor on the name of the technology (DCS).

    - - - BLURB Translated from a Belgian/Flemish Newspaper on-line archives - - -

    Antwerp citizen invents new digital compression technique. 02/11/2001 spi - belga

    BRUSSEL - Guillaume Defossé has developped a lossless compression system with wich he can record five dvd's onto one single 650MB CDROM. The inventor has called this system DGS. It is possible to record a 30 minute television fragment onto one single floppy disk. Defosse is a composer living in Antwerp/Belgium, he also studied electronics.

    URL : // registration needed - dutch // https://www.standaard.be/Archief/zoeken/DetailNew. asp?articleID=DMF02112001_010&trefwoord=dvd

  294. from the Compression FAQ by SparkMan · · Score: 1

    from the Compression FAQ at:
    http://www.faqs.org/faqs/compression-faq/part1/
    http://www.faqs.org/faqs/compression-faq/
    .

    9.1 Introduction

    It is mathematically impossible to create a program compressing without loss
    *all* files by at least one bit (see below and also item 73 in part 2 of this
    FAQ). Yet from time to time some people claim to have invented a new algorithm
    for doing so. Such algorithms are claimed to compress random data and to be
    applicable recursively, that is, applying the compressor to the compressed
    output of the previous run, possibly multiple times. Fantastic compression
    ratios of over 100:1 on random data are claimed to be actually obtained.

    Such claims inevitably generate a lot of activity on comp.compression, which
    can last for several months. Large bursts of activity were generated by WEB
    Technologies and by Jules Gilbert. Premier Research Corporation (with a
    compressor called MINC) made only a brief appearance but came back later with a
    Web page at http://www.pacminc.com. The Hyper Space method invented by David
    C. James is another contender with a patent obtained in July 96. Another large
    burst occured in Dec 97 and Jan 98: Matthew Burch applied
    for a patent in Dec 97, but publicly admitted a few days later that his method
    was flawed; he then posted several dozen messages in a few days about another
    magic method based on primes, and again ended up admitting that his new method
    was flawed. (Usually people disappear from comp.compression and appear again 6
    months or a year later, rather than admitting their error.)

    Other people have also claimed incredible compression ratios, but the programs
    (OWS, WIC) were quickly shown to be fake (not compressing at all). This topic
    is covered in item 10 of this FAQ.

    ...

    A common flaw in the algorithms claimed to compress all files is to assume that
    arbitrary bit strings can be sent to the decompressor without actually
    transmitting their bit length. If the decompressor needs such bit lengths
    to decode the data (when the bit strings do not form a prefix code), the
    number of bits needed to encode those lengths must be taken into account
    in the total size of the compressed data.

    Another common (but still incorrect) argument is to assume that for any file,
    some still to be discovered algorithm might find a seed for a pseudo-random
    number generator which would actually generate the whole sequence of bytes
    contained in the file. However this idea still fails to take into account the
    counting argument. For example, if the seed is limited to 64 bits, this
    algorithm can generate at most 2^64 different files, and thus is unable to
    compress *all* files longer than 8 bytes. For more details about this
    "magic function theory", see http://www.dogma.net/markn/FAQ.html#Q19

    ...

    So far no one has accepted this challenge (for good reasons).

    Mike Goldman makes another offer:

    I will attach a prize of $5,000 to anyone who successfully meets this
    challenge. First, the contestant will tell me HOW LONG of a data file to
    generate. Second, I will generate the data file, and send it to the
    contestant. Last, the contestant will send me a decompressor and a
    compressed file, which will together total in size less than the original
    data file, and which will be able to restore the compressed file to the
    original state.

    With this offer, you can tune your algorithm to my data. You tell me the
    parameters of size in advance. All I get to do is arrange the bits within
    my file according to the dictates of my whim. As a processing fee, I will
    require an advance deposit of $100 from any contestant. This deposit is
    100% refundable if you meet the challenge.

    ...

    --

    -- laws are the opinions of politicians --

  295. Re:What's this... Numbers? by Anonymous Coward · · Score: 0

    Best comment so far on this article. The rest of you need to stop masturbating over compression theory.

  296. U6H! Use writing magic, no picture. Smaller data. by U6H! · · Score: 1

    Maybe they just discovered written language, or started writing their tribal history on hides instead of stone tablets.

  297. it's sad that companies get away with this... by deviator · · Score: 1

    Another company tried to pull this one about ten years ago. They claimed (I believe) a 12 to 1 compression on "random" data, and you could "recompress" that data stream as many times as you wanted until it was less than 4k.

    uh-huh.

    Given that this really IS mathematically impossible, and people have tried for years to figure out ways around it, it's just another company trying to sell snake oil to investors. It's too bad this stuff makes it to slashdot and to the media in general, because the company doesn't deserve the attention.

  298. Forum Compression by mckirkus · · Score: 1

    Is there any way to compress the forums so that everybody with a witty idea that's been posted by 100 other people will have their posts all condensed down into one post. Would save us all a lot of time. Example below :) Here's a new one: What if you compressed something down to 10:1 then compressed that down to 10:1 etc. etc. until it was compressed down to one byte! Ha! God I'm witty!

  299. Re:In this house we obey the 2nd law of thermodyna by Anonymous Coward · · Score: 0

    As the author of secondlaw.com and his related sites takes great pains to point out, "information entropy" is an unrelated concept to thermodynamical entropy. This has nothing to do with the 2nd law.

  300. Ahem, Off-site storage! by spookyfluke · · Score: 1

    "Back in the days of non-quantum computing everyone thought we were bullshiting them!" -- CEO, ZeroSpace

    --
    you.bases.each{|base|base.are_belong_to=us}
  301. Well... by koali · · Score: 1
    To sum things up, simple maths tells us that no compression algorithm can compress all files of length n by at least a bit. Having an algorithm like that would yield infinite compression... there's no demostration needed there, that just doesn't make sense.

    Second, Information Theory says that you cannot compress data of n-bits of entropy to less than n bits. Data is said to be 'random' if it is n bits long and it has n bits of entropy (that is not accurate, I know).

    You can cheat and invent an algorithm that compresses *1* random string of data to a byte and adds one byte to the rest, so you can undo that transformation easily. There you are, you have an algorithm that compresses random data!

    The compression faq (and I guess that Kolmogorov says so, but I don't know) evades those tricks asking for compressed size+decompressor must be less than the uncompressed size.

    The only point I can find there is, what you call random data. Suppose a text encoded in ASCII in bytes. That is supposed to have a low entropy. Now take that data in 9-bit chunks and measure entropy. It will be higher. Now, entropy and randomness depend of how you look at data.

    This takes us to what's a random file. I'm sure that the guy with the compression faq challenge would give you a *very* random file, with a really even distribution of characters, little repeated sequences, no long streams of 77's, etc. Is that a random file...? It has been doctored. In fact, if you count how many possible files he could give you, it would be less than all the possible files of that length, therefore, you wouldn't need as many bits to represent them all.

    The problem is (and I'm sure someone who knows more combinatorics than me), is that it mustn't work pretty well... I'd say that the compression must be less than one bit (intuitive reasoning)...

    1. Re:Well... by koali · · Score: 1

      Oops, I forgot. Another problem of the last point is that the compression/decompression would be unbearably slow of very heavy in memory, at least with the methods I can think of

  302. it depends on the application! by dollargonzo · · Score: 1

    well...so much for all that technobabble on their site, and it obviously being some sort of hoax, scam, something to impress mommy, etc. or whatever yuo want to call it.

    the point is, that it ALL depends on what the application is. for example, sometimes yuo don't care about the order of the data yuo get back, yuo just want to the same discrete chunks yuo started with. and yuo sometimes also don't care how long it takes to COMPRESS, as long as uncompression is fast. for example, in the case of hardware testing, yuo usually want to test a bunch of input to make sure yuo have the correct output. if yuo are testing this capability, then yuo DONT care how long it takes to compress, yuo only need to do it once, what yuo care about is the QUICK transmission of the compressed data and uncompressing it as fast as possible.

    so what yuo do in this case, is divide the data into chunks of the test codes, and try to solve the TSP of the test codes as coordinates of an N-dimensional space where N=# of bits in code. although solving TSP for a large number of codes is impractical, yuo can use, say, a kohonen self organizing neural net to approximate. and at every step of the way, yuo would have data that can be compressed via a running length compression if yuo express the differences between the test codes (that is what the TSP is for). the longer yuo wait, the better the compression is. at least it is only O(x) and not anymore (not including actual the updating time for each node, which is neglible anyway compared to the actual solving of the problem).

    getting it back is trivial, just add all the differences one by one to get the test codes back in some order and feed them to the hardware to be tested. works, well, doesnt it??? its more than 100:1 if yuo wait long enough. and it is very practical for the application at hand. however, it is clearly impractical for every day use. so...can this BS company claim what they have done is real?

    SURE! if they dont mention the details of what they are doing...

    QED

    --
    BSD is for people who love UNIX. Linux is for those who hate Microsoft.
    1. Re:it depends on the application! by dollargonzo · · Score: 1

      also...if yuo DO need the order back, since the number of chunks is finite, yuo just add some extra data on the end to give the order of the chunks as extra data...it wont change the size of the compressed data THAT much. the larger the size(number of chunks) in fact, the less the extra order chunk will actually make a difference. so, for a 1GB file, 1k chunks, that is 10^6 chunks, that is 20bits * 10^6 chunks that is 20 extra MB.
      although for 100:1 compression that is twice the compressed data, the total compression is STILL 30:1 which is QUITE a bit better than 10:1...again the bigger the data (optimize, actually for chunk size, # of chunks) the more the compression.

      this could be used for MOVIES!! (yay)...take a while to compress the stuff yuo want, but then the tranfer is fast, and uncompression is fast. the only downside of this method is that yuo can't stream (SORRY!) but otherwise, for providing full movies to ppl, this would work really well.

      QED

      --
      BSD is for people who love UNIX. Linux is for those who hate Microsoft.
    2. Re:it depends on the application! by dollargonzo · · Score: 1

      plus, yuo CAN use it on random data...the bigger the data size, the more yuo can compress, because there would vectors closer to others.

      so, all yuor assertions about NOT being able to compress ALL data is not true.

      QED

      --
      BSD is for people who love UNIX. Linux is for those who hate Microsoft.
  303. Zeosync has next to no net presence. by Anonymous Coward · · Score: 0

    I did a search on Google and I got exactly three hits. Their own sight twice and the Reuters article.

    It's like they just popped in out of nowhere with a unbelievable technology looking for investors/suckers.

  304. Confirmed with my Polish speaking coworkers by Ewann · · Score: 3, Informative

    We have three native Polish speakers in my office. I asked one of them to translate the professor's reply. She said the gist of it is that he was upset they released his name, he didn't authorize any information release, etc. Apparently didn't deny or confirm the truth of the information but said something about having "more important things in my career" or something like that (not verbatim quote).

    1. Re:Confirmed with my Polish speaking coworkers by King+Babar · · Score: 2
      We have three native Polish speakers in my office. I asked one of them to translate the professor's reply. She said the gist of it is that he was upset they released his name, he didn't authorize any information release, etc.

      Wow; that's what it felt like to me. I feel my "random people who returned email queries" now has some support from native speakers.

      Now this has to be the beauty of Usenet; working from isolated keywords and the power of google (tm), you could follow what appears to be a scam from press announcement to debunking in a couple of hours, despite the fact that the smoking gun was in a Polish math specialty news group and had to be translated by a third party...

      Someday, this kind of thing will save people some real money. :-)

      --

      Babar

  305. Would this work? by Docrates · · Score: 2

    I know I'm posting late, but I hope someone reads this and comments.

    I've had this recurring thought in my head regarding compression that I haven't been able to prove/disprove.

    Disclaimer: I know absolutely nothing about compression other than what commons sense tells me.

    Now for my theory: Is it possible to make an analysis of a whole lot of data from a whole lot of sources for certain period of time. Let's say I log every single bit of data that comes and goes from, say, AOL's network. I then run an analysis of the data and come up with, say, the 5 million most used 8-byte strings. You probably want to play with the string sizes and number of strings to see what makes mathematical sense. You then keep a copy of the 40MB indexed string dbase on every internet node, or at each end of a slow link, or whatever, and then run all incoming and outgoing data through a program that trnaslates index references with actual data.

    Would that work? since a 5 million entry index requires a 3 byte key to acces an 8 byte string, would I get a 3/8 lossless compression on top of whatever's in place right now, whenever I hit an indexed string?

    --

    There are two kinds of people in the world: Those with good memory.
    1. Re:Would this work? by recursiv · · Score: 2

      The problem is that this scheme would have to differentiate between the indices and the rest of the raw data. So, let's say, for each "block", you either have a 3 byte index or a 8 bytes of raw data. But you also need at least one bit of header information do determine what it is. I think you would lose most if not all of your gains on this one bit.

      --
      I used to bulls-eye womp-rats in my pants
  306. Re:Unbreakable encryption by chipuni · · Score: 2

    Truly unbreakable encryption has existed for many years: the one-time pad . The problems of unbreakable encryption aren't the theory, but the practice. (If you want truly secure communications among n people who each transmit x bytes of data through the group each day, how will you securely generate n*(n-1)*x bytes of random data each day, and securely distribute it to each of them?)

    --
    Never play leapfrog with a unicorn. Or a juggernaut.
  307. Dosen't anyone read the link? by 42forty-two42 · · Score: 1
    ZeoSync said its scientific team had succeeded on a small scale in compressing random information sequences in such a way as to allow the same data to be compressed more than 100 times over

    It means you can compress it 100 times without data loss.

    gzip file;mv file.gz file

    lather, rinse, repeat

  308. The best compression. by Sivar · · Score: 1

    It is theoretically possible to get 100:1 or better compression. Assign a number to every file that has ever existed. It would surely take less than 64 bits to represent the number of files out there, and CERTAINLY less than 128.
    Now, put every one of those files into the compressor (compressed if you like) and index them with numbers.

    The compressed file would simply have a number or numbers of the files within. Even a full debian installation wouldn't exceed a few MB if even that.
    The decompressor would take even more space than WindowsXP, and this would not work for newly created files, but it gets theoretical possibility out of the way. Now for practical possibility...

    --
    Computer Science is no more about computers than astronomy is about telescopes. --E. W. Dijkstra
  309. They know what they're talking about... by Proaxiom · · Score: 1
    One thing to note is the composition of the team that is making these claims.

    They have academics from all over the US and Europe. One notable is Dr. Steve Smale, Professor Emeritus at UC Berkeley and 1966 Field's Medal winner.

    We are talking about a team of brilliant mathematicians here. If they think this is possible, it deserves at the very least serious consideration.

    Whether their ideas will come to fruition and withstand peer scrutiny is another thing. But to claim they don't know what they are talking about is a long stretch.

  310. pure BS by parasite · · Score: 0


    The most obvious problem with this ?

    Well if you can compress completely random data 1:100, then obviously there is nothing preventing you from RECOMPRESSING the compressed data over... And thus obvious run into some VERY serious PROBLEMS!!!!!!!!!!!!!! Because that would mean an infinite compression ratio, the universe down to a single BIT-- though there might be a minimum size so it would be the universe in 50k--but nonetheless completely absurd.

  311. infinite amplifier by Proud+Geek · · Score: 2

    I have an infinite amplifier; I can sell it to you now. It has infinite gain, and infinite input impedence. Unfortunately, it has to rely on real power supplies, since I do not have an ideal power supply. Funny thing is, it always outputs the rail voltage.

    --

    Even Slashdot wants to hide some things

  312. Pigeonhole Principle by sulli · · Score: 2

    Maybe it's another implementation of RFC1149?

    --

    sulli
    RTFJ.
  313. Fifty years? by jim · · Score: 1

    So Huffman compression has been an industry standard for 50 years has it?

    @Article{huffman-1952,
    author = {David A. Huffman},
    title = {A method for the construction of minimum-redundancy codes},
    journal = "Proceedings of the IRE",
    year = {1952},
    volume = {40},
    pages = {1098--1101},
    }

    That's pretty bloody quick uptake on the part of the industry, then.

    --
    -- Arm yourself when the Frog God smiles.
  314. Trying to defraud retirees by CITAnonymous · · Score: 1

    Notice where this company is based, West Palm Beach, Fl.

    That's prime retirement community. Lots and lots of seniors without enough technical knowledge to know that they are full of crap.

    They're hoping that their website and technomumbo will convince some old people to give them their money.

  315. What about decompression? by Anonymous Coward · · Score: 0

    I noticed they never talked about a decompressor in their press release... I suppose they are still working on that. I seem to remember a similar story, where someone had achieved 100% compression in a compressor, and even released source code. This company is lagging far behind what the open source communtity can provide (again though, the open source communtity is still working on a decompressor)

  316. Re:Fractal Compression? I don't think so... by moogla · · Score: 1

    The wonders of fractal compression are a "dirty lie" of compression techniques. It happens to work well for classes of images with natural (self-repeating) subjects. Furthermore, it takes forever to find the right algorithm, and sometimes, the parameters/algorithm description is very large. Of course, sometime, you can never find the right algorithm to produce what you want.

    --
    Black holes are where the Matrix raised SIGFPE
  317. Perpetual motion machines .... by Anonymous Coward · · Score: 0

    I have three words for them: Pigeion hole principle.

    It looks like somebody at ZeoSync changed their major from computer science to marketing.

  318. "Magic functions" don't work by yerricde · · Score: 1

    So if you define "practically random data" as "data that is random enough for practical purposes," you can compress it by storing the random seed and the string length. ;-)

    However, the seed may be almost as long as the string itself, if not longer. In the worst case, you're expanding the string by the 48-bit integer necessary to hold the string length.

    --
    Will I retire or break 10K?
  319. Re:Not possible - read article carefully! by MobiusKlein · · Score: 2, Interesting

    If you read the Reuters article carefully, it does not say a digital -> digital compression of 1:100, but implies a better way of encoding / compressing digital -> analog -> digital, with the analog bandwidth being much greater than today.

    Thats all the stuff where they talk about Dr. Claude Shannon and information theory. (They could have been clearer about it, but that's PR flacks for you.)

    examine the quote
    '"What we've developed is a new plateau in communications theory," St. George said. "We are expecting to produce the enormous capacity of analog signaling, with the benefit of the noise-free integrity of digital communications."'

    Sounds like they are trying to shove more data into an analog stream, using wacky math, than would normally be allowed.

    rbb

  320. The claim is FALSE! by rew · · Score: 2

    Without reading their website, the claim MUST BE FALSE.

    The proof is simple.

    Suppose we have a 100 bit message. There are 2^100 different messages. Suppose you can compress them on average to 98 bits. Then there can only be 2^98 compressed messages. We lost a couple along the way!

    This proves that if you compress SOME messages you will also have to make SOME longer. Not by much, but at least a little. (prepend 1 if "not compressable" prepend zero to the "compressed data stream" and you have a "worst case expansion" of "one bit")

    Now compressing normal data is easy. There are a lot of repeats, and other redundancy. So the normal case is that you can compress them. The bad news is that if you enumerate ALL 100-bit messages, ALL compression methods are going to need on average 100 bits or more. This is pure mathematics.

    The 2^100 number is a number that is quite large, but if you start talking about compressing a megabyte of data, then I'm already talking about enumerating all 2^8000000 possible messages. That is a thought experiment. But the argument still holds.

    -----

    I read their pressrelease. It's buzzword compliant bovine excrement. They will attract money and pay the existing people large salaries as long as they
    can keep up the charade.

    Oh, and they have placed a tactical "practically" in front of the word "random". I can compress "practically random data" by enormous amounts.

    If you take the MD5 hash of the string "hi there", and feed that back into the MD5 function, you can generate an endless stream of "practically random" data. Take the first 1Mb of this "practically random" data.

    I compressed 1Mbyte of data into the 212 bytes of the previous paragraph! However this is not possible if I let someone else generate the random data any way he pleases, and then have to compress it. They can claim to be "technically correct" up to a point due to this phenomenon....

    Roger.

  321. You all miss the point by Anonymous Coward · · Score: 0

    You are all thinking like bitheads. You need to step outside the box and think less logically, and more abstractly.

    Perhaps this TunerAcceleratorTM technology is a scam. Assuming it's not, let's examine the idea of mathematical non-repeating representations.

    Mathematics can represent any number (and approach the representation of irrational and imaginaries) by any infinite combination of other numbers and operators.

    5 + 20 = 25
    5 * 5 = 25
    100 / 4 = 25
    5^2 = 25
    et cetera

    These numbers are simply symbols and we all understand that complex datastreams cannot be represented symbolically, right?

    Wrong. Fractal geometry can -- rather a variant of fractals can -- probably something in the ballpark of true complexity, where "chaos" is reduced to an "equation" fired fed a set of "initial conditions". [Quotes because I'm just borrowing from the language of mathematics to relate to complexity theory].

    Example: DNA. You are a giant decompressed fractal. In fact, just about everything on this planet is a giant, decompressed fractal.

    Neither of my analogies are perfect, but I hope they get the point across. The universe, strange as it may seem, it not represented by 1s and 0s. Data, funny as it may sound, is not all digital.

    -k
    fear@fearstudios.com

    1. Re:You all miss the point by pointym5 · · Score: 1
      Data, funny as it may sound, is not all digital.


      Fine. But guess what? All the input to their compressor is digital. I invite you to describe a general, practical way of picking a fractal system or polynomial or whatever that will regenerate a given 128Kb string.
  322. How fractal transform image coding works by yerricde · · Score: 2

    Sounds like fractal compression to me.

    The fractal transform that Barnsley's products use is merely vector quantization, mapping each 8x8 pixel block of an image onto a 4x4 pixel block of a reduced version of itself, plus an RGB offset for DC. It begins to converge to the desired image after a few iterations of the transform.

    --
    Will I retire or break 10K?
  323. That's how RLE works by yerricde · · Score: 1

    Actually, if you change the domain you can get what appears to be impressive compression. Consider a bitmapped picture of a child's line drawing of a house. Replace that by a description of the drawing commands. Of course you have not violated Shannon's theorem because the amount of information in the original drawing is actually low.

    And if you manage to express all drawing commands in terms of "draw a horizontal line," you've re-invented run-length encoding that MacPaint and PCX files have been using for ages.

    --
    Will I retire or break 10K?
    1. Re:That's how RLE works by darksaber · · Score: 1

      Actually, RLE is different. RLE is when the file says X is repeated Y times as the basic way to compress. You are right about the vector format though.

  324. the claims by fw3 · · Score: 1
    much as this sure sounds like snake oil and I won't be buying any stock in this puppy:

    their claim is for 10:1 on something 'random' and short. And for ca. 10 x better sometime in the future.

    10:1 on the 'average' traffic that passes 'net channels or stores on disk would be a surprising thing. I don't think it's the level of impossible that /. consensus is hanging on it.

    My guess is .. If it's real it's something that needs to be implemented in hardware to be fast.

    certainly all the existing algorithms suck plenty of cpu cycles. If there's a solution here that's that much more space-efficient I very much doubt it's gonna be time-efficient.

    or maybe it's just smoke & mirrors

    fw

    --
    Linux is Linux, if One need clarify their dist: <Dist>/GNU Linux
    bsds are of course just BSD
  325. perpetual motion of the information age by markj02 · · Score: 2

    These kinds of compression claims are the perpetual motion machine of the information age. Actually, they are less plausible than perpetual motion. For perpetual motion, there is at least the (very remote) possibility that there is some kind of undiscovered physics. Impossibility statements in compression only hinge on mathematics, with no physics or experiments needed.

  326. ZeoSync invents cool new RMG Compression by Anonymous Coward · · Score: 0

    (Random Marketing Compression): The meaning of which is completely lost because we've dropped lots of big names and wowed you with random buzzwords and incorrect explanations.
    My absolute favorite was the explanation of the Pigeon hole Principle. These people are claiming up to 100:1 on a *RANDOM* number. The fallacy here is that compressed data is not random. It is incredably deliberate. It might seem random statistically, but it is indeed a very compact representation of other data. So trying to create random sequences does no good. Once data is statistically compressed or compressed by pattern, the compressed data should have no redundancy. Any redundancy left over in the compressed data is really just evidence that the compression algorithm used in the first case didn't recognize a higher level pattern in the data and encode it efficiently.
    The pigeon hole principle is used in the classic proof that there is no way to reduce ALL messages of size X to size Y, where Y X. That doesn't mean that there isn't a good method for reducing a small subset of X, though.
    I propose that there is some subset of messages of size X that can be compressed to size = Y, but that the compression ratio depends on the sizes of # of messages of size Y : # of messages of size X, and indeed the obvious compression method would be to make a look up table of size able to contain Y containing messages of size X.
    So for example if a compressor compresses 1:10 in bytes, and we apply the method to strings of length 300. Then there is only 256^30 strings of the 256^300 strings represented. It seems like we can only represent 1/256^270th of these messages that way.

    Heh, and I used to hate CS Theory in college.

  327. Checkout the press release, and the WHOIS records by moogla · · Score: 1

    I'm betting dollars-to-donuts Peter St. George is the only one who works at ZeoSync, and he wants your gullible-ass money.

    --
    Black holes are where the Matrix raised SIGFPE
  328. Do not under-estimate complexity by littleRedFriend · · Score: 0

    Say the human genome consists of 3.000.000.000 basepairs. There are 4 different basepairs (A,T,G and C). 5% of this is coding for protein. 1.500.000.000 basepairs. So a 1.5 Gb file is enough to encode an entire human being. I don't think 100:1 compression ratio is a lot.

    --
    IANAL, but imagine a beowulf cluster of in Soviet Russia all your belong are base to us welcoming the new SCO overlords.
    1. Re:Do not under-estimate complexity by markmoss · · Score: 2

      1.500.000.000 basepairs [are all the DNA coding for protein in the human genome]. So a 1.5 Gb file is enough to encode an entire human being.

      The protein-coding chromosomal DNA is very, very far from encoding an entire human being. You've also got the DNA that controls which proteins are expressed (some unknown portion of that other 95% of the chromosomes), mitochondrial DNA, environmental effects during your whole life, and most of all some billions of neurons, each with up to a hundred semi-randomly connections to other neurons. No one yet has come anywhere near to giving a computer the equivalent of the life experience stored in your neurons. (Or at least my neurons -- some people never learn...)

    2. Re:Do not under-estimate complexity by littleRedFriend · · Score: 0

      Well, I'm afraid that I do not agree with you. Not even from a technical point of view: the regulatory parts (enhancer, promoters) do not take up the rest of the genome (see pufferfish genome, which is a compact version of the mammalian one). It can even be in that 5%, which was a very high estimate for the protein coding part.

      My point was that it doesn't take a lot to make extremely complex things. Even a bacteria that has a genome 1/100 the size of the human one is pretty complex.

      And by the way don't believe the people that say that environment is an extremely important factor in human complexity. Like it or not, genetics is contributing far more. Chimpansees are not restrained from being able to read, because they grow up in a forest. Chimpansees share 99% of our genetic material. That is what I call efficient compression of information.

      --
      IANAL, but imagine a beowulf cluster of in Soviet Russia all your belong are base to us welcoming the new SCO overlords.
    3. Re:Do not under-estimate complexity by -brazil- · · Score: 1

      Well, the real key point is that we humans tend to think that our thoughts are more relevant and thus more complex than our physical forms. And our thoughts are definitely not encoded in our DNA and almost entirely a result of our environemnt.

      --

      The illegal we do immediately. The unconstitutional takes a little longer.
      --Henry Kissinger

    4. Re:Do not under-estimate complexity by Negadecimal · · Score: 2

      So a 1.5 Gb file is enough to encode an entire human being.

      Nope. 3M basepairs, four possible bases per pair. Takes two bits to describe four possible states, and so the unannotated sequence requires 6 billion bits of storage -> 750 billion bytes -> 715.2MB.

      And genomic sequences generally aren't very random.... telomeric sequences, satellite DNA, common promoters, copied genes -- all of them can be easily abstracted and compressed out.

      I'd expect that even with mapping annotations, the whole shebang would easily fit on a CD-ROM.

    5. Re:Do not under-estimate complexity by littleRedFriend · · Score: 0

      Yes, of course I made a little calculation error. However I could argue that you would need to encode information about the methylation of nucleotides as well. In this case you would need at least three bits per base pair, making it (pfew) about a gigabyte. bzip2-it, and yes it would fit on a CD-ROM (beats the complexity of an AOL-trial CD-ROM any time).

      --
      IANAL, but imagine a beowulf cluster of in Soviet Russia all your belong are base to us welcoming the new SCO overlords.
  329. Fred Pohl's Compression Algorithm by Anonymous Coward · · Score: 0

    I can't remember the book, but it was by Pohl. A bunch of people on a generational starship develop superhuman mental abilities -- eventually discovering FTL travel and returning to Earth quickly.

    Anyway, during the trip they send back encoded messages in a highly compressed form, using powers of primes. All you have to do is factor them to decode them, but we don't have the mathmatical skills to do so. A sample message might be:

    987823^234970213.3^8237.234

    Given that this is computationally impossible to solve unless we discover some really cool quantum computing tricks, it is useless, but conceptually you could encode any unique string of bits into a very small package.

  330. you CAN get a digit in pi w/o computing all priors by slew · · Score: 2

    It's called PSLQ lattice reduction...

    You can get the details here...

    http://www.mathsoft.com/asolve/plouffe/plouffe.h tm l

    http://www.lacim.uqam.ca/plouffe/Simon/articlepi .h tml

    Note: this goes quite a long ways to showing that conventional wisdom about pi being random digits isn't actually true... Pseudo random is more like it...

    However, it isn't really applicable to this multidimensional compression nonsense since the counting argument still applies.

    Suspiciously, this looks to be similar to what the fractal folks were pushing in the '80s if you replace gems with iterators... Every once in a while you have to change the color of your snake oil label to confuse the masses...

    -slew

  331. bunk. by Astrorunner · · Score: 1

    [quote]Although currently demonstrating its technology on very small bit strings, ZeoSync expects to overcome the existing temporal restraints of its technology and optimize its algorithms to lead to significant changes in how data is stored and transmitted.[endquote]

    The key here is its on "very small bit strings." It is significantly harder to reduce larger bit strings in this fashion.

    What it appears, from the limited documentation they have provided, is transform random data, and then use a formula to express that data. This have been proven ineffective. If you could find some sort of polynomial that would represent a file, you may indeed find polynomials shorter than the file, but at least half of them are going to be longer than the file, and more difficult to find.

    I've dabbled enough with data compression to at least be able to spot this.

    What the end result always turns out to be is that you *can* compress random data, but however, you can't always compress all random data. It may, actually, be possible to compress random data, but not in a 100:1 ratio, consistently. I've written some code that (IMHO) is a novel approach to compressing random data with moderate success, but never at 100:1.

    "You can compress all data half the time, or you can compress half the data all the time."

  332. For random input, maximal compression rate is by uriyan · · Score: 1

    Greater than 1.

    If it wasn't (i.e. it guaranteed a loss), you could take an input, run it through the compressor several time, and end up with a single byte, or even 0 bytes.

    Obviously the problem would appear at the decompression stage, since there aren't quite a lot of things you can get from decompressing a single byte.

  333. Strictly speaking... by Anonymous Coward · · Score: 0

    economics is not a Nobel Prize, it's a Bank of Sweden prize.

  334. Zeosync discovers ln -s! by Penguinoflight · · Score: 1

    The only downturn of this incredible algothorithim is that you have to keep the origional file, nuts.

    --
    "And we have seen and do testify that the Father sent the Son to be the Savior of the World"
    1 John 4:14
  335. I can do better than that by cafeteria · · Score: 1

    I once compressed a string of randomly choosen zero's with a 106:1 ratio with practically no loss of information. If I only could remember how I did it. J

  336. Of Course it works. maybe to 10000:1 by beerandbj · · Score: 1

    they mentioned not using the traditional 'redundant' data searching approach. From the description they are simply looking for patterns in the bits that they can generate mathematically.
    If the signal is wave related then i'm sure they will find lots.

    Here is sample 'C' code to illustrate a 1000000000:1 compression of random data.

    #include
    #include

    int main(int argc, char *argv[])
    {
    int i;

    srand(atoi(argv[1]));

    for (i=0; i4000000000; i++)
    print("%c", rand()%256);

    return 0;
    }

    As you can see you simply have to supply a 4byte number, and you can generate a 4GB file.

    If you first generate a 4GB file in this manner, then call it 'practically random'. Then run an algorithm that compares it with the sequences starting from all possible 'seeds' - and outputs the 4byte number that matches, you have 1000000000:1 compression!!

  337. RNG testing by kallisti · · Score: 1

    The best current RNG testing suite is DIEHARD. It uses a large number of tests to make sure that the numbers are random enough for most purposes. More information about RNGS and testing them can be found at the pLab which is one of most comprehensive sites on RNGs on the net.

  338. I've got you beat by Cybrex · · Score: 1

    'BS'

    A 33% improvement over your already impressive compression!

    -Cybrex

    --
    Boundless Expansion, Self-Transformation, Dynamic Optimism, Intelligent Technology, Spontaneous Order- BEST DO IT SO!
    1. Re:I've got you beat by Anonymous Coward · · Score: 0

      uh... '0'

      I win

      Of course, 0 being null, it can be expressed with nothing at all, which means that I've compressed it infinitely.

      All I need now is some flash and buzz words and I can be rich too!

  339. Gonna take a stab at this by realdpk · · Score: 2

    ...and bet that they meant "arbitrary data" rather than "random data". After all, who would want to compress random data? What possible benefit could there be to such a thing?

    1. Re:Gonna take a stab at this by eyenot · · Score: 1

      i don't know. they did say in plain english, "random data". all they've really promised is that for any given sequence of randomly generated bytes, they can get the seed. or so it appears to me.

      --
      "Stratigraphically the origin of agriculture and thermonuclear destruction will appear essentially simultaneous" -- Lee
  340. Cold Fusion Analogy is Wrong by billstewart · · Score: 2
    Pons and Fleischman, as near as I can tell, believed they had some interesting physics going on, though they were mistaken about quite what, and jumped to publication way prematurely. (As somebody said about their work "If it's not real, they've still invented the world's most interesting battery".)


    This is more like Usenet Crank Robert E. McElwaine who published lots of articles with his (capital-preserving) tagline "UN-altered REPRODUCTION and DISSEMINATION of this IMPORTANT Information is ENCOURAGED."


    And that may be giving them more credit than they deserve - it looks like a compression algorithm designed for use on digital wallets....

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  341. Devil's Advocate by mprinkey · · Score: 1

    OK, I admit it. It is very easy to crucify these guys. Even if they have something interesting, they deserve bashing because of the foolish PR job. But, is it *impossible* that there is something to this? I am not implying that Information Theory is in peril. Certainly not. But, as a scientist, I have to be objective enough to accept that they might have something.

    I took a quick look at Prof. Smale's publications. He has been working recently in complexity theory and other related fields. If he is actively involved, there may be something worthwhile here.

    One of the most interesting things about complexity theory, fractals, and cellular automata is the incredible amount of detail that can be evidenced by "simple" systems. See some of Steve Wolfram's work and his upcoming book for more on this. One of still-born technologies of image compression was based on fractals. The self-symmetry principle was able to accurately capture image details with only a handful of data passed to generator functions. In fact, these compression techniques could even produce "more" detail (zooming in) than the orginal image possessed by extrapolating these generators further.

    Of course, at first blush, this seems foolishness too. How can there be *more* detail after compression!? But, the essential fact is that natural "detail" is not as random as we might think. In still images, textures are often non-repeating but still highly correlated in some sense. In moving images, frame-by-frame correlation is typically very high. In executable code, only a small fraction of all possible arrangements of bytes can actually be executed. Perhaps Shannon's only weakness is believing too strongly in absolute randomness. Information Theoretic calculations leverage the pure statistical nature of the data stream to make calculations. Practical problems don't address purely random data and in making that purely random assumption, we may be shackling ourselves to Shannon's limits unnecessarily.

    While Shannon does and will likely continue to hold, I am willing to admit that "most" data of interest contains less entropy per bit then it could. (Notable exceptions would be strongly encrypted data...encryption is *designed* to exhibit statistical randomness!) Huffman and arithmatic coding work by pattern matching techniques and allow lossless compression of many data types. JPEG uses the quantization of Discrete Cosine Transforms of 8x8 pixel blocks to compress images. MPEG uses DCT quantization will motion compensation and lots of other techniques to try to capture frame to frame correlations. All of these are practical data streams. None are obviously correllation, but none are truly random either. With fairly "simple-minded" encoding tricks, we are able to significantly reduce the size of data files (gzip), sound (mp3), images (jpeg) and video (divx). Is it not possible to build a more general mechanism with which to ferret out more hidden symmetries and thus increase compression?

    I guess I am willing to accept that there is something more to this than simple charlatanism. Perhaps these folks have come up with an effective way to leverage complexity theory to establish a general framework for the construction generator functions, etc. It would be a landmark discovery if they are able to uncover self-similariry or some other self-generation principles from various data streams. Note that I did't say random data streams.

  342. Cold Fusion? by Shostakovich · · Score: 1

    ...We are talking about a team of brilliant mathematicians here. If they think this is possible, it deserves at the very least serious consideration...to claim they don't know what they are talking about is a long stretch. Lest we forget the cold fusion fiasco that the brilliant people on both sides of the Atlantic gave us. There's a reason that you don't know the author(s) of a scientific paper, when its being refereed by other scientists. Ideas are meant to be evaluated on their own merit, or in this case lack of merit.

  343. Re:Some background reading: Shannon's Limt by natersoz · · Score: 1

    http://cm.bell-labs.com/cm/ms/what/shannonday/shan non1948.pdf

  344. How do you get Random data? by Anonymous Coward · · Score: 0

    I have been following this discussion, and had the following thought - It is not possible to have truly random data.

    No matter how you get a sequence of data, lava lamps, radio active decay, etc etc, there are always conditions that cause the data to be as it is.

    Isn't this more of a chaos theory issue? No matter what data we have, there must be something that caused it to be as it is. It is like predicting the weather. If we could model everything, then maybe we could do it, but modelling everything would be impossible, as we would have to model our modelling etc etc.

    Basically, what do others think? I believe that it is not possible to have randomn data, just data that we do not know the context from which it came, or cannot model its context completely.

    1. Re:How do you get Random data? by acid_andy · · Score: 1

      I thought of this too and posted this comment. Anyone got any thoughts on the matter?

      --
      Your ad here.
  345. if any of you actually read the press release... by Anonymous Coward · · Score: 0

    It says

    "Current technologies that enable the compression of data for transmission and storage are generally limited to compression ratios of ten-to-one. ZeoSync's Zero Space Tuner(TM) and BinaryAccelerator(TM) solutions, once fully developed, will offer compression ratios that are anticipated to approach the hundreds-to-one range."

    as you can see it says once fully developed ie they have not done it yet and it says anticipated which means that it may just be vapor speak

  346. Other breakthroughs announced... by volpe · · Score: 2

    In other news, the company which managed to remove redundancy from pure entropy also managed breaking the absolute-zero barrier. It was previously thought that you couldn't make something colder if it already had zero heat in it. But apparently this is not the case, according to ZeoSync.

  347. maybe douglas adams was right by Anonymous Coward · · Score: 0

    everything in the universe can be compressed to the following byte...

    in big endian:
    00101010

  348. This is a hoax, here's a simple proof by roodman · · Score: 1

    This false claims seems to keep resurfacing every few years. Here is a simple way to see that it cannot be true:
    Suppose we make the less amazing claim that we can compress any random file just 2:1.
    Let's consider files of just two bits in length. There are 4: "00", "01", "10", and "11".
    Let's suppose our magic compressor function is called C. Obviously, a 2 bit file must compress to only one bit. Since there are only two choices for one-bit files, C("00") must be "0" or "1". C("01") must be the other choice; this is because for compression to be lossless, no two different files can compress to the same result. (or else, how would the decompressor know which one was originally compressed???) So we have constrained the function so far to be
    (C("00") => "0" and C("01") => "1") or
    (C("00") => "1" and C("01") => "0") ...
    Now, what will happen when we try to compress C("10")? This is where the contradiction occurs. There are no other unused 1-bit files left and so the compressor cannot possibly succeed in its claim of achieving 2:1 lossless compression even for the trivial case of 2-bit files. This same counting argument can be used to formally show that it is impossible to make a general lossless compressor than can compress any more than half of all random files of a given length by even a single bit. "Real world" compressors like Zip expand the vast majority of random files -- they only happen to do well on "typical, useful" files that we use which contain less entropy than most random files. To see this for yourself, write a small program in your favorite language to make a pseudorandom file of bytes, then run it through your compressor. You will see that when you run it through PKZip, gzip, or whatever, it almost always gets bigger. (If you see compression this probably indicates a problem in your pseudo random number generator) The frauds at ZeoSync are just trying to confuse the issue by invoking technical-sounding jargon from information theory. Their crazy claims do not even stand up to the simplest analysis by counting, much less real-world testing.
    Rudi Cilibrasi

    1. Re:This is a hoax, here's a simple proof by rocca · · Score: 1

      That is over simplistic, as in your example you cannot achieve any compression as there is no way to convert 2 bits to 1 and therefore 0% compression is the best we can achieve -- which we of course know is false. Compression works consideribly better on larger data sets.

  349. "at best"? by Antaeus+Feldspar · · Score: 2


    I believe that Deborah Tannen pointed this up as a key problem in our society, as the fallacy of "false duality", the notion that because there are two differing points of view that they are both worthy of attention.



    You say that "at best, this is revolutionary" but this is like saying "I have a great plan! Everyone takes off their shoes, switches them around, and somehow everyone winds up with a bigger pair! *At best*, everyone gets bigger shoes!" Well, no, just because someone's floated the fantasy doesn't mean it's even a vague possibility. These people are selling snake oil; it can be proved at home. To entertain their fradulent notions simply because they bring them up is a mistake.

    --
    If people are to respect the law, perhaps the law should begin by respecting the people.
  350. And further still: by Myself · · Score: 2

    Then you could take the output files from this compression scheme, which would be pretty uncompressable by traditional methods, and run THEM through the very same compression scheme, and make them smaller still. Repeat ad infinitum, and reduce all the data in the universe to one small file.

    Better yet: To use your 10 bits example, feed every one of the 1024 combinations into the decompression program, and one of them is guaranteed to represent all the data in the universe. That's only a handful of combinations, we should be able to check them all before dinner. When someone decompresses the right 10-bit code, call me, since my phone number must be in the data somewhere.

  351. Other uses of the terms "mark" and "pigeon" by billstewart · · Score: 2
    Those pigeons aren't in the Eighth Dimension - they're somewhere over New Jersey


    There is a way to make compression like this work - for each string you want to compress, there's a compression program that losslessly compresses it to an arbitrarily short output string (one bit is fine...), but if the output string is N bits long, the program only works for 2**N input strings, and in general requires SIZE(INPUT) bits of program per input string (though for non-random strings, or for related strings, you can do better.) In other words, it's not useful for general-purpose compression, but you can use it for special-purpose compression - you can't design a small compression program to perfectly describe "Alice"'s or "Bob"'s appearance, but you can design a small program that outputs "Alice", "Bob", or "Somebody else".


    Similarly, with pigeons, you can play Hundred-Pigeon Monte, and attract investors to your company, or use this to attract customers for your other products, or have a big crowd on the street intently watching you play hundred-pigeon monte with your shill while a pickpocket walks around behind the crowd.

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  352. -1 misinformative by TimMann · · Score: 1

    For uniformly distributed random data (white noise), the average compression ratio has to be 1:1. The "2:1 for 8-bit random" in the parent article is silly, so it puzzles me why it's been modded up. 2:1 is a common rule of thumb for data that's typically stored in computer file systems, but that's far from random; it has lots of ASCII text, executable programs, etc.

  353. Random vs. Pseudorandom by i_am_nitrogen · · Score: 2
    ...they've developed a way to step WAY back...
    If this is the case, and this isn't a hoax (neither of which I believe to be true), then the data (as stated by other posters) is not random. No algorithm could fare well with truly random data because there really is no pattern. That said, there is no such thing as truly random data, and even /dev/urandom uses a mathematical algorithm to generate data with very little (or at least a very hard to find) pattern, but a pattern still exists. So, if the compression system figured out that algorithm, all it would need to determine is what value to seed the algorithm with, and the entire sequence can be regenerated flawlessly. It's like doing srand(36); and printing out a sequence of numbers. You'll get the same sequence every time. Basically, if their compression system knows what my house looks like, and how my video camera works, then they could take a video of my house and compress it down to a very small size since they just have to recreate my house mathemagically.

    Even more importantly, however, is that their "Technical Information" reeks so strongly of buzzwords and technobabble it's hard to read it without the urge to hold my nose. This alone discredits their entire proposition. I feel like I've just been subjected to corporate brainwashing ..er.. I mean marketing.

  354. Well by weird+mehgny · · Score: 1

    The ultimate solution by far would be a decompression algorithm that, instead of some screwed up checksum and data mapping crap, randomly generates different sets of outputs and lets the user select which is correct.

  355. Infinity:1 by gnovos · · Score: 2

    It is possible to create "Infinite" compression, but it works like the laws of quantum mechanics, i.e. you never really get what you want. Here, I'll perform an expierement:

    o I have a 1 byte file I want to send you.
    o We start by synching our wrist-watches.
    o I call you on the telephone and say "Start" and hang up.
    o You and I start counting off the seconds.
    o When the number of seconds have passed that are equal to the value of the byte, I call you back and say "Stop".

    Now you have the value of the byte given to you in two bits of information (the "start" and "stop") bits.

    Now we have an 8:2 ratio, which isn't bad. But I can do this again with a two byte file and get 8:1. I can send you ANY length of file and only consume two bytes of bandwidth... but at a terrible cost: time. Lots and lots of time.

    But if you had something like a super far away satalite where bandwith is hard to come by and time is not in short supply, it would be the answer.

    --
    "Your superior intellect is no match for our puny weapons!"
    1. Re:Infinity:1 by Anonymous Coward · · Score: 1, Insightful

      The problem is you are still transmitting the same amount of data, it's just that apart from your start and stop bits the rest of the data is in analogue, not digital, form. Why use seconds anyway, why not nanoseconds? Also you could just post them a piece of metal whose length in nanometres was precisely the same value as the bytes in the file. The problem is if you wanted to store either of these things DIGITALLY - i.e. sample the phone conversion as a WAV or take a picture of the piece of metal with a digital camera, you'd use at least as much data as the original file. Unless you scaled the size of image or length of the WAV and stored the scaling factor too - but then you'd either lose resolution and thus some of your data, or else your scaling factor would need so much memory to store that you'd need as much data as the original file again. By the way, in a sense I think you've re-invented the modem!

  356. Horizontal lines == RLE by yerricde · · Score: 1

    Actually, RLE is different [from horizontal line vector encoding of a bitmap]. RLE is when the file says X is repeated Y times as the basic way to compress.

    Doesn't "a pixel colored X is repeated Y times" sound like "draw a horizontal 1-pixel-wide line in color X from the current position, Y pixels to the right"?

    --
    Will I retire or break 10K?
    1. Re:Horizontal lines == RLE by Anonymous Coward · · Score: 0

      Hey man, what if that little pink thing licked you?

  357. what about the patent by eyenot · · Score: 1

    i would be much more worried about their patent than their technique.

    why worry? i don't know; i guess because you can get away with patenting one-click-purchasing and a-book-on-a-disk. those patent clerks wouldn't know obvious if it was spraypainted in blood on the side of harsh goatsex.

    my gut tells me they are trying to patent the idea that you can derive a seed (for an algorhythm) from a string of bytes -- and then turn around, seed the algorhythm and get your bytes back.

    pretty damn obvious, right? but if you also say you're doing it "to cause compression", and you also shroud it in a fit of higher function theoretical math theories, and viola. if you're hyper-slick you can even get away with charging royalties on every inclusion of rnd(). at least until some nerds get jobs as patent clerks, lawyers, and justices, and show up in high orders on jury duty. that could take decadess, meanwhile they'll be living the high life and stowing it all away in corporate sponsorships or wtfe.

    --
    "Stratigraphically the origin of agriculture and thermonuclear destruction will appear essentially simultaneous" -- Lee
  358. Wondering by loraksus · · Score: 2

    I'm not a math person by any means (still doing college algebra, which pretty much means everybody has a better understanding of math than I do), and I would appreciate people picking this apart.

    So, my idea for a "Kick Ass Compression"

    Take a block of data - throw it against an algorithm that outputs a specific value ( I'm thinking of CRC, MD5 hash or what not), do that several times against several different algorithm which generate a similar kind of value. Record the two (or more) values, then encapsulate the small block of data into larger blocks - I'm thinking only 3 or 4 levels of encapsulation would be needed (because if you calculated the crc of the entire file, a program could decide which choice (in decoding a "block" if there are multiple ones, which I'm fairly sure there will be) is correct.

    Now people use md5 hashes/crc checks to verify whether the file they downloaded hasn't been modified, so I'm assuming that it is fairly difficult to get the exact value (especially with a known size). Using this "property" (I'm not sure if that is a correct word) you could decode the data into one of several (hundred??thousand??) byte streams (possibilities of uncompressed data) and by comparing byte streams between algorithm A and B, the byte streams would match at one (would it be possible to have more? I suppose it depends on the algorithms used) point, which would be the proper "uncompressed" (rather derived or something) data.

    I'm pretty sure it would take a shitload of computing power in decompressing, but computers are fairly fast nowadays, and I think that this could be a viable at some point. 100:1 probably not, and there would be a lower limit imposed on the file size based on the possible choices (I think the possible choices would reach a pretty large number pretty fast)

    Maybe I'm just plain wrong - but could something like this be useable? Any abuse would be appreciated :)
    Thanks!

    --
    1q2w3e4r5t6y7u8i9o0pqawsedrftgthyjukilo;p'azsxdcfv gbhnjmk,l.;/
  359. Um, No by glyph42 · · Score: 1

    Why do people ignore mathematically sound proofs? This has been proven to be impossible in many ways. Hello? Earth to morons?

    --
    Music speeds up when you yawn, but does not change pitch.
  360. Warning, Nitpick ahead. by gaudior · · Score: 1
    42 is the answer to the Ultimate Question, of Life, the Univers, and everything.

    This is quite different than ...the meaning of life...

    I like to keep things clear. ;-)

  361. Why is this on Slashdot? by glyph42 · · Score: 1

    These are methods that are derived from classical physics and statistical mechanics and quantum theory, and at the highest level, this mathematical breakthrough has enabled two classical scientific methods to be improved, Huffman Compression and Arithmetic Compression, both industry standards for the past fifty years.

    Haha! Arithmetic encoding *is* the improvement to Huffman encoding. Arithmetic encoding is mathematically perfect. Its ratio cannot be improved. Only its speed / memory use during encoding could be improved. This is obviously tripe, since they don't even realize that AC is just the back-end to the compressor, and the front-end (the model) is what can be improved. Why do these 100:1 lossless compression stories even make Slashdot?

    --
    Music speeds up when you yawn, but does not change pitch.
  362. Time by Yeroc · · Score: 1

    I won't go into whether the compression ratios claimed are possible or not as this has been thrashed to death already. The one thing they do admit is that their technology is extremely compute-intensive. It appears they've only been able to run their algorithms on very small data sets (running on very large computers) due to this problem. Whether this technology proves to be a complete farce or not it doesn't appear that it will ever be practical for live streaming at any rate. I know some oil companies that wouldn't mind finding a way to compress all their seismic data though....

  363. My new compression scheme by Joe+U · · Score: 1

    Works by redifining the word byte to equal 2 trillion bits.

    Thus, my entire hard drive fits in under 1K.

  364. Sure it compresses random data at 100:1! by calags · · Score: 1

    And they claim it to be lossless... But if the data is truly random would it matter if you lose some when uncompressing?

    --
    Never attribute to stupidity what can be construed as a monopoly preservation tactic.
  365. Compressed data as a source of entropy? by Xenophon+Fenderson, · · Score: 1

    This whole discussion reminded me of an idea I had the last time compression made Slashdot's front page. If you compressed a file and threw away the dictionary/hash, so all you had was the compressed data stream, couldn't you use that as a source of entropy for PRNGs and OpenSSL and such? I mean, theoretically, it's supposed to be identical to random noise. It should be really high quality entropy.

    Is this insightful, or is there some obvious flaw that I'm missing because I don't know how PRNGs work?

    --
    I'm proud of my Northern Tibetian Heritage
  366. Compressing random data is easy... by 1000101b · · Score: 1

    All you have to do is break the data into chunks... 4 bytes for example. Next calculate the sum of the bytes (using their ASCII codes) in the chunk. Then (this is the hard part) you determine the permutation (from a list of possible permutations) of 4-byte chunks of data for that sum. All one has to do is transmit the sum and the permutation... two numbers... and you can use all sorts of fear inducing math to compress it even more. For better compression, change your chunk size to a bigger number. Also, you can re-compress your data (or even other compressed data) for smaller size. You can even stream it. The one drawback though... it takes lots and lots of cycles to compress and even more to decompress.... (sigh). At least it got me an A in one of my early CS classes!

    --
    Live wrong, impostor.
  367. Provably impossible by mathemagicianX · · Score: 1

    A binary string is said to be random in
    algorithmic information theory (my area of mathematics)
    if it is not significantly compressible.
    (I won't get into exactly what this means here
    but you get the idea...)
    Anyway,
    anything that is compressible by a factor of
    100 must have a huge amount of structure for
    the compressor to take advantage of, and so
    is highly non-random by definition. Clearly then their "virtually
    random" data is not random in the slightest.
    In fact in order to be compressible by this
    factor it must be EXTREMELY non-random!

  368. It's a hoax - another smoking gun by JohnPM · · Score: 1

    Their press release states:

    All of these traditional methods are being enhanced by ZeoSync through collaboration with top experts from Harvard University, MIT, University of California at Berkley, Stanford University, University of Florida, University of Michigan, Florida Atlantic University, Warsaw Polytechnic, Moscow State University and Nankin and Peking Universities in China, Johannes Kepler University in Lintz Austria, and the University of Arkansas, among others.

    The claims about their compression performance are clearly false - but even legitimate companies are known to exaggerate and oversell. However this list of academic collaborators is the most damning evidence of a hoax, IMHO. Having worked on quite a few projects at the interface of commercialisation and academia, I can promise you that there is no way any project can run with such a long list of partners. Maybe 2 or 3 would be beleivable.

    --
    Karma police, I've given all I can, it's not enough, I've given all I can, but we're still on the payroll.
  369. Sloppy.... by amacbride · · Score: 1

    Hmmph. Since they didn't bother to run the press release through a spell checker ("Berkeley"), I suspect they have bugs that could randomly change your compressed data into listings from the 1937 Manhattan phone book. (Or perhaps old episodes of "Who's The Boss?").

  370. Mathematical Law Disproven by Someone Bad at Math by Discoflamingo13 · · Score: 1
    ST PAUL,MN - Like the belief in cold fusion, perpetual motion machines, or the idea that quantum computers will make all cryptography useless, a student of the classics today at Macalester College has finally proven that compression of random data can be accomplished to an arbitrary degree. Her secret to success:

    "She's really terrible at math," said a member of the Mathematics/Computer Science Department at Macalester, who wishes to remain nameless. "Her belief that she shattered Godel's Incompleteness Theorem by stating that `Um, humans don't think like machines, so like, it can't be right,' once again completely brings the mathematics community down to its foundations. Or rather, it doesn't , because she doesn't have clue one about what she's talking about." Other voices dissent:

    "It's important to think outside of the box," said the chair of the Classics department. "Her complete lack of mathematical knowledge only makes her a better candidate for seeing the inherent flaws in centuries of mathematical reasoning."

    "So like, some scary guys were arguing about this thing on some website they saw, saying it was impossible, that you could compress some arbitrarily random something-or-other, and I said, `Look, like, in Honey, I Shrunk the Kids, they could make anything smaller because there was space in-between particles, or Barbi dolls, or something, so like, we just have to find the space in-between your data.' That shut them up really quick."

    Hastily scrawling her new idea on a cafeteria napkin, the "idiot prodigy," as dubbed by the "scary guys" (the local chapter of the ACM), she has come to the conclusion that the only way to reach this kind of compression is to use rational numbers. "Like, 1/2 is smaller than one- why don't they just use that? I mean, I've found the space in-between their `data,' like, why don't they believe me? A whole `nother department at this college does!"

    In posthumous response, Huffman, of Huffman-encoding fame, is now spinning in his grave- along with Turing, Godel, and Church.

    Other than changing a few titles and names, this event actually happened in a class I was taking- Godel's Incompleteness Theorem was summarily "disproved", along with the Church-Turing thesis, and the entire idea of P(!)=NP, by a classics major in my Advanced Symbolic Logic class.

    I suppose that, in whatever context, the lesson for today is that it's easy for any one person to disprove an untenable law of mathematics or computer science, simply by being really bad at math.

  371. amazing by jopet · · Score: 1

    quite amazing: that rediculous claim of doing something that has been proven impossible many years ago not only got them into slashdot, but also into the computer column of my local austrian newspaper (sigh). even more amazing: the number of people posting their own wonderful algorithms for compressing random data here. most amazing: i waste my time bothering.

  372. A stupid theory of mine by TACD · · Score: 1
    Here's my stupid theory to add to the mush:

    1. Take a file, any file. The aforementioned Matrix movie, for example. Now, line up ALL the bits in the wonderfully huge thing.
    2. So you have a massively long string of 1s and 0s. Resolve into a decimal number. (I know this acheives nothing, but bear with me).
    3. Create a mathematical algorithm to which the answer will be this number. (Clearly can be quite small; x(to the power of)21 + 4 or something.)
    4. Convert this algorithm into binary (imagine it in decimal for simplicity;s sake).
    5. Go to 3, and reapeat at will.
    No doubt there is a very fine reason why this is idiotic and won't work, but I'm not much of a geek so please tell it to me. Nicely. :-)
    --
    Security through promiscuity is no better than security through obscurity.
    1. Re:A stupid theory of mine by 1000101b · · Score: 1

      There are an infinate number of math expressions that, when evaluated, equal your given BIG number. Your goal would be to minimize the number of symbols/characters in your expression. This approach could work but it would be difficult to generically create an expression (although it could be in machine language) that was physically smaller than the source file. I don't think the theory is so stupid. Now... the implementation...???

      --
      Live wrong, impostor.
  373. That's a great idea! by Anonymous Coward · · Score: 0

    To get 100:1 compression, you'd only need to use

    890028147162858511750213993609110836894187624591 65 212124436344046800\
    231927137294396344627357632177625898073120906089 55 216520806400000000\
    0000000000000000

    megs of storage. You could cut that down some if you realize that some of those strings have repeating elements, but that would ruin the elegant simplicity of such an approach. Of course, it wouldn't work. It would be 100:log_2(that number up there). It's close enough, though.

  374. All ready has a patent? by Rubbersoul · · Score: 2, Interesting

    This may have already been posted, and if it has sorry, but I thought this may be of interest to some of you.

    Jean-loup Gailly (one of the creators of gzip) has written an article on a patent that was granted for compression of truly random data, and how it is not mathematically possible. You can read it here for those that are interested.

    --
    man .sig
    No manual entry for .sig.
  375. Unlimited compression by jkstill · · Score: 0

    This is not the first time so called unlimited lossless compression has surfaced.

    Some of you may recall an article that appeared in Byte Magazine a few years ago:

    April 20, 1992 Byte Week Vol 4. No. 25:

    "In an announcement that has generated high interest - and more than a
    bit of skepticism - WEB Technologies (Smyrna, GA) says it has
    developed a utility that will compress files of greater than 64KB in
    size to about 1/16th their original length. Furthermore, WEB says its
    DataFiles/16 program can shrink files it has already compressed."
    [...]
    "A week after our preliminary test, WEB showed us the program successfully
    compressing a file without losing any data. But we have not been able
    to test this latest beta release ourselves."
    [...]
    "WEB, in fact, says that virtually any amount of data can be squeezed
    to under 1024 bytes by using DataFiles/16 to compress its own output
    multiple times."

    The product did not work as advertised ( surprise ) and does not seem to have made many inroads into the data transission industry.

    More of this can be seen at:
    http://www.faqs.org/faqs/compression-faq/part1/s ec tion-8.html

  376. "existing temporal constraints ..." by JoeGee · · Score: 1

    So if the packet it sent via tachyons (or sent in an alternate universe) and arrives at the exact moment it is sent, transmission time = 0, therefore the packet has been "compressed losslessly." Cool. I understand. :)

    --

    Get off my virtual lawn, you damned virtual kids!
  377. compressible postings by jopet · · Score: 1

    postings to this topics are so redundnant that a compression rate of about 1:800 should be easily achievable.

  378. Isn't it a requirement of every mathematical site by JoeGee · · Score: 2

    ... to have catchy theme music, and pretty flash intros? That's how *I* can tell they doing something real in the academic community. :)

    If their technology is so earthshatteringly different and revolutionary but can use existing connections, why didn't their site download instantly? If it's only software and they already have a patent one would think the easiest route to gain investors would be a small download and a mindblowing demo away ...

    --

    Get off my virtual lawn, you damned virtual kids!
  379. 100:1 Compression? They should see a doctor... by marko123 · · Score: 1

    I got about 6:1 compression on the pizza I ate last night.

    If they got anything more than about 20:1 compression, I'd suggest eating food with more fibre.

    But 100:1 lossless compression? Guys, call yourselves an ambulance. Healthy digestion should include some loss of information.

    --
    http://pcblues.com - Digits and Wood
  380. Compression of random data? I think not! by Anonymous Coward · · Score: 0


    Surely they mean arbitrary data, not random data. A fundamental property of random data is that it is its own shortest description - i.e. incompressible.

    The whole thing stinks, really. Even if they mean arbitrary non-random data the 100:1 compression factor is just not achievable all the time: can you achieve such a compression factor on a 50 bit string?

  381. Wlodzimierz Holsztynski speaks !!! by Anonymous Coward · · Score: 0
    The 'InterTran(tm) translation engine comes to the rescue:

    >> > To ogol uzywamy makes " you " ( very like, with Panu zalezy :-) >> Not zalezy me to formie " Mr. ". More responds me shape " you" poprostu >> reads when notki of thee on internecie as well I see, with you've at least tytul >> doctor, meeting ex szacunku zwrocilem sie shape " Mr. ". Yes by way of student >> zostalem wonted . Ex that tides bede returns sie to " you ". > > what, znalaz?em is not wr?cz a kick in the pants?ce... >pardon me On?odku, ?e not wiedzia?em :-(( > >"Dr. Wlodzimierz Holsztynski Dr. Holsztynski became > and full professor of mathematics at Warsaw University > at the age of 22, uniquely combining pure applied ... " > >I am sorry, bank is not niedost?pna, and quotation this balance ex googli. >Id? seeks onward:-)and mo?e, On?odku, title?by? what wi?cej...? > >salutes, ?K To wit clotted nonsense . not wiedzialem via who downtime, with this outlet pozwolila yourselves to quotation jakichkolwiek informacji of me . Zadnych not autoryzowalem,,nie upowaznilem them, not pozwolilem, and yet to upshot zabronilem - when a few days ago by accident zauwazylem what title, this spot napisalem until them a man of law, zeby those " informations " usuneli. I`ve istotniejsze successes, niz those nieprawdziwe, listed to that stronie, and yet if not mial, not wants tommy rot . (they such belongings does by, wherebyprzez co soots, przyciagnac investor ). Bylem them consultant, moze anew bede, and moze not, ex nimi it is anyone's guess . Pardon me too those niepotrrzebna misinfoprmacje, whereas naprawde there are not therein neither troche my guilts . Salutes, Wlodek I'm sorry, with yes sie steel, choc in the main this smieszne. -- ============= P of l N E ON S ============== cartulary as well rummaging newsów http:/www.polnews.pl ---- ex 28.08 nowa, lepsza version ----


    I'm glad I took the time for that ;)
  382. It's legit. by Anonymous Coward · · Score: 0

    Being a mathematics grad student myself familiar with Smale's work, believe me this stuff is legit.

    Clearly they are not all the way there yet. A fellow named Michael Barnsley at Georgia Tech (author of Fractals Everywhere) has been working on this kind of stuff to (under the moniker Iterated Function Systems). I think the catch is that compression times have been astronomical (although that could be coming down due to recent advances in computational horsepower and theoretical breakthroughs) and the decompression time isn't exactly snappy either.

    Anyway, sorry if this is redundant ... just my two cents worth. I'm personally not at all surprised that this would be possible.

  383. HIJKLMNO by flufffy · · Score: 2

    can be compressed to water ;)

  384. not possible by Anonymous Coward · · Score: 0

    By definition, random data can not be compressed because it contains no repeating patterns.

    Once again, slashdot falls for another BS story.

    You guys suck.

  385. Definition of Random by Bombcar · · Score: 1

    We need a slashdot poll:

    A random set of numbers means:

    (1) Any and all possible sets
    (2) Only those sets which have no visible patters
    (3) RAANNNDY!, Baby!
    (4) Cowboy Neal

  386. TM's as indicator of crap. by augustz · · Score: 2

    The real breakthrough is the new discovery that the number of TMs and words capitalized in TheMiddle == the amount of money these folks will dupe from some silly investors.

  387. Current ratio for random data is EXACTLY... by Anonymous Coward · · Score: 0

    1:1, and so will remain.

    Nough said.

  388. Steve Smale by phr2 · · Score: 1

    Steve Smale is a real mathematician, one of the great ones of the 20th century (he'd be in his mid 60's now). I had some classes from him at UC Berkeley in the 90's and know him slightly. He's not a computer guy, but there's no bullshit about him and I'm amazed if he's actually been pulled into a scam like this. He retired from UC a few years ago and last I heard he was teaching in Hong Kong. I'll see if I can find an email address for him and ask him what the story is.

  389. Mega compression by Anonymous Coward · · Score: 0

    I can get 1,000,000:1 compression... Just store a very large file at, say:
    www.placeoflargefile.com/bigfile

    There! 32 bytes. Unfortunately, the "decompression algorithm" can take quite a while, depending on your connection...

  390. Re:Unbreakable encryption by color+of+static · · Score: 2

    I'm quite aware of that page. I work about 20' from the author :-).
    I'd argue that there is no effective commercial one-time pad, only products that approach it. There have been a number of companies releasing similiar press releases about OTPs for some time, but each time the generation method has resulted in it not being an OTP. Most of the time it has also been substantially worse then most existing algorithms.

  391. Sounds a lot like... by chhamilton · · Score: 1

    ...a lot of other hoaxes out there (that numerous people have already mentioned), however, it also sounds like another "new" technology.

    There's a company out there called Datagistics that is also claiming magic compression of pretty much all data, using a technique they call "Random Access Para-Integrated Data", or RAPID for short. They're not claiming 100:1, but rather 20:1, so I guess their technology has a 5 times better chance of being real ;)

    The site is unfortunately a little light on the details, not even offering a techo-babble pseudo-explanation like these ZeoSync guys...

  392. "random" data? by howlingfrog · · Score: 1

    Yes, I really believe that they can compress random data. Even though the various mathematical definitions of the word "random" all essentially mean "noncompressible." As pointed out many many times already, for any compression scheme to work at all, there HAVE to be noncompressible strings. The word "random" simply refers to such strings.

    For a ratio of 100:1, a fantastically high proportion of strings must be noncompressible. Information theory is not my field, but I would assume it'd be something like this: Given an arbitrary string, there's a 1 in 2^100 chance it's compressible. Yay, the world's bandwidth problems are solved.

    I can't believe Slashdot's editors even bothered to post this load of bull.

    --
    The original Howling Frog is a fictional character and has no UID.
  393. Hang on? Use equations. by Anonymous Coward · · Score: 0

    Had a quick brainwave.. it's probably wrong, but I thought I'd throw it out incase there's something in it.

    Why *can't* you use equations to represent long streams of data?

    If you ever wrote a compiler or studied random number theory, you'll know that the only 'random' numbers a computer generates are psuedo-random. Most 'random' number generators use quirky equations to produce their output.

    What if you could match the output to a number of equations in some way?

    Now, it's a well known fact that for the amount any compression routine can compress 'random' data.. it must also expand an equal amount of data. That's fine. What if there was a routine that would *only* compress 20% of files to a twentieth of their size? You could run the algorithm over it, keep the output in the cases where the compression was efficient, and you're still up on the deal.

    Before you say anything.. I know all about the 'Question 9' and pigeon-holing blah blah blah.. just throwing this out.

  394. Reuters report by markovII · · Score: 1

    I think the Reuters people were wrong, Shannon died last year, not a decade ago .... if you read the report in Reuters site

  395. Re:how can this be? Answer: BitPerfectTM by dzeuthen · · Score: 1

    Basicly it means they *might* have a breakthrough for audio/video, but it's useless for executables etc.

    This will be lossy, so the above and this might be offtopic.

    You know, in fact most audio/video are compressed using lossy algorithms most of the time exploiting redundancies non-percievable to the audience, such as using YUV colorspace instead of RGB and fancy transformations into frequency space from spatial space.

    In the end the binary soup is often encoded using Huffman and runlength encoding. Add to this trivial (not really) motion detection and compensating algorithms and you arrive at MPEG1/MPEG2.

    Dedicated circuits handle the encoding and decoding (where encoding is much more expensive).

    My point is that you cannot discard the application domain when talking about compressing A/V if you want to achieve a cost-effective encoding+decoding solution. Mere non-informed compression is here a waste of time. Use what you know!

  396. If the universe is Deterministic... by acid_andy · · Score: 1

    If (that's a very big if) the universe is completely deterministic then, in theory, everything in it could be calculated by knowing the initial conditions of the Big Bang and all the physical laws that acted to change those conditions. In that case, nothing in the universe would be truly random and provided you knew a thing's position in spacetime you could calculate its data (except that something tells me your simulator would have to be more larger than the universe itself to run the code). On a more realistic scale, if you knew exactly the conditions under which your data to be compressed was obtained, you could run a simulation of the process again to regenerate the data. E.g. a script for a raytracing program will be many times smaller than any algorithm such as jpeg could ever make the resultant image. Basically what I'm saying, is IF there's no such thing as truly random numbers, and you know how the random data was generated in the first place, you can run the generation process again to get the data again. Hopefully the data you would need about the initial conditions would be less than the data the process generated.

    --
    Your ad here.
  397. You made the error. by nikoftime · · Score: 1

    The word Berkeley is actually quite correct. Check this link to University of California Berkeley's Website: http://www.berkeley.edu/

    1. Re:You made the error. by amacbride · · Score: 1
      ...actually, I just wasn't being clear. To quote the original article:

      All of these traditional methods are being enhanced by ZeoSync through collaboration with top experts from Harvard University, MIT, University of California at Berkley, Stanford University, ...


      (I'm an alum, so I noticed.)
  398. The current ratio, by mindstrm · · Score: 2

    The current BEST ratio for compressing truly random data is 1:1
    In other words, you can't do it.

    If you TRY, some compressions software will end up making it bigger.

    These guys are claiming 100:1 lossless on truly random data. This is difficult to believe on both fronts.
    First, 100:1 lossless on any real-life data is unlikely. Add in the 'truly random' part...

    So.. either they've violated the laws of the universe, or they are about to bring about one of the biggest mathematical discoveries in the world, or they are full of crap.

    1. Re:The current ratio, by kender · · Score: 1

      There 100:1 on nearly random data is BS otherwise compressing their compressed data would yield the same 100:1 ratio...

  399. See pigeonhole example. by mindstrm · · Score: 2

    You can't compress every set of 1000 bits of data into 10 bits of data.

    10 bits of data only allows for 1024 combinations.
    1000 bits allows for a lot more.. so it's simply not possible.

  400. Remember the 100% secure encryption scheme by twfry · · Score: 1
    This "technology" will take the same path as that professor from Princeton University who claimed he developed a proveably secure method of public encryption. Remember the one that showed up in the Wall Street Journal and NY Times. Where you use a known source (in his case a satelight) which constantly sends out private encryption keys. To transfer data you just agree on a time to select the key, both the encoder and decoder grap it, and then the data is encrypted with that key. The idea was that the private encyption key is never transmitted and a hacker wouldn't know which of the trillions of keys out there to use.

    What? You don't remember this? Oh, thats because it is worst than traditional public key encryption schemes. 1) Its security through obscurity, you need a method to transfer the time to grab the private key out of all of the possible ones. 2) Whats worse is now for regular private key encryption to transfer the data, instead of having the entire key length as possible combinations, now there are only the ones transmitted by the common source. (No matter how many they are it will always be much less than 2^128)

    We all know this compression method will follow a similar path.

  401. 100:1? pfft! 100% compression! by FIGJAM · · Score: 1

    rm -Rf /

    --
    Do your best, hope for the best, suspect the worst.
  402. It can happen, I actually believe them. by twoblink · · Score: 1

    I'm not kidding here. Let's take a look at a few possibilities.

    We take "random" data, and convert it into an image. We somehow "convert" this image into vectored images. Then an entire image is saved as descriptors, not as data itself. So it can happen. A combination of different sin or cos waves can yield any shape wave. Suppose we draw a line, number marker lines above it, 1-9. Then below it, number it A-F. Then we progress and plot the HEX code. After that is all plotted, it creates a wave graph. Suppose they found a way to efficiently describe the wave graph. Then compression ratios of 100:1 is quite possible. We at that point aren't actually saving data, but a DESCRIPTION of data, which might be shorter. I can have a string of numbers 1 to 1,000,000; and that would not compress that much; but I can say;

    for (i=1;i1000000;i++)
    printf(i)

    I can just save that, and I will have saved 1,000,000 lines of numbers. I am not saving any data, just a smaller description of it. Hell, in my example, I have achieved 41666:1 compression ratio! So.. before you doubt, shut your trap and think outside the box...

    The second theory that makes me think this might actually be true, is that the pigeon hole theory they described. If you rearrange data and then "move it up a dimension" you can group things in such a way that a smaller subset of information is stored. You add noise to it. When you decompress, you WILL lose something in the dimensional transition, BUT, as long as what you lose is not along the data path, or is not part of the dimension quadrant that actual data sits on, you are fine. Sooo... it _IS_ possible.

    Now did these guys in Florida do it? I don't know, I doubt it. Is it possible to? Yes.

    Do your "bits of entropy" math calculations and you will find it's not possible... blah blah blah.. BUT, if you think OUTSIDE THE BOX, then it should be possible.

    Two notes:
    1) There was a similar claim by a program called OWS, back in 1992. It was a hoax.

    2) Nothing will compress to 1 byte because there is always algorithm overhead, so at some point in time, if this does compress random data, it might GROW slightly in size, past a certain point.

    My guess is, such high compression ratios, if they are to be achieved, will require a lot of horsepower, both to compress, and decompress; with the teeter-totter tilting heavier on the compression.

    BlinkBlink

    1. Re:It can happen, I actually believe them. by epine · · Score: 1


      Please apply for your Ark B boarding pass. With the newly installed compression algorithm, the transporters now operate with 100 times greater efficiency. Signal your consent by flapping your gums and we will promptly beam you aboard.

    2. Re:It can happen, I actually believe them. by Anonymous Coward · · Score: 0

      Hi

      Concerning the 41666:1 ration in your for (i=1;i1000000;i++) printf(i) example: You have to include the software which decodes/interpretes/compiles your statement; actually this software is part of the decompression algorithm.

      Regards

      Daniel

  403. They aren't outright lying.... by CoolGopher · · Score: 1

    If we think a bit closer about the quote:
    "The limitation to this Pigeonhole Principle circumvention is that the multi-dimensional space can never be super saturated, and that all of the pigeons can not be simultaneously present at which point our multi-dimensional circumvention of the pigeonhole problem breaks down."

    Bringing the multi-dimensional aspect down to a plain byte level, what they are saying is that as long as the byte only contains values between 0 and 3, they can achieve astonishing compression levels. Hell, even I could do a 64:1 with such an assumption =)

  404. "Practically random" (was Re:The current ratio,) by isdnip · · Score: 2

    They have funny wording in their release about data that is practically random. Well, that can be parsed to mean that in practice, the data is random and therefore it can be replaced by any other random string. After all, it's random! Not mathematically random in the entopy sense, but used by an application which wants any old string of random numbers. So sure, I can send a message saying, "generate me 1000 random digits". Great compression. Useless in practice, of course. In any case, these guys sound like a get-rich-quick scheme, trying to fool people, and not the only one of that type I can think of.

  405. Encoding to low Kolmogorov Complexity can work... by bgspence · · Score: 1

    ZeoSync says they encode their targets so that they will 'substantially occupy a space of low Kolmogorov Complexity Construct' (see their 'Technical process' page). This might mean that if you encode the most 'meaningful' bit patterns with low encoding values falling within a target compression range which spans only 1% of the encoding space of all possible bit patterns, you will get the kind of results they are touting.

    For example, there are many bit patterns which could be used as a jpg image, but most random bit patterns will simply look like noise. So, encode the ones with potential usefulness with small encoding values and encode the noisy images with the larger encoding values. They don't attempt to encode random bit messages losslessly, only the useful ones.

    The trick is sorting out the symantics of a bit string. Which ones are noise, which ones have meaning and which ones are p0rn.

  406. How big is the decoder? by Anonymous Coward · · Score: 0

    The answer is quite simple. The encoder and decoder are each about 5 terabytes big with every possible combination of 0s and 1s in them. Then, all the "zip file" has to store is the location of the file. Sure, 100-1 compression, but it requires about 3 copies of cdrom.com's computer to hold the thing :)

    1. Re:How big is the decoder? by Anonymous Coward · · Score: 0

      Nice idea, but I reckon more often than not the index giving the location of the file would be at least as long as the file itself! E.g. Say we have a file consisting of 3 random digits from 0 to 9. There are 1000 possible files that you can get - so you save all these to disk and index them 0 to 999 - trouble is the indices are the same lengths as the files they represent (except maybe for cases less than 100 - you could compress the file '003' to '3').

  407. Forgive me for being paranoid... by Anonymous Coward · · Score: 0

    Forgive me for being paranoid (hence AC, and yes I know this post will probably never be read), but what I see here is an investor scam site (the old snake-oil compression thing, *again*?!) with an unskippable Flash intro on the same day news about a Flash-borne COM virus breaks.

    Anyone care to virus-scan the Flash intro on their page? Is this a seed?

  408. Nice disclaimer... by jelle · · Score: 1

    "This press release may contain forward-looking statements. Investors are cautioned that such forward-looking statements involve risks and uncertainties, including, without limitation, financing, completion of technology development, product demand, competition, and other risks and uncertainties."

    I've seen that a couple of times before, and each time it really meant this:

    "The claim made in this press release is bogus. The company does not posess the ability to perform what is promised above. The part of the statement above that is not a lie, is total bullshit. This press release was created as a mere attempt to get money from clueless investors"

    --
    --- Hindsight is 20/20, but walking backwards is not the answer.
  409. I can think of an instance where 100:1 happens. by netik · · Score: 1

    If you consider video; If you are going between two frames of video, and you're using temporal compression, then only the sections of the image that change get updated, and not the entire image. The rest of the data is dropped, and movie size becomes that much smaller.

    Overall, though, you're never going to get 100:1 across the entire film even with throwing away data, so how does this work?

  410. Perhaps I am tired but... by Anonymous Coward · · Score: 0

    If one were to apply a series of (memorized) random algorithms to the random data of a file, could one set of the resulting data actually be easier to compress using more conventional methods?

    I doubt they have anything worked out but I was hoping to breifly spark some interest with a not-so-well-thought-out post :P

  411. Now.... by tahpot · · Score: 1

    where did I put my modem? Broadband's dead.

  412. Physically Possible? by modulus · · Score: 1

    Assuming "random" data is the hardest to compress, as I think it probably is, can this possibly be true?

    The following thought experiment:

    • You take your 10GiB DVD rip (or whatever) and compress it. If it were truly random, it would compress to something like 100MiB.
    • Compress the result. It's just another file, no harder to compress than a random one, right? The result is just 1MiB.

    So now you've got a DVD movie that fits on a floppy disk. I don't buy it. (Not that I wouldn't, if it were actually possible.)

    How can this be reasonable at all?

  413. most likely a scam by by2 · · Score: 1

    Compress random data 100 to 1 ? I'll bet you $1000 that it's either a exaggerated claim, or an outright scam.

  414. How about this sci-fi style solution? by acid_andy · · Score: 1

    OK imagine it's thousands of years in the future and humans can do funky things with space and time.
    1. You have the file you wish to compress on your hard drive.
    2. With your ultra-hi-tech science you open a wormhole to connect this point in space-time with say, a point 2 months in the future.
    3. You delete the file from your hard drive. It was the only copy in existance so you've effectively compressed it to 0 bytes and you can use this disk space for 2 months.
    4. 2 months later you need the file again so you fetch it via the wormhole.

    OK so this is kind of like just backing up the file to a tape drive or something and then copying it back when you need it, maybe it's not really compression - but the difference is, *after* you delete it, it really doesn't occupy any *space* at all until you need it again 2 months later.

    Now we just need to work out how to make a wormhole, hmmm...

    --
    Your ad here.
  415. Random Data Defined by Anonymous Coward · · Score: 0

    How do we define truly random? Is there any other definition than it's data for which the shortest possible description is a list of the data itself. Then recoverably compressing truly random data is by definition impossible.

  416. mod him up to 5 by abdulla · · Score: 1

    i've never laughed so much at a post :)

  417. Do you realize the implications of this? by Anonymous Coward · · Score: 0

    I know someone else posted this but I'm just going to reiterate and provide an interesting example.

    Let's say you have a 1 gig file that happens to be the DIVX rip of the Gladiator DVD.

    soooo since this thing can do "any random bytes" you might be able to assume that you can zip over and over....

    1,000,000,000 bytes
    to 10,000,000 bytes
    to 100,000 bytes
    to 1,000 bytes
    to 10 bytes

    Then, it could be as easy as going to a messageboard and typing "this is the data for Gladiator--> 'S#j1LLzo0i'"

    but obviously I REALLLLLY doubt you can do it over and over.... like the article says "100:1 in some/most cases"

    but still if compression got down that far.... warez would truly be unstoppable. You could have every creation ever created on a 10 gig drive with ease.

    1. Re:Do you realize the implications of this? by SIGFPE · · Score: 2

      You could have every creation ever created on a 10 gig drive with ease.

      Er...you're not thinking hard enough. You could compress that 10 gig drive to 1 byte. In fact, here it is: X. That 'X' contains all the best warez ever written. Unfortunately I'm keeping the decompressor for myself.
      --
      -- SIGFPE
  418. Too simple for me.... by rocca · · Score: 1

    I need a compression routine with more than a Zero Space Tuner(tm) and BitRate Accelerator(tm) and Fake Article Compounder(tm), I need one with Sub-Space Intergalactical Holographic Nucleoumical Redifferenciator Protocol(tm) support.

  419. JAR compression. by Shanep · · Score: 2

    JAR, from the maker of ARJ, is substantially better than ZIP and RAR as far as compression goes and substantially slower also.

    Interesting thing I remember with JAR in DOS, is that the more memory you have to assign to the compression, the better the compression.

    http://www.arjsoft.com/jar.htm

    --
    War crimes, torture, lies, illegal spying... Would someone give Bush a blowjob, already, so he can be impeached?
  420. [no coherent subject] by Anonymous Coward · · Score: 0

    Hmm... so we've seen a thousand times mathematical proofs that no, you CAN'T have 100:1. OK. Here's how I think some of it might work.

    They say they intentionally randomize some data (this is out of the New Scientist article). That means they can't be doing pattern-matching on it because they're deliberately destroying patterns.

    How about finding which patterns it DOESN'T match?

    I read about this in some article in some magazine about iris scanning. The author of the iris scanner software said (i paraphrase from memory), "The big breakthrough came when I stopped trying to find what patterns are in the image, and started trying to find what patterns *aren't* in the image".

    So perhaps instead of storing information about how the file is constructed, they store information about how the file ISN'T constructed. It seems lame and stupid -- but fits in perfectly with their claims. Great compression of random data - random data has no patterns. They don't mention repeating data - because I'll bet you that ZeoSync fails miserably at patterened data. Actually... no it wouldn't. A file that repeated the pattern 101010101010 isn't repeating the pattern 110011001100, so they could still use pattern-non-matching.

    In any case, I lost track of compression technology about five years back. This is random and incoherent mumbling and should be ignored. http://www.nitrozac.com/ for no more information. Hooray for Nitrozac!

  421. MOD PARENT UP by Platypii · · Score: 0, Redundant

    Dead on.... said it before i could

  422. Unlimited compression is easy... watch this: by waimate · · Score: 1
    Pi is an infinite length number that never repeats, right?

    Therefore all possible data sequences appear somewhere within the digits of Pi, right ?

    Therefore any file of any size can be represented by just two numbers - the position of the starting digit within the digits of Pi, and the length of the sequence. Presto. QED.

    Best of all, if you want to encrypt as well as compress, just use "e" instead of "Pi".

    (note: one of the two numbers resulting from the compression may be rather lengthy in nature, however, do not let this prevent you from IPOing your company)

    1. Re:Unlimited compression is easy... watch this: by kfogel · · Score: 1

      No -- the first boldface assumption is not true.

      Imagine that the expansion of Pi were infinite
      and non-repeating, but coincidentally never
      contained the digit `8'. As you can see,
      there's no contradiction there -- such a string
      is possible, although it happens that Pi is
      not one such.

      Therefore, any data with an `8' would not
      be locatable in Pi.

      The conclusion that all possible data sequences
      appear in X because X is infinite and
      non-repeating is the misstep. (I once came
      to that same wrong conclusion myself, but
      a fellow named Dave Kuhl corrected me, thanks
      Dave! :-)

      -Karl

      --
      http://www.red-bean.com/kfogel
  423. Re:how can this be? Answer: BitPerfectTM by composer777 · · Score: 1

    Ok, where do you store the information about where the error is in the 3 bit string. You have listed next to each 3 bit number an error and a position for that error. You also need to include information indicating whether or not the digits following the encoded 000 and 111 is significant. So here is your algorithm(the flaws in your reasoning should be apparent):
    000 0 00
    001 0 11
    010 0 10
    011 1 01
    100 0 01
    101 1 10
    110 1 11
    111 1 00

    Reordering you encoded strings we get 000, 001, 010, 011, 100, 101, 110, 111, which is the exact same amount of data that we were trying to compress, so therefore you cannot compress a truly random 3 bit string, nice try though.

    Note: we need to use 00 for the case of 111 and 000 in order the alorightm to be able to differentiate between cases such as 111 and 1,1,1.

  424. Far better than that. by Anonymous Coward · · Score: 0

    I have a method that works in theory, I just don't have a computer trillions of times faster to make it work.

    I think that the trick is not to think of the file as a stream of numbers, but instead as one or more incredibly large numbers or ILNs(tm). You've seen complicated equations that to resolve result in really large numbers - just build tools to analyze the number to find the shortest possible equation to represent it. Trade size for computing power.

    As for the compress the compressed number thing, you would perhaps be able to do such but it would provide diminishing returns.

    Maybe when I have a computer with the equivalent of 1 billion pentaflop chips my dream will be realized.

    History will forget me though.

    Paladin

  425. What would you ask Zeosync? by gloomyjoe · · Score: 1
    I work for an Internet research firm, and will be getting a briefing from Zeosync on Monday afternoon. I have a pretty good idea of what I'd like to ask them, but I'd be grateful for suggestions of questions to probe whether there's any reality to their claims (and to make me look smarter). Assuming I'm not bound by an NDA, I'll post a follow-up on the meeting later.

    Thanks.

  426. I've got it! by Anonymous Coward · · Score: 0

    Here's a scheme to get major compression of all files:

    caveat: it requires user input.

    Program saves number of bytes the file contains.

    Program randomly generates files of this size and asks the user "Is this the right one?"

  427. Re:how can this be? Answer: BitPerfectTM by Anonymous Coward · · Score: 0

    You are an idiot. He was saying the compression is lossy. This means that he doesn't store where the error bit is. Simply that they are probably erroring up/down to the nearest well compressible value, and then compressing. Not that the restore would make things exactly as they are supposed to be.

  428. no, i've got it. by eyenot · · Score: 1

    the whole universe decompresses at a ratio of 100:1 except for your file. you laugh, then you die. the end.

    --
    "Stratigraphically the origin of agriculture and thermonuclear destruction will appear essentially simultaneous" -- Lee