ZeoSync Makes Claim of Compression Breakthrough
dsb42 writes: "Reuters is reporting that ZeoSync has announced a breakthrough in data compression that allows for 100:1 lossless compression of random data. If this is true, our bandwidth problems just got a lot smaller (or our streaming video just became a lot clearer)..." This story has been submitted many times due to the astounding claims - Zeosync explicitly claims that they've superseded Claude Shannon's work. The "technical description" from their website is less than impressive. I think the odds of this being true are slim to none, but here you go, math majors and EE's - something to liven up your drab dull existence today. Update: 01/08 13:18 GMT by M : I should include a link to their press release.
Exscuse my lack of compression knowledge, but whats the current ratio? Im assuming 100:1 is pretty damn good. =) btw...even though this *might* be a good compression algorithm and all that, how long would it take to decompress a file using your joe average computer??
I SURVIVED THE GREAT SLASHDOT BLACKOUT OF 2002!
even lossless compression still relies on redundancy within the data, normally repeating patterns of data. surely 100-1 on TRUE random data is impossible?
update comments set karma=-1, reason='offtopic' where sid=26315
They claim 100:1 compression for random data. The thing is, if thats true, then lets say we have data A size (1000)
compress(A) = B
Now, B is 1/100th the size of A, right, but it too, is random, right (size 100).
On we go:
compress(B) = C (size is now 10)
compress(C) = D (size 1).
So everything compresses into 1 byte.
Or am I missing something.
Mr Thinly Sliced
Maybe they just needed more bandwidth for their terrible site?
"Ignorance more frequently begets confidence than does knowledge"
- Charles Darwin
The odds on a compression claim turning out to be true are always identical to the compression ratio claimed?
Given a number of pigeons within a sealed room that has a single hole, and which allows only one pigeon at a time to escape the room, how many unique markers are required to individually mark all of the pigeons as each escapes, one pigeon at a time?
After some time a person will reasonably conclude that:
"One unique marker is required for each pigeon that flies through the hole, if there are one hundred pigeons in the group then the answer is one hundred markers". In our three dimensional world we can visualize an example. If we were to take a three-dimensional cube and collapse it into a two-dimensional edge, and then again reduce it into a one-dimensional point, and believe that we are going to successfully recover either the square or cube from the single edge, we would be sorely mistaken.
This three-dimensional world limitation can however be resolved in higher dimensional space. In higher, multi-dimensional projective theory, it is possible to create string nodes that describe significant components of simultaneously identically yet different mathematical entities. Within this space it is possible and is not a theoretical impossibility to create a point that is simultaneously a square and also a cube. In our example all three substantially exist as unique entities yet are linked together. This simultaneous yet differentiated occurrence is the foundation of ZeoSync's Relational Differentiation Encoding(TM) (RDE(TM)) technology. This proprietary methodology is capable of intentionally introducing a multi-dimensional patterning so that the nodes of a target binary string simultaneously and/or substantially occupy the space of a Low Kolmogorov Complexity construct. The difference between these occurrences is so small that we will have for all intents and purposes successfully encoded lossley universal compression. The limitation to this Pigeonhole Principle circumvention is that the multi-dimensional space can never be super saturated, and that all of the pigeons can not be simultaneously present at which point our multi-dimensional circumvention of the pigeonhole problem breaks down.
The punchline to the joke was always along the lines of
http://www.zeosync.com/flash/pressrelease.htm
- Derwen
http://fsfeurope.org/
Pure random data is imposible to compress - If You compress 1Mb of random data (propper Random Data, not pseudo random).. and you get, say 100K's worth of compressed output; what's stopping you feading this 100K's worth back through the algorhythm, again and reduceing it down even more.... again, and again, untill the whole 1MB is squashed into a byte! (Which, obviously is a load of rubbish).....
ZeoSync said its scientific team had succeeded on a small scale in compressing random information sequences in such a way as to allow the same data to be compressed more than 100 times over -- with no data loss. That would be at least an order of magnitude beyond current known algorithms for compacting data.
ZeoSync announced today that the "random data" they were referencing is string of all zero's. Technically this could be produced randomly and our algorythm reduces this to just a couple of characters, a 100 times compression!!
ZEOSYNC'S MATHEMATICAL BREAKTHROUGH OVERCOMES LIMITATIONS OF DATA COMPRESSION THEORY
International Team of Scientists Have Discovered
How to Reduce the Expression of Practically Random Information Sequences
WEST PALM BEACH, Fla. - January 7, 2001 - ZeoSync Corp., a Florida-based scientific research company, today announced that it has succeeded in reducing the expression of practically random information sequences. Although currently demonstrating its technology on very small bit strings, ZeoSync expects to overcome the existing temporal restraints of its technology and optimize its algorithms to lead to significant changes in how data is stored and transmitted.
Existing compression technologies are currently dependent upon the mapping and encoding of redundantly occurring mathematical structures, which are limited in application to single or several pass reduction. ZeoSync's approach to the encoding of practically random sequences is expected to evolve into the reduction of already reduced information across many reduction iterations, producing a previously unattainable reduction capability. ZeoSync intentionally randomizes naturally occurring patterns to form entropy-like random sequences through its patent pending technology known as Zero Space Tuner(TM). Once randomized, ZeoSync's BinaryAccelerator(TM) encodes these singular-bit-variance strings within complex combinatorial series to result in massively reduced BitPerfect(TM) equivalents. The combined TunerAccelerator(TM) is expected to be commercially available during 2003.
According to Peter St. George, founder and CEO of ZeoSync and lead developer of the technology: "What we've developed is a new plateau in communications theory. Through the manipulation of binary information and translation to complex multidimensional mathematical entities, we are expecting to produce the enormous capacity of analogue signaling, with the benefit of the noise free integrity of digital communications. We perceive this advancement as a significant breakthrough to the historical limitations of digital communications as it was originally detailed by Dr. Claude Shannon in his treatise on Information Theory." [C.E. Shannon. A Mathematical Theory of Communication. Bell System Technical Journal, 27:379-423, 623-656, 1948]
"There are potentially fantastic ramifications of this new approach in both communications and storage," St. George continued. "By significantly reducing the size of data strings, we can envision products that will reduce the cost of communications and, more importantly, improve the quality of life for people around the world regardless of where they live."
Current technologies that enable the compression of data for transmission and storage are generally limited to compression ratios of ten-to-one. ZeoSync's Zero Space Tuner(TM) and BinaryAccelerator(TM) solutions, once fully developed, will offer compression ratios that are anticipated to approach the hundreds-to-one range.
Many types of digital communications channels and computing systems could benefit from this discovery. The technology could enable the telecommunications industry to massively reduce huge amounts of information for delivery over limited bandwidth channels while preserving perfect quality of information.
ZeoSync has developed the TunerAccelerator(TM) in conjunction with some traditional state-of-the-art compression methodologies. This work includes the advancement of Fractals, Wavelets, DCT, FFT, Subband Coding, and Acoustic Compression that utilizes synthetic instruments. These are methods that are derived from classical physics and statistical mechanics and quantum theory, and at the highest level, this mathematical breakthrough has enabled two classical scientific methods to be improved, Huffman Compression and Arithmetic Compression, both industry standards for the past fifty years.
All of these traditional methods are being enhanced by ZeoSync through collaboration with top experts from Harvard University, MIT, University of California at Berkley, Stanford University, University of Florida, University of Michigan, Florida Atlantic University, Warsaw Polytechnic, Moscow State University and Nankin and Peking Universities in China, Johannes Kepler University in Lintz Austria, and the University of Arkansas, among others.
Dr. Piotr Blass, chief technology advisor at ZeoSync, said "Our recent accomplishment is so significant that highly randomized information sequences, which were once considered non-reducible by the scientific community, are now massively reducible using advanced single-bit- variance encoding and supporting technologies."
"The technologies that are being developed at ZeoSync are anticipated to ultimately provide a means to perform multi-pass data encoding and compression on practically random data sets with applicability to nearly every industry," said Jim Slemp, president of Radical Systems, Inc. "The evaluation of the complex algorithms is currently being performed with small practically random data sets due to the analysis times on standard computers. Based on our internally validated test results of these components, we have demonstrated a single-point-variance when encoding random data into a smaller data set. The ability to encode single-point-variance data is expected to yield multi-pass capable systems after temporal issues are addressed."
"We would like to invite additional members of the scientific community to join us in our efforts to revolutionize digital technology," said St. George. "There is a lot of exciting work to be done."
About ZeoSync
Headquartered in West Palm Beach, Florida, ZeoSync is a scientific research company dedicated to advancements in communications theory and application. Additional information can be found on the company's Web site at www.ZeoSync.com or can be obtained from the company at +1 (561) 640-8464.
This press release may contain forward-looking statements. Investors are cautioned that such forward-looking statements involve risks and uncertainties, including, without limitation, financing, completion of technology development, product demand, competition, and other risks and uncertainties.
I simply can't believe that this method of compression/encoding is so new that it requires a completely new dictionary (of words we presumably are not allowed to use).
100 to 1? Bah, that's only 99%. /dev/null.
The _real_ trick is getting 100% compression. It's actually really easy, there's a module built in to do it on your average unix.
Simply run all your backups to the New Universal Logical Loader and perfect compression is achieved. The device driver, is of course, loaded as
B is not random. It is a description (in some format) of A.
But, what you say does have merit, and this is why compressing a ZIP doesn't do much - there is a limit on repeated compression because the particular algorithm will output data which it itself is very bad at comrpessing further (if it didn't why not iterate once more and produce a smaller file internally?).
---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"
Many people may say this is bull, but think of it in another way.
Instead of assuming that data is static, think of it constantly moving. Even in random data, moving data can be compressed because it constantly moving along. It is sort of like when a herd of people file into hall. Sure everyone is unique, but you could organize and say, "Hey five red shirts now", "ten blue shirts now".
And I think that is what they are trying to achieve. Move the dimensions into a different plane. However, and this is what I wonder about. How fast will it actually be? I am not referring to the mathematical requirements, but the data will stream and hence you will attempt to organize. Does that organization mean that some bytes have to wait?
"You can't make a race horse of a pig"
"No," said Samuel, "but you can make very fast pig"
What're they talking about? 20Gb of rand() output?
If so, they're a bunch or twits.
Government of the people, by corporate executives, for corporate profits.
There seems to be a company claiming to exceed, go around, obliterate Shannon every few years. In the early 90's there was a company called Web (before the WWW was really around by a year or so). They made claims of compressing any data, even data that had already been compressed. It is a sad story that you should be able to find in either the sci.compression FAQ or the renewed deja archives. It basically boils down to as they got closer to market, they found some problems... you can guess the rest.
This isn't limited to the field of compression of course. There are people that come up with "unbreakable" encryption, infinite gain amplifier (is that gain in V and I?), and all sorts of perpetual motion machines. The sad fact is that compression and encryption are not well understood enough for these ideas to be killed before a company is started or stacked on the claims.
We already have lzip to compress the files down to 0% of their original size. ZeoSync doesn't catch up with latest technologies on /. it seems.
If you read the press release carefully, they claim to be able to compress practically random data, such as pictures of green grass, 100 : 1. They never claim to be able to do the same with true random data, since this is impossible.
There may be something about that. However, there are also many points that make me sceptical, but maybe the press release has not been reviewed carefully enough.
This new algorithm does not break Shannon's limit, which is impossible, so the phrase about the "historical limitations" is a hoax...
Screw ZeoSync, I've built a compression algorithm that is 1000:1 and is completely lossless. I've yet to demonstrate it in public though but please give me venture capital. Thank you.
The maximum compression ratio for random data is 1. That's no compression at all.
ZeoSync intentionally randomizes naturally occurring patterns to form entropy-like random sequences through its patent pending technology known as Zero Space Tuner(TM). Once randomized, ZeoSync's BinaryAccelerator(TM) encodes these singular-bit-variance strings within complex combinatorial series to result in massively reduced BitPerfect(TM) equivalents. The combined TunerAccelerator(TM) is expected to be commercially available during 2003.
:)
I think they have made a buzz-word compression routine, even our sales peoply have difficults putting this many buzz-words in a press release
Section 1.9 of the comp.compression FAQ is good background reading on this stuff. In particular, read the "WEB story".
Most random generation uses bytes as their unit.
Now, what if they look for bit-sequences (not only 8-bit sequences but maybe odd numbers) in order to generate patterns ?
I guess this could be a way to significantly compress data but this'd imply a huge number of data read in order to achieve the best result possible.
Note they may also do this in more than one pass-through but then their compression thing should be really lengthy, then.
Trolling using another account since 2005.
Never, *EVER* accept any advice from the Aberdeen Group. Apparently their analysts don't know shit.
"Either this research is the next 'Cold Fusion' scam that dies away or it's the foundation for a Nobel Prize. I don't have an answer to which one it is yet," said David Hill, a data storage analyst with Boston-based Aberdeen Group.
Wonder which category he expects them to win in...
Physics, Chemistry, Economics, Physiology / Medicine, Peace or Literature
There is no Nobel category for pure mathematics, or computing theory.
... by compressing some VC's bank account, by a factor of greater than 100!
... real soon now, real soon".
"It was just data, you know," the sobbing wretch was reportedly told, "just ones and zeros. And hey - you can look at it as a proof of principle. We'll have the general application out
yes, we have no bananas
Quite the contrary: if they had claimed to be achieving 100:1 compression on truly random data, they would be provably talking total rubbish. Consider the number of possible bit strings of length N. Now consider the number of possible bit strings of length N/100. There are fewer of the latter, right? Therefore, if you can compress every length-N string into a length-N/100 string, at least two inputs must map to the same output. Hence, you can't uniquely recover the input from the output - and the compression cannot be lossless.
The fact that they hedge and talk about "practically" random sequences is the only thing that makes it possible they're telling the truth!
ZeoSync is not claiming to reduce random data 100-to-1. They are claiming to reduce "practically random" data 100-to-1, and Reuters appears to have misreported it. What "practically random" data should mean is data randomly selected from that used in practice. What ZeoSync may mean by "practically random" is data randomly selected from that used in their intended applications. So their press release is not mathematically impossible; it just means they've found a good way to remove more information redundancy in some data.
The proof that 100-to-1 compression of random data is impossible is so simple as to be trivial: There are 2^N files of length N bits. There are 2^(N/100) files of length N/100 bits. Clearly not all 2^N files can be compressed to length N/100.
The company's claims, which are yet to be demonstrated in any public forum...
Call the editors at Wired... I think we have an early nominee for the 2k2 vaporware list.
ZeoSync expects to overcome the existing temporal restraints of its technology
Ah... So even if it's not outright bullshit, it's too slow to use?
"Either this research is the next 'Cold Fusion' scam that dies away or it's the foundation for a Nobel Prize," said David Hill...
Somehow I think this is going to turn out more Pons-and-Fleischmann than Watson-and-Crick. Almost anytime there's a press release with such startling claims but no peer review or public demonstration, someone has forgotten to stir the jar.
When they become laughingstocks, and their careers are forever wrecked, I hope they realized they deserve it. And I hope their investors sue them.
I should really post after I've had my coffee... I sound mean...
OK,
- B
http://www.bradheintz.com/
- updated
Compression, after all, is removing all redundancy from the original data.
So, if there is no redundancy, there is nothing to remove (if you want to remain lossless).
When you use some text, you may compres by remving some letter evn if tht lead to bad ortogrph. That is because English (as other langages) is redundant. When compressing some periodical signal, you may give only one period and tell that the signal is then repeated. When compressing bytes, there are specific methods (RLE, Huffman's trees,...)
But, in all these situations, there was some redundancy to remove...
A compression algorithm may not be perfect (it usually has to add some info to tell how the original data was compressed). Then, recompressing with another compression algorithm (or sometimes, the same will do the trick) may improve the compression. But the information quantity inside the data is the lower limit.
Now, take a true random data stream of n+1 bits. Even if you know the value of the n first bits, you can't predict the value of n+1. In other words, there is no way that could allow the express these n+1 bits with n (or less) bits. By definition, true random data can't be compressed.
And, to finish, compression ratio of 1:100 can be easily archived with some data... take a sequence of 200 bytes at 0x00... It may be compressed to 0xC8 0x00. Compression ratio is really only meaningful when comparing different algorithms compressing the same data stream.
Any technology distinguishable from magic, is insufficiently advanced.
Take very large prime numbers and the like, huge strings of almost random numbers that can often be written as a trivial (2^n)-1 type formula. Maybe the massaging of the figures is simply finding a very large number that can be expressed like the above with an offset other than "-1" to get the correct "BitPerfect" data. I was toying around with this idea when there was a fad for expressing DeCSS code in unusual ways, but ran out of math before I could get it to work.
The above theory maybe bull when it comes to the crunch, but if it could be made to work, then the compression figures are bang in the ball park for this. They laughed at Goddard remember? But I have to admit, I think replacing Einstein with the Monty Python foot better fits my take on this at present...
UNIX? They're not even circumcised! Savages!
Is it possible, at all, to trust a company whose home page has silly javascript that resizes your browser window?
A thought just occurred to me: If you can do 100:1 compression and compress something down to, say, 2 bytes, what would 'ab' expand to? My thought is "ZeoSync Rulz, Suckas"
Using time travel, high compression of arbitrary data is trivial. Simply record the location (in both space and time) of the computer with the data, and the name of the file, and then replace the file with a note saying when and where it existed. To decompress, you just pop back in time and space to before the time of the deletion and copy the file.
I was thinking about submitting the ZeoSync release, and then I thought, nah, it's just fluff, no one will be interested... It's true that a press release is usually written by suits, not scientists, so you can't expect too much real meat - but "ZeoSync's approach to the encoding of practically random sequences is expected to evolve into the reduction of already reduced information" is a real winner; if you're "reducing information", it's not lossless compression! I smell a rat. The whole thing sounds like it could have been written by the Onion, for Crom's sake.
First off, they don't say it can compress "random data", they say it can compress "practically random data", which I would take to be everyday sort of data like audio and video. And they don't say that data can be compressed infinitely. _If_ whatever they have does work, I suspect it'll be an enlightening moment for the rest of us if/when they release the details of their algorithm. Sort of like, if the only thing you're familiar with is the bubble sort, quick-sort is almost magical. Well, maybe the current schemes of run-length-encoding, and whatever other pattern matching we do, is akin to the bubble sort and these guys have put their heads together and created the quick-sort of data compression.
I'm not calling it either way, but all the "It can't be done! The world is flat!" comments are so typically... well... slashdot.
They're looking for investment money?
Just think of it as an innumeracy tax on
venture capitalists.
Proposed: a method for reducing any file down to 16 bytes and losslessly restoring it.
1. Create an MD5 hash of the file.
2. Share it on a Peer-to-Peer filesharing client.
3. Delete the original file.
4. Find it again!
Note: in trials, this method seems to work best for Britney Spears songs and videos; further research is being done on how to restore Barry Manilow songs and videos, and what to do about hash collisions (bug those uppity MD5 people again).
pb Reply or e-mail; don't vaguely moderate.
For example, at the top of the list Dr. Piotr Blass is listed as Chief Technical Adviser from Florida Atlantic University. But he seems to be missing from the faculty. Google doesn't turn up much on the guy either. Hmmm.
I've not even had time to check the rest yet.
Please don't all do so at once though :-)
It's essentially a collection of lecture notes for a course on information theory and neural networks given by the author (David MacKay), but has been much expanded since I took the course in 1997. It will certainly show how any claim for a compression technique which works consistently on random data is bogus.
I don't recall any of this crap about pigeons flying out of boxes. Or am I getting old?
...richie - It is a good day to code.
So, if practically random data can be compressed, I can compress the result again, and the result again, until I end up with one bit of data in the end? That's great! Imagine the implications: for example, every ordinary lamp is now a computer, because it holds exactly one bit of data, on or off. No wait, that can't be right.
Like science? Comics? Wicked...
Funny By Nature
If they can compress "random" data 100:1 then they can compress _anything_ 100:1
Which begs the question: have they tried compressing the compressed data again to get 10000:1? If not, why not? If fact why not make the compression function iterate to get 100^n:1 compression?
Oh, I see. That's why. It's because this technology doesn't exist and never can. It's "ZeoSync vs Physics." I know where my money is.
-- MartinG To mail me: echo kewyjlcxyzvjfxbqwh | tr bcefhjklqvwxyz
Their claims are 100% accurate (they can compress random data 100:1) only if (by their definition) random data comprises a very small percentage of all possible data sequences. The other 99.9999% of "non-random" sequences would need to expand. You can show this by a simple counting argument.
This is covered in great detail in the comp.compression FAQ. Take a look at the information on the WEB Technologies DataFiles/16 compressor (notice the similarity of claims!) if you're unconvinced. You can find it in Section 8 of Part 1 of the FAQ.
--JoeProgram Intellivision!
If it's truly random data, this compression/decompression is actually VERY easy. Compression: Strip 99 bytes out of every hundred.
Decompression: Insert 99 random bytes in between every byte.
What's that? You want the SAME data back? Why does it matter? It's pure random data anyway!
Oh yeah. Have they announced a DE-compression routine yet? (I know "lossless" sort of implies that they have one, but I didn't see anything about decompression, only compression)
Marketing rubbish as usual.
Ahh - My eye!
The doctor said I'm not supposed to get Slashdot in it!
navigating through the flash rubbish you can reach a list of team members that includes steve smale from berkeley and richard stanley from MIT who both are existing senior academics.
so either someone has lent their names to weirdoes without paying attention or there is something of substance hidden behind the PR ugliness. after all the PR is aimed toward investors, not toward sentient human beings, and is most probably not under the control of the scientific team.
Dev elpizw tipota, dev phoboumai tipota eimai lephteros http://euclidian.org
(Of course, this DOES create all sorts of other problems, but I'm going to ignore those, because they'd go and spoil things.)
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Don't bother compressing it, just delete it, and then get an infinite number on monkeys on an infinite number of typewriters to re-produce the original.
I was wondering as I read the headline and summary on slashdot "how can these sleazeballs possibly promote this scam, because it would be easy to show counterexamples?" This shows, once again, that I lack the imagination and chutzpah of a real con artist.
The beauty of this scam is that zeospace claims that they can't even do it themselves, yet. They've only managed to compress very short strings. So, they can't be called to compress large random files because, well gosh, they just haven't gotten the big file compressor work yet. So, you can't prove that they are full of shit.
Beautiful flash animation, though. I particularly like the fact that clicking the 'skip intro' button does absolutely nothing -- you get the flash garbage anyway.
thad
I love Mondays. On a Monday, anything is possible.
The proof goes like this:
- Assume someone claims a compressor that will compress any X-byte message to Y bytes where Y<X
- There are 2^(8*X) possible messages X bytes long.
- There are 2^(8*Y) possible messages Y bytes long.
- Since Y is smaller than X, this means that no 1 to 1 mapping between the two sets can exist, because they're not equally large.
You see this simply if I claim a compressor that can compress any 2-byte message to 1 byte.There are then 65536 possible input-messages, but onle 256 possible outputs. So It is mathemathically certain that 99.7% of the messages can not be represented in 1 byte. (regardless of how I choose to encode them)
These claims surface ever so often. They're bullshit every time. It's even a FAQ-entry on sci.compression
This is along the lines of perpetual motion machines.
Every once in a while, some bozo claims to achieve ridiculous compression rates on random data. It's always bullshit meant to sucker in the gullible investors, or just to get some attention for some psycho loser who usually doesn't understand more than enough math needed to copy and deform a few compression theory equations out of a text book.
Skepticism is your friend.
Why are you letting these clowns ruin our country?
Ok, say I want to compress "foo" 100 times over:
bash$ for i in $(seq 1 100); do gzip foo; mv foo.gz foo; done
The company website is all Flash. Well, that blows my opinion of them completely. All glitz and no substance. That changed my opinion from 95% sure it was a pile of BS to 99.99%.
Need a Python, C++, Unix, Linux develop
If one finds a way to predict (i.e. compress) "random" numbers, then it is no long random. That means it has some deeper mathematical structure.
What could happen is that so-called "random" information in human cultural datasets are far from random and highly compressible.
Mathematical breakthrough from the same county that gave us the Butterfly Ballot Balyhoo? Hard to believe. ;-)
Anyway, they're still working on tiny "bit strings" due to not yet overcoming the "temporal contraint" barrier. So, don't get all excited just yet.
-- @rjamestaylor on Ello
[yoshi@ilp.ath.cx]# apt-get zeosync /dev/hda* HD_backup.zeo
[yoshi@ilp.ath.cx]# zeosync -compress
[yoshi@ilp.ath.cx]# ls
-rw------- 1 yoshi users 1 Jan 08 14:25 HD_backup.zeo
Oh, that's right never.
[windows users: the bold 1 would be the file size of all backed up partitions on the primary disk]
Get your Unix fortune now!
Seriously though, the comp.compression FAQ [faqs.org] is really worth a read, especially question #9 [faqs.org]
YES! Ditto. Seconded. Somebody mod this guy up.
Here's a bit to whet your appetite:
9.1 Introduction
It is mathematically impossible to create a program compressing without loss
*all* files by at least one bit (see below and also item 73 in part 2 of this
FAQ). Yet from time to time some people claim to have invented a new algorithm
for doing so. Such algorithms are claimed to compress random data and to be
applicable recursively, that is, applying the compressor to the compressed
output of the previous run, possibly multiple times. Fantastic compression
ratios of over 100:1 on random data are claimed to be actually obtained.
Such claims inevitably generate a lot of activity on comp.compression, which
can last for several months. Large bursts of activity were generated by WEB
Technologies and by Jules Gilbert. Premier Research Corporation (with a
compressor called MINC) made only a brief appearance but came back later with a
Web page at http://www.pacminc.com. The Hyper Space method invented by David
C. James is another contender with a patent obtained in July 96. Another large
burst occured in Dec 97 and Jan 98: Matthew Burch applied
for a patent in Dec 97, but publicly admitted a few days later that his method
was flawed; he then posted several dozen messages in a few days about another
magic method based on primes, and again ended up admitting that his new method
was flawed. (Usually people disappear from comp.compression and appear again 6
months or a year later, rather than admitting their error.)
Other people have also claimed incredible compression ratios, but the programs
(OWS, WIC) were quickly shown to be fake (not compressing at all). This topic
is covered in item 10 of this FAQ.
At least his scam was believable enough to fool a thousand people. ZeoSync got to choose a more believable scam to beat a 17 year old.
If they're talking about compressing what you find in a typical user's documents, or perhaps executable programs, it's *possible* that there's enough redundancy to come up with that kind of savings.
/dev/random, I flat-out don't believe it.
If they're talking about 100:1 compression of a pile of bytes out of
-jcr
The only title of honor that a tyrant can grant is "Enemy of the State."
- Transforming the data to a complex vector space, C^n if you will.
- Using some very complicated seed and algorithm to generate randomish data in this complex domain that approximates the transformed data.
- Investigaiting the differences, and storing the differences with a "complex combinatorial series".
Yes it sounds like crap but it's not as empty as social texts.I am in no way a compression specialist but: ZeoSync intentionally randomizes naturally occurring patterns to form entropy-like random sequences through its patent pending technology known as Zero Space Tuner(TM).
;-)
in this phase we are going to randomize the hard work you want to send over the internet, effectifly destroying it (unless you have the seed ofcourse)
Once randomized, ZeoSync's BinaryAccelerator(TM) encodes these singular-bit-variance strings within complex combinatorial series to result in massively reduced BitPerfect(TM) equivalents
Now it's going to find patterns in the so called "randomized" data, and probably writing those down, now irreversibly destroying your data...
s. The combined TunerAccelerator(TM) is expected to be commercially available during 2003.
and they are putting it off for a year too.... hmm...
"By significantly reducing the size of data strings, we can envision products that will reduce the cost of communications and, more importantly, improve the quality of life for people around the world regardless of where they live."
Jezus! these guys are geniuses!!! better compression REDUCES the cost of communications... damn... I wonder what else they envision?? that the files will be smaller too???
my conclusion
we can randomize any string, store 1 byte, then generate another random string... which, because it is random has a snowballs chance in hell of being the same
correct me if I'm wrong, but this really seems to be a load of crap to me. Plus they use WAY to many buzzwords
Fighting for peace is like fucking for virginity
What does the article mean with "random data"?
1) Data with maximal entropy?
2) Random file picked from Internet?
In case of 1), I'd say the article is crap. If bits in the data have absolutely no dependency between them, i.e., redundancy, (also between non-adjacent bits) it is absolutely impossible to compress them. It's not even good as a fairy tale.
In case of 2), ok, 1:100 may be possible for most non-compressed data. The new JPEG-2 algorithms can do 1:100, but it's lossy. Text compression algorithms might do 10:1 on typical text, but they are also quite fast and don't therefore find all redundancies. For example, Huffman encoding is at simplest done with just single characters, and not much longer sequences, the searching of which takes a lot of time. The redundancies do not also have to be linear; for example "wDoRrOdW" ('word' written first in lower case, and then with upper case to opposite direction) would be difficult to compress completely, although it clearly has high redundancy.
Removing all redundancies would require finding the shortest description, i.e., a program that prints the string. To find it, we have to go through all possible programs that are shorter than printf("wDoRrOdW"). Many of them don't even terminate (for example "while(1);"). Complete search is therefore impossible; all algorithms make guesses about the topology of the search landskape, and don't search everything.
I have absolutely no doubt that this method works well within the theoretical limits, albeit it's of course always possible that it verges the limits closer than any earlier methods.
While this theorhetically could work to reduce messages down upwards of 100:1 compression, both the compression and the decompression would require huge resources of computer CPU time for a message of any reasonable length. Even if you had pre-built a table of 'short unique-prime-factors integers' to make finding the optimal composite to send back, you'd still have to generate some huge N-digit number, and then the decoder would have to be able to recalculate that N-digit number from the prime representations.
So while I'm sure this is possible, computing speeds are no-where near close enough. And it would appear this company is trying to vie it for use in compressing internet traffic. Maybe on 512-byte messages they can get something, but I doubt if it's anything close to effective for internet use.
"Pinky, you've left the lens cap of your mind on again." - P&TB
"I can see my house from here!" - ST:
The flaw here is simple,
When you reorganize the string of data, and sort by value, you must retain information on how to restore the string to its original order. There is no effiient way to save this "undo" information without negating the benefit gained from compression.
For example:
Given a series of random numbers: 34, 8, 244, 127
If you reorganize them by value: 8,34,127,244
You can create redundancy if the string is large enough - for 8 bit values, a string of 25,600 values should produce a lot of repettition - in this example, there would be an average of 10 repetitions per value (10*256=25,600).
This is nice until you try to decompress the file. Without a record of how to reorganize the values, you are left with junk.
Even if you keep a record with info for reorganizing the data, the overhead needed to store the undo info outweighs the compression benefit.
If you did find an efficient way to to store the undo information, it would be more effective to simply apply this algorithm directly to the random data!
X
ZeoSync's HTML site will be available January 13, 2002 with costumer service agents providing chat assistance.
So they have a set of professionals in charge of "dressing up" their technology? Isn't that normally called the marketing department?
Our patent-pending technology ReductioAdAbsurdum (TM) will likely be infringed upon by this new technology, so rest assured that our lawyers will scrutinize their compression algorithm closely.
My car gets 40 rods to the hogshead, and that's the way I likes it!
First of all, it's impossible to acheive any type of ratio on random data. Good quality random data such as that from random.org simply can't be compressed. Period.
Data compression works by finding patterns in seemingly random data. A standard video stream really doesn't contain that much unique information. That's why we can compress it pretty well without loosing too much data. However, random data is 100% unique and you must have, say, 8 bits representing 8 bits because there is no other way to represent it without loosing information.
The claims by this company are impossible. I read their technical description and I'm still trying that around in my head. It doesn't make sense. It's called the rule of limited entropy and no data compression breakthrough can break it. You can't just make data appear out of thin air.
Is it just me, or is this another company looking to swindle over a few VC investors? The only type of program I see here is the lie, buy, and sell high kind -- I don't buy it.
"I'll just chip in a bit for RedHat: I actually have that installed on my university machine." - Linus, '95
There is one method that might work - on sum data.
:(.
Godel encoding is an old technique for compression, with a fast decompress (P time). Unfortunatly the compression statge is NP (Maybe NP-Hard, can't remember).
The method relies on expressing the number as an algebraic product, that can be expressed in less space than the result.
For example, in ASCII, the string (in RPN) "7 7 ^ 34 * 99 ^ - 7 p" has 18 characters. It's expansion has 740 characters. That's a compression ratio of, what, 35:1. [Ok, so you'd never actually do it in ASCII, but it shows the technique]
The advantages of the technique are that it gives better compression on larger numbers, in principle. In general, however, other factos come into play, and it bottoms out. My analysis suggested it bottoms out somewhere around 120-150:1.
However, the disadvantaged of the scheme are numerous. Firstly, there is no known algorithm to encode efficently. The system can't stream, like gzip and LZW can. Thus far, it's just an interesting idea.
I mention this because the mult-dimensional mathematics that they are reffering to have a passing similarity somthing I was playing with a couple of years ago, to look for faster algorithms (or any, really, other than brute force). It was cute, but always slower than brute force, save a few best cases
If I put my best guess ot max compression together with the uncanny similarity of the maths. Namely, you to a split into some expression, and then re-apply the algorithm to a sub expression. Then , throw it through a symbolylic computation routine, to optimise it a bit, and gzip the whole lot. It would only work well on some numbers, but you can pad it slightly to get a very different number, and try again until you get a good fit.
So stepwise:
ZeoSync intentionally randomizes naturally occurring patterns to form entropy-like random sequences through its patent pending technology known as Zero Space Tuner
Pad to a value that gives good compression
Once randomized, ZeoSync's BinaryAccelerator encodes these singular-bit-variance strings within complex combinatorial series to result in massively reduced BitPerfect equivalents
Godel encode.
The refference to may iterations suggest that they reapply the process to any large enough numbers left in the expression.
And that's a scary match, in my mind.
Of course, pinch of salt. There was a comment above about the odds of any compression thecnology being vaild being equal to teh claimed compression rate. I can't see how this might work. But I'm not writing this off just yet, it rings just true enough.
> Why couldn't it be possible to have the a single algorythmic solution that works on the entire dataset simultaneously?
Because if you have a "perfect" compression algorithm, then it is not always reversible. That's because it maps a larger set of files (all files of N bit length) to a smaller set of files (all files of N-1 bit length, or whatever). Therefore, the mapping can't be an injection (some two or more large files are mapped to the same small file), and therefore not reversible. So you can't uncompress the data.
But fortunately, you can always get by with a single bit of constant overhead. Simply set that bit to 0 if the rest of the stream is compressed, to 1 if it is uncompressed. Now, if your algorithm produces a larger file than the input, just leave it uncompressed and you have only lost 1 bit.
The argument about no "perfect" compression algorithm existing is overrated, IMO., though people like to point it out whenever a compression algorithm pops up. Of course a press release wouldn't mention that they sometimes increase file sizes by 1 bit!
(I still do think their technology is bullshit, though.)
It's true that you need to find patterns to compress data. What constitutes a pattern though, can be more complicated than what gzip offers.
For instance, I can come up with a number of statistically random sequences that can be compressed very small if you "know the pattern", but will fail completely to be compressed with gzip. For example, I could take 11 MB of the binary digits of pi -- a very short program can produce these, but gzip will totally fail in compressing it. Or I could encrypt 11 MB of zero bits with RC4; if I know the key then it is also extremely easy to compress -- otherwise, it will be nearly impossible.
So the art, really, is in finding the patterns. I'm pretty sure that ZeoSync's stuff is bullshit, but it doesn't *necessarily* mean that this kind of thing is not possible (just... unlikely).
If you look at this sequence as a one-dimensional series: 00101101, it's pretty hard (at least for a processor) to distinguish a pattern there... it's a pseudo-random sequence. But if I paint it this way, in 2d: (0,0) (1,0) (1,1) (0,1), I can step back and see a square with sides of length one.
AFAIK, what these people are claiming is that they've developed a way to step WAY back, to n-dimensions, and have patterns emerge from seemingly random data.
It's not the random-number generation that's significant here... it's the purported ability to compress a seemingly random sequence. RLE typically doesn't fare very well with pure random data because it only looks for certain types of redundancy.
If I haven't missed the boat here, it's really a very interesting achievment.
That's just amazing! Let's test it. Here's an idea of a pretty good test :
I'll prepare 257 files containing random data, which are each 100 bytes in length. Then, they'll be able to compress each of those files into a corresponding lossless compressed file which is one byte long! (Remember, this is supposedly 100:1 lossless compression of random data.)
Oh, wait a sec... How can they possibly represent 257 different files, with only one byte each? That one byte can only represent 256 different possible values!
What about if the files that I asked them to compress were only 2 bytes in length, instead of 100 bytes in length? Still, 257 of them. Since they claim to be able to do 100:1 lossless compression of random data, they should be able to do 2:1 lossless compression of random data. I mean, that's 50 times less impressive! But, wait... They still have to express 257 different files with only 256 different possible values!
Huh... How many different files are 2-bytes long? I guess there's 65,536 of them. I only wanted them to compress 257 different files each into a byte. The task of compressing 65,536 different files into one byte is almost 256 times harder than what they already can't do!
This is starting to sound like a theorem, or something!
Education is the silver bullet.
3.14159265...
2.71828182...
1.41421356...
I can understand people at Reuters being trolled by this crap, but Slashdot too? Wow.
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
Perhaps another kind of breakthough could be made by leveraging the internet for the keyspace used in your compression. (Okay, I might not have the terminology quite right... that was one of my friend's realms of interest.)
The idea is that you have a token that is given to a remote server, which sends back a stream of data. As long as the tokens were significantly shorter than the data provided, then the observed local compression would be highly significant.
Or, put another way, you're NOT storing data on a remote server. But a remote server has a very well developed library of token/data combinations. So, when a client sends a stream of tokens to this server, they get the original stream of data back (even though the stream of data, itself, isn't recorded in whole at the server).
Again, not for random data. And perhaps better if the tokens at the main server were geared to particular types of data with a different tokenspace for each.
Is this idea very silly, or very good?
in There's Something About Mary. These guys will be in great shape until someone claims 200:1 compression. Then it's back to the claims drawing board.
-
For every data set that is compressible 100:1 (which I will grant them.. even a fool can do that), there are 99 which grow larger or the compression fails entirely.
So, they have figured out a way to compress difficult-to-compress data rather well, but cannot compress easy stuff that LZW works on? Rather dubious, but I'll eat my words with a smile if they can put all the Star Trek episodes on a floppy disk.
Any connection between your reality and mine is purely coincidental.
If it means they compress arbitrary random data it is just bullshit. It is easy to prove that there exists some file that will not be compressible, and not much harder to prove that actually there are many more uncompressable random files than compressable ones (read any text about kolmogorov complexity). But of course most computer files are not at all random. Compressing a *randomly picked* computer file is something different altogether therefore, but it still hard to guarantee a certain compression if the type of information stored in the file is not known. Thats the reason why different compression algorithms for different file types exist. All in all their claim is too fuzzy to say anything ... better compression is a certain thing of the future, guaranteeing compression for random files is just another cold fusion hoax.
Now here's the interesting part: they used to spell his name right in a previous version of their official bios section. This could just be sloppiness, of course.
Babar
It's over here (Question 9, search for 'WEB, Gilbert').
"If he thinks he can hide and run from the United States and our allies, he's sorely mistaken." Bush on bin Laden
If you're talking about compressed video over uncompressed. A typical DVD movie would be 720 (horizontal) x 480 (vertical) x 16 (bit, YUV2) x 29.97 (fps) x 6300 (seconds in a 105 minute movie) / 8 (bits pr byte) = ca. 100 gigabytes. In reality you'll get it as 5-6 gigabytes, while as a divx 2-pass (or similar mpeg4-codec) you will reach 100:1, at very little quality loss. Of course this is only possible because movies are *very* non-random both in each frame, and in one frame to the next.
Kjella
Live today, because you never know what tomorrow brings
The binary representation of pi contains all sequences, so it is claimed.
If only we could predict what the Nth bit of pi was going to be, then we could just specify an offset into the bit sequence and a length and we could have any file compressed as two numbers.
One of the numbers would be pretty large, though... It could easily be as big as the bit representation of the file, but hey, who cares??
It's still a possible algorithm. These ZeoSunc people don't seem to care about practical algorithms either...
Gimme some VC money!!!
Back in 1991 or 1992, in the days of 2400 bps modems, MS-DOS 5.0, and BBS'es, a "radical new compression tool" called OWS made the rounds. It claimed to have been written by some guy in Japan and use breakthroughs in fractal compression, often achieving 99% compression! "Better than ARJ! Better than PKzip!" Of course all my friends and I downloaded it immediately. Now we can send gam^H^H^Hfiles to each other in 10 minutes instead of 10 hours!
Now I was in the ninth grade, and compression technology was a complete mystery to me then, so I suspected nothing at first. I installed it and read the docs. The commands and such were pretty much like PKzip. I promptly took one of my favorite ga^H^Hdirectories, *copied it to a different place*, compressed it, deleted it, and uncompressed it without problems. The compressed file was exactly 1024 bytes. Hmm, what a coincidence!
The output looked kind of funny though:
Compressing file abc.wad by 99%.
Compressing file cde.wad by 99%.
Compressing file start.bat by 99%.
etc. Wait, start.bat is only 10 characters, that's like one bit! And why is *every* file compressed by 99%? Oh well, must be a display bug.
So I called my friend and arranged to send him this g^Hfile via Zmodem, and it took only a few seconds. But he couldn't uncompress it on the other side. "Sector Not Found", he said. Oh well, try it again. Same result. Another bug.
So I decided that this wasn't working out and stopped using OWS. Their user interface needed some work anyway, plus I was a little suspicious of compression bugs. The evidence was right there for me to make the now-obvious conclusion, but it didn't hit me until a few *weeks* later when all the BBS sysops were posting bulletins warning that OWS was a hoax.
As it turns out, OWS was storing the FAT information in the compressed files, so that when people do reality checks it will appear to re-create the deleted files, as it did for me. But when they try to uncompress a file that actually isn't there or has had its FAT entries moved around, you get the "Sector Not Found" error and you're screwed. If I hadn't tried to send a compressed file to a friend I might have been duped into "compressing" and deleting half my software or more.
All in all, a pretty cruel but effective joke. If it happened today somebody would be in federal pound-me-in-the-ass prison. Maybe it happened then too...
(Yes, this is slightly off-topic, but where else am I going to post this?)
LAMP hosting on Debian, SSH, no bandwidth cap, PayPal accepted - http://secondbrainhosting.com/
...was if they were powered by Blacklight Power. If you're not in the know, they're a "power company" run by a "scientist" who claimed that he had been able to reproduce something that sounded suspiciously like cold fusion in his Princeton, NJ-area laboratories. The Village Voice ran a story on them (where I read about these jokers) and a whole slew of investors were lined up (in the heady days a few months before the dot-com bubble popped) and last I checked, they still haven't actually, you know. Produced what they said they would two years ago (power).
If you've got a slow afternoon, take a gander at what physicists have to say about Blacklight...
Easy does it!
This comment has been submitted already, 276865 hours , 59 minutes ago. No need to try again.
I think the following statement in the press release pretty much says it all:
>We perceive this advancement as a significant
>breakthrough to the historical limitations of
>digital communications as it was originally
>detailed by
>Dr. Claude Shannon in his treatise on Information
>Theory."
How about algorithmic information theory? Kolmogorov, Solomonov, Chaitin? The statement above indicates that the most recent word on compression is an old Bell Labs tech report by Claude Shannon... not to put Shannon down, that work *is* a landmark, but there has certainly been more work done since.
Try compressing the number Pi using Shannons theory... you can't do it. On the other hand, using Kolmogorov complexity, you can compress it quite nicely.
The fact that this statement appears in the press release seems to indicate a great deal of ignorance on the part of this corporations researchers. Part of any good research program is to familiarize yourself with previous work done in the field... and AIT is *not* some obscure backwater idea... there are several conferences on this topic every year and just about every CS graduate student has seen at least Kolmogorov complexity.
This is a pretty serious credibility robber. (Not to mention that from a mathematical standpoint, compressing totally random data is impossible under our current axioms... so if we *can* compress completely random data... its time for a new theory of the foundations of mathematics. At the risk of sounding dogmatic: do you *really* think some dot-com startup is capable of this?
Perhaps they are, but I'm going to need to see the proofs written up nice and formally before I run out and buy snake-oi... I mean *stock*.)
ZeoSync: Ladies and gentleman - observe! The random data goes in THUS, and run through our process, comes out 100 times smaller!
ZeoSync: Now, we carefully unpack and - volia! random data of the same size as before! This is due to our patented process and a little bit of magic we like to call "length of file stored in the header".
Investor: Hey - those first few bytes from the original and uncompressed file look totally different!
ZeoSync: Those bytes are in there somewhere - we only said LOSSLESS compression, not ordered!
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Next you'll be publishing stories about squaring the circle and trisecting an angle with straight edge and compasses. Claiming to be able to compress random data is the oldest joke in the CS book and you fell for it.
-- SIGFPE
The output from a pseudo-random number generator is usually considered "random enough for practical purposes." So if you define "practically random data" as "data that is random enough for practical purposes," you can compress it by storing the random seed and the string length. ;-)
I think I can beat their 100:1 compression ratio with this scheme.
To get something done, a committee should consist of no more than three persons, two of them absent.
100:1 ratio? On random data?
Considerations far more elementary than Shannon's limits rule out compression of statistically random data by even a single bit. Here's why:
There are 2^n bit strings of length n. Any compression method purporting to compress random strings (by even a single bit) must produce output of length at most n-1 for these 2^n inputs. But in that case the mapping is not unique, since there are only (2^n)-1 bit strings of length n-1 or less. (So decoding is not possible.)
Once every so often some "researchers" claim to have attained the holy grail of compression. Too bad we never hear of them again
From the comp.compression faq
this topic has generated and is still generating the greatest volume of news in the history of comp.compression
The advertized revolutionary methods have all in common their supposed ability to compress random or already compressed data. I will keep this item in the FAQ to encourage people to take such claims with great precautions
Yes it does.
Compressing random data is impossible!
I used to bulls-eye womp-rats in my pants
The reporter wrote glowing things about how when he decompressed his files, they had the right size and timestamp. There was a small matter of the contents being wrong, but the company had assured him that this was just a small glitch in the beta version that would be fixed in the final release.
I can imagine that some junior reporter might fall for this, but where the heck was the editor?
I imagine that the whole stunt was probably part of a scam to defraud some investors. Get it published in a magzine, and it must be legit, right? I wouldn't be the least bit surprised if this new "lossless compression algorithm" proved to be such a scheme.
BYTE went seriously downhill around 1985 or so. A friend seems to think that it was a result of Steve Ciarcia moving on, but I don't think that fully explains it. Before that, there were plenty of technical articles by other authors, but BYTE turned into a rag full of mostly non-technical reviews.
Note the results are "BitPerfectTM", rather than simply saying "perfect". They try to hide it, but they are using lossy compression. That is why repeated compression makes it smaller, more loss.
"Singular-bit-variance" and "single-point-variance" mean errors.
The trick is that they aren't randomly throwing away data. They are introducing a carefully selected error to change the data to a version that happens to compress really well. If you have 3 bits, and introduce a 1 bit error in just the right spot, it will easily compress to 1 bit.
000 and 111 both happen to compress really well, so...
000: leave as is. Store it as a single zero bit
001: add error in bit 3 turns it into 000
010: add error in bit 2 turns it into 000
011: add error in bit 1 turns it into 111
100: add error in bit 1 turns it into 000
101: add error in bit 2 turns it into 111
110: add error in bit 3 turns it into 111
111: leave as it. Store it as a single one bit.
They are using some pretty hairy math for their list of strings that compress the best. The problem is that there is no easy way to find the string almost the same as your data that just happens to be really compressable. That is why they are having "temporal" problems for anything except short test cases.
Basicly it means they *might* have a breakthrough for audio/video, but it's useless for executables etc.
-
- - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
A similar technique works with the output of drand48() and in fact for a long enough sequence this approach works with every random number generator algorithm available today.
In fact here's the compressed file for the rand() case:
int i;for (i = 0; i<1000000; ++i) printf("%d\n",rand());
Use gcc as decompressor.
-- SIGFPE
Scroll down to Incredible Claims for descriptions of the last four scams like this. Remember Pixelon?
1:100 average compression on all data is just impossible. And I don't mean "improbable" or "I don't belive that", it is impossible. The reason is pigeon hole principle, for simplicity assume that we are talking about 1000bit files, although you can compress some of these 1000bit files to just 10bits, you cannot possibly compress all of them to 10bits, as with 10 bits is just 1024 different configurations while 1000bits call for representations of 2 different configurations. If you can compress the first 1024, there is simply no room to represent remaining 2-1024 files.
So every loseless compression algorithm that can represent some files with other files less than original in length must expand some other files. Higher compression on some files means number of files that do not compress at all is also greater. Average compression rate other than 1 is only achiveable if there is some redundancy in original encoding. I guess you can call that redundancy "a pattern." Rar, zip, gzip etc. all achieve less than 1 compressed/original length on average because there is redundancy in originals : programs that have some instructions, prefixes with common occurance, pictures that are represented with full dword although they use a few thousand colors, sound files almost devoid of very low and very high numbers because of recording conditions etc. No compression algorithm can achive less than 1 ratio averaged over all possible strings. It is a simple consequence of pigeon hole principle and cannot be tricked.
Gentlemen, you can't fight in here, this is the War Room!
Bear with me for a moment. This kind of 'compression technology' is EXACTLY the kind of thing the MPAA has been dreading. Imagine millions of people on Morpheus trading 5MB copies of The Matrix, Star Wars and everything else. Of course it's a hoax, but if they can keep it up long enough, then maybe they'll get bought out by the MPAA, RIAA, or whoever!
ZeoSoft is ushering in the business model of the new millenium - fooling the tech-illiterate elite of today's content cartels into buying them out, then laughing all the way to the bank! I applaud ZeoSoft for their initiative, and hope to see other such business ventures in the future.
Now, if you'll excuse me, I'm off to develop a program that uses fractal-temporal equations to randomly generate sequels to popular movies! (hint, hint)
[PowerPoint] is a tool for capitalist presentation
There are 2^N bit sequences of length N. There are 2^M sequences of length M. If M<N then 2^M<2^N. So you can't represent all sequences of length 2^N using sequences of length 2^M. You can't even represent most sequences of length N using sequences of length M. It doesn't matter if you can visualise infinite dimensional spaces with pretty purple knobs on. You can't have an algorithm that packs most sequences of length N into M bits.
-- SIGFPE
We have three native Polish speakers in my office. I asked one of them to translate the professor's reply. She said the gist of it is that he was upset they released his name, he didn't authorize any information release, etc. Apparently didn't deny or confirm the truth of the information but said something about having "more important things in my career" or something like that (not verbatim quote).
I know I'm posting late, but I hope someone reads this and comments.
I've had this recurring thought in my head regarding compression that I haven't been able to prove/disprove.
Disclaimer: I know absolutely nothing about compression other than what commons sense tells me.
Now for my theory: Is it possible to make an analysis of a whole lot of data from a whole lot of sources for certain period of time. Let's say I log every single bit of data that comes and goes from, say, AOL's network. I then run an analysis of the data and come up with, say, the 5 million most used 8-byte strings. You probably want to play with the string sizes and number of strings to see what makes mathematical sense. You then keep a copy of the 40MB indexed string dbase on every internet node, or at each end of a slow link, or whatever, and then run all incoming and outgoing data through a program that trnaslates index references with actual data.
Would that work? since a 5 million entry index requires a 3 byte key to acces an 8 byte string, would I get a 3/8 lossless compression on top of whatever's in place right now, whenever I hit an indexed string?
There are two kinds of people in the world: Those with good memory.
Truly unbreakable encryption has existed for many years: the one-time pad . The problems of unbreakable encryption aren't the theory, but the practice. (If you want truly secure communications among n people who each transmit x bytes of data through the group each day, how will you securely generate n*(n-1)*x bytes of random data each day, and securely distribute it to each of them?)
Never play leapfrog with a unicorn. Or a juggernaut.
I have an infinite amplifier; I can sell it to you now. It has infinite gain, and infinite input impedence. Unfortunately, it has to rely on real power supplies, since I do not have an ideal power supply. Funny thing is, it always outputs the rail voltage.
Even Slashdot wants to hide some things
Maybe it's another implementation of RFC1149?
sulli
RTFJ.
All I need is the random seed and the random number generation forumla....
A string of 'random' data that can be generated by a seed and an algorithm is pseudo random. For certain applications pseudo random is good enough, and it is used all over the place - from picking the next block in a tetris game to generating cipher streams.
Truly random data is an entirely different beast.
If J.K.R wrote Windows: Puteulanus fenestra mortalis!
If you read the Reuters article carefully, it does not say a digital -> digital compression of 1:100, but implies a better way of encoding / compressing digital -> analog -> digital, with the analog bandwidth being much greater than today.
Thats all the stuff where they talk about Dr. Claude Shannon and information theory. (They could have been clearer about it, but that's PR flacks for you.)
examine the quote
'"What we've developed is a new plateau in communications theory," St. George said. "We are expecting to produce the enormous capacity of analog signaling, with the benefit of the noise-free integrity of digital communications."'
Sounds like they are trying to shove more data into an analog stream, using wacky math, than would normally be allowed.
rbb
Without reading their website, the claim MUST BE FALSE.
The proof is simple.
Suppose we have a 100 bit message. There are 2^100 different messages. Suppose you can compress them on average to 98 bits. Then there can only be 2^98 compressed messages. We lost a couple along the way!
This proves that if you compress SOME messages you will also have to make SOME longer. Not by much, but at least a little. (prepend 1 if "not compressable" prepend zero to the "compressed data stream" and you have a "worst case expansion" of "one bit")
Now compressing normal data is easy. There are a lot of repeats, and other redundancy. So the normal case is that you can compress them. The bad news is that if you enumerate ALL 100-bit messages, ALL compression methods are going to need on average 100 bits or more. This is pure mathematics.
The 2^100 number is a number that is quite large, but if you start talking about compressing a megabyte of data, then I'm already talking about enumerating all 2^8000000 possible messages. That is a thought experiment. But the argument still holds.
-----
I read their pressrelease. It's buzzword compliant bovine excrement. They will attract money and pay the existing people large salaries as long as they
can keep up the charade.
Oh, and they have placed a tactical "practically" in front of the word "random". I can compress "practically random data" by enormous amounts.
If you take the MD5 hash of the string "hi there", and feed that back into the MD5 function, you can generate an endless stream of "practically random" data. Take the first 1Mb of this "practically random" data.
I compressed 1Mbyte of data into the 212 bytes of the previous paragraph! However this is not possible if I let someone else generate the random data any way he pleases, and then have to compress it. They can claim to be "technically correct" up to a point due to this phenomenon....
Roger.
Sounds like fractal compression to me.
The fractal transform that Barnsley's products use is merely vector quantization, mapping each 8x8 pixel block of an image onto a 4x4 pixel block of a reduced version of itself, plus an RGB offset for DC. It begins to converge to the desired image after a few iterations of the transform.
Will I retire or break 10K?
Exactly! But until you make that connection, it may as well be random!
I'll assume by "sequences" you mean "random sequences" because otherwise you are saying that lossless compression is impossible. =) I agree with you otherwise, given one crippling constraint: that you can't observe your data except as one-dimensional binary numbers.
After re-reading this press release a few times, I don't think these people have really accomplished much. Bear with me and I'll flesh out the point I'm trying to make - if someone could find a way to do this, I think it could work.
With PURE random data, this won't work. Why anyone would want to transmit gigabytes worth of pure random data is beyond me. A signal worth compressing isn't going to be purely random. It may look like it, but there is some information there. This is why signal processing people use random processes to model signals. Not because their signals are completely random - but because - given enough samples - they look like specific random processes (Gaussian, Rayleigh, Rician...).
Now, the technique I'm thinking of would do something along the lines of take a pseudo-random process and map it to an n-dimensional space. An algorithm then searches this space for (even just) simple patterns. Suppose it finds ten equally spaced points along a "line" in 12-dimensional space. That's 120 bytes that can be reduced to a significantly smaller vector (plus an offset to aid reconstruction in the right place), no?
I don't know... would this work? I think so. Would it be feasible given existing computing power? I'm not so sure...
These kinds of compression claims are the perpetual motion machine of the information age. Actually, they are less plausible than perpetual motion. For perpetual motion, there is at least the (very remote) possibility that there is some kind of undiscovered physics. Impossibility statements in compression only hinge on mathematics, with no physics or experiments needed.
It's called PSLQ lattice reduction...
h tm l
i .h tml
You can get the details here...
http://www.mathsoft.com/asolve/plouffe/plouffe.
http://www.lacim.uqam.ca/plouffe/Simon/articlep
Note: this goes quite a long ways to showing that conventional wisdom about pi being random digits isn't actually true... Pseudo random is more like it...
However, it isn't really applicable to this multidimensional compression nonsense since the counting argument still applies.
Suspiciously, this looks to be similar to what the fractal folks were pushing in the '80s if you replace gems with iterators... Every once in a while you have to change the color of your snake oil label to confuse the masses...
-slew
...and bet that they meant "arbitrary data" rather than "random data". After all, who would want to compress random data? What possible benefit could there be to such a thing?
This is a common idea, and it might seem like it would work. However this idea still fails to take into account the counting argument. For example, if the seed is limited to 64 bits, this
algorithm can generate at most 2^64 different files, and thus is unable to compress *all* files longer than 8 bytes
I used to bulls-eye womp-rats in my pants
This is more like Usenet Crank Robert E. McElwaine who published lots of articles with his (capital-preserving) tagline "UN-altered REPRODUCTION and DISSEMINATION of this IMPORTANT Information is ENCOURAGED."
And that may be giving them more credit than they deserve - it looks like a compression algorithm designed for use on digital wallets....
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
blah blah blah DISSERTATION blah blah blah
This word in all caps is used in doctoral programs at all universities I know of. Give me my PhD, right?
A signal worth compressing isn't going to be purely random
Absolutely, and that's why lossless compression works in practice.
But the thing you're describing is bogus. Look, take random data and look at it in a complicated enough way then you're sure to find patterns that can be compressed out. But you'll find that you'll also have to describe the complexity of your way of looking at it and that'll take up the same amount of space as you've just compressed out. That press release is 100% bogus. It's not even slightly real. Have you seen how many universities they claim to collaborate with? It's merely a scam to make money out of venture capitalists.
The way you speak, eg. putting scare quotes about the word "line", suggests that you're not comfortable dealing with multi-dimensional spaces. The SF connotations suggest something cool and esoteric to get venture capital cash. Those of us who actually work with these things every day know there's no reason to see compressible patterns if you start embedding things in high diemnsional spaces. People who do things like wavelet and DCT compression techniques quite happily represent data for compression in very high dimensional spaces. But there's no magic and certainly no way to to things that are provably false.
Would it be feasible given existing computing power?
It wouldn't be feasible with any computing power.
-- SIGFPE
In other news, the company which managed to remove redundancy from pure entropy also managed breaking the absolute-zero barrier. It was previously thought that you couldn't make something colder if it already had zero heat in it. But apparently this is not the case, according to ZeoSync.
I believe that Deborah Tannen pointed this up as a key problem in our society, as the fallacy of "false duality", the notion that because there are two differing points of view that they are both worthy of attention.
You say that "at best, this is revolutionary" but this is like saying "I have a great plan! Everyone takes off their shoes, switches them around, and somehow everyone winds up with a bigger pair! *At best*, everyone gets bigger shoes!" Well, no, just because someone's floated the fantasy doesn't mean it's even a vague possibility. These people are selling snake oil; it can be proved at home. To entertain their fradulent notions simply because they bring them up is a mistake.
If people are to respect the law, perhaps the law should begin by respecting the people.
Then you could take the output files from this compression scheme, which would be pretty uncompressable by traditional methods, and run THEM through the very same compression scheme, and make them smaller still. Repeat ad infinitum, and reduce all the data in the universe to one small file.
Better yet: To use your 10 bits example, feed every one of the 1024 combinations into the decompression program, and one of them is guaranteed to represent all the data in the universe. That's only a handful of combinations, we should be able to check them all before dinner. When someone decompresses the right 10-bit code, call me, since my phone number must be in the data somewhere.
There is a way to make compression like this work - for each string you want to compress, there's a compression program that losslessly compresses it to an arbitrarily short output string (one bit is fine...), but if the output string is N bits long, the program only works for 2**N input strings, and in general requires SIZE(INPUT) bits of program per input string (though for non-random strings, or for related strings, you can do better.) In other words, it's not useful for general-purpose compression, but you can use it for special-purpose compression - you can't design a small compression program to perfectly describe "Alice"'s or "Bob"'s appearance, but you can design a small program that outputs "Alice", "Bob", or "Somebody else".
Similarly, with pigeons, you can play Hundred-Pigeon Monte, and attract investors to your company, or use this to attract customers for your other products, or have a big crowd on the street intently watching you play hundred-pigeon monte with your shill while a pickpocket walks around behind the crowd.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
1.500.000.000 basepairs [are all the DNA coding for protein in the human genome]. So a 1.5 Gb file is enough to encode an entire human being.
The protein-coding chromosomal DNA is very, very far from encoding an entire human being. You've also got the DNA that controls which proteins are expressed (some unknown portion of that other 95% of the chromosomes), mitochondrial DNA, environmental effects during your whole life, and most of all some billions of neurons, each with up to a hundred semi-randomly connections to other neurons. No one yet has come anywhere near to giving a computer the equivalent of the life experience stored in your neurons. (Or at least my neurons -- some people never learn...)
Even more importantly, however, is that their "Technical Information" reeks so strongly of buzzwords and technobabble it's hard to read it without the urge to hold my nose. This alone discredits their entire proposition. I feel like I've just been subjected to corporate brainwashing ..er.. I mean marketing.
A solution to the problem with music today
It is possible to create "Infinite" compression, but it works like the laws of quantum mechanics, i.e. you never really get what you want. Here, I'll perform an expierement:
o I have a 1 byte file I want to send you.
o We start by synching our wrist-watches.
o I call you on the telephone and say "Start" and hang up.
o You and I start counting off the seconds.
o When the number of seconds have passed that are equal to the value of the byte, I call you back and say "Stop".
Now you have the value of the byte given to you in two bits of information (the "start" and "stop") bits.
Now we have an 8:2 ratio, which isn't bad. But I can do this again with a two byte file and get 8:1. I can send you ANY length of file and only consume two bytes of bandwidth... but at a terrible cost: time. Lots and lots of time.
But if you had something like a super far away satalite where bandwith is hard to come by and time is not in short supply, it would be the answer.
"Your superior intellect is no match for our puny weapons!"
This may not appear immediately relevant, but bear with me.
I'm not agreeing or disagreeing with ZeoSync's claims, but if you can impose a semblance of order on something that only appears chaotic, you can do some pretty cool stuff.
Take for example this little demo at this website in germany. (I realize what the domain looks like, there's nothing for sale or license, trust me). The actual download link is about halfway down the page.
This isn't "compression" in the conventional sense, but they still manage to contain a demo that contains hundreds of megs of textures and samples, in addition to the engine itself in *64kb*. Now thats a hell of a ratio.
They do this not by storing the raw data, but instead storing the instructions needed to reconstruct the data as it is needed.
Granted, I realize that they only accomplished this with their own data, but I don't think taking this a step further to an arbitrary set of textures and sounds is impossible. Granted, this idea won't work for all types of data, and also can not be considered "lossless", (hell, it's not even strictly compression) but I still think it's incredible that you can get this high quality results out of something this small.
(Disclaimer: The above link is to a demo that requires directx 8.1 and I sincerely doubt will run under wine. It also doesn't work with every video card out there. I've scanned the binary, and it doesn't appear to have any viruses or trojans, but I won't guarantee it. If you can't accept the risk, don't download the binary.)
I'm not a math person by any means (still doing college algebra, which pretty much means everybody has a better understanding of math than I do), and I would appreciate people picking this apart.
:)
So, my idea for a "Kick Ass Compression"
Take a block of data - throw it against an algorithm that outputs a specific value ( I'm thinking of CRC, MD5 hash or what not), do that several times against several different algorithm which generate a similar kind of value. Record the two (or more) values, then encapsulate the small block of data into larger blocks - I'm thinking only 3 or 4 levels of encapsulation would be needed (because if you calculated the crc of the entire file, a program could decide which choice (in decoding a "block" if there are multiple ones, which I'm fairly sure there will be) is correct.
Now people use md5 hashes/crc checks to verify whether the file they downloaded hasn't been modified, so I'm assuming that it is fairly difficult to get the exact value (especially with a known size). Using this "property" (I'm not sure if that is a correct word) you could decode the data into one of several (hundred??thousand??) byte streams (possibilities of uncompressed data) and by comparing byte streams between algorithm A and B, the byte streams would match at one (would it be possible to have more? I suppose it depends on the algorithms used) point, which would be the proper "uncompressed" (rather derived or something) data.
I'm pretty sure it would take a shitload of computing power in decompressing, but computers are fairly fast nowadays, and I think that this could be a viable at some point. 100:1 probably not, and there would be a lower limit imposed on the file size based on the possible choices (I think the possible choices would reach a pretty large number pretty fast)
Maybe I'm just plain wrong - but could something like this be useable? Any abuse would be appreciated
Thanks!
1q2w3e4r5t6y7u8i9o0pqawsedrftgthyjukilo;p'azsxdcf
This may have already been posted, and if it has sorry, but I thought this may be of interest to some of you.
Jean-loup Gailly (one of the creators of gzip) has written an article on a patent that was granted for compression of truly random data, and how it is not mathematically possible. You can read it here for those that are interested.
man
No manual entry for
... to have catchy theme music, and pretty flash intros? That's how *I* can tell they doing something real in the academic community. :)
...
If their technology is so earthshatteringly different and revolutionary but can use existing connections, why didn't their site download instantly? If it's only software and they already have a patent one would think the easiest route to gain investors would be a small download and a mindblowing demo away
Get off my virtual lawn, you damned virtual kids!
can be compressed to water ;)
The real breakthrough is the new discovery that the number of TMs and words capitalized in TheMiddle == the amount of money these folks will dupe from some silly investors.
I'm quite aware of that page. I work about 20' from the author :-).
I'd argue that there is no effective commercial one-time pad, only products that approach it. There have been a number of companies releasing similiar press releases about OTPs for some time, but each time the generation method has resulted in it not being an OTP. Most of the time it has also been substantially worse then most existing algorithms.
The current BEST ratio for compressing truly random data is 1:1
In other words, you can't do it.
If you TRY, some compressions software will end up making it bigger.
These guys are claiming 100:1 lossless on truly random data. This is difficult to believe on both fronts.
First, 100:1 lossless on any real-life data is unlikely. Add in the 'truly random' part...
So.. either they've violated the laws of the universe, or they are about to bring about one of the biggest mathematical discoveries in the world, or they are full of crap.
You can't compress every set of 1000 bits of data into 10 bits of data.
10 bits of data only allows for 1024 combinations.
1000 bits allows for a lot more.. so it's simply not possible.
They have funny wording in their release about data that is practically random. Well, that can be parsed to mean that in practice, the data is random and therefore it can be replaced by any other random string. After all, it's random! Not mathematically random in the entopy sense, but used by an application which wants any old string of random numbers. So sure, I can send a message saying, "generate me 1000 random digits". Great compression. Useless in practice, of course. In any case, these guys sound like a get-rich-quick scheme, trying to fool people, and not the only one of that type I can think of.
This is, of course, exactly what they WANT you to do. They only get money from the original sale of the stock. I'm presuming that this is a fly-by-night operation, so they're not going to care when (not if) their stock tanks. They've already got their money wired to a bank in the Bahamas. The person who will get hurt is the poor sod who doesn't understand that their claims are pure baloney.
Free Software: Like love, it grows best when given away.
No, I don't think hash is involved. Maybe LSD, but no hash.
Er...you're not thinking hard enough. You could compress that 10 gig drive to 1 byte. In fact, here it is: X. That 'X' contains all the best warez ever written. Unfortunately I'm keeping the decompressor for myself.
-- SIGFPE
So a 1.5 Gb file is enough to encode an entire human being.
Nope. 3M basepairs, four possible bases per pair. Takes two bits to describe four possible states, and so the unannotated sequence requires 6 billion bits of storage -> 750 billion bytes -> 715.2MB.
And genomic sequences generally aren't very random.... telomeric sequences, satellite DNA, common promoters, copied genes -- all of them can be easily abstracted and compressed out.
I'd expect that even with mapping annotations, the whole shebang would easily fit on a CD-ROM.
JAR, from the maker of ARJ, is substantially better than ZIP and RAR as far as compression goes and substantially slower also.
Interesting thing I remember with JAR in DOS, is that the more memory you have to assign to the compression, the better the compression.
http://www.arjsoft.com/jar.htm
War crimes, torture, lies, illegal spying... Would someone give Bush a blowjob, already, so he can be impeached?