ZeoSync Makes Claim of Compression Breakthrough
dsb42 writes: "Reuters is reporting that ZeoSync has announced a breakthrough in data compression that allows for 100:1 lossless compression of random data. If this is true, our bandwidth problems just got a lot smaller (or our streaming video just became a lot clearer)..." This story has been submitted many times due to the astounding claims - Zeosync explicitly claims that they've superseded Claude Shannon's work. The "technical description" from their website is less than impressive. I think the odds of this being true are slim to none, but here you go, math majors and EE's - something to liven up your drab dull existence today. Update: 01/08 13:18 GMT by M : I should include a link to their press release.
while it sounds nice it's inplausable. That amount of data cannot be compressed beyond each unique charachter + compression data. Unless long repeating strings appear. Of course then it would not be random. P.S. FIRST POST WOOHOOO!!!
Exscuse my lack of compression knowledge, but whats the current ratio? Im assuming 100:1 is pretty damn good. =) btw...even though this *might* be a good compression algorithm and all that, how long would it take to decompress a file using your joe average computer??
I SURVIVED THE GREAT SLASHDOT BLACKOUT OF 2002!
even lossless compression still relies on redundancy within the data, normally repeating patterns of data. surely 100-1 on TRUE random data is impossible?
update comments set karma=-1, reason='offtopic' where sid=26315
They claim 100:1 compression for random data. The thing is, if thats true, then lets say we have data A size (1000)
compress(A) = B
Now, B is 1/100th the size of A, right, but it too, is random, right (size 100).
On we go:
compress(B) = C (size is now 10)
compress(C) = D (size 1).
So everything compresses into 1 byte.
Or am I missing something.
Mr Thinly Sliced
Maybe they just needed more bandwidth for their terrible site?
"Ignorance more frequently begets confidence than does knowledge"
- Charles Darwin
The odds on a compression claim turning out to be true are always identical to the compression ratio claimed?
Given a number of pigeons within a sealed room that has a single hole, and which allows only one pigeon at a time to escape the room, how many unique markers are required to individually mark all of the pigeons as each escapes, one pigeon at a time?
After some time a person will reasonably conclude that:
"One unique marker is required for each pigeon that flies through the hole, if there are one hundred pigeons in the group then the answer is one hundred markers". In our three dimensional world we can visualize an example. If we were to take a three-dimensional cube and collapse it into a two-dimensional edge, and then again reduce it into a one-dimensional point, and believe that we are going to successfully recover either the square or cube from the single edge, we would be sorely mistaken.
This three-dimensional world limitation can however be resolved in higher dimensional space. In higher, multi-dimensional projective theory, it is possible to create string nodes that describe significant components of simultaneously identically yet different mathematical entities. Within this space it is possible and is not a theoretical impossibility to create a point that is simultaneously a square and also a cube. In our example all three substantially exist as unique entities yet are linked together. This simultaneous yet differentiated occurrence is the foundation of ZeoSync's Relational Differentiation Encoding(TM) (RDE(TM)) technology. This proprietary methodology is capable of intentionally introducing a multi-dimensional patterning so that the nodes of a target binary string simultaneously and/or substantially occupy the space of a Low Kolmogorov Complexity construct. The difference between these occurrences is so small that we will have for all intents and purposes successfully encoded lossley universal compression. The limitation to this Pigeonhole Principle circumvention is that the multi-dimensional space can never be super saturated, and that all of the pigeons can not be simultaneously present at which point our multi-dimensional circumvention of the pigeonhole problem breaks down.
The punchline to the joke was always along the lines of
http://www.zeosync.com/flash/pressrelease.htm
- Derwen
http://fsfeurope.org/
"Breakthrough" compression schemes are the perpetual motion machines of the 21st century. Any technological claim that's introduced with the statement that they've broken through the boundaries of information theory falls way on the wrong side of Occam's razor for me.
Think about it: 100-to-1 compression of random data? Just think in terms of first principles: How many bit strings are there of a given length? How would you reduce the size of a binary description that identifies a particular one? And note that the random data thing is straight from their press release!
Drink! Drink! Drink!
cup of tea, father?
ah, GO ON!
Pure random data is imposible to compress - If You compress 1Mb of random data (propper Random Data, not pseudo random).. and you get, say 100K's worth of compressed output; what's stopping you feading this 100K's worth back through the algorhythm, again and reduceing it down even more.... again, and again, untill the whole 1MB is squashed into a byte! (Which, obviously is a load of rubbish).....
ZeoSync said its scientific team had succeeded on a small scale in compressing random information sequences in such a way as to allow the same data to be compressed more than 100 times over -- with no data loss. That would be at least an order of magnitude beyond current known algorithms for compacting data.
ZeoSync announced today that the "random data" they were referencing is string of all zero's. Technically this could be produced randomly and our algorythm reduces this to just a couple of characters, a 100 times compression!!
ZEOSYNC'S MATHEMATICAL BREAKTHROUGH OVERCOMES LIMITATIONS OF DATA COMPRESSION THEORY
International Team of Scientists Have Discovered
How to Reduce the Expression of Practically Random Information Sequences
WEST PALM BEACH, Fla. - January 7, 2001 - ZeoSync Corp., a Florida-based scientific research company, today announced that it has succeeded in reducing the expression of practically random information sequences. Although currently demonstrating its technology on very small bit strings, ZeoSync expects to overcome the existing temporal restraints of its technology and optimize its algorithms to lead to significant changes in how data is stored and transmitted.
Existing compression technologies are currently dependent upon the mapping and encoding of redundantly occurring mathematical structures, which are limited in application to single or several pass reduction. ZeoSync's approach to the encoding of practically random sequences is expected to evolve into the reduction of already reduced information across many reduction iterations, producing a previously unattainable reduction capability. ZeoSync intentionally randomizes naturally occurring patterns to form entropy-like random sequences through its patent pending technology known as Zero Space Tuner(TM). Once randomized, ZeoSync's BinaryAccelerator(TM) encodes these singular-bit-variance strings within complex combinatorial series to result in massively reduced BitPerfect(TM) equivalents. The combined TunerAccelerator(TM) is expected to be commercially available during 2003.
According to Peter St. George, founder and CEO of ZeoSync and lead developer of the technology: "What we've developed is a new plateau in communications theory. Through the manipulation of binary information and translation to complex multidimensional mathematical entities, we are expecting to produce the enormous capacity of analogue signaling, with the benefit of the noise free integrity of digital communications. We perceive this advancement as a significant breakthrough to the historical limitations of digital communications as it was originally detailed by Dr. Claude Shannon in his treatise on Information Theory." [C.E. Shannon. A Mathematical Theory of Communication. Bell System Technical Journal, 27:379-423, 623-656, 1948]
"There are potentially fantastic ramifications of this new approach in both communications and storage," St. George continued. "By significantly reducing the size of data strings, we can envision products that will reduce the cost of communications and, more importantly, improve the quality of life for people around the world regardless of where they live."
Current technologies that enable the compression of data for transmission and storage are generally limited to compression ratios of ten-to-one. ZeoSync's Zero Space Tuner(TM) and BinaryAccelerator(TM) solutions, once fully developed, will offer compression ratios that are anticipated to approach the hundreds-to-one range.
Many types of digital communications channels and computing systems could benefit from this discovery. The technology could enable the telecommunications industry to massively reduce huge amounts of information for delivery over limited bandwidth channels while preserving perfect quality of information.
ZeoSync has developed the TunerAccelerator(TM) in conjunction with some traditional state-of-the-art compression methodologies. This work includes the advancement of Fractals, Wavelets, DCT, FFT, Subband Coding, and Acoustic Compression that utilizes synthetic instruments. These are methods that are derived from classical physics and statistical mechanics and quantum theory, and at the highest level, this mathematical breakthrough has enabled two classical scientific methods to be improved, Huffman Compression and Arithmetic Compression, both industry standards for the past fifty years.
All of these traditional methods are being enhanced by ZeoSync through collaboration with top experts from Harvard University, MIT, University of California at Berkley, Stanford University, University of Florida, University of Michigan, Florida Atlantic University, Warsaw Polytechnic, Moscow State University and Nankin and Peking Universities in China, Johannes Kepler University in Lintz Austria, and the University of Arkansas, among others.
Dr. Piotr Blass, chief technology advisor at ZeoSync, said "Our recent accomplishment is so significant that highly randomized information sequences, which were once considered non-reducible by the scientific community, are now massively reducible using advanced single-bit- variance encoding and supporting technologies."
"The technologies that are being developed at ZeoSync are anticipated to ultimately provide a means to perform multi-pass data encoding and compression on practically random data sets with applicability to nearly every industry," said Jim Slemp, president of Radical Systems, Inc. "The evaluation of the complex algorithms is currently being performed with small practically random data sets due to the analysis times on standard computers. Based on our internally validated test results of these components, we have demonstrated a single-point-variance when encoding random data into a smaller data set. The ability to encode single-point-variance data is expected to yield multi-pass capable systems after temporal issues are addressed."
"We would like to invite additional members of the scientific community to join us in our efforts to revolutionize digital technology," said St. George. "There is a lot of exciting work to be done."
About ZeoSync
Headquartered in West Palm Beach, Florida, ZeoSync is a scientific research company dedicated to advancements in communications theory and application. Additional information can be found on the company's Web site at www.ZeoSync.com or can be obtained from the company at +1 (561) 640-8464.
This press release may contain forward-looking statements. Investors are cautioned that such forward-looking statements involve risks and uncertainties, including, without limitation, financing, completion of technology development, product demand, competition, and other risks and uncertainties.
Looks like Wired has a start to their top 10 list for 2002.
I simply can't believe that this method of compression/encoding is so new that it requires a completely new dictionary (of words we presumably are not allowed to use).
100 to 1? Bah, that's only 99%. /dev/null.
The _real_ trick is getting 100% compression. It's actually really easy, there's a module built in to do it on your average unix.
Simply run all your backups to the New Universal Logical Loader and perfect compression is achieved. The device driver, is of course, loaded as
Note the wording: "Practically Random", not "Random". This of course does throw some doubt on this claim, as "Practically Random" could mean anything...
were you expecting to see a sig here? perhaps you'd rather see the inside of an ambulance!
I doubt this compression thing is true but if it is...I'm gonna tell @home to eat it's cable modem,
and enjoy a blazing dial up connection
Here're a few nebulous bits from the site to keep the skeptics going:
ZeoSync intentionally randomizes naturally occurring patterns to form entropy-like random sequences through its patent pending technology known as Zero Space Tuner(TM). Once randomized, ZeoSync's BinaryAccelerator(TM) encodes these singular-bit-variance strings within complex combinatorial series to result in massively reduced BitPerfect(TM) equivalents. The combined TunerAccelerator(TM) is expected to be commercially available during 2003.
According to Peter St. George, founder and CEO of ZeoSync and lead developer of the technology: "What we've developed is a new plateau in communications theory. Through the manipulation of binary information and translation to complex multidimensional mathematical entities, we are expecting to produce the enormous capacity of analogue signaling, with the benefit of the noise free integrity of digital communications. We perceive this advancement as a significant breakthrough to the historical limitations of digital communications as it was originally detailed by Dr. Claude Shannon in his treatise on Information Theory." [C.E. Shannon. A Mathematical Theory of Communication. Bell System Technical Journal, 27:379-423, 623-656, 1948]
"There are potentially fantastic ramifications of this new approach in both communications and storage," St. George continued. "By significantly reducing the size of data strings, we can envision products that will reduce the cost of communications and, more importantly, improve the quality of life for people around the world regardless of where they live."
[note - It appears to cure cancer and solve the issue of world hunger]
...These are methods that are derived from classical physics and statistical mechanics and quantum theory, and at the highest level, this mathematical breakthrough has enabled two classical scientific methods to be improved, Huffman Compression and Arithmetic Compression, both industry standards for the past fifty years...
Science is based on fact, news is based on fact.
"The company's claims, which are yet to be demonstrated in any public forum, could vastly boost the ability of computer disks to store, text, music and video -- if ZeoSync's formulae succeed in scaling up to handle massive amounts of data."
make prediction, test, make new prediction on results if applicable.
What happened too the the idea that test reults need to be duplicated by others before its accepted as fact(see:news).
I'm just a lowly mechanical engineer, so I'll take this to mean 1 thing.
Being labeled "renowned" must mean you are not bound by the scientific method, and that "journalists" are not bound by "fact" in reporting "news".
If the data is truly random, there is absolutely no way possible to compress it. This is bollocks.
...the funniest thing i've read in a while.
any first year cs studen knows it can't be done.
Acts@core.mailboks.com Acrux@core.mailboks.com Adam@core.mailboks.com Adar@core.mailboks.com Ada@core.mailboks.com
I didn't read the entire press release, but I did notice the subtitle:
"International Team of Scientists Have Discovered How to Reduce the Expression of Practically [emphasis mine] Random Information Sequences "
So I guess the data does have at least some redundancy in it. I'm not an expert, so I don't if this makes their claim more likely to be true, but I thought it should be pointed out.
No sig for you.
From the splash screen:
"with costumer service agents providing chat assistance."
Let's see, I prepare a press release guaranteed to garner my website tens, if not hundreds of thousands of hits, and I leave an egregious typo as my first impression?
Not!
B is not random. It is a description (in some format) of A.
But, what you say does have merit, and this is why compressing a ZIP doesn't do much - there is a limit on repeated compression because the particular algorithm will output data which it itself is very bad at comrpessing further (if it didn't why not iterate once more and produce a smaller file internally?).
---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"
excrement, bovine at that, in the air.
From their website:
"ZeoSync's new "Binary Accellerator (TM)" is not a compression technology, rather it encodes digital information into fast and dependable muti-dimensional mathematical entities that the company calls "Gems (TM)". We have chosen the name "Gems" as an acronym for Multi-Dimenstional Mathematical Reduction (MDMR) does not clearly define the condensation process, as successfully as does the mental image of the crystallization of nature that transforms rough materials into precious stones."
"Once crystallized, Gems are able to move rapidy on a fixed set of binary carriers through existing digital transmission devices, breaking all known transmission barriers. This MindSpeed velocity affects the complete global communications infrastructure by sending more data accross less bandwidth while saving time..."
MindSpeed velocity? You've got to be kidding me!
Many people may say this is bull, but think of it in another way.
Instead of assuming that data is static, think of it constantly moving. Even in random data, moving data can be compressed because it constantly moving along. It is sort of like when a herd of people file into hall. Sure everyone is unique, but you could organize and say, "Hey five red shirts now", "ten blue shirts now".
And I think that is what they are trying to achieve. Move the dimensions into a different plane. However, and this is what I wonder about. How fast will it actually be? I am not referring to the mathematical requirements, but the data will stream and hence you will attempt to organize. Does that organization mean that some bytes have to wait?
"You can't make a race horse of a pig"
"No," said Samuel, "but you can make very fast pig"
Hey, if their algorithm works on random data, re-apply it to the output, and it will be compressed again. You can do this again and again, until only one bit is left!
Now, let's uncompress a 0 bit and a 1 bit. All software ever written and ever to be written in the future must come out, since there cannot be anything which compresses to anything else than a 0 or 1 bit, if compressed to a single bit.
Seriously though, the comp.compression FAQ is really worth a read, especially question #9.
It's either a year old or they are just that lame...
What're they talking about? 20Gb of rand() output?
If so, they're a bunch or twits.
Government of the people, by corporate executives, for corporate profits.
To marketing, "random data" means a .jpeg, a .mpeg, an audio file, a .exe file, a text file, a .doc, etc. I.e. the algorithms apply to general data, as opposed to schemes to compress specific data (aka .jpeg for pictures, .mpeg for a series of similar pictures, etc.)
"All representatives are busy. The estimated hold time is one..hundred..sixty..four..minutes." Detroit Edison, 02/01/02
There seems to be a company claiming to exceed, go around, obliterate Shannon every few years. In the early 90's there was a company called Web (before the WWW was really around by a year or so). They made claims of compressing any data, even data that had already been compressed. It is a sad story that you should be able to find in either the sci.compression FAQ or the renewed deja archives. It basically boils down to as they got closer to market, they found some problems... you can guess the rest.
This isn't limited to the field of compression of course. There are people that come up with "unbreakable" encryption, infinite gain amplifier (is that gain in V and I?), and all sorts of perpetual motion machines. The sad fact is that compression and encryption are not well understood enough for these ideas to be killed before a company is started or stacked on the claims.
Think really hard...
You'd think they'd create a java application to present their site compressed with their methodology.
Heck, even Sun ponied up with a streaming video on demand java applet for their CEO speeches, just to illustrate there weren't performance issues involved.
We already have lzip to compress the files down to 0% of their original size. ZeoSync doesn't catch up with latest technologies on /. it seems.
claim = crap
If you read the press release carefully, they claim to be able to compress practically random data, such as pictures of green grass, 100 : 1. They never claim to be able to do the same with true random data, since this is impossible.
There may be something about that. However, there are also many points that make me sceptical, but maybe the press release has not been reviewed carefully enough.
This new algorithm does not break Shannon's limit, which is impossible, so the phrase about the "historical limitations" is a hoax...
Screw ZeoSync, I've built a compression algorithm that is 1000:1 and is completely lossless. I've yet to demonstrate it in public though but please give me venture capital. Thank you.
The maximum compression ratio for random data is 1. That's no compression at all.
ZeoSync intentionally randomizes naturally occurring patterns to form entropy-like random sequences through its patent pending technology known as Zero Space Tuner(TM). Once randomized, ZeoSync's BinaryAccelerator(TM) encodes these singular-bit-variance strings within complex combinatorial series to result in massively reduced BitPerfect(TM) equivalents. The combined TunerAccelerator(TM) is expected to be commercially available during 2003.
:)
I think they have made a buzz-word compression routine, even our sales peoply have difficults putting this many buzz-words in a press release
It is fairly safe to assume that this is (supposed to be) an advertisement of a company that provides "Strategic Business Communications" services...
http://www.wilsonmchenry.com/
Section 1.9 of the comp.compression FAQ is good background reading on this stuff. In particular, read the "WEB story".
Most random generation uses bytes as their unit.
Now, what if they look for bit-sequences (not only 8-bit sequences but maybe odd numbers) in order to generate patterns ?
I guess this could be a way to significantly compress data but this'd imply a huge number of data read in order to achieve the best result possible.
Note they may also do this in more than one pass-through but then their compression thing should be really lengthy, then.
Trolling using another account since 2005.
Never, *EVER* accept any advice from the Aberdeen Group. Apparently their analysts don't know shit.
"Either this research is the next 'Cold Fusion' scam that dies away or it's the foundation for a Nobel Prize. I don't have an answer to which one it is yet," said David Hill, a data storage analyst with Boston-based Aberdeen Group.
Wonder which category he expects them to win in...
Physics, Chemistry, Economics, Physiology / Medicine, Peace or Literature
There is no Nobel category for pure mathematics, or computing theory.
... by compressing some VC's bank account, by a factor of greater than 100!
... real soon now, real soon".
"It was just data, you know," the sobbing wretch was reportedly told, "just ones and zeros. And hey - you can look at it as a proof of principle. We'll have the general application out
yes, we have no bananas
Quite the contrary: if they had claimed to be achieving 100:1 compression on truly random data, they would be provably talking total rubbish. Consider the number of possible bit strings of length N. Now consider the number of possible bit strings of length N/100. There are fewer of the latter, right? Therefore, if you can compress every length-N string into a length-N/100 string, at least two inputs must map to the same output. Hence, you can't uniquely recover the input from the output - and the compression cannot be lossless.
The fact that they hedge and talk about "practically" random sequences is the only thing that makes it possible they're telling the truth!
sure i'll give you 10,000,000....which i've compressed into a nickel
... that it's not neccessary to put every memo sent by email from higher management levels into an attached Winword file, plain ASCII text
works as well and - believe me - there is no loss of information at all.
ZeoSync is not claiming to reduce random data 100-to-1. They are claiming to reduce "practically random" data 100-to-1, and Reuters appears to have misreported it. What "practically random" data should mean is data randomly selected from that used in practice. What ZeoSync may mean by "practically random" is data randomly selected from that used in their intended applications. So their press release is not mathematically impossible; it just means they've found a good way to remove more information redundancy in some data.
The proof that 100-to-1 compression of random data is impossible is so simple as to be trivial: There are 2^N files of length N bits. There are 2^(N/100) files of length N/100 bits. Clearly not all 2^N files can be compressed to length N/100.
Looks like IPO drilling to me,
,Haa haaaaahaaa
"We have almost got it, we have the basic principals and the trademarked names" (INSIDE)"Now watch this the money will come pouring in
Honestly did you guys read their sting of random data ? All fricking 0's yeah thats random in theory.
But why no go all the way.hell why not 1000:1 or 100000:1 no problem with all zero's
Ok if it works ?
Hey its a boon to civilization, VOip and 1000 times more MP3's and PORN at superspeeds.
Hmm , If you had something this truly unbeliveable wouldnt you wait on a public announcment until AFTER you had a functional demonstration ?
Sig went tro...aahemmm.....fishing........
The company's claims, which are yet to be demonstrated in any public forum...
Call the editors at Wired... I think we have an early nominee for the 2k2 vaporware list.
ZeoSync expects to overcome the existing temporal restraints of its technology
Ah... So even if it's not outright bullshit, it's too slow to use?
"Either this research is the next 'Cold Fusion' scam that dies away or it's the foundation for a Nobel Prize," said David Hill...
Somehow I think this is going to turn out more Pons-and-Fleischmann than Watson-and-Crick. Almost anytime there's a press release with such startling claims but no peer review or public demonstration, someone has forgotten to stir the jar.
When they become laughingstocks, and their careers are forever wrecked, I hope they realized they deserve it. And I hope their investors sue them.
I should really post after I've had my coffee... I sound mean...
OK,
- B
http://www.bradheintz.com/
- updated
this sounds a lot like using pi (3.141592...) for compression. any random string is guaranteed to occur in that sequence, so just find the position of the string in pi and pronto.... compression!
doesnt work though since on average you'll need as many numbers to describe the position of the string as yould need to simply represent the string in the first place.
instead of using pi, they create a '4th dimension', i.e. some sort of combination of all possible combinations in 3 dimensions. different from the pi example though, the problem now is not representing the position in this dimension (4 coordinates) but the recreation of this space (which needs to be enumerated) by the guy who wants to decript the message.
for 'short' strings a few pointers in a 4 dimensional space will do, for longer strings more dimensions, leading to longer pointers, longer tables of enumeration etc.
of course, this can be tackled the other way around as well. 'random', by definition, means that the next instance doesnt have any relation to the previous. if you can find such a relation its no longer random to start with.
Of course, even though it can't be true, if it turned out to be true...
...and they patented their genius and hard work...
...clods around here would claim how idiotic their obvious work is and how it shouldn't be patented.
"All representatives are busy. The estimated hold time is one..hundred..sixty..four..minutes." Detroit Edison, 02/01/02
Compression, after all, is removing all redundancy from the original data.
So, if there is no redundancy, there is nothing to remove (if you want to remain lossless).
When you use some text, you may compres by remving some letter evn if tht lead to bad ortogrph. That is because English (as other langages) is redundant. When compressing some periodical signal, you may give only one period and tell that the signal is then repeated. When compressing bytes, there are specific methods (RLE, Huffman's trees,...)
But, in all these situations, there was some redundancy to remove...
A compression algorithm may not be perfect (it usually has to add some info to tell how the original data was compressed). Then, recompressing with another compression algorithm (or sometimes, the same will do the trick) may improve the compression. But the information quantity inside the data is the lower limit.
Now, take a true random data stream of n+1 bits. Even if you know the value of the n first bits, you can't predict the value of n+1. In other words, there is no way that could allow the express these n+1 bits with n (or less) bits. By definition, true random data can't be compressed.
And, to finish, compression ratio of 1:100 can be easily archived with some data... take a sequence of 200 bytes at 0x00... It may be compressed to 0xC8 0x00. Compression ratio is really only meaningful when comparing different algorithms compressing the same data stream.
Any technology distinguishable from magic, is insufficiently advanced.
Simple:
;-)
ZeoSync(data) = miniData
miniData is randomdata
ZeoSync(miniData) = miniMiniData = ZeoSync(ZeoSync(data))
ad infinitum
If someone claims to do real lossless compression of random data, this is impossible...
As all data would be no data, hence my newest haiku
----
--
[insert witty one-liner here for your own pleasure]
I was wondering how much processor power decompression takes? If it isn't too much, then would assume that we will see PDA's, MP3 players, and ISP's doing this. .03 Mb.
A 8 Megabyte Flash card now hold 800 MB of Data, the Harry Potter Divx is now only 7MB. A metallica mp3 is only
At best this is revolutionary, at worse a candidate for next years vaporware list.
Secondsun
There is nothing wrong with being gay. It's getting caught where the trouble lies.
Take very large prime numbers and the like, huge strings of almost random numbers that can often be written as a trivial (2^n)-1 type formula. Maybe the massaging of the figures is simply finding a very large number that can be expressed like the above with an offset other than "-1" to get the correct "BitPerfect" data. I was toying around with this idea when there was a fad for expressing DeCSS code in unusual ways, but ran out of math before I could get it to work.
The above theory maybe bull when it comes to the crunch, but if it could be made to work, then the compression figures are bang in the ball park for this. They laughed at Goddard remember? But I have to admit, I think replacing Einstein with the Monty Python foot better fits my take on this at present...
UNIX? They're not even circumcised! Savages!
You just need a big ol' monolithic database containing all possible random strings of digits of a certain length (which will be big, OK, but so is MS Word, and people use that, don't they?), then you mix 'n match the data you're compressing, and all you have to save is the tags of the fixed random strings you're matching to, the length of which will be ... oh, wait ...
yes, we have no bananas
Is it possible, at all, to trust a company whose home page has silly javascript that resizes your browser window?
In other noticies, the same company announced that a perpetuum-mobile machine will see the market in 2003
ZeoSync intentionally randomizes naturally occurring patterns to form entropy-like random sequences through its patent pending technology known as Zero Space Tuner(TM). Once randomized, ZeoSync's BinaryAccelerator(TM) encodes these singular-bit-variance strings within complex combinatorial series to result in massively reduced BitPerfect(TM) equivalents.
So they claim, if it isn't random enough, they make it MORE random first so their compression can work better. OOoohKay...
Vaaaaaaaaaaaaaaaaaporware...
A thought just occurred to me: If you can do 100:1 compression and compress something down to, say, 2 bytes, what would 'ab' expand to? My thought is "ZeoSync Rulz, Suckas"
For those who are not satisfied with the 100:1 compression ratio, lzip might as well worth considering. ;)
Using time travel, high compression of arbitrary data is trivial. Simply record the location (in both space and time) of the computer with the data, and the name of the file, and then replace the file with a note saying when and where it existed. To decompress, you just pop back in time and space to before the time of the deletion and copy the file.
This press release may contain forward-looking statements. Investors are cautioned that such forward-looking statements involve risks and uncertainties, including, without limitation, financing, completion of technology development, product demand, competition, and other risks and uncertainties. I guess it's a standard clausule in a pressrelease but still I think they rely on it to get some free cash since this just can't work.
Why are you all so skeptical? Do you really understand their discovery? This may represent a major breakthrough in techniques for reducing the size of investor funds by 100:1.
Give a man a fish and he eats for one day. Teach him how to fish, and though he'll eat for a lifetime, he'll call you a miser for not giving him your fish.
Terrorists can't threaten a country's freedom and democracy. Only lawmakers and voters can do that.
The press release says "practically random", which means basically nothing.
It also mentions "temporal restraints" which means it runs to slow to use anywhere.
If their claims turn out true and they build a profitable buisness based on it, I'm sure all ownership will be given to Zog, a prehistoric neanderthal first known for his claim on "squishing things".
First off, they don't say it can compress "random data", they say it can compress "practically random data", which I would take to be everyday sort of data like audio and video. And they don't say that data can be compressed infinitely. _If_ whatever they have does work, I suspect it'll be an enlightening moment for the rest of us if/when they release the details of their algorithm. Sort of like, if the only thing you're familiar with is the bubble sort, quick-sort is almost magical. Well, maybe the current schemes of run-length-encoding, and whatever other pattern matching we do, is akin to the bubble sort and these guys have put their heads together and created the quick-sort of data compression.
I'm not calling it either way, but all the "It can't be done! The world is flat!" comments are so typically... well... slashdot.
seems to me that if they are claiming use of the pigeonhole principle they are going to be trying something like the following:
1)create some table of bit strings where the size of the string is 100 times larger than the size of the index into the table
2)encode data as indexes into the table
3)transmit indexes
4)decode data with indexes into the table
now how they get the table from the source to the destination is anyones guess but it seems to me that since the top of their flash animation keeps blinking "revolutionary microchip technology" these tables will be built into hardware. i dont think this can really work for a few reasons, but i just wanted to give my first impression of how it all sounded from my point of view.
They're looking for investment money?
Just think of it as an innumeracy tax on
venture capitalists.
Proposed: a method for reducing any file down to 16 bytes and losslessly restoring it.
1. Create an MD5 hash of the file.
2. Share it on a Peer-to-Peer filesharing client.
3. Delete the original file.
4. Find it again!
Note: in trials, this method seems to work best for Britney Spears songs and videos; further research is being done on how to restore Barry Manilow songs and videos, and what to do about hash collisions (bug those uppity MD5 people again).
pb Reply or e-mail; don't vaguely moderate.
A million dollars in venture capital is easier to obtain than 5 points on a combinatorics test.
Whenever a blurb contains the phrase
"top experts from so-in-so" you can
assume it to be a bunch of snake oil.
As in the phrase "top experts from
Harvard" What they don't have any
experts?? They have to borrow some
one elses?
The press release is light on details. A quick search of the US Patent database for BinaryAccelerator and Zero Space Tuner turned up nothing. I thought even pending patents were there. The CEO seems to be a mortgage broker. Interesting line of research.
Anyways, I thought the breaktrough in data compression would be using a mathmatical algorithem to express Pi and in index to the digit that your random string begins and a count of the data. That truly would be random, if those guys can prove there is a mathmatical formula for Pi.
D.O.U.O.S.V.A.V.V.M.
For example, at the top of the list Dr. Piotr Blass is listed as Chief Technical Adviser from Florida Atlantic University. But he seems to be missing from the faculty. Google doesn't turn up much on the guy either. Hmmm.
I've not even had time to check the rest yet.
Please don't all do so at once though :-)
It's essentially a collection of lecture notes for a course on information theory and neural networks given by the author (David MacKay), but has been much expanded since I took the course in 1997. It will certainly show how any claim for a compression technique which works consistently on random data is bogus.
Notice the one word in the fifth paragraph of the story, "anticipated", and it validates the entire story.
I don't recall any of this crap about pigeons flying out of boxes. Or am I getting old?
...richie - It is a good day to code.
You can dl the source and look at the algorithms yourself.
lzip. Or just read this snippet from their faq:
1. What is lzip?
Lzip is the most advanced file compression utility ever conceived. It is literally years ahead of gzip (though admittedly gzip was around first), and makes use of mathematical transforms the bzip developers have never even heard of. The practical upshot of this is that when you use lzip, you get the best compression on the planet. Smaller file sizes; faster compression/uncompression times.
Used properly, lzip is capable of reducing a file down to 0% of its original size. Yes, you read that correctly: 0% of its original size. And regardless of file size, this can be done in constant time. Now do you see why some people are calling lzip the "holy grail" of file utilities?
2. What makes lzip different from gzip/bzip2?
Well, other than the performance benefits mentioned above, the real difference is that lzip uses a "lossy" compression scheme. Most other file compression utilities use a "lossless" compression scheme, mostly because the lossless algorithms are better understood and simpler mathematically (most programmers take shortcuts, particularly in areas that involve a lot of math).
This has two side effects. The first is that files compressed with lzip cannot be restored to their original state -- this is the "lossy" in lossy compression. The second is that the performance is vastly improved. Why don't go go back up to question number one and read that second paragraph again. We're talking about a constant-time algorithm that can reduce a file down to 0% of its original size. What's not to like?
Software Wars
I just compressed an infinite set of nonrepeating numbers into a simple mathematical equation!!
data = d/r
Where d = diameter of circle and r = radius
This gives the incredible compression ratio of
infinity:1.. how about that!
"Random data cannot be compressed."
---Storer, James A. "Data Compression: Methods and Theory," 1988.
(Good choice of career with a name like that.)
If the random string just happened to come out as something with a short pattern, then you could compress it, but in general a random string can't be compressed. In fact a string that can't be compressed is sometimes taken as the proof the string is random.
Nothing is impossible to people who don't do the work. Marketing will promise you the moon then blame development when all you get is grilled cheese.
Nothing to say here... move along
Ok, so with current knowledge this all seems total B.S..but remember things always seem impossible until you understand how, imagine trying to explain todays society and its advances to someone from 200 years ago?? Not saying this is all true, just saying dont also discount it straight away because =You= dont know how its done/can be done. Obviously lack of proof makes a person less inclined to think its possible along with current understanding, but Maybe?!
Laptop Reviews
Just a few days late to make it onto the VAPORWARE list for 2001... Oh Well, 2002 Vaporware List, We have a winner!!!!
Sheesh. The sheer stupidity of these people. Tell you what. I'll give them a chunk of data, and they can show me their magic compression.
"...In your answer, ignore facts. Just go with what feels true..."
www.smokepotgetpaid.com
Hey, I've got the ultimate compressor for random data.... >/dev/null .. 1:0 compression
/dev/urandom ...?????
decompress, you say? what about retrieving it from
Don't click here. BT will enforce intellectual rights and sue for eac
Some of my collegues (that claim to know about such things :-) explained to me after reading the press release that it is not compression in the normal information-theory sense, but rather a way to squeeze more bandwith out of the copper.
The technical details of how data-manipulation can increase SNR is beyond me, but at least it doesn't seem as clearly impossible as "100:1 lossless compression".
--
Goenk
Incompetence Floats
So, if practically random data can be compressed, I can compress the result again, and the result again, until I end up with one bit of data in the end? That's great! Imagine the implications: for example, every ordinary lamp is now a computer, because it holds exactly one bit of data, on or off. No wait, that can't be right.
Like science? Comics? Wicked...
Funny By Nature
A function has a unique inverse if and only if it is one-to-one. The only way compressions schemes get around this is that they can't compress everything by a factor of 100.
In fact, if a scheme compresses any length N string of data ANY amount, it follows necessarily that there is a string of length = N who is not compressed, but actually bloated.
This can be THE chance for Duke Nukem Forever to get rid of the first place in next year's Wired Vapourware list :)
random data? that's noise isn't it? you can compress noise as much as you want and no-one cares, I've developed a technique based around the insertion of digits into "lugholes" than can "compress" noise by a huge factor.
That was classic intercourse!
For all those people that ask why you can't recompress data ad infinitum to 1 byte lengths, the answer is as follows: (from failing memory of an Information Theory course from my Computer Science degree 10 years ago)
Information theory explains that you can't compress pure information. Compression relies on a lack of efficiency of the format of the data to store its actual information (entropy).
Basically, any lossless compression method just stores the same information in a more efficient way.
Eg. An n-bit long stream of binary 1's may be a lot of data but it contains only 2 items of information: value(1) and length(n).
Once compressed, even random data is no longer random.
Hope this helps.
Niz.
Now look - two occurences of 'v,c'. Patterns have occured in truly random data.
Ask any cryptanylist and they'll tell you that 'random' typing at a key board isn't really random. This being why 'random' typing isn't a good source for chaos in building keys for pgp etc.
People tend to pattern their typing movements and timing between strokes.
Nothing to say here... move along
If they can compress "random" data 100:1 then they can compress _anything_ 100:1
Which begs the question: have they tried compressing the compressed data again to get 10000:1? If not, why not? If fact why not make the compression function iterate to get 100^n:1 compression?
Oh, I see. That's why. It's because this technology doesn't exist and never can. It's "ZeoSync vs Physics." I know where my money is.
-- MartinG To mail me: echo kewyjlcxyzvjfxbqwh | tr bcefhjklqvwxyz
The last internal combustion engine was turned to scrap in a ceremony celebrating the billionth Segway sold.
Richard Cranium, who lives 100 miles away from
anyone or anything, had this to say about the 2 wheeled mobility machine that has a maximum range
of 12 miles:
"Gasp..gasp..gasp...*censored*"
what your data set is. For example if one wishes to compress an album of suppose, N'Stink, or Backdoor Boys, it isn't truly random data, and a close observation of the data provided, will show almost an exact duplicate (allowing for differences in the spelling of the bands' names) and therefore compression rates can well be into the realm of 100:1.
I sig, therefore I was.
(For those that didn't have the good fortune to program in this er, well! self documenting language: COBOL very much lives within it's data descriptions and is very record oriented. There's entire section called the DATA DIVISION (I think, it's a long time since) where you define your records with it's columns and their respective data type. Now you can overlay a "column" with multiple data types in order to process them appropriately.)
Well now, this instructor claimed, no insisted! that the REDEFINE statement enabled you to multiply your physical memory. That is, if you define a character field with length 10 (10 bytes for all intents and purposes) and redefine it numeric then that guy claimed that you could store 20 bytes in this 10 byte address space.
Therefore you can't be right. In COBOL you can write the most complex applications like an air traffic control system or a sophisticated telephone exchange that serves Manhatten with just 1 byte of storage space. It's only a matter of clever REDEFINEing. That's what I claim is the true breakthrough in compression technology, now isn't it?
It's probably needless to say that this guy didn't really helped to build confidence and enhance our pleasure in the great language of COBOL...
ich bin der musikant
mit taschenrechner in der hand
kraftwerk
Their claims are 100% accurate (they can compress random data 100:1) only if (by their definition) random data comprises a very small percentage of all possible data sequences. The other 99.9999% of "non-random" sequences would need to expand. You can show this by a simple counting argument.
This is covered in great detail in the comp.compression FAQ. Take a look at the information on the WEB Technologies DataFiles/16 compressor (notice the similarity of claims!) if you're unconvinced. You can find it in Section 8 of Part 1 of the FAQ.
--JoeProgram Intellivision!
If it's truly random data, this compression/decompression is actually VERY easy. Compression: Strip 99 bytes out of every hundred.
Decompression: Insert 99 random bytes in between every byte.
What's that? You want the SAME data back? Why does it matter? It's pure random data anyway!
Oh yeah. Have they announced a DE-compression routine yet? (I know "lossless" sort of implies that they have one, but I didn't see anything about decompression, only compression)
Marketing rubbish as usual.
Ahh - My eye!
The doctor said I'm not supposed to get Slashdot in it!
Anyone else not believe them one bit or is it just me?
"In other news, a new method of compression known as "/dev/null" was discovered by ZeoSync. It has the best compression ratio of any program to date. All you do is output the datastream to the new DevNullAccelerator and boom! No more data storage problems!"
I could believe that more then their press release.
Brielle
navigating through the flash rubbish you can reach a list of team members that includes steve smale from berkeley and richard stanley from MIT who both are existing senior academics.
so either someone has lent their names to weirdoes without paying attention or there is something of substance hidden behind the PR ugliness. after all the PR is aimed toward investors, not toward sentient human beings, and is most probably not under the control of the scientific team.
Dev elpizw tipota, dev phoboumai tipota eimai lephteros http://euclidian.org
(Of course, this DOES create all sorts of other problems, but I'm going to ignore those, because they'd go and spoil things.)
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
But 6 GB to 600MB is 10 to one which we obviously can already do. They're claiming 100-1
Don't bother compressing it, just delete it, and then get an infinite number on monkeys on an infinite number of typewriters to re-produce the original.
I was wondering as I read the headline and summary on slashdot "how can these sleazeballs possibly promote this scam, because it would be easy to show counterexamples?" This shows, once again, that I lack the imagination and chutzpah of a real con artist.
The beauty of this scam is that zeospace claims that they can't even do it themselves, yet. They've only managed to compress very short strings. So, they can't be called to compress large random files because, well gosh, they just haven't gotten the big file compressor work yet. So, you can't prove that they are full of shit.
Beautiful flash animation, though. I particularly like the fact that clicking the 'skip intro' button does absolutely nothing -- you get the flash garbage anyway.
thad
I love Mondays. On a Monday, anything is possible.
Years ago, a company called "Webb Technologies" (or something like that) claimed to have 16:1 lossless compression (on any data). They made several press releases and caused quite a stir.
Byte magazine followed the story with interest for months, urging Webb to release the code or give a public demo - but of course that never happened.
I'm not sure what ever happened to Webb, but I sure hope ZeoSync's investors pursue fraud charges when this is exposed for what it is.
Take a string ... chop off 99% of it. You've just achieved 100 to 1 lossy compression. Again, let me emphasize that this is not a usable compression method!. The fun is finding the flaw.
The proof goes like this:
- Assume someone claims a compressor that will compress any X-byte message to Y bytes where Y<X
- There are 2^(8*X) possible messages X bytes long.
- There are 2^(8*Y) possible messages Y bytes long.
- Since Y is smaller than X, this means that no 1 to 1 mapping between the two sets can exist, because they're not equally large.
You see this simply if I claim a compressor that can compress any 2-byte message to 1 byte.There are then 65536 possible input-messages, but onle 256 possible outputs. So It is mathemathically certain that 99.7% of the messages can not be represented in 1 byte. (regardless of how I choose to encode them)
These claims surface ever so often. They're bullshit every time. It's even a FAQ-entry on sci.compression
Bah!
I realize that 1st of April is far away.
Or, is my clock screwed?
People will do tomorrow what they did today because that is what they did yesterday.
all information is infinately compressable
... avenge death." Simpsons quote depicts a situation where zero data conveys information.
"Not back
100:1 is careless way to describe a compression scheme boasting as hundred-fold compression.
Now, stop discussing compression. Go watch Invader Zim RIGHT NOW!
Dirt doesn't need luck.
Decompress at will.
I'm no master of compression theroy, but the only way the could acceive that 100:1 compression ratio they tout would be a data sample looking something like:
1111111111111111111111111111111111111
1111111111111111111111111111111111111
111111111111111111111111111111111111
Wouldnt it?
So if thats there data they could get, so maybe the got there sample by randomly banging the '1' key 100 times and then ran their compression scheme against it? Well problem solved, case closed and stamp moron on their folder (or press release as it may be)!
Thanks for mentioning this for the millionth time in one article.
This is along the lines of perpetual motion machines.
Every once in a while, some bozo claims to achieve ridiculous compression rates on random data. It's always bullshit meant to sucker in the gullible investors, or just to get some attention for some psycho loser who usually doesn't understand more than enough math needed to copy and deform a few compression theory equations out of a text book.
Skepticism is your friend.
Why are you letting these clowns ruin our country?
Reuters is supposed to be a reputable news agency, but they didn't even bother waiting for a response from Steve Smale and verifying his involvement in the company. I don't blame Zeosync for trying to get some attention, but Reuters is supposed perform some basic reporting and research. That was the first thing I learned in journalism class, so why in the world are reuter's reporting getting away with it?
i know I'm offtopic
Note that they say in the press release that it will be useful after 'temporal issues' are dealt with. It sounds like they try to reduce the data into a series of overlapping equations or some such, but the procedure is so intensive that it takes a super computer to do in a reasonable amount of time.
The point:people will still be using winzip and pkzip for quite some time.
I guess what you'd be going for here is some kind of reversible hash function. Remember that the goal of a hash function is to make it so that no 2 messages (or groups of data) hash to the same value. I am under the impression that this has never been done, but ti has been done to the point where trying to genereate your own message to match a hash message is near impossible, especially if you're aiming to get a point across. If, however, each 'message' hashed to a different value, and this was reversible, I could see doing this. I just don't see how you can make every message have a complete unique value. It seems that if you have an infinate number of 500 bit messages all to be reduced to 5 bits it would take all 32 messages before you were completely out of options. They're talking larger numbers, but there are only so many combinations to hash to and they're always less than the combinations you're hasing from. If anyone would like more information about hashes and hashing equations try http://www.rsasecurity.com/rsalabs/faq/ and more specifially http://www.rsasecurity.com/rsalabs/faq/2-1-6.html . Then again, this is how I would try to compress things, I do not know what method they are using or others typically use. I just hope its true.
I have been working on a similar approach for over a decade. The approach that I think that they have taken is the re-ordering of the bits in a known fashion. While you may have what appears to be random bits, they can be re-ordered to move towards ordered bits (I approached this as a helix with variables of radius and bands to exchange). Notice that you can not become ordered without introducing more overhead than what you can compress. Once this is done, you can then get some small degree of compression. I have taken 1 g of data to 1k. Problem is 3 days for compression which pretty much made this worthless (NPC problem).
Some of you will be naysayers here. After all you have studied CS and know for a fact that a new thought could not possibly beat the old thought. Think again. Look at data in a different fashion. If you have a finite string, there are only so many pure ordered strings. Likewise, there are only so many purely random strings. The others have a degree of order and can by moved towards more order slowly. BTW, the problem with many proofs is they are based on infinite strings, which the above could never solve. But then again, we never have infinte strings.
From the site:
"This press release may contain forward-looking statements."
Ok so now I'm going make some forward looking comments on my site and hope for investors.
1. I have solved the worlds need for power and energy.
2. I Have solved the problem that keeps us humans from being immmortal.
To get the solutions to 1 and 2 please send 5 bucks too me! McK
That statement is hidden at the bottom of the page look carefully!
Even for those of use without PHDs in Math, it is still inherently obvious that you can't compress completely random data losslessly at all.
If I have 4 bits, then I have 16 (2^4) combinations. If I want to compress this to 3 bits, then I can only have 8 (2^3) codes, so I have to use some of the codes for more than one combination. Obviously, I can't tell the difference between the multiple combinations from just the code, when I try to reverse the process.
Pretty obvious really, but then again this type of crap comes up every year or so. I suppose it makes some unscrupulous individuals enough investment money for them to run off to Burmuda.
Xesdeeni
Ok, say I want to compress "foo" 100 times over:
bash$ for i in $(seq 1 100); do gzip foo; mv foo.gz foo; done
If I read through the marketing malarky, I get..
They claim that they have developed a method of reducing the data in question to a set of mathmatical algorithms (GEMS) that can be used to accurately represent the original data set. Think about a sine wave that consists of 1000 data points. Which would you rather have, 1000 data points that as part of the sine wave may have very little redundancy, or the mathmatical equivilant of the sine wave which can describe the entire data set accurately? This is obviously quite hard to do, and why they talk about temporal constraints and limited bit strings.
This has been one of the holy grails of image compression for quite some time.
Their press release claimed that this worked on 'practically random' data. Even these guys aren't crazy enough to say they can compress true random data. The question is what does 'practically random' mean? Does that include 98% of the files on my harddrive? Or is this something that is only going to work on text files?
At least PGP uses various timing values for random data as well (the timing of typing in addition to some other timing sources I believe). If it was just typing then that would be scary: How many "random" keystrokes seem to always have "asdf"? There is nothing random typing at a keyboard.
The company website is all Flash. Well, that blows my opinion of them completely. All glitz and no substance. That changed my opinion from 95% sure it was a pile of BS to 99.99%.
Need a Python, C++, Unix, Linux develop
The reason this might work is because 99,99% of the data you are surrounded with from day to day are NOT truly random - things like images and sound are nearly random in their nature, but neither of them are truly random. (Because in that case you'd be looking at an image of static and listening to white noise.)
So even though their algorithms won't work on truly random data it will work on the 99,99% that are not that random - and if they're correct in what they say, they've developed new techniques for exploiting this un-randomness. I still don't believe their 100:1 ratios are belivable, but if they are only 5% better than the current best algorithms, that's still a major step forward.
If one finds a way to predict (i.e. compress) "random" numbers, then it is no long random. That means it has some deeper mathematical structure.
What could happen is that so-called "random" information in human cultural datasets are far from random and highly compressible.
Mathematical breakthrough from the same county that gave us the Butterfly Ballot Balyhoo? Hard to believe. ;-)
Anyway, they're still working on tiny "bit strings" due to not yet overcoming the "temporal contraint" barrier. So, don't get all excited just yet.
-- @rjamestaylor on Ello
If so, this is actually on-topic.
[yoshi@ilp.ath.cx]# apt-get zeosync /dev/hda* HD_backup.zeo
[yoshi@ilp.ath.cx]# zeosync -compress
[yoshi@ilp.ath.cx]# ls
-rw------- 1 yoshi users 1 Jan 08 14:25 HD_backup.zeo
Oh, that's right never.
[windows users: the bold 1 would be the file size of all backed up partitions on the primary disk]
Get your Unix fortune now!
At least his scam was believable enough to fool a thousand people. ZeoSync got to choose a more believable scam to beat a 17 year old.
This is not a troll. It's simply a question of why so many slashdot headline descriptions start off sensibly and subject-oriented but end with some completely off-topic, blatently offensive and generally incorrect remark?
Taco and Michael do it the most, but I've seen other posters doing it is well, and I can't help but think that it's probably intentional to some degree just to stimulate conversation in the comments section. A prime example on this post would be, math majors and EE's - something to liven up your drab dull existence today. If I was a math major or EE, that would piss me off. Period.
Come on guys, if you want a website with less than a 20/80 percent signal to noise ratio respectively then climb out of your little sandbox and start acting like professional adults. P.S., if this gets moderated as troll or flamebait, then you're completely brain-dead and didn't read it... so go ahead and mod down appropriately(I know you will).
A musician without the RIAA, is like a fish without a bicycle.
Although currently demonstrating its technology on very small bit strings, ZeoSync expects to overcome the existing temporal restraints of its technology and optimize its algorithms to lead to significant changes in how data is stored and transmitted.
i.e. - some guy in research made a little doohickey that compressed "Your Mom!" to the bit string 100101010.
"Nothing to see here, folks! Move along, move along!"
+1 Insightful, -1 Troll. What can I say, I'm an Insightful Troll.
After reading through their press release, I don't think ZeoSync is claiming anything impossible. First off, I don't think the input data is random at all. I think it's normal data.
ZeoSync intentionally randomizes naturally occurring patterns to form entropy-like random sequences
It seems like this is the first step in which the input data is transformed into "random" data which isn't actually random, because each byte (or word) has a difference of one bit with the bytes (or words) before it and after it.
Once randomized, ZeoSync's BinaryAccelerator(TM) encodes these singular-bit-variance strings within complex combinatorial series to result in massively reduced BitPerfect(TM) equivalents
I'm not sure what a "complex combinatorial series" is, but it sounds to me like they might be trying to first convert input data string into single bit variance strings and then look at these strings as values of a function representable by a "combinatorial series", which might be a taylor series, or the frequency domain output of an FFT or something like that. Of course it might be difficult to find a sufficiently compactly representable function whose values are the intermediate single bit variance strings, but presumably that's why they say
Although currently demonstrating its technology on very small bit strings, ZeoSync expects to overcome the existing temporal restraints of its technology
It would seem to me that these guys have taken the idea of the standard compression scheme and turned it around - so it takes in random data and spits out non-random data. Somehow (possibly through high-speed buzzword factoring) this new data is smaller, though.
I mean the new data must be non-random, right? Because otherwise you could just shove it back in and make it smaller. Oh, but wait, they have a universal first stage that makes any data random. This is all very humorous.
Anyway, does anyone know, how random is the output of a standard compressor like gzip?
Justin
"Why would God give us a waist if we wasn't supposed to rest our pants on it?" - Rev. Roy McDaniels
Lossy compression of data is possible by two methods: dropping data, and recognising specific patterns. All compression routines are specialised for some form of data. That's why JPG and GIF files can vary so much at storing the same picture, both in quality and file size.
Lossless compression of truly random data is impossible. Take a random 5 digit number. You can ONLY represent 100000 different numbers using five digits. If you're using less than five digits then there aren't 100000 discrete combinations and you've lost data.
The only way a five digit number could be compressed is if you had either nonrandom data (EG: a 5 digit number using only even digits) OR if you accepted data loss (EG: round off to nearest 10 - compressing by one digit).
There's no way around this; it can't now - and never will - be done.
If they are such a great and groundbreaking company then why don't they check the spelling and grammer on their site?
ZeoSync's HTML site will be available January 13, 2002 with costumer service agents providing chat assistance.
Or will there be online help for Halloween costumes? I am sorry but I think this is a ruse just like the Seti@Home Accelerator
If they're talking about compressing what you find in a typical user's documents, or perhaps executable programs, it's *possible* that there's enough redundancy to come up with that kind of savings.
/dev/random, I flat-out don't believe it.
If they're talking about 100:1 compression of a pile of bytes out of
-jcr
The only title of honor that a tyrant can grant is "Enemy of the State."
The company and its associates are in South Florida, a renowned area for illegal stock activity. The stock of one of the two associates listed on the page has gone from 0.05 to 0.10 in the last day on unusually high volume.
This is no different than IAUS from a few years back. They claimed to have a new data encoding that could significantly increase channel throughput. Nothing came of it.
Their website is an irritating mass of fussy flash animations, and their html site is apparently down until January 13th.
- Transforming the data to a complex vector space, C^n if you will.
- Using some very complicated seed and algorithm to generate randomish data in this complex domain that approximates the transformed data.
- Investigaiting the differences, and storing the differences with a "complex combinatorial series".
Yes it sounds like crap but it's not as empty as social texts.2k2 actually means "2200", not "2002". The k denotes a decimal place shifted by the metric multiplier; ie 2M2 is 2,200,000; 2T2 is 2,200,000,000; 2n2 is 0.0000000022.
Not my standard, by the way, its been used in electronics as a short hand for component values for years.
Uhhh, this sounds like bunk to me, anyone want to comment. They claim to be doing "Multi Dimensional" encoding:
meta name="description" content="ZeoSync's mission is to improve all existing and traditional communication systems. As the world's first and only provider of multi-dimensional encoding technologies, we will introduce affordable microchips and software to the global telecommunications community. We will radically improve network performance while simultaneously creating excellent equity participation for our shareholders.
">
meta name="keywords" content="technology, chip, microchip, satllite, binary code, code, compression, multi-dimensional, encoding, transmission, invest, broadcast quality">
Hmm...didnt I just leave 2001 behind I few days ago? Maybe I just drank too much Bud Light that night.
I am in no way a compression specialist but: ZeoSync intentionally randomizes naturally occurring patterns to form entropy-like random sequences through its patent pending technology known as Zero Space Tuner(TM).
;-)
in this phase we are going to randomize the hard work you want to send over the internet, effectifly destroying it (unless you have the seed ofcourse)
Once randomized, ZeoSync's BinaryAccelerator(TM) encodes these singular-bit-variance strings within complex combinatorial series to result in massively reduced BitPerfect(TM) equivalents
Now it's going to find patterns in the so called "randomized" data, and probably writing those down, now irreversibly destroying your data...
s. The combined TunerAccelerator(TM) is expected to be commercially available during 2003.
and they are putting it off for a year too.... hmm...
"By significantly reducing the size of data strings, we can envision products that will reduce the cost of communications and, more importantly, improve the quality of life for people around the world regardless of where they live."
Jezus! these guys are geniuses!!! better compression REDUCES the cost of communications... damn... I wonder what else they envision?? that the files will be smaller too???
my conclusion
we can randomize any string, store 1 byte, then generate another random string... which, because it is random has a snowballs chance in hell of being the same
correct me if I'm wrong, but this really seems to be a load of crap to me. Plus they use WAY to many buzzwords
Fighting for peace is like fucking for virginity
What does the article mean with "random data"?
1) Data with maximal entropy?
2) Random file picked from Internet?
In case of 1), I'd say the article is crap. If bits in the data have absolutely no dependency between them, i.e., redundancy, (also between non-adjacent bits) it is absolutely impossible to compress them. It's not even good as a fairy tale.
In case of 2), ok, 1:100 may be possible for most non-compressed data. The new JPEG-2 algorithms can do 1:100, but it's lossy. Text compression algorithms might do 10:1 on typical text, but they are also quite fast and don't therefore find all redundancies. For example, Huffman encoding is at simplest done with just single characters, and not much longer sequences, the searching of which takes a lot of time. The redundancies do not also have to be linear; for example "wDoRrOdW" ('word' written first in lower case, and then with upper case to opposite direction) would be difficult to compress completely, although it clearly has high redundancy.
Removing all redundancies would require finding the shortest description, i.e., a program that prints the string. To find it, we have to go through all possible programs that are shorter than printf("wDoRrOdW"). Many of them don't even terminate (for example "while(1);"). Complete search is therefore impossible; all algorithms make guesses about the topology of the search landskape, and don't search everything.
I have absolutely no doubt that this method works well within the theoretical limits, albeit it's of course always possible that it verges the limits closer than any earlier methods.
While this theorhetically could work to reduce messages down upwards of 100:1 compression, both the compression and the decompression would require huge resources of computer CPU time for a message of any reasonable length. Even if you had pre-built a table of 'short unique-prime-factors integers' to make finding the optimal composite to send back, you'd still have to generate some huge N-digit number, and then the decoder would have to be able to recalculate that N-digit number from the prime representations.
So while I'm sure this is possible, computing speeds are no-where near close enough. And it would appear this company is trying to vie it for use in compressing internet traffic. Maybe on 512-byte messages they can get something, but I doubt if it's anything close to effective for internet use.
"Pinky, you've left the lens cap of your mind on again." - P&TB
"I can see my house from here!" - ST:
given: they've succeeded in losslessly compressing random data (right...)
given: compressed data looks like random data
does this mean they can compress compressed data? So I can run their algorhythm over and over again and compress [arbitrary large thing] to [something really small]? Something's not right here. Though it would be fun to have this exchange:
me: I've got a linux distro on this floppy disk
foo: Wait, this disk is empty
me: No...see that bit there?
The flaw here is simple,
When you reorganize the string of data, and sort by value, you must retain information on how to restore the string to its original order. There is no effiient way to save this "undo" information without negating the benefit gained from compression.
For example:
Given a series of random numbers: 34, 8, 244, 127
If you reorganize them by value: 8,34,127,244
You can create redundancy if the string is large enough - for 8 bit values, a string of 25,600 values should produce a lot of repettition - in this example, there would be an average of 10 repetitions per value (10*256=25,600).
This is nice until you try to decompress the file. Without a record of how to reorganize the values, you are left with junk.
Even if you keep a record with info for reorganizing the data, the overhead needed to store the undo info outweighs the compression benefit.
If you did find an efficient way to to store the undo information, it would be more effective to simply apply this algorithm directly to the random data!
X
wow, that's amazing. i bet the person you're replying to didn't know that. see, they thought that perl actually consists of purely random uncompressable data. they weren't making a jab at perl's supposed unreadability, no they were not. it's a good thing you came along to educate us all.
ZeoSync's HTML site will be available January 13, 2002 with costumer service agents providing chat assistance.
So they have a set of professionals in charge of "dressing up" their technology? Isn't that normally called the marketing department?
...Come on, write the right year at least.
Oh wait! We got it all wrong! It's not 100:1, it's 0b100 to 0b1! That makes more sense.
They have 1 line of text in thier opening page, and it refers to Costumer support. Or do they sell costumes? Don't fall for this fraud. What a joke.
Lossy compression... although I believe I will stick with LZip for now, as I find it MUCH faster when compressing large files.
Hmm... Press release...
Hey! There's some contact info... Let's see...
@wilsonmchenry.com, must be http://www.wilsonmchenry.com/ , I wonder what they do...
Let's see: "Strategic Business Communications"
Doesn't that mean the write press releases like that? To get maximal publicity?
They seem to be doing pretty good job!
(Come on people! You can't be that ignorant!]
"In our three dimensional world we can visualize an example. If we were to take a three-dimensional cube and collapse it into a two-dimensional edge, and then again reduce it into a one-dimensional point..."
Last I checked, an edge is one dimensional, and a point is zero dimensional. 1337 math skilz dude!
Things fall apart, it's scientific.
This fact burned the Russians when they were
trying to generate long series of random numbers
for their operatives. They got a bunch of typists
to bang away for a while, the Americans realised
this, and broke their code.
Check out "Code Breaking" by Kippenhahn (sp.?)
throwing money down a hole.
A lot turns on what they mean by "random". If you think sound, and can extract a white noise component, you could mathematically say X% are truly random bits (where any bit string can be replaced by [nearly] any other).
All compression is the creation of virtual machines that have instructions like "write a zero" "write a one" "copy 8 bits from 24 bits ago". More instructions need more bits to specify. Truly random data would require random instructions.
Aha! Now I get it, it's a shell script to run gzip multiple times! Too bad I got prior art:
#!/bin/bash /tmp/$$ /tmp/$$ > $1
#supercompress filename number_of_iterations
x=$2
while (($x > 0)); do
gzip -c $1 >
gzip -c
let x=$x-1
done
Yay! All your codecs are belong to us!
Once more unto the breach, dear friends, once more, Or close the wall up with our American dead!
None of their words marked (TM) are trademarked, search at http://tess.uspto.gov/bin/gate.exe?f=tess&state=h0 9ukh.1.1 and see.
Our patent-pending technology ReductioAdAbsurdum (TM) will likely be infringed upon by this new technology, so rest assured that our lawyers will scrutinize their compression algorithm closely.
My car gets 40 rods to the hogshead, and that's the way I likes it!
pi
In two bytes I've Exceded 500000:1 ratio on transmitting truly rasndom compressed data to you -- were I to use the ASCII code character for pi I could double that thruput.
This proves that in theory at least that one can compress random data sequences.
https://www.gnu.org/philosophy/free-sw.html
I have a simple algorithm for lossless compression down to one byte. Now the key for de-compressing it happens to be the same size as the original data.....
This signature is a waste of 42 characters
First of all, it's impossible to acheive any type of ratio on random data. Good quality random data such as that from random.org simply can't be compressed. Period.
Data compression works by finding patterns in seemingly random data. A standard video stream really doesn't contain that much unique information. That's why we can compress it pretty well without loosing too much data. However, random data is 100% unique and you must have, say, 8 bits representing 8 bits because there is no other way to represent it without loosing information.
The claims by this company are impossible. I read their technical description and I'm still trying that around in my head. It doesn't make sense. It's called the rule of limited entropy and no data compression breakthrough can break it. You can't just make data appear out of thin air.
Is it just me, or is this another company looking to swindle over a few VC investors? The only type of program I see here is the lie, buy, and sell high kind -- I don't buy it.
"I'll just chip in a bit for RedHat: I actually have that installed on my university machine." - Linus, '95
There is one method that might work - on sum data.
:(.
Godel encoding is an old technique for compression, with a fast decompress (P time). Unfortunatly the compression statge is NP (Maybe NP-Hard, can't remember).
The method relies on expressing the number as an algebraic product, that can be expressed in less space than the result.
For example, in ASCII, the string (in RPN) "7 7 ^ 34 * 99 ^ - 7 p" has 18 characters. It's expansion has 740 characters. That's a compression ratio of, what, 35:1. [Ok, so you'd never actually do it in ASCII, but it shows the technique]
The advantages of the technique are that it gives better compression on larger numbers, in principle. In general, however, other factos come into play, and it bottoms out. My analysis suggested it bottoms out somewhere around 120-150:1.
However, the disadvantaged of the scheme are numerous. Firstly, there is no known algorithm to encode efficently. The system can't stream, like gzip and LZW can. Thus far, it's just an interesting idea.
I mention this because the mult-dimensional mathematics that they are reffering to have a passing similarity somthing I was playing with a couple of years ago, to look for faster algorithms (or any, really, other than brute force). It was cute, but always slower than brute force, save a few best cases
If I put my best guess ot max compression together with the uncanny similarity of the maths. Namely, you to a split into some expression, and then re-apply the algorithm to a sub expression. Then , throw it through a symbolylic computation routine, to optimise it a bit, and gzip the whole lot. It would only work well on some numbers, but you can pad it slightly to get a very different number, and try again until you get a good fit.
So stepwise:
ZeoSync intentionally randomizes naturally occurring patterns to form entropy-like random sequences through its patent pending technology known as Zero Space Tuner
Pad to a value that gives good compression
Once randomized, ZeoSync's BinaryAccelerator encodes these singular-bit-variance strings within complex combinatorial series to result in massively reduced BitPerfect equivalents
Godel encode.
The refference to may iterations suggest that they reapply the process to any large enough numbers left in the expression.
And that's a scary match, in my mind.
Of course, pinch of salt. There was a comment above about the odds of any compression thecnology being vaild being equal to teh claimed compression rate. I can't see how this might work. But I'm not writing this off just yet, it rings just true enough.
I have an algorithm which will compress any random data down to one bit. Here goes:
1: Represent the data as one big integer (this is easy, treat all the bytes in the file as one number).
2: Subtract 1 from the data.
This algorithm can be reapplied any number of times, until the data that is left is a single bit representing zero. Hell, since you know it's zero, why not get rid of it? So my algorithm can losslessly compress any data to 0k.
What do you mean you need to know how many times the algorithm was performed? How many bytes do you reckon that will take to express?
"I think he was truly surprised at how little I cared about how big a market the Mac had" - Linus on Jobs
See the comp.compression FAQ:
s ec tion-8.html
http://www.faqs.org/faqs/compression-faq/part1/
From the press release:
This press release may contain forward-looking statements. Investors are cautioned that such forward-looking statements involve risks and uncertainties, including, without
limitation, financing, completion of technology development, product demand, competition, and other risks and uncertainties.
Anyone care to wager whether a group of guys sitting around at lunch had the following conversation?:
dude 1 - H3y! | g0+ +h|z 31337 |d34! 13+'z pu+ 0u+ 4 pr3$z r3134$3 +h4+ \/\/3 c4n c0mpr3$z d4+4 4+ 100:1!
dude 2/ - You're trippin man.
dude 3 - No...that's a seriously cool idea. We can get that VC we always wanted and then skip the country.
dude 2 - d00d! *starts writing press release*
What is your Slash Rating?
> Why couldn't it be possible to have the a single algorythmic solution that works on the entire dataset simultaneously?
Because if you have a "perfect" compression algorithm, then it is not always reversible. That's because it maps a larger set of files (all files of N bit length) to a smaller set of files (all files of N-1 bit length, or whatever). Therefore, the mapping can't be an injection (some two or more large files are mapped to the same small file), and therefore not reversible. So you can't uncompress the data.
But fortunately, you can always get by with a single bit of constant overhead. Simply set that bit to 0 if the rest of the stream is compressed, to 1 if it is uncompressed. Now, if your algorithm produces a larger file than the input, just leave it uncompressed and you have only lost 1 bit.
The argument about no "perfect" compression algorithm existing is overrated, IMO., though people like to point it out whenever a compression algorithm pops up. Of course a press release wouldn't mention that they sometimes increase file sizes by 1 bit!
(I still do think their technology is bullshit, though.)
OWS was a Lossless Fractal Compression Program that had similar claims of unheard of compression.
i on &selm=3131e2b5.39555131%40news.az.com&rnum=1
It turned out that all the program did was create an encrypted directory listing of the files you were compressing. The program would then use the listing to search your hard drive for the same file you thought you had compressed when uncompressing the archive. It then copied the file over to the directory in which you were uncompressing your fake archive.
When it could not find a file matching the listing it simply told you it had a disk read/write error.
Here is a link to the google archive about it.
http://groups.google.com/groups?q=+OWS+compress
I think people just like this type of press releases.
People... people... quite simply... in science, breakthrough are very rarely "sudden". Usually, research has been going on for a while and intermediate results are found along the way and published.
That's how research works. You very rarely start from scratch and find the holy grail on day one.
Of course, there are exceptions, but when a surprising discovery comes along, it is usually a long way from being cooked and needs further research.
Hard work is needed almost all the time. Period.
this could be acheived is by first running the 'random' data through some kind of preprocessor and doing some hocus pocus. That said, this claim still smells funny.
I got early copy of their compression technology and few their press release into it. I was surprised how well it did compress it. I attach the output below so that anybody else who has the decompressor can read the press release in it's entirety without having to download it all:
"Bullshit".
There's just not a usable decompression method.
For any algorithm to be of practical use, it must be relatively speedy on current hardware. If it takes a gazillion horsepower to compress and decompress the data, it's of no practical application until processor technology can support it inexpensively.
Here are a couple of "good" compression ideas friends have presented to me:
1. Since PI theoretically contains every possible string of digits that can exist, why not use some index into PI to compress data? For example, surely the string, "Hello, World", when converted to ASCII numerical values or some other numerical sequence occurs somewhere within the value of PI (maybe, say, starting at the 10 trillionth digit). I pointed out that most likely, the index # into PI (and you can encode it however you please) would average out to be as big, if not bigger, than the data to be compressed. He didn't believe me, worked on it for a year before discovering that, indeed, the index # into PI ends up being as big, if not bigger, than the data being compressed.
2. Same thing with random number generators. If you have an algorithm that is good enough to generate all possible string of numbers, then why not just store a random number generator "seed" that represents the data to be compressed. Feed the seed into the random number generator, and generate "random" numbers until the file is restored. I pointed out that the size of the seed predetermines the possible number of sequences that can be represented. For example, a 16-bit seed can only possibly represent 65k different outputs, with most of those outputs not being useful to represent actual data. Thus, the seed size required to represent any and all data would be so great that the seed would end up being as big, if not bigger, than the original data. He attempted to prove me wrong, and came up with an algorithm that broke a file into chunks that could be represented by random number seeds, but the algorithm ended up, at best, producing output about 25% smaller than the original, and averaged files bigger than the original uncompressed data.
The list goes on... when we dress up such things into really fancy and complicated mathmatical clothing, it just takes us a little longer to realize, that, indeed, it really isn't going to compress things better. Sometimes, it takes enough longer that people will build up a company to sell the soon expected product, only to die when the product cannot be delivered.
Well...having never seen my 14.4 modem move data that quickly I decided to do a test. A friend and I dialed into each other's modem. We created a 300k file consisting of nothing but upper-case "P". Guess what? We got that fabled 150,000 bytes/sec!
It goes to show, you can do amazing things with ramdom data as long as your random data is carefully selected. Don't tell me that an infinite number of monkeys couldn't produce a 300k file of upper-case "P".
Kind thoughts do not change the world
''randomly'' typing on your keyboard is a pretty crummy source of randomness... ;)
Otherwise known as XOR.
_O_
.|< The named which can be named is not the true named
If we look at what they are actually saying (not much actually) it seem there might be a misunderstanding. I don't think they mean to actually compress random files. The multitude of the word analogue and some other wording, make we want to wait for more details before saying 'no way'.
It seems that the articles passed so many hyping filters that nothing meaningful can be discerned.
Two witches watch two watches, which witch watch which watch,and which watch does which witch watch?
New Compression technique gets 100:1 compression on random data.
Caveat: if Random is defined as a arbitrarily long series of identical values. Deviation from this may cause less than optimal amounts, specifically a nominal compression ratio of 1:1.
MORAL: just redefine the question and all problems in science and technology go away.
100:1 loseless compression on any data is impossible. Why? Here's why:
What are the requirements every (loseless) compression software must follow? It must be able for every array of bits of length N to produce another array of legth M, which can be then translated back into N.
Look at it this way:
You have a array of 1000 bits. They can be combined in 2^1000 ways. Because we imply that it is possible N -> M -> N we also must have (at least) 2^1000 different M arrays and that is only possible if you use at least 1000 bits, so no compression is possible.
So how does software compress stuff? Well, it uses repetitive data and shortens it. In the extreme scenario, when none of the data is repetitive, the file is even longer (because of compression software overhead).
Therefore it is impossible to compress (any)data 100:1 losslessly. Yes, you can compress it 100:1 if you just have an 1000-bits long array of ones, but not in any other case...
Boky
boky
Maybe it is random data. See, they just send over the seed number, the number of repetitions, and a key defining the machine/random number algorithm used. Cuts anything down to 12 or 16 tops.!
Sounds like a winner idea to me. Sign me up.
The following is a proof that a perfect compression algorithm has an average compression rate of ONE. Yes, ONE. That translates into NO COMPRESSION WHATSOEVER. A short aside on why compression is used if it "doesn't do anything" follows. I'm not doing this rigorously because I don't remember the rigorous proof, before anyone asks. This is sometimes referred to as "the enumeration proof."
Assume any given data N to be compressed can be viewed as a binary number. Assuming the algorithm works on any given data (sure I can say that your file is "1" compressed and refuse to compress anything else, but do you need my help for that?), it must be able to compress all numbers from 0 to N. It must also give UNIQUE compressed answers (I can say that ALL files are "1"...decompression's tricky, though). Therefore, if the algorithm is used to compress all numbers from 0 to N, it will return the numbers 0 to N in a different order, in the BEST CASE. If the algorithm isn't perfect, it will return numbers GREATER than N as well.
So why do we use compression? Because we don't compress numbers from 0 to N. We compress things that have patterns. Lots of them. Because of that, algorithms can make additional assumptions (some quantity of repeated data will be present in the data set being the usual one). Because of this, the average comopression of an algorithm, when used on random files and not enumerated numbers, is usually less than one (i.e. usually makes the file smaller). If you custom-write a file in binary that contains little to no patterns, you'll find that most compression algorithms will either make it larger or leave it the same. The last thing I'll mention is an example of where compression works really well: text documents. Since most letters in a document are within a certain ASCII range, the document can be reduced in size. For example, if you use no character under 65 (and your document has no header. Shh...it's an example), the first bit of every byte inthe file will be 1. The compression algorithm can see this, and remove all these ones. It will have to add a couple bytes at the end mentioning how the file was compressed, but you'll be getting rid of 1/8th of the file for the cost of a couple bytes. That's pretty good.
I'm sure no one will mod this up, because no one likes anonymous cowards, but it might as well be here for posterity's sake.
In the face of everybody, quite logically, saying this will not work, I would like to propose a means by which it could. It goes like this:
Let' try to compress a long string of bits, say 1 billion of them.
Well as everyone knows there is an incredibly large number ways of setting zeros and ones in 1 billion bits. I would like to propose that a large number of those combinations will NEVER be used in any meaningfull way by man kind in his whole history. Nor by any intelligent beings in the whole history of the universe. There just isn't going to be enough space or time for all those combinations to be used. Lets say that only 1 percent of the combinations will be used by some being, somewhere at some time. All we need to do is assign numbers to those usefull combinations and forget the rest. BINGO we have just compressed the data by a factor of one hundred.
Now if 1 billion bits is not long enough to introduce the required redundancy, I'm sure a longer string would. I will leave the calculation of the shortest string where all combinations are used by some one eventually in the universe as an exercise for the reader.
So how do we find out which combinations are now, or are going to be, usefull ?. Well for this we need some serious physics. You know, quantum mechanics has a way of exploring all possibilities pretty quickly. However that's a bit out of my league.
Cheers.
It's true that you need to find patterns to compress data. What constitutes a pattern though, can be more complicated than what gzip offers.
For instance, I can come up with a number of statistically random sequences that can be compressed very small if you "know the pattern", but will fail completely to be compressed with gzip. For example, I could take 11 MB of the binary digits of pi -- a very short program can produce these, but gzip will totally fail in compressing it. Or I could encrypt 11 MB of zero bits with RC4; if I know the key then it is also extremely easy to compress -- otherwise, it will be nearly impossible.
So the art, really, is in finding the patterns. I'm pretty sure that ZeoSync's stuff is bullshit, but it doesn't *necessarily* mean that this kind of thing is not possible (just... unlikely).
That's just amazing! Let's test it. Here's an idea of a pretty good test :
I'll prepare 257 files containing random data, which are each 100 bytes in length. Then, they'll be able to compress each of those files into a corresponding lossless compressed file which is one byte long! (Remember, this is supposedly 100:1 lossless compression of random data.)
Oh, wait a sec... How can they possibly represent 257 different files, with only one byte each? That one byte can only represent 256 different possible values!
What about if the files that I asked them to compress were only 2 bytes in length, instead of 100 bytes in length? Still, 257 of them. Since they claim to be able to do 100:1 lossless compression of random data, they should be able to do 2:1 lossless compression of random data. I mean, that's 50 times less impressive! But, wait... They still have to express 257 different files with only 256 different possible values!
Huh... How many different files are 2-bytes long? I guess there's 65,536 of them. I only wanted them to compress 257 different files each into a byte. The task of compressing 65,536 different files into one byte is almost 256 times harder than what they already can't do!
This is starting to sound like a theorem, or something!
Education is the silver bullet.
stop modding me up and save some points for the other people
update comments set karma=-1, reason='offtopic' where sid=26315
_Maybe_ they've got an algorithm which compresses _random_ data by 100:1. The methodology, as I read it, sounds suspicious. They want to take seemingly 2-dimensional (serial, basically) data and encode it so it looks 3-dimensional, and somehow that helps compression.
Doesn't this encoding take some extra space? Seems to me as if there would have to be position data for this 2D stuff to become 3D stuff.
But honestly, the terminology aside, the mere fact that their website looks so fancy and "flashy" and that they've got as little technical detail as possible leads me to believe that this is not worth my time looking into. Anybody that puts this much effort into appearance has no substance to back it up, in my experience.
We'll know they went wrong once the SEC starts investigating for their wild claims. And if they've got the next compression method that beats all else, hey - cool. But I doubt it.
- Chris
The last company i heard of that claimed this (about 10 years ago) ALSO claimed to be able to compress thier compressed files at 100 to 1 ... which could also then be compressed at 100 to 1 ... in fact, they claimed to be able to fit the library of congress (potentially) on a floppy!
While they WERE, in fact, able to compress data in the ratios described, the reporter pointed out that they were still working on the problem of data "de-compression".
god i wish i could remember that company...
It's no wonder...they were forced to develop something on this order of magnitude by their marketing department just to get their forty-umpteenth bazillion byte flash-enhanced website to be browsable over a lowly T1....
moto411.com
3.14159265...
2.71828182...
1.41421356...
Their press release ends with the following fine print. Enjoy!
This press release may contain forward-looking statements. Investors are cautioned that such forward-looking statements involve risks and uncertainties, including, without limitation, financing, completion of technology development, product demand, competition, and other risks and uncertainties.
This sounds of course like complete hot air to me. I wonder what the guys at random.org think.
I can understand people at Reuters being trolled by this crap, but Slashdot too? Wow.
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
These algorithms always try to be close to the theoretical Shannon limit but that includes things like 1.25 bits allocation for a specific bit sequence - hence room for optimization.
The 100:1 ratio can be acheived by using extremely lossy comression (they mentioned DCT,FFT - which are useful in lossy comression schemes
Given the marketing hype, this may actually be what lies beneath
Two witches watch two watches, which witch watch which watch,and which watch does which witch watch?
Most people here seem to have skipped their Information Science lessons me thinks. There's a lot of throwing of the word "random" and "information" and "compression" but most people don't seem to know what they are.
In fact, TRUE RANDOM DATA (e.g. white noise) has the most information (the highest entropy) and is the most difficult to compress. Loss less that is. Funny thing here is if you count lossful compression (such as mp3 coding), random data is in fact easy to "compress", because you simply code it badly and upon decoding obtain new (but different) random data (e.g. white noise).
So, you can not compress TRUE RANDOM DATA. It has the highest entropy of all data (highest information density).
If you compress file A into a file B of 50% the size of A, then file B is MORE RANDOM than file A. The entropy of A is lesser than that of file B. You could also view this as: the information density of file B is higher than file A.
The problem is that if A is a legible text file, we seem to think that A has information and B hasn't.
My conclusion is that this company can not compress TRUE RANDOM DATA 100 to 1. If you read their press release they talk about "Practically Random Information Sequences". With this they probably mean things like audio, video, and the lot. Not random at all.
Somehow this reminds me of a joke from my student days. When a transmitter sends a string of zero's and one's to a receiver somewhere there's a fat chance some of the one's turn into zero's and the other way around. This is a problem. But there's a simple solution, simply make the transmitter and receiver terribly bad so that ALL data is received badly. All one's turn into zero's and the other way around. Place an inverter after the receiver, and there you have it, perfect transmission!
The flaw here of course is that under the worst possibly conditions only 50% of the data is received badly....
Do you really trust lossless compression from a place that can't even count their own election ballots?
"Our data was completely random. It's just an odd coincidence that it came out as all 0s, really!"
ZeoSync "expects" to overcome the restraints? I can see those guys now. "Well, let's see. What if we came up with some mumbo-jumbo about 100:1 compression or cold fusion or what-not, and make an impressive-sounding press release that says we're right on the brink of a breakthrough. Then people will get excited and give us money, and we can sit around for a couple years pretending to work on the problem only to discover a previously unknown fundamental flaw."
Wow, I think I've just found a way to not work and still get people to give me money!
I hope I'm wrong though, because my poor 56k connection is in serious need of a boost.
It is impossible to losslessly compress all sets of N bits down to M bits where M N.
Proof: Suppose there was a machine COMPRESSOR that took N bits input and returned M bits output. Now suppose there was another machine DECOMPRESSOR that tooks M bits input and returned N bits output. There are 2^N sets of N bits ranging from 0000....1 to 111...1. There are 2^M sets of M bits. Thus, DECOMPRESSOR has to produce 2^N different outputs for 2^M different inputs. At least one input has multiple outputs---and DECOMPRESSOR cannot decide which is the correct output without extra bits of information.
As I understand it, normal data is compressed by finding repeating strings and shoving it in a dictionary. This obviously won't work for truly random data. Since we all know what truly random data is a myth, why don't we use mathabatics? Compare the string to something like pi, or e, or other equations that produce commonly occuring 'random' data. Of course the time it takes to compress would be very long indeed. But, thankfully, processor power is fairly cheap, and big businesses surely won't mind running their 64 teraflop cluster for a week if it'll get nike commercials over a 56k in a second.
Since my views are obviously misguided, could someone share some links about compression theory?
--Roy
A full delphion search of zeosync and Piotr Blass turns up no patents at all, issued or applied.
There is a reference to Kolmogorov which is so hyperbolic (as is the rest of their "technical" text) as to be nearly incomprehensible. It may be obfuscatory or it may have been written by a mathematician unfamiliar with human communication.
Kolmogorov is a relevant reference, but possibly a trivial one: for example to compress a pseudo random stream of arbitrary length one need know only the length of the stream, the algorithm, and seed. This is obviously not generally applicable; though, indeed, it is not addressed by Shannon's theorem.
Piotr Blass appears to be an actual mathematician at Palm Beach Atlantic College and apparently edits the Ulam Quarterly, an on-line mathematics journal.
Unless and until they show a self-contained archive of a small size that can be brought to a standalone computer and expanded into a standardized benchmark data compression corpus, they can be ignored.
I pity the poor VPs who gave them money.
Perhaps another kind of breakthough could be made by leveraging the internet for the keyspace used in your compression. (Okay, I might not have the terminology quite right... that was one of my friend's realms of interest.)
The idea is that you have a token that is given to a remote server, which sends back a stream of data. As long as the tokens were significantly shorter than the data provided, then the observed local compression would be highly significant.
Or, put another way, you're NOT storing data on a remote server. But a remote server has a very well developed library of token/data combinations. So, when a client sends a stream of tokens to this server, they get the original stream of data back (even though the stream of data, itself, isn't recorded in whole at the server).
Again, not for random data. And perhaps better if the tokens at the main server were geared to particular types of data with a different tokenspace for each.
Is this idea very silly, or very good?
in There's Something About Mary. These guys will be in great shape until someone claims 200:1 compression. Then it's back to the claims drawing board.
-
their press release reads like a bad onion article
Wherever they picked their terminology from, it's *not* from coding theory articles, or topology articles, or probability theory, or any other branch of Math I've ran into. (IAAM - I Am A Mathematician).
Whoever wrote that text has never read any of these articles. And is not a mathematician. Nor a programmer.
Draw your own conclusions about the company.
--
I refuse to use
You cannot compress a truly random data set. Every compression algorithm relies on somelevel of data redundency in order to acheive the compression. Here is an excellent site about a $5000 data compression challenge that has yet to be won:
http://www.geocities.com/patchnpuki/other/compre ss ion.htm
Take care,
Brian
--
100% Linux Web Hosting Services - We Don't Do Windows!
--
A few years ago I read a translated article by a Russian theoretical physicist that delt with compression of data. He was working with some computer programmers and mathmaticians to compress data efficiently for use on their older computers and for transfer between sites doing research... the logical extension and application was also military as well as encryption.
While the math was over my head a bit, he went on to explain that it was theoretically possible to compress a standard bianary data string of randomized computer data by a factor of 200 or more; central to this was the use of a "key" similar to encryption. This key would be generated by the software in advance of compression and used in the decompression of the data. The key was small... 10000 times smaller than the data stream itself on average when data sent exceeded 1mb.
The military had expectations of incorporating this compression into target coordination software used in datalinks between the latest generation of PVO type fighters, as well as in targeting data for nuclear missle equiped vessels and launch facilities.
It's compressed and encrypted at the same time...
The article was in Janes and a few obscure math sites. I remember the guys last name as Stynavovich.
The scheme does work - I was given restricted access to the code and tried it. And since it works on random data, by iterative use of the scheme you can achieve almost unbelievable compression. However I've been informed by a reliable source that the algorithm is being quickly suppressed by order of OSHA and the Consumer Product Safety Commission - with a sufficiently large file and sufficiently many iterations of the algorithm, the data can become so compressed that it explodes, not only wrecking the computer but creating a serious risk of injury to life and limb.
We regret that we cannot provide investment opportunities for any resident of Kansas.
What, Kansas has a law about selling perpetual motion machines or something?
Random data is fake data so whats your point? practically random data is compressable. Truly random data would involve no patterns and is only theory and useless to the person who simply want to compress some file. Truly random data would have no number occuring twice which limits it to the bit length defining its basic unit... 100:1 Lets get real. And if you come out with a bold faced lie such as that, you will be quicky exposed. But sometimes marketing rules over intelligence. Their taking a gamble. If they had an IPO tomorrow I'd buy, but I'd sell the next day.
For every data set that is compressible 100:1 (which I will grant them.. even a fool can do that), there are 99 which grow larger or the compression fails entirely.
So, they have figured out a way to compress difficult-to-compress data rather well, but cannot compress easy stuff that LZW works on? Rather dubious, but I'll eat my words with a smile if they can put all the Star Trek episodes on a floppy disk.
Any connection between your reality and mine is purely coincidental.
Your using the word 'description' lit the proverbial light bulb -
Let's say I want to describe my car. I say, "my car," which 'compresses' the idea of my car, really really well, but there is no way to get back to my car. It's like some compression ideas that were suggested, really really good compression, just no way to 'get back.'
Now, let's say I use more description - "1990 geo storm hatchback." It still doesn't work, because there are cars like it out there - we still don't have the complete idea of my car. In fact - to completely describe my car, I would need to describe every atom on it - so compression would seem to be impossible, my car is too random, as all real objects are, to be compressed.
What compression does, is it describes an object A in another object, B. The description must be complete, otherwise there is loss. That is how JPEG's and MP3's work - they knock out some of the 'unimportant' data. The data 'we can't see.' But we don't want that on data, we want lossless compression.
Back to my car - real world objects could be 'compressed' by a complete description, that is exact in its detail but includes patterns to model the object - probably using shape descriptions to model panels and engines, describing one basic piston, and then including each piston's differences from my basic piston description. In this way, the entire car could be described, with no loss to the original, complete idea.
Hope this helps - thinking about it this way kinda sorted it out, for me.
Who is this Anonymous Coward character, how does he post so much, and why is he always such a whore?
If it's not true random noise, so what? Random noise contains no information; why would you want to compress it?
So long as the input data isn't hand picked for their claims, but rather is representative of an actual application...what's the problem?
--an unbreakable toy is useful for breaking other toys--
Some said that you can't compress truly random data. Well, in a set of sample datas you might or you might not compress it. You can't just generalize by saying 'no true random can be compressed'. If it's random and you got a decent number of samples something in the between will have to work.
:-)
:-)
;-)
About the 1 bit discussion, I don't think they meant they can compress something more than one time. Hey, if my 100 mega file gets to 10 mega, I (and the rest of the world) don't need to get to 1 mega, 100k etc... for this technology to be a breakthrough.
I have noted that lately slasdot's comments are tanging the 'vaporware' discussion. I am under the (correct)impression that, from time to time, a certain topic drives more generally all others here. A couple weeks back was M$ bugs. We could talk about biogenetics and someone would comment about 'what if M$ has a bug and our DNA gets shared', etc...
So, the new 'wave' is vaporware. While I neither believe someone has found a way to achieve close to 100:1 compression, I think we should be reasonable that if they do, we don't need to achieve a final file of 1 byte size.
Which in my humble theories is somewhat possible... But that's another topic. Just for the sake of the exercise imagine that a compressor has a large dictionary with levels. So in level one 'a' equals to '1234'. In level two, 'a' equals to '0987'.
I have never worked with compression but I believe it works something like this:
1) You found patterns
2) You replace the patters with smaller combinations from your dictionary, repeat until it's not possible to find patterns.
Well, if you go the other way and take a bit, and it says in dictionary one bit at level one means 'abc' and 'abc' on level two means '1235jdjlh' which in turn means 'mary had a little lamb', well, you could achieve a 1 bit compression. BUT ONLY IF YOU KNOW what is the output. The other way around. 'Decompress' something that you know the result.
Because in your dictionary, one bit in level one will always have to means something fixed.
UNLESS there are two patter dictionary. One in the program itself and another on the file.
If you have another one in the file, it can interoperate with the decom/compressor to discover what 1 bit means.
But then again, the compressed part would get to one bit, at least MAYBE (like I said, never saw how REALLY a compression system works). But it should require tha a dictionary is attached to it, making it a larger file.
Er... can anyone with experience on this field just reply to me if this is how compression systems work? Just curious. And if there's this concept of two dictionaries files interoperating. Thanks
Okay, now, what impact would this kind of technology (if it exists) have in our lives? Even if it's not 100:1, it's 50:1 or 75:1? Slashdot readers and their comments are extremely good for pointing mistakes and flaws on almost everything
But, some day or another someone somewhere will come with this (or a closer) tech. Maybe it's too much marketing talk, but it will change a lot of things... And I didn't see much comments about the impact.
Where are the scifi fans when we need them
Buy a Nintendo DS Lite
If you already have the entire set of codes locally, that means you essentially have every pr0n video every made, already on your hard drive. Not only that, but you'll have every non-existent one ever not made, including the one with Natalie Portman getting it on with the midgets from the Wizard of Oz. I have got to get this technology!!
Theres no way http://www.zeosync.com/index.htm was written by anyone who cares about /saving/ space :p
But on a serious note - random data doesn't necessarly mean unique data - all data is psuedo random, and could occour, and if you watch any random data stream long enough, and with a big enough past-data buffer, patterns will emmerge.
Mj
If a google search is done on ZeoSync, there is no mention of anything other than ZeoSync website. No links, no references from other web sites.
This is a HOAX .
Definition of random data is data with an AIC (information content) of 0.
So I doubt it.
________ semper ubi sub ubi
I remember back in DOS and BBS days someone came out with a "radical new compression" program that people tried and went nuts over. You'd test it, of course: hand it a 400k file (which back then was pretty big :), have it compress it, and you'd get something like a 100 byte archive (wowie!). You'd then delete the original, decompress the archive and SHAZAAM! there's your 400k original file back!
What it was really doing was copying your original file to a hidden directory and embedding the path to the copy in the "archive". Decompressing of course copied the hidden file back. If you were smart enough to find and delete the hidden copy the decompressor would report mysterious Drive C: data errors.
Wish I could remember the name of that hoax...
Not that I'm saying this one is a hoax <cough>
Can we assume then that a file compressed at 100:1 would be random data. If so could you not just recompress that same file and get 10000:1 compression? ...or would this compressed file no longer be random data, but a sequence of mathematical equations that we could compress even further with pkzip. Hmmm... but that would make the data appear random again and we will be able to get 100:1 compression on are already twice compressed file.
If it means they compress arbitrary random data it is just bullshit. It is easy to prove that there exists some file that will not be compressible, and not much harder to prove that actually there are many more uncompressable random files than compressable ones (read any text about kolmogorov complexity). But of course most computer files are not at all random. Compressing a *randomly picked* computer file is something different altogether therefore, but it still hard to guarantee a certain compression if the type of information stored in the file is not known. Thats the reason why different compression algorithms for different file types exist. All in all their claim is too fuzzy to say anything ... better compression is a certain thing of the future, guaranteeing compression for random files is just another cold fusion hoax.
I have invented a 1000x lossless compression scheme, too. I'm still working on the decompression, tho.
<grub> Reading
Now here's the interesting part: they used to spell his name right in a previous version of their official bios section. This could just be sloppiness, of course.
Babar
Ooops, they just missed the Wired Vaporware awards, maybe they can catch them next year. This is almost like that (i think it was austrailian) other compression system a few months back that claimed full screen, high quality, lossy compressed video down a modem - they just didn't say how lossy and there was only one demonstration (that no-one saw). Where is the demonstration for this? i know that if i'd invented it, i would want to prove to the world it was real as fast as i could. On the other hand it could be real - i always thought you could do it with dictionary-based compression, if the dictionary was stored in the decoder software and was very very large to include most likely bit-patterns. Ofcourse fractal compression for any data could be possible - it hasn't been disproved (i dont think). The article looks sus though, and who would call themselves ZeoSync, it sound like a fake movie name...
This comment does not represent the views or opinions of the user.
It's over here (Question 9, search for 'WEB, Gilbert').
"If he thinks he can hide and run from the United States and our allies, he's sorely mistaken." Bush on bin Laden
If they clame it only works with "small sequences" then maybe they are just compressing the predictability in the flawed random number sequence they are using as test data.
/dev/random 9-)
Maybe they need that patch that add's network generated entropy into
"Oh no, not again"
if their website is an indication for the quality of their compression algorithm i wouldnt invest a single penny. Resizing my browser window and only working with flash. i am so tired of that bullshit.
I think I'm missing something. How can the size of my pr0n files give people in a third world country a better life?
- MayorQ
Someone please move this article to the humor section. No really, before someone here tries to compress Natalie Portman and a bowl of grits, then shove it all down their pants.
If you're talking about compressed video over uncompressed. A typical DVD movie would be 720 (horizontal) x 480 (vertical) x 16 (bit, YUV2) x 29.97 (fps) x 6300 (seconds in a 105 minute movie) / 8 (bits pr byte) = ca. 100 gigabytes. In reality you'll get it as 5-6 gigabytes, while as a divx 2-pass (or similar mpeg4-codec) you will reach 100:1, at very little quality loss. Of course this is only possible because movies are *very* non-random both in each frame, and in one frame to the next.
Kjella
Live today, because you never know what tomorrow brings
The binary representation of pi contains all sequences, so it is claimed.
If only we could predict what the Nth bit of pi was going to be, then we could just specify an offset into the bit sequence and a length and we could have any file compressed as two numbers.
One of the numbers would be pretty large, though... It could easily be as big as the bit representation of the file, but hey, who cares??
It's still a possible algorithm. These ZeoSunc people don't seem to care about practical algorithms either...
Gimme some VC money!!!
Why is slashdot giving free publicity to these frauds? It's not like there's any chance in hell they've done something useful. How much brain power does it take to realize that you can't beat elementary math? We've seen this same scam a dozen times before, and it's always a fraud just like all the reasoning people point out.
This is the software equivalent of perpetual motion machines, or snake-oil elixirs. Giving them coverage will only encourage more people to ry the same scam. These folks need to be reported to the justice department, not slashdot.
--Lee Daniel Crocker : http://www.etceterology.com My life is in the public domain.
The first thing I thought of when I saw this was an old BBS hoax ...
...
... slightly -better- than 100:1 compression. I was impressed.
... in fact ... it had increased by ... 5K.
... old DOS terminology for essentially a background daemon, though a real ugly method) that, whenever you went to touch the file, it intercepted the call and fed you the renamed old version.
... it just pretended to compress the file and you had to "uncompress" (ie, unhide the old version and rename it over the faked file) with the same program.
The time was somewhere around 1985
The hoax was a program that claimed to have (drum roll please) 100:1 file compression. So sure, I downloaded the thing on my lovely 1200 baud modem, installed it and tested it on a 512K file.
Sure enough, the resulting file was less than 5K
Then I took a close look at the program and, after investigation, found that even though this file was 5K, my disk space available had not decreased
Of course, the hoax was that this program simply renamed and hid the old file and installed a TSR (Terminate and Stay Resident
...
A variation on this didn't do the TSR
...
I would love for this new technology to work, but chances are in real life applications it's going to be about as productive as the BBS hoax was.
It is more productive to voice thoughtful opinions (reply) than to judge (moderate) others.
Back in 1991 or 1992, in the days of 2400 bps modems, MS-DOS 5.0, and BBS'es, a "radical new compression tool" called OWS made the rounds. It claimed to have been written by some guy in Japan and use breakthroughs in fractal compression, often achieving 99% compression! "Better than ARJ! Better than PKzip!" Of course all my friends and I downloaded it immediately. Now we can send gam^H^H^Hfiles to each other in 10 minutes instead of 10 hours!
Now I was in the ninth grade, and compression technology was a complete mystery to me then, so I suspected nothing at first. I installed it and read the docs. The commands and such were pretty much like PKzip. I promptly took one of my favorite ga^H^Hdirectories, *copied it to a different place*, compressed it, deleted it, and uncompressed it without problems. The compressed file was exactly 1024 bytes. Hmm, what a coincidence!
The output looked kind of funny though:
Compressing file abc.wad by 99%.
Compressing file cde.wad by 99%.
Compressing file start.bat by 99%.
etc. Wait, start.bat is only 10 characters, that's like one bit! And why is *every* file compressed by 99%? Oh well, must be a display bug.
So I called my friend and arranged to send him this g^Hfile via Zmodem, and it took only a few seconds. But he couldn't uncompress it on the other side. "Sector Not Found", he said. Oh well, try it again. Same result. Another bug.
So I decided that this wasn't working out and stopped using OWS. Their user interface needed some work anyway, plus I was a little suspicious of compression bugs. The evidence was right there for me to make the now-obvious conclusion, but it didn't hit me until a few *weeks* later when all the BBS sysops were posting bulletins warning that OWS was a hoax.
As it turns out, OWS was storing the FAT information in the compressed files, so that when people do reality checks it will appear to re-create the deleted files, as it did for me. But when they try to uncompress a file that actually isn't there or has had its FAT entries moved around, you get the "Sector Not Found" error and you're screwed. If I hadn't tried to send a compressed file to a friend I might have been duped into "compressing" and deleting half my software or more.
All in all, a pretty cruel but effective joke. If it happened today somebody would be in federal pound-me-in-the-ass prison. Maybe it happened then too...
(Yes, this is slightly off-topic, but where else am I going to post this?)
LAMP hosting on Debian, SSH, no bandwidth cap, PayPal accepted - http://secondbrainhosting.com/
...was if they were powered by Blacklight Power. If you're not in the know, they're a "power company" run by a "scientist" who claimed that he had been able to reproduce something that sounded suspiciously like cold fusion in his Princeton, NJ-area laboratories. The Village Voice ran a story on them (where I read about these jokers) and a whole slew of investors were lined up (in the heady days a few months before the dot-com bubble popped) and last I checked, they still haven't actually, you know. Produced what they said they would two years ago (power).
If you've got a slow afternoon, take a gander at what physicists have to say about Blacklight...
Easy does it!
This comment has been submitted already, 276865 hours , 59 minutes ago. No need to try again.
interesting the way this article appears just above the one about wired magazine's vaporware list!
Just use RLE on a bunch of zeroes!
The technology never materialized. They were then touting a lossy compression scheme. Unfortuneatly, I am only guessing at 1000:1 ratio for images. I doubt very highly these guys can do what they claim either.
Here's an approach: A random number generator can generate a series of random numbers. So, given a series of random numbers, couldn't you work backwards to find the seed (and perhaps a few other parameters) for the random number generator that would generate that series of random numbers? Clearly, this is lossless. Clearly, the job gets harder as the series of random numbers gets longer. Perhaps 100 is the practical limit for finding the right seed. One nice feature: this is very asymmetric compression. Compression is very slow, but decompression would be very fast. Only problem: I don't know if it is feasible to find the seed.
The use of the term "practically random" in the press release is of course an exception that's large enough for you to drive a truck through it.
...
The press release has enough marketing hype in it that it's hard to see if there's any substance behind it, but it's quite possible that what is meant here is that the data is "randomly selected" from a "typical computer user's system." In other words, it's not random in the mathematical sense, but a random sample of a highly nonrandom space. (Or at least you would hope that the typical computer disk contains highly nonrandom data, otherwise you can't get much real work done with it!).
This would be a very different claim and one that might very well be true; Microsoft Word and Excel files, for example, are highly redundant files, as are executable images, email messages etc. It's even reasonable to think of possible ways to do better than zip/gzip by determining the file type and applying different compression algorithms to different file types; the power of programs like zip/gzip is that they do pretty well with no a priori knowledge about the structure of any particular type of file.
However the amount of hype in the press release does sound suspiciously like they're trying to figure out a way to separate naive investors from their money
In the future people will laugh back at the tremendous waste of time and money for trying to break Shannon's law as much as we laugh back at the people who tried to break the law of conservation of energy (eg. engines that did more work than the energy inputted).
I think maybe there is some sort of entropy in some multi-dimensional space for certain kinds of data that gives us enough redundancy to compress 1:100. Of course, the data cannot be random (otherwise no more frequent patterns or redundancy) and would only theoretically be compressile to our new measure of entropy. But, what kind of data? Even Shannon's law allows for 1:100 given the right kind of data.
I think the following statement in the press release pretty much says it all:
>We perceive this advancement as a significant
>breakthrough to the historical limitations of
>digital communications as it was originally
>detailed by
>Dr. Claude Shannon in his treatise on Information
>Theory."
How about algorithmic information theory? Kolmogorov, Solomonov, Chaitin? The statement above indicates that the most recent word on compression is an old Bell Labs tech report by Claude Shannon... not to put Shannon down, that work *is* a landmark, but there has certainly been more work done since.
Try compressing the number Pi using Shannons theory... you can't do it. On the other hand, using Kolmogorov complexity, you can compress it quite nicely.
The fact that this statement appears in the press release seems to indicate a great deal of ignorance on the part of this corporations researchers. Part of any good research program is to familiarize yourself with previous work done in the field... and AIT is *not* some obscure backwater idea... there are several conferences on this topic every year and just about every CS graduate student has seen at least Kolmogorov complexity.
This is a pretty serious credibility robber. (Not to mention that from a mathematical standpoint, compressing totally random data is impossible under our current axioms... so if we *can* compress completely random data... its time for a new theory of the foundations of mathematics. At the risk of sounding dogmatic: do you *really* think some dot-com startup is capable of this?
Perhaps they are, but I'm going to need to see the proofs written up nice and formally before I run out and buy snake-oi... I mean *stock*.)
These people who come up with these recursive lossless compression algorithms that can compress any file are stupid. They just don't see the real possibilities of such an algorithm.
If I had such an algorithm I would decompress 0 and 1 repeatedly until I generated every possible piece of content in the universe and then sue the shit out of anyone who dared to create or copy anything without my express permission!!!!! BWAHAAHAHAHAHAAA!!!!
Best. Comment. Ever. Enjoy!
ZeoSync: Ladies and gentleman - observe! The random data goes in THUS, and run through our process, comes out 100 times smaller!
ZeoSync: Now, we carefully unpack and - volia! random data of the same size as before! This is due to our patented process and a little bit of magic we like to call "length of file stored in the header".
Investor: Hey - those first few bytes from the original and uncompressed file look totally different!
ZeoSync: Those bytes are in there somewhere - we only said LOSSLESS compression, not ordered!
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Yeah, from their techno-nonsense filled press release, I'd say this is fraud, but not aimed against anyone with technical knowledge.
I could just imagine these people going down to retirement communities and poor neighborhoods and asking for "investments". Then, they dazzle the poor victims with this press release and their flashy web site, get their money, and run!
" Once randomized, ZeoSync's BinaryAccelerator(TM) encodes these singular-bit-variance strings within complex combinatorial series to result in massively reduced BitPerfect(TM) equivalents. "
Its scientific fact. There no real evidence for it, but it is scientific fact...
Typical press
Next you'll be publishing stories about squaring the circle and trisecting an angle with straight edge and compasses. Claiming to be able to compress random data is the oldest joke in the CS book and you fell for it.
-- SIGFPE
they say the compression software should be available sometime in 2003. however, there doesn't seem to be any mention of the decompression software...
or am i just too cynical...?
-duncan
> no pattern == no compression
.. err .. very hard to find the formula
:)
No pattern may be quite well compressed, but only in special cases. That's what some people call 'fractal compression', which AFAIK means replacing data with formula (and optional initial data). Decompression is simply iterative application of formula to the initial data. There are three problems though:
(a) generally it's lossy
(b) it's
(c) not every data set can be compressed
Otherwise it's fine
3.243F6A8885A308D313
Reading through the comments, has caused me to fire off some "practically" random thoughts:
The better the compression ratio, the smaller the data set that can be compressed to this size is.
One piece of random data is not the worst case, as it may represent the best case for a compression ratio.
The average compression ratio for a fixed length of random data in general is not the worst case for an algorithm, but merely the average compression ratio for the algorithm over all data sets.
Algorithms and formulae contain a lot of information, much/most of it implied. e.g. the addition operator has an implied meaning, which when fully stated takes up much more space than '+'.
The statment "Cyan, magenta and yellow are primary colours in negative colour mixing." implies lots of information about the structure of our eyes and of light. The amount of information that is implied increases as the readers understanding of the words grow.
If the universe is built on a few small pieces of information, then maybe everything has patterns coming from these that allow it to be compressed back to something very small in size.
Small excercise I thought through:
100:1 compression ratio on 1000 bits:
1000 bits have 2^1000 combinations.
10 bits have 2^10 combinations.
So each combination of 10 bits would need to represent 2^100 combinations of 1000 bits.
BUT, say we have 2^100 compression algorithms, each are optimal at compressing 2^100 seperate patterns of 1000 bits of data to 100 bits. The patterns do not overlap.
We add 100 bits to the compressed data to tell us which algorithm to choose. Therefore final compressed data becomes: 110 bits long.
New compression ratio is:
1000/110 = 9.1 (1 d.p.)
This can be achieved on any 1000 bits we have. 9.1:1 compression is a pretty good general compression ratio for all cases.
BIG PROBLEM: compressor/decompressor would need knowledge of each algorithm. If each algortihm only took up 1 byte, 1,125,899,906,842,624 Petabytes of storage would be needed.
Suicidal optimism: There may be one algorithm that can generate all these other algorithms from a very small data set though.
> * If the data was represented a different way (say, using bits instead of bytesize data) then patterns might emerge ..
Then it was not 'random' data. As far as I remember from early Univeristy ages, random data is data that have 0 self correlation, thus it does not matter if it's a bit or octet-encoded
3.243F6A8885A308D313
Some years ago I "invented" a pretty nifty compression algorithm. tried to implement it, too. turned out to be unusable of course, but only because of the computational complexity of it (it would take more time then the life of universe on reasonably large data :)
:)
who knows, maybe quantum computing will make it possible though, so here's the deal:
1.) take a GOOD pseudo random generator (one that is as random as possible, but can reproduce the same ramdom string if started from the same "seed".
2.) run this algorithm, try matching strings from the generated random string with the data you want to compress.
3.) upon reaching a significant match, record the position in the random stream, and the length of the match
some bits needs some polishing, but if this method wouldn't take ages, it could actually be usable
cheers,
mitch
--
// "If human beings don't keep exercising their lips,
// their brains start working." -- Ford Prefect
// "If human beings don't keep exercising their lips,
// their brains start working." -- Ford Prefect
The output from a pseudo-random number generator is usually considered "random enough for practical purposes." So if you define "practically random data" as "data that is random enough for practical purposes," you can compress it by storing the random seed and the string length. ;-)
I think I can beat their 100:1 compression ratio with this scheme.
To get something done, a committee should consist of no more than three persons, two of them absent.
You dont need to come with a formula though. All you need to do is to find *where in PI* your random string starts. Yet it may start very far from the begining and thus it's position number may happen to be larger than the source string itself :)
3.243F6A8885A308D313
100:1 ratio? On random data?
Considerations far more elementary than Shannon's limits rule out compression of statistically random data by even a single bit. Here's why:
There are 2^n bit strings of length n. Any compression method purporting to compress random strings (by even a single bit) must produce output of length at most n-1 for these 2^n inputs. But in that case the mapping is not unique, since there are only (2^n)-1 bit strings of length n-1 or less. (So decoding is not possible.)
Once every so often some "researchers" claim to have attained the holy grail of compression. Too bad we never hear of them again
From the comp.compression faq
this topic has generated and is still generating the greatest volume of news in the history of comp.compression
The advertized revolutionary methods have all in common their supposed ability to compress random or already compressed data. I will keep this item in the FAQ to encourage people to take such claims with great precautions
"The new wave is not value-added; it's garbage-subtracted" - Esther Dyson, Dec 1994
Rather, this problem can only be successfully resolved through the solution of what is commonly understood within the mathematical community as the "Pigeonhole Principle."
Given a number of pigeons within a sealed room that has a single hole, and which allows only one pigeon at a time to escape the room, how many unique markers are required to individually mark all of the pigeons as each escapes, one pigeon at a time?
I'm pretty sure that's not the pigeonhole principle. As I understand it, the pigeonhole principal is the following:
Given N pigeons, and M holes to stuff them in, at least the ceiling of N/M pigeons must be in one of the holes. If M is N-1, then this number is 2 (i.e. rolling 7 six-sided dice leads to at least one pair).
This looks like a hoax to me.
here is my solution
void main()
{
int i;
for(i=0;i (lessthan -- do not know any html) 1000000;i++) {
printf("%c",random(256));
}
}
since real random data carries no information, i have achieved lossless compression of 100 to 1.
PS. Just add a couple more zeroes to achieve an even better compression
Now do i get my own story on slashdot.
Didn't think so
badness 10000
i know of a smart way to compress files
no you dont
I can guarantee its something retarded hes gonna say
Ljung: I find it hard to believe that you, an idiot, can think of a better encryption algorithm than zip and other widely used compression methods
heh there's a compression program called godzillaCrunch
it makes files like 0.2 %
but they can't get decompressed
Mabye they have a whole host of irregular numbers stored on some sort of massive file array all figured out to a crud load of bits in binary.
:)
:)"
/too/ bad. Heh. Mabye a few hours only for decompression time? I'm sure that the modem users would love it at least. :)
A cluster of super computers then offsets the data against a the irregular number that allows for the least amount of offset. All that would have to be translated is the offset, which would seem like a prime canidate for scientific notation.
"Uh, yah, that video file there is 121^(^3434^173) offset into e . . .
Of course having the destination computer DECOMPRESS this data would be another matter entirly, hehe. But with home computers getting faster and faster, and preferably only easy to calculate irregular numbers being used, it likely wouldn't be
It is just the penultimate trade-off between the end size of the compressed file and the time/power it takes to compress/decompress it.
Need help treating your acne? Come here!
i know of a smart way to compress files
<fraggle> no you dont
<fraggle> I can guarantee its something retarded hes gonna say
<fraggle> Ljung: I find it hard to believe that you, an idiot, can think of a better encryption algorithm than zip and other widely used compression methods
<Ljung> heh there's a compression program called godzillaCrunch
<Ljung> it makes files like 0.2 %
<Ljung> but they can't get decompressed
The reporter wrote glowing things about how when he decompressed his files, they had the right size and timestamp. There was a small matter of the contents being wrong, but the company had assured him that this was just a small glitch in the beta version that would be fixed in the final release.
I can imagine that some junior reporter might fall for this, but where the heck was the editor?
I imagine that the whole stunt was probably part of a scam to defraud some investors. Get it published in a magzine, and it must be legit, right? I wouldn't be the least bit surprised if this new "lossless compression algorithm" proved to be such a scheme.
BYTE went seriously downhill around 1985 or so. A friend seems to think that it was a result of Steve Ciarcia moving on, but I don't think that fully explains it. Before that, there were plenty of technical articles by other authors, but BYTE turned into a rag full of mostly non-technical reviews.
Note the results are "BitPerfectTM", rather than simply saying "perfect". They try to hide it, but they are using lossy compression. That is why repeated compression makes it smaller, more loss.
"Singular-bit-variance" and "single-point-variance" mean errors.
The trick is that they aren't randomly throwing away data. They are introducing a carefully selected error to change the data to a version that happens to compress really well. If you have 3 bits, and introduce a 1 bit error in just the right spot, it will easily compress to 1 bit.
000 and 111 both happen to compress really well, so...
000: leave as is. Store it as a single zero bit
001: add error in bit 3 turns it into 000
010: add error in bit 2 turns it into 000
011: add error in bit 1 turns it into 111
100: add error in bit 1 turns it into 000
101: add error in bit 2 turns it into 111
110: add error in bit 3 turns it into 111
111: leave as it. Store it as a single one bit.
They are using some pretty hairy math for their list of strings that compress the best. The problem is that there is no easy way to find the string almost the same as your data that just happens to be really compressable. That is why they are having "temporal" problems for anything except short test cases.
Basicly it means they *might* have a breakthrough for audio/video, but it's useless for executables etc.
-
- - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
This press release may contain forward-looking statements. Investors are cautioned that such forward-looking statements involve risks and uncertainties, including, without limitation, financing, completion of technology development, product demand, competition, and other risks and uncertainties.
I wonder why they are warning that they are uncertain they`ll complete it. Anyone here emailed them?
Here's what they are claiming to use. Seems like this is a way to descirbe data multidimensionally in a way that isn't readily assimilated digitally.
note that last line of the excerpt i give below.
what i thinkis, if you're going to use binary data, you've got to follow the rules of the road -- shannon's law.
check it out on this website
Examples of Kolmogorov Complexity
1. Pi is an infinite sequence of seemingly random digits, but it contains only a few bits of information: the size of the short program that can produce the consecutive bits of pi forever. Informally we say the descriptional complexity of pi is a constant. Formally we say K(pi) = 0(1), which means "K(pi) does not grow".
2. A truly random string is not significantly compres;sible; its description length is within a constant offset of its length. Formally we say K(x) = Theta(|x|), which means "K(x) grows as fast as the length of x".
-- too cruel for schuel
I'm not saying that I believe these guys but I can think of one means of compression that is a possibility. This has been a background thought of mine for years but I never did anything with it.
Pseudorandom number generators generate a lot more output than they occupy in storage space. If one were to find a way to either derive generators and coefficients from the target data or to match generators with the target data then a patchwork of generators could provide huge compression ratios on seemingly random data.
Two more points regarding such a scheme :
1) It would be highly asymmetrical, compression could take a gazillion years, decompression would be extremely fast. This would be acceptable in the content delivery business.
2) What is "random" anyway? A 1600 byte segment of output from a generator represented by a 16 bytes can appear to be "random" when subjected to statistical tests. If you get lucky and find a generator that can reproduce part of the sequence you've compressed it.
So, I'm probably more skeptical than the next guy but it's a big world out there so I don't pretend to know everything.
Coniine ( forgot passwd )
First, you have a or various random-looking number generators of some sort that net you something to compare to the data, probably VERY carefully chosen. You pretend that the seed data doesn't count against your total data. Their indecipherably obtuse hypercube example makes you think that they coax this pattern many times from various "angles" so that they get something shaped like the original data out of the other end.
I'm not buying this claim of "lossless." If they are comparing it to existing compression at 10:1, then they mean JPG or MP3 or DIVX or things like that... none of which are truly lossless. Or, as this is a "temporally-challenged" unproven multi-pass system, perhaps they have found a way to get the above situation to work for certain data losslessly, and are praying to the mathematical gods that zipping a zip file won't just add another 20K.
If they are attempting to compress visual data. Aren't most broadcast images lighter on the top than the bottom? Don't they involve stick-figurey thingies? Why not just send texture and position data to a computer and let us all watch poser-o-vision. After all, we're already dancing like puppets to these posers.
 
This press release may contain forward-looking statements. Investors are cautioned that such forward-looking statements involve risks and uncertainties, including, without limitation, financing, completion of technology development, product demand, competition, and other risks and uncertainties.
This Sig is a mnemonic device designed to allow you to recognize this author in the future.
dd if=/dev/zero of=bigfile bs=1k count=1000k
.com's and VCs has hit. And it is just what America needs to pull out of this slump that was caused by the last bunch of .com's and VCs.
bzip2 -9 bigfile
ls -l bigfile
753 bigfile.bz2
Nice huh?? Now put a few of these big files in the kernel tree and see what people say.
But really, this sound like the next wave of
And hey everyone don't forget to vote this one for Wiered's vaporware top 10.
It was dark and I didn't have my contacts...
Advancing my general theory that Reuters reporters are idiots, the article took 10 years off the life of the estimable Claude Shannon. Sadly enough--and well known to /. readers, Dr. Shannon died last year (2001), not in 1991. This obscure bit of knowledge was buried away in technical journals like the _NYT_ and _Entertainment Weekly_, so one can see how Reuters missed it.
Buy Text Processing in Python
All compression does is maximize bit entropy; that is, compression CANNOT occur on random data!!
Sounds like bullshit marketing of someone looking for VC funding.
The biggest trick the devil pulled was letting lawyers become politicians so they can write the laws.
A similar technique works with the output of drand48() and in fact for a long enough sequence this approach works with every random number generator algorithm available today.
In fact here's the compressed file for the rand() case:
int i;for (i = 0; i<1000000; ++i) printf("%d\n",rand());
Use gcc as decompressor.
-- SIGFPE
Scroll down to Incredible Claims for descriptions of the last four scams like this. Remember Pixelon?
1:100 average compression on all data is just impossible. And I don't mean "improbable" or "I don't belive that", it is impossible. The reason is pigeon hole principle, for simplicity assume that we are talking about 1000bit files, although you can compress some of these 1000bit files to just 10bits, you cannot possibly compress all of them to 10bits, as with 10 bits is just 1024 different configurations while 1000bits call for representations of 2 different configurations. If you can compress the first 1024, there is simply no room to represent remaining 2-1024 files.
So every loseless compression algorithm that can represent some files with other files less than original in length must expand some other files. Higher compression on some files means number of files that do not compress at all is also greater. Average compression rate other than 1 is only achiveable if there is some redundancy in original encoding. I guess you can call that redundancy "a pattern." Rar, zip, gzip etc. all achieve less than 1 compressed/original length on average because there is redundancy in originals : programs that have some instructions, prefixes with common occurance, pictures that are represented with full dword although they use a few thousand colors, sound files almost devoid of very low and very high numbers because of recording conditions etc. No compression algorithm can achive less than 1 ratio averaged over all possible strings. It is a simple consequence of pigeon hole principle and cannot be tricked.
Gentlemen, you can't fight in here, this is the War Room!
Bear with me for a moment. This kind of 'compression technology' is EXACTLY the kind of thing the MPAA has been dreading. Imagine millions of people on Morpheus trading 5MB copies of The Matrix, Star Wars and everything else. Of course it's a hoax, but if they can keep it up long enough, then maybe they'll get bought out by the MPAA, RIAA, or whoever!
ZeoSoft is ushering in the business model of the new millenium - fooling the tech-illiterate elite of today's content cartels into buying them out, then laughing all the way to the bank! I applaud ZeoSoft for their initiative, and hope to see other such business ventures in the future.
Now, if you'll excuse me, I'm off to develop a program that uses fractal-temporal equations to randomly generate sequels to popular movies! (hint, hint)
[PowerPoint] is a tool for capitalist presentation
You know, and I know, that a "zero bandwidth" transmitter makes as much sense as 1 + 1 == 3 (or, for that matter, compression of "random" data). For this reason, despite a working prototype, the poor man had been unable to obtain a patent for his invention, despite 10 years of trying. (The Patent Office seems to be a lot looser when it comes to software.) He was very bitter and convinced that everyone else in the world was an idiot.
However, when the invention was described to me, it turns out that by "zero bandwidth", he meant "undetectable by FCC compliance measuring equipment", and that what he had really invented was a "Spread Spectrum" transmitter! What a sad story. Someone else got the patent because they could communicate it better.
So, even though compression of "random" data is mathematical nonsense, it is likely that "random" is not being used in the standard matematical sense, but in the Marketroid sense - and the new compression algorithm might actually be useful.
Anyway, cryptographers, known for being pessimists in measurements of entropy, say that there is 2-3 bits of entropy per word in english text. So I'd say that there is at least bit as an absolute minimum.
Now if we be generous and call each word 7 characters of 7 bits each, we find that every 50 bits of english text contains at least 2 bits of randomness.
Wich means that it is mathematically impossible to compress 'random' strings of english text by more that 25 to one. Note that I've been extreamly conservative in this estimate.
Mathematics once proven is always true. If they really found a way around it, they would be able to explain which assumption by Shannon they avoid - not just which result.
There are a number of ways that the ZeoSync press release tips its hand as the nonsense it is. One just needs to read carefully.
Rhetorically, the reference to unnamed "experts" from Harvard, MIT, Berkeley, etc. is quite telling. If someone at those places had genuinely done this research, they would be named with credentials. The absense of that speaks loudly. I suspect the actual collaboration amounts to some former undergraduate of those schools calling a professor and asking some innane question ("Hi Dr. Jones, what do you think of Claude Shannon?").
But even more telling than the rhetorical lacunae are the ten-dollar words they include to try to wow the reader. In an allegedly lossless compression algorithm, the release brags about advancements of fractal, wavelet, FFT etc. techniques... in other words, a bunch of LOSSY compression techniques. Put simply, if you are happy with lossy compression (which you often are, but there is a clear difference), you can get whatever compression ratio you want (at the cost of correspondingly reduced fidelity).
So the ZeoSync claim is either directly false, or it is about lossy compression, and is worth a big yawn.
Buy Text Processing in Python
In fact, they've been zipping all of the Internet content for years with their Echelon technology. Every e-mail, webpage, Slashdot post etc. is currently stored on a half-full CDROM that every NSA employee carries a copy of.
That's what I'm wondering; isn't it a little late for someone to get rich quick with an vapourware IPO ?
What the heck else is the technobabble B.S. good for ?
Hi,
// registration needed - dutch // https://www.standaard.be/Archief/zoeken/DetailNew. asp?articleID=DMF02112001_010&trefwoord=dvd
I've ran across some article discussing a Belgian Inventor wich has invented some hardware/software solution that enables 5 FULL-DVD's to be put on ONE SINGLE CD. No ripping required. Unbelievable ? YES Until now i did not find anything related to this. Neither by doing searches on the inventor () his name nor on the name of the technology (DCS).
- - - BLURB Translated from a Belgian/Flemish Newspaper on-line archives - - -
Antwerp citizen invents new digital compression technique. 02/11/2001 spi - belga
BRUSSEL - Guillaume Defossé has developped a lossless compression system with wich he can record five dvd's onto one single 650MB CDROM. The inventor has called this system DGS. It is possible to record a 30 minute television fragment onto one single floppy disk. Defosse is a composer living in Antwerp/Belgium, he also studied electronics.
URL :
from the Compression FAQ at:
http://www.faqs.org/faqs/compression-faq/part1/
http://www.faqs.org/faqs/compression-faq/
.
9.1 Introduction
It is mathematically impossible to create a program compressing without loss
*all* files by at least one bit (see below and also item 73 in part 2 of this
FAQ). Yet from time to time some people claim to have invented a new algorithm
for doing so. Such algorithms are claimed to compress random data and to be
applicable recursively, that is, applying the compressor to the compressed
output of the previous run, possibly multiple times. Fantastic compression
ratios of over 100:1 on random data are claimed to be actually obtained.
Such claims inevitably generate a lot of activity on comp.compression, which
can last for several months. Large bursts of activity were generated by WEB
Technologies and by Jules Gilbert. Premier Research Corporation (with a
compressor called MINC) made only a brief appearance but came back later with a
Web page at http://www.pacminc.com. The Hyper Space method invented by David
C. James is another contender with a patent obtained in July 96. Another large
burst occured in Dec 97 and Jan 98: Matthew Burch applied
for a patent in Dec 97, but publicly admitted a few days later that his method
was flawed; he then posted several dozen messages in a few days about another
magic method based on primes, and again ended up admitting that his new method
was flawed. (Usually people disappear from comp.compression and appear again 6
months or a year later, rather than admitting their error.)
Other people have also claimed incredible compression ratios, but the programs
(OWS, WIC) were quickly shown to be fake (not compressing at all). This topic
is covered in item 10 of this FAQ.
...
A common flaw in the algorithms claimed to compress all files is to assume that
arbitrary bit strings can be sent to the decompressor without actually
transmitting their bit length. If the decompressor needs such bit lengths
to decode the data (when the bit strings do not form a prefix code), the
number of bits needed to encode those lengths must be taken into account
in the total size of the compressed data.
Another common (but still incorrect) argument is to assume that for any file,
some still to be discovered algorithm might find a seed for a pseudo-random
number generator which would actually generate the whole sequence of bytes
contained in the file. However this idea still fails to take into account the
counting argument. For example, if the seed is limited to 64 bits, this
algorithm can generate at most 2^64 different files, and thus is unable to
compress *all* files longer than 8 bytes. For more details about this
"magic function theory", see http://www.dogma.net/markn/FAQ.html#Q19
...
So far no one has accepted this challenge (for good reasons).
Mike Goldman makes another offer:
I will attach a prize of $5,000 to anyone who successfully meets this
challenge. First, the contestant will tell me HOW LONG of a data file to
generate. Second, I will generate the data file, and send it to the
contestant. Last, the contestant will send me a decompressor and a
compressed file, which will together total in size less than the original
data file, and which will be able to restore the compressed file to the
original state.
With this offer, you can tune your algorithm to my data. You tell me the
parameters of size in advance. All I get to do is arrange the bits within
my file according to the dictates of my whim. As a processing fee, I will
require an advance deposit of $100 from any contestant. This deposit is
100% refundable if you meet the challenge.
...
-- laws are the opinions of politicians --
Best comment so far on this article. The rest of you need to stop masturbating over compression theory.
Maybe they just discovered written language, or started writing their tribal history on hides instead of stone tablets.
Another company tried to pull this one about ten years ago. They claimed (I believe) a 12 to 1 compression on "random" data, and you could "recompress" that data stream as many times as you wanted until it was less than 4k.
uh-huh.
Given that this really IS mathematically impossible, and people have tried for years to figure out ways around it, it's just another company trying to sell snake oil to investors. It's too bad this stuff makes it to slashdot and to the media in general, because the company doesn't deserve the attention.
Is there any way to compress the forums so that everybody with a witty idea that's been posted by 100 other people will have their posts all condensed down into one post. Would save us all a lot of time. Example below :)
Here's a new one: What if you compressed something down to 10:1 then compressed that down to 10:1 etc. etc. until it was compressed down to one byte! Ha! God I'm witty!
As the author of secondlaw.com and his related sites takes great pains to point out, "information entropy" is an unrelated concept to thermodynamical entropy. This has nothing to do with the 2nd law.
"Back in the days of non-quantum computing everyone thought we were bullshiting them!" -- CEO, ZeroSpace
Second, Information Theory says that you cannot compress data of n-bits of entropy to less than n bits. Data is said to be 'random' if it is n bits long and it has n bits of entropy (that is not accurate, I know).
You can cheat and invent an algorithm that compresses *1* random string of data to a byte and adds one byte to the rest, so you can undo that transformation easily. There you are, you have an algorithm that compresses random data!
The compression faq (and I guess that Kolmogorov says so, but I don't know) evades those tricks asking for compressed size+decompressor must be less than the uncompressed size.
The only point I can find there is, what you call random data. Suppose a text encoded in ASCII in bytes. That is supposed to have a low entropy. Now take that data in 9-bit chunks and measure entropy. It will be higher. Now, entropy and randomness depend of how you look at data.
This takes us to what's a random file. I'm sure that the guy with the compression faq challenge would give you a *very* random file, with a really even distribution of characters, little repeated sequences, no long streams of 77's, etc. Is that a random file...? It has been doctored. In fact, if you count how many possible files he could give you, it would be less than all the possible files of that length, therefore, you wouldn't need as many bits to represent them all.
The problem is (and I'm sure someone who knows more combinatorics than me), is that it mustn't work pretty well... I'd say that the compression must be less than one bit (intuitive reasoning)...
well...so much for all that technobabble on their site, and it obviously being some sort of hoax, scam, something to impress mommy, etc. or whatever yuo want to call it.
the point is, that it ALL depends on what the application is. for example, sometimes yuo don't care about the order of the data yuo get back, yuo just want to the same discrete chunks yuo started with. and yuo sometimes also don't care how long it takes to COMPRESS, as long as uncompression is fast. for example, in the case of hardware testing, yuo usually want to test a bunch of input to make sure yuo have the correct output. if yuo are testing this capability, then yuo DONT care how long it takes to compress, yuo only need to do it once, what yuo care about is the QUICK transmission of the compressed data and uncompressing it as fast as possible.
so what yuo do in this case, is divide the data into chunks of the test codes, and try to solve the TSP of the test codes as coordinates of an N-dimensional space where N=# of bits in code. although solving TSP for a large number of codes is impractical, yuo can use, say, a kohonen self organizing neural net to approximate. and at every step of the way, yuo would have data that can be compressed via a running length compression if yuo express the differences between the test codes (that is what the TSP is for). the longer yuo wait, the better the compression is. at least it is only O(x) and not anymore (not including actual the updating time for each node, which is neglible anyway compared to the actual solving of the problem).
getting it back is trivial, just add all the differences one by one to get the test codes back in some order and feed them to the hardware to be tested. works, well, doesnt it??? its more than 100:1 if yuo wait long enough. and it is very practical for the application at hand. however, it is clearly impractical for every day use. so...can this BS company claim what they have done is real?
SURE! if they dont mention the details of what they are doing...
QED
BSD is for people who love UNIX. Linux is for those who hate Microsoft.
I did a search on Google and I got exactly three hits. Their own sight twice and the Reuters article.
It's like they just popped in out of nowhere with a unbelievable technology looking for investors/suckers.
We have three native Polish speakers in my office. I asked one of them to translate the professor's reply. She said the gist of it is that he was upset they released his name, he didn't authorize any information release, etc. Apparently didn't deny or confirm the truth of the information but said something about having "more important things in my career" or something like that (not verbatim quote).
I know I'm posting late, but I hope someone reads this and comments.
I've had this recurring thought in my head regarding compression that I haven't been able to prove/disprove.
Disclaimer: I know absolutely nothing about compression other than what commons sense tells me.
Now for my theory: Is it possible to make an analysis of a whole lot of data from a whole lot of sources for certain period of time. Let's say I log every single bit of data that comes and goes from, say, AOL's network. I then run an analysis of the data and come up with, say, the 5 million most used 8-byte strings. You probably want to play with the string sizes and number of strings to see what makes mathematical sense. You then keep a copy of the 40MB indexed string dbase on every internet node, or at each end of a slow link, or whatever, and then run all incoming and outgoing data through a program that trnaslates index references with actual data.
Would that work? since a 5 million entry index requires a 3 byte key to acces an 8 byte string, would I get a 3/8 lossless compression on top of whatever's in place right now, whenever I hit an indexed string?
There are two kinds of people in the world: Those with good memory.
Truly unbreakable encryption has existed for many years: the one-time pad . The problems of unbreakable encryption aren't the theory, but the practice. (If you want truly secure communications among n people who each transmit x bytes of data through the group each day, how will you securely generate n*(n-1)*x bytes of random data each day, and securely distribute it to each of them?)
Never play leapfrog with a unicorn. Or a juggernaut.
It means you can compress it 100 times without data loss.
gzip file;mv file.gz file
lather, rinse, repeat
It is theoretically possible to get 100:1 or better compression. Assign a number to every file that has ever existed. It would surely take less than 64 bits to represent the number of files out there, and CERTAINLY less than 128.
Now, put every one of those files into the compressor (compressed if you like) and index them with numbers.
The compressed file would simply have a number or numbers of the files within. Even a full debian installation wouldn't exceed a few MB if even that.
The decompressor would take even more space than WindowsXP, and this would not work for newly created files, but it gets theoretical possibility out of the way. Now for practical possibility...
Computer Science is no more about computers than astronomy is about telescopes. --E. W. Dijkstra
They have academics from all over the US and Europe. One notable is Dr. Steve Smale, Professor Emeritus at UC Berkeley and 1966 Field's Medal winner.
We are talking about a team of brilliant mathematicians here. If they think this is possible, it deserves at the very least serious consideration.
Whether their ideas will come to fruition and withstand peer scrutiny is another thing. But to claim they don't know what they are talking about is a long stretch.
The most obvious problem with this ?
Well if you can compress completely random data 1:100, then obviously there is nothing preventing you from RECOMPRESSING the compressed data over... And thus obvious run into some VERY serious PROBLEMS!!!!!!!!!!!!!! Because that would mean an infinite compression ratio, the universe down to a single BIT-- though there might be a minimum size so it would be the universe in 50k--but nonetheless completely absurd.
I have an infinite amplifier; I can sell it to you now. It has infinite gain, and infinite input impedence. Unfortunately, it has to rely on real power supplies, since I do not have an ideal power supply. Funny thing is, it always outputs the rail voltage.
Even Slashdot wants to hide some things
Maybe it's another implementation of RFC1149?
sulli
RTFJ.
So Huffman compression has been an industry standard for 50 years has it?
@Article{huffman-1952,
author = {David A. Huffman},
title = {A method for the construction of minimum-redundancy codes},
journal = "Proceedings of the IRE",
year = {1952},
volume = {40},
pages = {1098--1101},
}
That's pretty bloody quick uptake on the part of the industry, then.
-- Arm yourself when the Frog God smiles.
Notice where this company is based, West Palm Beach, Fl.
That's prime retirement community. Lots and lots of seniors without enough technical knowledge to know that they are full of crap.
They're hoping that their website and technomumbo will convince some old people to give them their money.
I noticed they never talked about a decompressor in their press release... I suppose they are still working on that. I seem to remember a similar story, where someone had achieved 100% compression in a compressor, and even released source code. This company is lagging far behind what the open source communtity can provide (again though, the open source communtity is still working on a decompressor)
The wonders of fractal compression are a "dirty lie" of compression techniques. It happens to work well for classes of images with natural (self-repeating) subjects. Furthermore, it takes forever to find the right algorithm, and sometimes, the parameters/algorithm description is very large. Of course, sometime, you can never find the right algorithm to produce what you want.
Black holes are where the Matrix raised SIGFPE
I have three words for them: Pigeion hole principle.
It looks like somebody at ZeoSync changed their major from computer science to marketing.
So if you define "practically random data" as "data that is random enough for practical purposes," you can compress it by storing the random seed and the string length. ;-)
However, the seed may be almost as long as the string itself, if not longer. In the worst case, you're expanding the string by the 48-bit integer necessary to hold the string length.
Will I retire or break 10K?
If you read the Reuters article carefully, it does not say a digital -> digital compression of 1:100, but implies a better way of encoding / compressing digital -> analog -> digital, with the analog bandwidth being much greater than today.
Thats all the stuff where they talk about Dr. Claude Shannon and information theory. (They could have been clearer about it, but that's PR flacks for you.)
examine the quote
'"What we've developed is a new plateau in communications theory," St. George said. "We are expecting to produce the enormous capacity of analog signaling, with the benefit of the noise-free integrity of digital communications."'
Sounds like they are trying to shove more data into an analog stream, using wacky math, than would normally be allowed.
rbb
Without reading their website, the claim MUST BE FALSE.
The proof is simple.
Suppose we have a 100 bit message. There are 2^100 different messages. Suppose you can compress them on average to 98 bits. Then there can only be 2^98 compressed messages. We lost a couple along the way!
This proves that if you compress SOME messages you will also have to make SOME longer. Not by much, but at least a little. (prepend 1 if "not compressable" prepend zero to the "compressed data stream" and you have a "worst case expansion" of "one bit")
Now compressing normal data is easy. There are a lot of repeats, and other redundancy. So the normal case is that you can compress them. The bad news is that if you enumerate ALL 100-bit messages, ALL compression methods are going to need on average 100 bits or more. This is pure mathematics.
The 2^100 number is a number that is quite large, but if you start talking about compressing a megabyte of data, then I'm already talking about enumerating all 2^8000000 possible messages. That is a thought experiment. But the argument still holds.
-----
I read their pressrelease. It's buzzword compliant bovine excrement. They will attract money and pay the existing people large salaries as long as they
can keep up the charade.
Oh, and they have placed a tactical "practically" in front of the word "random". I can compress "practically random data" by enormous amounts.
If you take the MD5 hash of the string "hi there", and feed that back into the MD5 function, you can generate an endless stream of "practically random" data. Take the first 1Mb of this "practically random" data.
I compressed 1Mbyte of data into the 212 bytes of the previous paragraph! However this is not possible if I let someone else generate the random data any way he pleases, and then have to compress it. They can claim to be "technically correct" up to a point due to this phenomenon....
Roger.
You are all thinking like bitheads. You need to step outside the box and think less logically, and more abstractly.
Perhaps this TunerAcceleratorTM technology is a scam. Assuming it's not, let's examine the idea of mathematical non-repeating representations.
Mathematics can represent any number (and approach the representation of irrational and imaginaries) by any infinite combination of other numbers and operators.
5 + 20 = 25
5 * 5 = 25
100 / 4 = 25
5^2 = 25
et cetera
These numbers are simply symbols and we all understand that complex datastreams cannot be represented symbolically, right?
Wrong. Fractal geometry can -- rather a variant of fractals can -- probably something in the ballpark of true complexity, where "chaos" is reduced to an "equation" fired fed a set of "initial conditions". [Quotes because I'm just borrowing from the language of mathematics to relate to complexity theory].
Example: DNA. You are a giant decompressed fractal. In fact, just about everything on this planet is a giant, decompressed fractal.
Neither of my analogies are perfect, but I hope they get the point across. The universe, strange as it may seem, it not represented by 1s and 0s. Data, funny as it may sound, is not all digital.
-k
fear@fearstudios.com
Sounds like fractal compression to me.
The fractal transform that Barnsley's products use is merely vector quantization, mapping each 8x8 pixel block of an image onto a 4x4 pixel block of a reduced version of itself, plus an RGB offset for DC. It begins to converge to the desired image after a few iterations of the transform.
Will I retire or break 10K?
Actually, if you change the domain you can get what appears to be impressive compression. Consider a bitmapped picture of a child's line drawing of a house. Replace that by a description of the drawing commands. Of course you have not violated Shannon's theorem because the amount of information in the original drawing is actually low.
And if you manage to express all drawing commands in terms of "draw a horizontal line," you've re-invented run-length encoding that MacPaint and PCX files have been using for ages.
Will I retire or break 10K?
their claim is for 10:1 on something 'random' and short. And for ca. 10 x better sometime in the future.
10:1 on the 'average' traffic that passes 'net channels or stores on disk would be a surprising thing. I don't think it's the level of impossible that /. consensus is hanging on it.
My guess is .. If it's real it's something that needs to be implemented in hardware to be fast.
certainly all the existing algorithms suck plenty of cpu cycles. If there's a solution here that's that much more space-efficient I very much doubt it's gonna be time-efficient.
or maybe it's just smoke & mirrors
fw
Linux is Linux, if One need clarify their dist: <Dist>/GNU Linux
bsds are of course just BSD
These kinds of compression claims are the perpetual motion machine of the information age. Actually, they are less plausible than perpetual motion. For perpetual motion, there is at least the (very remote) possibility that there is some kind of undiscovered physics. Impossibility statements in compression only hinge on mathematics, with no physics or experiments needed.
(Random Marketing Compression): The meaning of which is completely lost because we've dropped lots of big names and wowed you with random buzzwords and incorrect explanations.
My absolute favorite was the explanation of the Pigeon hole Principle. These people are claiming up to 100:1 on a *RANDOM* number. The fallacy here is that compressed data is not random. It is incredably deliberate. It might seem random statistically, but it is indeed a very compact representation of other data. So trying to create random sequences does no good. Once data is statistically compressed or compressed by pattern, the compressed data should have no redundancy. Any redundancy left over in the compressed data is really just evidence that the compression algorithm used in the first case didn't recognize a higher level pattern in the data and encode it efficiently.
The pigeon hole principle is used in the classic proof that there is no way to reduce ALL messages of size X to size Y, where Y X. That doesn't mean that there isn't a good method for reducing a small subset of X, though.
I propose that there is some subset of messages of size X that can be compressed to size = Y, but that the compression ratio depends on the sizes of # of messages of size Y : # of messages of size X, and indeed the obvious compression method would be to make a look up table of size able to contain Y containing messages of size X.
So for example if a compressor compresses 1:10 in bytes, and we apply the method to strings of length 300. Then there is only 256^30 strings of the 256^300 strings represented. It seems like we can only represent 1/256^270th of these messages that way.
Heh, and I used to hate CS Theory in college.
I'm betting dollars-to-donuts Peter St. George is the only one who works at ZeoSync, and he wants your gullible-ass money.
Black holes are where the Matrix raised SIGFPE
Say the human genome consists of 3.000.000.000 basepairs. There are 4 different basepairs (A,T,G and C). 5% of this is coding for protein. 1.500.000.000 basepairs. So a 1.5 Gb file is enough to encode an entire human being. I don't think 100:1 compression ratio is a lot.
IANAL, but imagine a beowulf cluster of in Soviet Russia all your belong are base to us welcoming the new SCO overlords.
I can't remember the book, but it was by Pohl. A bunch of people on a generational starship develop superhuman mental abilities -- eventually discovering FTL travel and returning to Earth quickly.
Anyway, during the trip they send back encoded messages in a highly compressed form, using powers of primes. All you have to do is factor them to decode them, but we don't have the mathmatical skills to do so. A sample message might be:
987823^234970213.3^8237.234
Given that this is computationally impossible to solve unless we discover some really cool quantum computing tricks, it is useless, but conceptually you could encode any unique string of bits into a very small package.
It's called PSLQ lattice reduction...
h tm l
i .h tml
You can get the details here...
http://www.mathsoft.com/asolve/plouffe/plouffe.
http://www.lacim.uqam.ca/plouffe/Simon/articlep
Note: this goes quite a long ways to showing that conventional wisdom about pi being random digits isn't actually true... Pseudo random is more like it...
However, it isn't really applicable to this multidimensional compression nonsense since the counting argument still applies.
Suspiciously, this looks to be similar to what the fractal folks were pushing in the '80s if you replace gems with iterators... Every once in a while you have to change the color of your snake oil label to confuse the masses...
-slew
[quote]Although currently demonstrating its technology on very small bit strings, ZeoSync expects to overcome the existing temporal restraints of its technology and optimize its algorithms to lead to significant changes in how data is stored and transmitted.[endquote]
The key here is its on "very small bit strings." It is significantly harder to reduce larger bit strings in this fashion.
What it appears, from the limited documentation they have provided, is transform random data, and then use a formula to express that data. This have been proven ineffective. If you could find some sort of polynomial that would represent a file, you may indeed find polynomials shorter than the file, but at least half of them are going to be longer than the file, and more difficult to find.
I've dabbled enough with data compression to at least be able to spot this.
What the end result always turns out to be is that you *can* compress random data, but however, you can't always compress all random data. It may, actually, be possible to compress random data, but not in a 100:1 ratio, consistently. I've written some code that (IMHO) is a novel approach to compressing random data with moderate success, but never at 100:1.
"You can compress all data half the time, or you can compress half the data all the time."
Greater than 1.
If it wasn't (i.e. it guaranteed a loss), you could take an input, run it through the compressor several time, and end up with a single byte, or even 0 bytes.
Obviously the problem would appear at the decompression stage, since there aren't quite a lot of things you can get from decompressing a single byte.
economics is not a Nobel Prize, it's a Bank of Sweden prize.
The only downturn of this incredible algothorithim is that you have to keep the origional file, nuts.
"And we have seen and do testify that the Father sent the Son to be the Savior of the World"
1 John 4:14
I once compressed a string of randomly choosen zero's with a 106:1 ratio with practically no loss of information. If I only could remember how I did it. J
they mentioned not using the traditional 'redundant' data searching approach. From the description they are simply looking for patterns in the bits that they can generate mathematically.
If the signal is wave related then i'm sure they will find lots.
Here is sample 'C' code to illustrate a 1000000000:1 compression of random data.
#include
#include
int main(int argc, char *argv[])
{
int i;
srand(atoi(argv[1]));
for (i=0; i4000000000; i++)
print("%c", rand()%256);
return 0;
}
As you can see you simply have to supply a 4byte number, and you can generate a 4GB file.
If you first generate a 4GB file in this manner, then call it 'practically random'. Then run an algorithm that compares it with the sequences starting from all possible 'seeds' - and outputs the 4byte number that matches, you have 1000000000:1 compression!!
The best current RNG testing suite is DIEHARD. It uses a large number of tests to make sure that the numbers are random enough for most purposes. More information about RNGS and testing them can be found at the pLab which is one of most comprehensive sites on RNGs on the net.
'BS'
A 33% improvement over your already impressive compression!
-Cybrex
Boundless Expansion, Self-Transformation, Dynamic Optimism, Intelligent Technology, Spontaneous Order- BEST DO IT SO!
...and bet that they meant "arbitrary data" rather than "random data". After all, who would want to compress random data? What possible benefit could there be to such a thing?
This is more like Usenet Crank Robert E. McElwaine who published lots of articles with his (capital-preserving) tagline "UN-altered REPRODUCTION and DISSEMINATION of this IMPORTANT Information is ENCOURAGED."
And that may be giving them more credit than they deserve - it looks like a compression algorithm designed for use on digital wallets....
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
OK, I admit it. It is very easy to crucify these guys. Even if they have something interesting, they deserve bashing because of the foolish PR job. But, is it *impossible* that there is something to this? I am not implying that Information Theory is in peril. Certainly not. But, as a scientist, I have to be objective enough to accept that they might have something.
I took a quick look at Prof. Smale's publications. He has been working recently in complexity theory and other related fields. If he is actively involved, there may be something worthwhile here.
One of the most interesting things about complexity theory, fractals, and cellular automata is the incredible amount of detail that can be evidenced by "simple" systems. See some of Steve Wolfram's work and his upcoming book for more on this. One of still-born technologies of image compression was based on fractals. The self-symmetry principle was able to accurately capture image details with only a handful of data passed to generator functions. In fact, these compression techniques could even produce "more" detail (zooming in) than the orginal image possessed by extrapolating these generators further.
Of course, at first blush, this seems foolishness too. How can there be *more* detail after compression!? But, the essential fact is that natural "detail" is not as random as we might think. In still images, textures are often non-repeating but still highly correlated in some sense. In moving images, frame-by-frame correlation is typically very high. In executable code, only a small fraction of all possible arrangements of bytes can actually be executed. Perhaps Shannon's only weakness is believing too strongly in absolute randomness. Information Theoretic calculations leverage the pure statistical nature of the data stream to make calculations. Practical problems don't address purely random data and in making that purely random assumption, we may be shackling ourselves to Shannon's limits unnecessarily.
While Shannon does and will likely continue to hold, I am willing to admit that "most" data of interest contains less entropy per bit then it could. (Notable exceptions would be strongly encrypted data...encryption is *designed* to exhibit statistical randomness!) Huffman and arithmatic coding work by pattern matching techniques and allow lossless compression of many data types. JPEG uses the quantization of Discrete Cosine Transforms of 8x8 pixel blocks to compress images. MPEG uses DCT quantization will motion compensation and lots of other techniques to try to capture frame to frame correlations. All of these are practical data streams. None are obviously correllation, but none are truly random either. With fairly "simple-minded" encoding tricks, we are able to significantly reduce the size of data files (gzip), sound (mp3), images (jpeg) and video (divx). Is it not possible to build a more general mechanism with which to ferret out more hidden symmetries and thus increase compression?
I guess I am willing to accept that there is something more to this than simple charlatanism. Perhaps these folks have come up with an effective way to leverage complexity theory to establish a general framework for the construction generator functions, etc. It would be a landmark discovery if they are able to uncover self-similariry or some other self-generation principles from various data streams. Note that I did't say random data streams.
...We are talking about a team of brilliant mathematicians here. If they think this is possible, it deserves at the very least serious consideration...to claim they don't know what they are talking about is a long stretch. Lest we forget the cold fusion fiasco that the brilliant people on both sides of the Atlantic gave us. There's a reason that you don't know the author(s) of a scientific paper, when its being refereed by other scientists. Ideas are meant to be evaluated on their own merit, or in this case lack of merit.
http://cm.bell-labs.com/cm/ms/what/shannonday/shan non1948.pdf
I have been following this discussion, and had the following thought - It is not possible to have truly random data.
No matter how you get a sequence of data, lava lamps, radio active decay, etc etc, there are always conditions that cause the data to be as it is.
Isn't this more of a chaos theory issue? No matter what data we have, there must be something that caused it to be as it is. It is like predicting the weather. If we could model everything, then maybe we could do it, but modelling everything would be impossible, as we would have to model our modelling etc etc.
Basically, what do others think? I believe that it is not possible to have randomn data, just data that we do not know the context from which it came, or cannot model its context completely.
It says
"Current technologies that enable the compression of data for transmission and storage are generally limited to compression ratios of ten-to-one. ZeoSync's Zero Space Tuner(TM) and BinaryAccelerator(TM) solutions, once fully developed, will offer compression ratios that are anticipated to approach the hundreds-to-one range."
as you can see it says once fully developed ie they have not done it yet and it says anticipated which means that it may just be vapor speak
In other news, the company which managed to remove redundancy from pure entropy also managed breaking the absolute-zero barrier. It was previously thought that you couldn't make something colder if it already had zero heat in it. But apparently this is not the case, according to ZeoSync.
everything in the universe can be compressed to the following byte...
in big endian:
00101010
This false claims seems to keep resurfacing every few years. Here is a simple way to see that it cannot be true: ...
Suppose we make the less amazing claim that we can compress any random file just 2:1.
Let's consider files of just two bits in length. There are 4: "00", "01", "10", and "11".
Let's suppose our magic compressor function is called C. Obviously, a 2 bit file must compress to only one bit. Since there are only two choices for one-bit files, C("00") must be "0" or "1". C("01") must be the other choice; this is because for compression to be lossless, no two different files can compress to the same result. (or else, how would the decompressor know which one was originally compressed???) So we have constrained the function so far to be
(C("00") => "0" and C("01") => "1") or
(C("00") => "1" and C("01") => "0")
Now, what will happen when we try to compress C("10")? This is where the contradiction occurs. There are no other unused 1-bit files left and so the compressor cannot possibly succeed in its claim of achieving 2:1 lossless compression even for the trivial case of 2-bit files. This same counting argument can be used to formally show that it is impossible to make a general lossless compressor than can compress any more than half of all random files of a given length by even a single bit. "Real world" compressors like Zip expand the vast majority of random files -- they only happen to do well on "typical, useful" files that we use which contain less entropy than most random files. To see this for yourself, write a small program in your favorite language to make a pseudorandom file of bytes, then run it through your compressor. You will see that when you run it through PKZip, gzip, or whatever, it almost always gets bigger. (If you see compression this probably indicates a problem in your pseudo random number generator) The frauds at ZeoSync are just trying to confuse the issue by invoking technical-sounding jargon from information theory. Their crazy claims do not even stand up to the simplest analysis by counting, much less real-world testing.
Rudi Cilibrasi
I believe that Deborah Tannen pointed this up as a key problem in our society, as the fallacy of "false duality", the notion that because there are two differing points of view that they are both worthy of attention.
You say that "at best, this is revolutionary" but this is like saying "I have a great plan! Everyone takes off their shoes, switches them around, and somehow everyone winds up with a bigger pair! *At best*, everyone gets bigger shoes!" Well, no, just because someone's floated the fantasy doesn't mean it's even a vague possibility. These people are selling snake oil; it can be proved at home. To entertain their fradulent notions simply because they bring them up is a mistake.
If people are to respect the law, perhaps the law should begin by respecting the people.
Then you could take the output files from this compression scheme, which would be pretty uncompressable by traditional methods, and run THEM through the very same compression scheme, and make them smaller still. Repeat ad infinitum, and reduce all the data in the universe to one small file.
Better yet: To use your 10 bits example, feed every one of the 1024 combinations into the decompression program, and one of them is guaranteed to represent all the data in the universe. That's only a handful of combinations, we should be able to check them all before dinner. When someone decompresses the right 10-bit code, call me, since my phone number must be in the data somewhere.
There is a way to make compression like this work - for each string you want to compress, there's a compression program that losslessly compresses it to an arbitrarily short output string (one bit is fine...), but if the output string is N bits long, the program only works for 2**N input strings, and in general requires SIZE(INPUT) bits of program per input string (though for non-random strings, or for related strings, you can do better.) In other words, it's not useful for general-purpose compression, but you can use it for special-purpose compression - you can't design a small compression program to perfectly describe "Alice"'s or "Bob"'s appearance, but you can design a small program that outputs "Alice", "Bob", or "Somebody else".
Similarly, with pigeons, you can play Hundred-Pigeon Monte, and attract investors to your company, or use this to attract customers for your other products, or have a big crowd on the street intently watching you play hundred-pigeon monte with your shill while a pickpocket walks around behind the crowd.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
For uniformly distributed random data (white noise), the average compression ratio has to be 1:1. The "2:1 for 8-bit random" in the parent article is silly, so it puzzles me why it's been modded up. 2:1 is a common rule of thumb for data that's typically stored in computer file systems, but that's far from random; it has lots of ASCII text, executable programs, etc.
Even more importantly, however, is that their "Technical Information" reeks so strongly of buzzwords and technobabble it's hard to read it without the urge to hold my nose. This alone discredits their entire proposition. I feel like I've just been subjected to corporate brainwashing ..er.. I mean marketing.
A solution to the problem with music today
The ultimate solution by far would be a decompression algorithm that, instead of some screwed up checksum and data mapping crap, randomly generates different sets of outputs and lets the user select which is correct.
It is possible to create "Infinite" compression, but it works like the laws of quantum mechanics, i.e. you never really get what you want. Here, I'll perform an expierement:
o I have a 1 byte file I want to send you.
o We start by synching our wrist-watches.
o I call you on the telephone and say "Start" and hang up.
o You and I start counting off the seconds.
o When the number of seconds have passed that are equal to the value of the byte, I call you back and say "Stop".
Now you have the value of the byte given to you in two bits of information (the "start" and "stop") bits.
Now we have an 8:2 ratio, which isn't bad. But I can do this again with a two byte file and get 8:1. I can send you ANY length of file and only consume two bytes of bandwidth... but at a terrible cost: time. Lots and lots of time.
But if you had something like a super far away satalite where bandwith is hard to come by and time is not in short supply, it would be the answer.
"Your superior intellect is no match for our puny weapons!"
Actually, RLE is different [from horizontal line vector encoding of a bitmap]. RLE is when the file says X is repeated Y times as the basic way to compress.
Doesn't "a pixel colored X is repeated Y times" sound like "draw a horizontal 1-pixel-wide line in color X from the current position, Y pixels to the right"?
Will I retire or break 10K?
i would be much more worried about their patent than their technique.
why worry? i don't know; i guess because you can get away with patenting one-click-purchasing and a-book-on-a-disk. those patent clerks wouldn't know obvious if it was spraypainted in blood on the side of harsh goatsex.
my gut tells me they are trying to patent the idea that you can derive a seed (for an algorhythm) from a string of bytes -- and then turn around, seed the algorhythm and get your bytes back.
pretty damn obvious, right? but if you also say you're doing it "to cause compression", and you also shroud it in a fit of higher function theoretical math theories, and viola. if you're hyper-slick you can even get away with charging royalties on every inclusion of rnd(). at least until some nerds get jobs as patent clerks, lawyers, and justices, and show up in high orders on jury duty. that could take decadess, meanwhile they'll be living the high life and stowing it all away in corporate sponsorships or wtfe.
"Stratigraphically the origin of agriculture and thermonuclear destruction will appear essentially simultaneous" -- Lee
I'm not a math person by any means (still doing college algebra, which pretty much means everybody has a better understanding of math than I do), and I would appreciate people picking this apart.
:)
So, my idea for a "Kick Ass Compression"
Take a block of data - throw it against an algorithm that outputs a specific value ( I'm thinking of CRC, MD5 hash or what not), do that several times against several different algorithm which generate a similar kind of value. Record the two (or more) values, then encapsulate the small block of data into larger blocks - I'm thinking only 3 or 4 levels of encapsulation would be needed (because if you calculated the crc of the entire file, a program could decide which choice (in decoding a "block" if there are multiple ones, which I'm fairly sure there will be) is correct.
Now people use md5 hashes/crc checks to verify whether the file they downloaded hasn't been modified, so I'm assuming that it is fairly difficult to get the exact value (especially with a known size). Using this "property" (I'm not sure if that is a correct word) you could decode the data into one of several (hundred??thousand??) byte streams (possibilities of uncompressed data) and by comparing byte streams between algorithm A and B, the byte streams would match at one (would it be possible to have more? I suppose it depends on the algorithms used) point, which would be the proper "uncompressed" (rather derived or something) data.
I'm pretty sure it would take a shitload of computing power in decompressing, but computers are fairly fast nowadays, and I think that this could be a viable at some point. 100:1 probably not, and there would be a lower limit imposed on the file size based on the possible choices (I think the possible choices would reach a pretty large number pretty fast)
Maybe I'm just plain wrong - but could something like this be useable? Any abuse would be appreciated
Thanks!
1q2w3e4r5t6y7u8i9o0pqawsedrftgthyjukilo;p'azsxdcf
Why do people ignore mathematically sound proofs? This has been proven to be impossible in many ways. Hello? Earth to morons?
Music speeds up when you yawn, but does not change pitch.
This is quite different than ...the meaning of life...
I like to keep things clear. ;-)
These are methods that are derived from classical physics and statistical mechanics and quantum theory, and at the highest level, this mathematical breakthrough has enabled two classical scientific methods to be improved, Huffman Compression and Arithmetic Compression, both industry standards for the past fifty years.
Haha! Arithmetic encoding *is* the improvement to Huffman encoding. Arithmetic encoding is mathematically perfect. Its ratio cannot be improved. Only its speed / memory use during encoding could be improved. This is obviously tripe, since they don't even realize that AC is just the back-end to the compressor, and the front-end (the model) is what can be improved. Why do these 100:1 lossless compression stories even make Slashdot?
Music speeds up when you yawn, but does not change pitch.
I won't go into whether the compression ratios claimed are possible or not as this has been thrashed to death already. The one thing they do admit is that their technology is extremely compute-intensive. It appears they've only been able to run their algorithms on very small data sets (running on very large computers) due to this problem. Whether this technology proves to be a complete farce or not it doesn't appear that it will ever be practical for live streaming at any rate. I know some oil companies that wouldn't mind finding a way to compress all their seismic data though....
Works by redifining the word byte to equal 2 trillion bits.
Thus, my entire hard drive fits in under 1K.
And they claim it to be lossless... But if the data is truly random would it matter if you lose some when uncompressing?
Never attribute to stupidity what can be construed as a monopoly preservation tactic.
This whole discussion reminded me of an idea I had the last time compression made Slashdot's front page. If you compressed a file and threw away the dictionary/hash, so all you had was the compressed data stream, couldn't you use that as a source of entropy for PRNGs and OpenSSL and such? I mean, theoretically, it's supposed to be identical to random noise. It should be really high quality entropy.
Is this insightful, or is there some obvious flaw that I'm missing because I don't know how PRNGs work?
I'm proud of my Northern Tibetian Heritage
All you have to do is break the data into chunks... 4 bytes for example. Next calculate the sum of the bytes (using their ASCII codes) in the chunk. Then (this is the hard part) you determine the permutation (from a list of possible permutations) of 4-byte chunks of data for that sum. All one has to do is transmit the sum and the permutation... two numbers... and you can use all sorts of fear inducing math to compress it even more. For better compression, change your chunk size to a bigger number. Also, you can re-compress your data (or even other compressed data) for smaller size. You can even stream it. The one drawback though... it takes lots and lots of cycles to compress and even more to decompress.... (sigh). At least it got me an A in one of my early CS classes!
Live wrong, impostor.
A binary string is said to be random in
algorithmic information theory (my area of mathematics)
if it is not significantly compressible.
(I won't get into exactly what this means here
but you get the idea...)
Anyway,
anything that is compressible by a factor of
100 must have a huge amount of structure for
the compressor to take advantage of, and so
is highly non-random by definition. Clearly then their "virtually
random" data is not random in the slightest.
In fact in order to be compressible by this
factor it must be EXTREMELY non-random!
Their press release states:
All of these traditional methods are being enhanced by ZeoSync through collaboration with top experts from Harvard University, MIT, University of California at Berkley, Stanford University, University of Florida, University of Michigan, Florida Atlantic University, Warsaw Polytechnic, Moscow State University and Nankin and Peking Universities in China, Johannes Kepler University in Lintz Austria, and the University of Arkansas, among others.
The claims about their compression performance are clearly false - but even legitimate companies are known to exaggerate and oversell. However this list of academic collaborators is the most damning evidence of a hoax, IMHO. Having worked on quite a few projects at the interface of commercialisation and academia, I can promise you that there is no way any project can run with such a long list of partners. Maybe 2 or 3 would be beleivable.
Karma police, I've given all I can, it's not enough, I've given all I can, but we're still on the payroll.
Hmmph. Since they didn't bother to run the press release through a spell checker ("Berkeley"), I suspect they have bugs that could randomly change your compressed data into listings from the 1937 Manhattan phone book. (Or perhaps old episodes of "Who's The Boss?").
"She's really terrible at math," said a member of the Mathematics/Computer Science Department at Macalester, who wishes to remain nameless. "Her belief that she shattered Godel's Incompleteness Theorem by stating that `Um, humans don't think like machines, so like, it can't be right,' once again completely brings the mathematics community down to its foundations. Or rather, it doesn't , because she doesn't have clue one about what she's talking about." Other voices dissent:
"It's important to think outside of the box," said the chair of the Classics department. "Her complete lack of mathematical knowledge only makes her a better candidate for seeing the inherent flaws in centuries of mathematical reasoning."
"So like, some scary guys were arguing about this thing on some website they saw, saying it was impossible, that you could compress some arbitrarily random something-or-other, and I said, `Look, like, in Honey, I Shrunk the Kids, they could make anything smaller because there was space in-between particles, or Barbi dolls, or something, so like, we just have to find the space in-between your data.' That shut them up really quick."
Hastily scrawling her new idea on a cafeteria napkin, the "idiot prodigy," as dubbed by the "scary guys" (the local chapter of the ACM), she has come to the conclusion that the only way to reach this kind of compression is to use rational numbers. "Like, 1/2 is smaller than one- why don't they just use that? I mean, I've found the space in-between their `data,' like, why don't they believe me? A whole `nother department at this college does!"
In posthumous response, Huffman, of Huffman-encoding fame, is now spinning in his grave- along with Turing, Godel, and Church.
Other than changing a few titles and names, this event actually happened in a class I was taking- Godel's Incompleteness Theorem was summarily "disproved", along with the Church-Turing thesis, and the entire idea of P(!)=NP, by a classics major in my Advanced Symbolic Logic class.
I suppose that, in whatever context, the lesson for today is that it's easy for any one person to disprove an untenable law of mathematics or computer science, simply by being really bad at math.
quite amazing: that rediculous claim of doing something that has been proven impossible many years ago not only got them into slashdot, but also into the computer column of my local austrian newspaper (sigh). even more amazing: the number of people posting their own wonderful algorithms for compressing random data here. most amazing: i waste my time bothering.
- Take a file, any file. The aforementioned Matrix movie, for example. Now, line up ALL the bits in the wonderfully huge thing.
- So you have a massively long string of 1s and 0s. Resolve into a decimal number. (I know this acheives nothing, but bear with me).
- Create a mathematical algorithm to which the answer will be this number. (Clearly can be quite small; x(to the power of)21 + 4 or something.)
- Convert this algorithm into binary (imagine it in decimal for simplicity;s sake).
- Go to 3, and reapeat at will.
No doubt there is a very fine reason why this is idiotic and won't work, but I'm not much of a geek so please tell it to me. Nicely.Security through promiscuity is no better than security through obscurity.
To get 100:1 compression, you'd only need to use
1 65 212124436344046800\
9 55 216520806400000000\
89002814716285851175021399360911083689418762459
23192713729439634462735763217762589807312090608
0000000000000000
megs of storage. You could cut that down some if you realize that some of those strings have repeating elements, but that would ruin the elegant simplicity of such an approach. Of course, it wouldn't work. It would be 100:log_2(that number up there). It's close enough, though.
This may have already been posted, and if it has sorry, but I thought this may be of interest to some of you.
Jean-loup Gailly (one of the creators of gzip) has written an article on a patent that was granted for compression of truly random data, and how it is not mathematically possible. You can read it here for those that are interested.
man
No manual entry for
This is not the first time so called unlimited lossless compression has surfaced.
s ec tion-8.html
Some of you may recall an article that appeared in Byte Magazine a few years ago:
April 20, 1992 Byte Week Vol 4. No. 25:
"In an announcement that has generated high interest - and more than a
bit of skepticism - WEB Technologies (Smyrna, GA) says it has
developed a utility that will compress files of greater than 64KB in
size to about 1/16th their original length. Furthermore, WEB says its
DataFiles/16 program can shrink files it has already compressed."
[...]
"A week after our preliminary test, WEB showed us the program successfully
compressing a file without losing any data. But we have not been able
to test this latest beta release ourselves."
[...]
"WEB, in fact, says that virtually any amount of data can be squeezed
to under 1024 bytes by using DataFiles/16 to compress its own output
multiple times."
The product did not work as advertised ( surprise ) and does not seem to have made many inroads into the data transission industry.
More of this can be seen at:
http://www.faqs.org/faqs/compression-faq/part1/
So if the packet it sent via tachyons (or sent in an alternate universe) and arrives at the exact moment it is sent, transmission time = 0, therefore the packet has been "compressed losslessly." Cool. I understand. :)
Get off my virtual lawn, you damned virtual kids!
postings to this topics are so redundnant that a compression rate of about 1:800 should be easily achievable.
... to have catchy theme music, and pretty flash intros? That's how *I* can tell they doing something real in the academic community. :)
...
If their technology is so earthshatteringly different and revolutionary but can use existing connections, why didn't their site download instantly? If it's only software and they already have a patent one would think the easiest route to gain investors would be a small download and a mindblowing demo away
Get off my virtual lawn, you damned virtual kids!
I got about 6:1 compression on the pizza I ate last night.
If they got anything more than about 20:1 compression, I'd suggest eating food with more fibre.
But 100:1 lossless compression? Guys, call yourselves an ambulance. Healthy digestion should include some loss of information.
http://pcblues.com - Digits and Wood
Surely they mean arbitrary data, not random data. A fundamental property of random data is that it is its own shortest description - i.e. incompressible.
The whole thing stinks, really. Even if they mean arbitrary non-random data the 100:1 compression factor is just not achievable all the time: can you achieve such a compression factor on a 50 bit string?
>> > To ogol uzywamy makes " you " ( very like, with Panu zalezy :-) >> Not zalezy me to formie " Mr. ". More responds me shape " you" poprostu >> reads when notki of thee on internecie as well I see, with you've at least tytul >> doctor, meeting ex szacunku zwrocilem sie shape " Mr. ". Yes by way of student >> zostalem wonted . Ex that tides bede returns sie to " you ". > > what, znalaz?em is not wr?cz a kick in the pants?ce... >pardon me On?odku, ?e not wiedzia?em :-(( > >"Dr. Wlodzimierz Holsztynski Dr. Holsztynski became > and full professor of mathematics at Warsaw University > at the age of 22, uniquely combining pure applied ... " > >I am sorry, bank is not niedost?pna, and quotation this balance ex googli. >Id? seeks onward:-)and mo?e, On?odku, title?by? what wi?cej...? > >salutes, ?K To wit clotted nonsense . not wiedzialem via who downtime, with this outlet pozwolila yourselves to quotation jakichkolwiek informacji of me . Zadnych not autoryzowalem,,nie upowaznilem them, not pozwolilem, and yet to upshot zabronilem - when a few days ago by accident zauwazylem what title, this spot napisalem until them a man of law, zeby those " informations " usuneli. I`ve istotniejsze successes, niz those nieprawdziwe, listed to that stronie, and yet if not mial, not wants tommy rot . (they such belongings does by, wherebyprzez co soots, przyciagnac investor ). Bylem them consultant, moze anew bede, and moze not, ex nimi it is anyone's guess . Pardon me too those niepotrrzebna misinfoprmacje, whereas naprawde there are not therein neither troche my guilts . Salutes, Wlodek I'm sorry, with yes sie steel, choc in the main this smieszne. -- ============= P of l N E ON S ============== cartulary as well rummaging newsów http:/www.polnews.pl ---- ex 28.08 nowa, lepsza version ----
I'm glad I took the time for that
Being a mathematics grad student myself familiar with Smale's work, believe me this stuff is legit.
... just my two cents worth. I'm personally not at all surprised that this would be possible.
Clearly they are not all the way there yet. A fellow named Michael Barnsley at Georgia Tech (author of Fractals Everywhere) has been working on this kind of stuff to (under the moniker Iterated Function Systems). I think the catch is that compression times have been astronomical (although that could be coming down due to recent advances in computational horsepower and theoretical breakthroughs) and the decompression time isn't exactly snappy either.
Anyway, sorry if this is redundant
can be compressed to water ;)
By definition, random data can not be compressed because it contains no repeating patterns.
Once again, slashdot falls for another BS story.
You guys suck.
We need a slashdot poll:
A random set of numbers means:
(1) Any and all possible sets
(2) Only those sets which have no visible patters
(3) RAANNNDY!, Baby!
(4) Cowboy Neal
Fellowship 9/11
The real breakthrough is the new discovery that the number of TMs and words capitalized in TheMiddle == the amount of money these folks will dupe from some silly investors.
1:1, and so will remain.
Nough said.
Steve Smale is a real mathematician, one of the great ones of the 20th century (he'd be in his mid 60's now). I had some classes from him at UC Berkeley in the 90's and know him slightly. He's not a computer guy, but there's no bullshit about him and I'm amazed if he's actually been pulled into a scam like this. He retired from UC a few years ago and last I heard he was teaching in Hong Kong. I'll see if I can find an email address for him and ask him what the story is.
I can get 1,000,000:1 compression... Just store a very large file at, say:
www.placeoflargefile.com/bigfile
There! 32 bytes. Unfortunately, the "decompression algorithm" can take quite a while, depending on your connection...
I'm quite aware of that page. I work about 20' from the author :-).
I'd argue that there is no effective commercial one-time pad, only products that approach it. There have been a number of companies releasing similiar press releases about OTPs for some time, but each time the generation method has resulted in it not being an OTP. Most of the time it has also been substantially worse then most existing algorithms.
...a lot of other hoaxes out there (that numerous people have already mentioned), however, it also sounds like another "new" technology.
;)
There's a company out there called Datagistics that is also claiming magic compression of pretty much all data, using a technique they call "Random Access Para-Integrated Data", or RAPID for short. They're not claiming 100:1, but rather 20:1, so I guess their technology has a 5 times better chance of being real
The site is unfortunately a little light on the details, not even offering a techo-babble pseudo-explanation like these ZeoSync guys...
Yes, I really believe that they can compress random data. Even though the various mathematical definitions of the word "random" all essentially mean "noncompressible." As pointed out many many times already, for any compression scheme to work at all, there HAVE to be noncompressible strings. The word "random" simply refers to such strings.
For a ratio of 100:1, a fantastically high proportion of strings must be noncompressible. Information theory is not my field, but I would assume it'd be something like this: Given an arbitrary string, there's a 1 in 2^100 chance it's compressible. Yay, the world's bandwidth problems are solved.
I can't believe Slashdot's editors even bothered to post this load of bull.
The original Howling Frog is a fictional character and has no UID.
Had a quick brainwave.. it's probably wrong, but I thought I'd throw it out incase there's something in it.
Why *can't* you use equations to represent long streams of data?
If you ever wrote a compiler or studied random number theory, you'll know that the only 'random' numbers a computer generates are psuedo-random. Most 'random' number generators use quirky equations to produce their output.
What if you could match the output to a number of equations in some way?
Now, it's a well known fact that for the amount any compression routine can compress 'random' data.. it must also expand an equal amount of data. That's fine. What if there was a routine that would *only* compress 20% of files to a twentieth of their size? You could run the algorithm over it, keep the output in the cases where the compression was efficient, and you're still up on the deal.
Before you say anything.. I know all about the 'Question 9' and pigeon-holing blah blah blah.. just throwing this out.
I think the Reuters people were wrong, Shannon died last year, not a decade ago .... if you read the report in Reuters site
Basicly it means they *might* have a breakthrough for audio/video, but it's useless for executables etc.
This will be lossy, so the above and this might be offtopic.
You know, in fact most audio/video are compressed using lossy algorithms most of the time exploiting redundancies non-percievable to the audience, such as using YUV colorspace instead of RGB and fancy transformations into frequency space from spatial space.
In the end the binary soup is often encoded using Huffman and runlength encoding. Add to this trivial (not really) motion detection and compensating algorithms and you arrive at MPEG1/MPEG2.
Dedicated circuits handle the encoding and decoding (where encoding is much more expensive).
My point is that you cannot discard the application domain when talking about compressing A/V if you want to achieve a cost-effective encoding+decoding solution. Mere non-informed compression is here a waste of time. Use what you know!
If (that's a very big if) the universe is completely deterministic then, in theory, everything in it could be calculated by knowing the initial conditions of the Big Bang and all the physical laws that acted to change those conditions. In that case, nothing in the universe would be truly random and provided you knew a thing's position in spacetime you could calculate its data (except that something tells me your simulator would have to be more larger than the universe itself to run the code). On a more realistic scale, if you knew exactly the conditions under which your data to be compressed was obtained, you could run a simulation of the process again to regenerate the data. E.g. a script for a raytracing program will be many times smaller than any algorithm such as jpeg could ever make the resultant image. Basically what I'm saying, is IF there's no such thing as truly random numbers, and you know how the random data was generated in the first place, you can run the generation process again to get the data again. Hopefully the data you would need about the initial conditions would be less than the data the process generated.
Your ad here.
The word Berkeley is actually quite correct. Check this link to University of California Berkeley's Website: http://www.berkeley.edu/
The current BEST ratio for compressing truly random data is 1:1
In other words, you can't do it.
If you TRY, some compressions software will end up making it bigger.
These guys are claiming 100:1 lossless on truly random data. This is difficult to believe on both fronts.
First, 100:1 lossless on any real-life data is unlikely. Add in the 'truly random' part...
So.. either they've violated the laws of the universe, or they are about to bring about one of the biggest mathematical discoveries in the world, or they are full of crap.
You can't compress every set of 1000 bits of data into 10 bits of data.
10 bits of data only allows for 1024 combinations.
1000 bits allows for a lot more.. so it's simply not possible.
What? You don't remember this? Oh, thats because it is worst than traditional public key encryption schemes. 1) Its security through obscurity, you need a method to transfer the time to grab the private key out of all of the possible ones. 2) Whats worse is now for regular private key encryption to transfer the data, instead of having the entire key length as possible combinations, now there are only the ones transmitted by the common source. (No matter how many they are it will always be much less than 2^128)
We all know this compression method will follow a similar path.
rm -Rf /
Do your best, hope for the best, suspect the worst.
I'm not kidding here. Let's take a look at a few possibilities.
We take "random" data, and convert it into an image. We somehow "convert" this image into vectored images. Then an entire image is saved as descriptors, not as data itself. So it can happen. A combination of different sin or cos waves can yield any shape wave. Suppose we draw a line, number marker lines above it, 1-9. Then below it, number it A-F. Then we progress and plot the HEX code. After that is all plotted, it creates a wave graph. Suppose they found a way to efficiently describe the wave graph. Then compression ratios of 100:1 is quite possible. We at that point aren't actually saving data, but a DESCRIPTION of data, which might be shorter. I can have a string of numbers 1 to 1,000,000; and that would not compress that much; but I can say;
for (i=1;i1000000;i++)
printf(i)
I can just save that, and I will have saved 1,000,000 lines of numbers. I am not saving any data, just a smaller description of it. Hell, in my example, I have achieved 41666:1 compression ratio! So.. before you doubt, shut your trap and think outside the box...
The second theory that makes me think this might actually be true, is that the pigeon hole theory they described. If you rearrange data and then "move it up a dimension" you can group things in such a way that a smaller subset of information is stored. You add noise to it. When you decompress, you WILL lose something in the dimensional transition, BUT, as long as what you lose is not along the data path, or is not part of the dimension quadrant that actual data sits on, you are fine. Sooo... it _IS_ possible.
Now did these guys in Florida do it? I don't know, I doubt it. Is it possible to? Yes.
Do your "bits of entropy" math calculations and you will find it's not possible... blah blah blah.. BUT, if you think OUTSIDE THE BOX, then it should be possible.
Two notes:
1) There was a similar claim by a program called OWS, back in 1992. It was a hoax.
2) Nothing will compress to 1 byte because there is always algorithm overhead, so at some point in time, if this does compress random data, it might GROW slightly in size, past a certain point.
My guess is, such high compression ratios, if they are to be achieved, will require a lot of horsepower, both to compress, and decompress; with the teeter-totter tilting heavier on the compression.
BlinkBlink
If we think a bit closer about the quote:
"The limitation to this Pigeonhole Principle circumvention is that the multi-dimensional space can never be super saturated, and that all of the pigeons can not be simultaneously present at which point our multi-dimensional circumvention of the pigeonhole problem breaks down."
Bringing the multi-dimensional aspect down to a plain byte level, what they are saying is that as long as the byte only contains values between 0 and 3, they can achieve astonishing compression levels. Hell, even I could do a 64:1 with such an assumption =)
They have funny wording in their release about data that is practically random. Well, that can be parsed to mean that in practice, the data is random and therefore it can be replaced by any other random string. After all, it's random! Not mathematically random in the entopy sense, but used by an application which wants any old string of random numbers. So sure, I can send a message saying, "generate me 1000 random digits". Great compression. Useless in practice, of course. In any case, these guys sound like a get-rich-quick scheme, trying to fool people, and not the only one of that type I can think of.
ZeoSync says they encode their targets so that they will 'substantially occupy a space of low Kolmogorov Complexity Construct' (see their 'Technical process' page). This might mean that if you encode the most 'meaningful' bit patterns with low encoding values falling within a target compression range which spans only 1% of the encoding space of all possible bit patterns, you will get the kind of results they are touting.
For example, there are many bit patterns which could be used as a jpg image, but most random bit patterns will simply look like noise. So, encode the ones with potential usefulness with small encoding values and encode the noisy images with the larger encoding values. They don't attempt to encode random bit messages losslessly, only the useful ones.
The trick is sorting out the symantics of a bit string. Which ones are noise, which ones have meaning and which ones are p0rn.
The answer is quite simple. The encoder and decoder are each about 5 terabytes big with every possible combination of 0s and 1s in them. Then, all the "zip file" has to store is the location of the file. Sure, 100-1 compression, but it requires about 3 copies of cdrom.com's computer to hold the thing :)
Forgive me for being paranoid (hence AC, and yes I know this post will probably never be read), but what I see here is an investor scam site (the old snake-oil compression thing, *again*?!) with an unskippable Flash intro on the same day news about a Flash-borne COM virus breaks.
Anyone care to virus-scan the Flash intro on their page? Is this a seed?
"This press release may contain forward-looking statements. Investors are cautioned that such forward-looking statements involve risks and uncertainties, including, without limitation, financing, completion of technology development, product demand, competition, and other risks and uncertainties."
I've seen that a couple of times before, and each time it really meant this:
"The claim made in this press release is bogus. The company does not posess the ability to perform what is promised above. The part of the statement above that is not a lie, is total bullshit. This press release was created as a mere attempt to get money from clueless investors"
--- Hindsight is 20/20, but walking backwards is not the answer.
If you consider video; If you are going between two frames of video, and you're using temporal compression, then only the sections of the image that change get updated, and not the entire image. The rest of the data is dropped, and movie size becomes that much smaller.
Overall, though, you're never going to get 100:1 across the entire film even with throwing away data, so how does this work?
If one were to apply a series of (memorized) random algorithms to the random data of a file, could one set of the resulting data actually be easier to compress using more conventional methods?
:P
I doubt they have anything worked out but I was hoping to breifly spark some interest with a not-so-well-thought-out post
where did I put my modem? Broadband's dead.
Assuming "random" data is the hardest to compress, as I think it probably is, can this possibly be true?
The following thought experiment:
So now you've got a DVD movie that fits on a floppy disk. I don't buy it. (Not that I wouldn't, if it were actually possible.)
How can this be reasonable at all?
Compress random data 100 to 1 ? I'll bet you $1000 that it's either a exaggerated claim, or an outright scam.
OK imagine it's thousands of years in the future and humans can do funky things with space and time.
1. You have the file you wish to compress on your hard drive.
2. With your ultra-hi-tech science you open a wormhole to connect this point in space-time with say, a point 2 months in the future.
3. You delete the file from your hard drive. It was the only copy in existance so you've effectively compressed it to 0 bytes and you can use this disk space for 2 months.
4. 2 months later you need the file again so you fetch it via the wormhole.
OK so this is kind of like just backing up the file to a tape drive or something and then copying it back when you need it, maybe it's not really compression - but the difference is, *after* you delete it, it really doesn't occupy any *space* at all until you need it again 2 months later.
Now we just need to work out how to make a wormhole, hmmm...
Your ad here.
How do we define truly random? Is there any other definition than it's data for which the shortest possible description is a list of the data itself. Then recoverably compressing truly random data is by definition impossible.
i've never laughed so much at a post :)
I know someone else posted this but I'm just going to reiterate and provide an interesting example.
Let's say you have a 1 gig file that happens to be the DIVX rip of the Gladiator DVD.
soooo since this thing can do "any random bytes" you might be able to assume that you can zip over and over....
1,000,000,000 bytes
to 10,000,000 bytes
to 100,000 bytes
to 1,000 bytes
to 10 bytes
Then, it could be as easy as going to a messageboard and typing "this is the data for Gladiator--> 'S#j1LLzo0i'"
but obviously I REALLLLLY doubt you can do it over and over.... like the article says "100:1 in some/most cases"
but still if compression got down that far.... warez would truly be unstoppable. You could have every creation ever created on a 10 gig drive with ease.
I need a compression routine with more than a Zero Space Tuner(tm) and BitRate Accelerator(tm) and Fake Article Compounder(tm), I need one with Sub-Space Intergalactical Holographic Nucleoumical Redifferenciator Protocol(tm) support.
JAR, from the maker of ARJ, is substantially better than ZIP and RAR as far as compression goes and substantially slower also.
Interesting thing I remember with JAR in DOS, is that the more memory you have to assign to the compression, the better the compression.
http://www.arjsoft.com/jar.htm
War crimes, torture, lies, illegal spying... Would someone give Bush a blowjob, already, so he can be impeached?
Hmm... so we've seen a thousand times mathematical proofs that no, you CAN'T have 100:1. OK. Here's how I think some of it might work.
They say they intentionally randomize some data (this is out of the New Scientist article). That means they can't be doing pattern-matching on it because they're deliberately destroying patterns.
How about finding which patterns it DOESN'T match?
I read about this in some article in some magazine about iris scanning. The author of the iris scanner software said (i paraphrase from memory), "The big breakthrough came when I stopped trying to find what patterns are in the image, and started trying to find what patterns *aren't* in the image".
So perhaps instead of storing information about how the file is constructed, they store information about how the file ISN'T constructed. It seems lame and stupid -- but fits in perfectly with their claims. Great compression of random data - random data has no patterns. They don't mention repeating data - because I'll bet you that ZeoSync fails miserably at patterened data. Actually... no it wouldn't. A file that repeated the pattern 101010101010 isn't repeating the pattern 110011001100, so they could still use pattern-non-matching.
In any case, I lost track of compression technology about five years back. This is random and incoherent mumbling and should be ignored. http://www.nitrozac.com/ for no more information. Hooray for Nitrozac!
Dead on.... said it before i could
Therefore all possible data sequences appear somewhere within the digits of Pi, right ?
Therefore any file of any size can be represented by just two numbers - the position of the starting digit within the digits of Pi, and the length of the sequence. Presto. QED.
Best of all, if you want to encrypt as well as compress, just use "e" instead of "Pi".
(note: one of the two numbers resulting from the compression may be rather lengthy in nature, however, do not let this prevent you from IPOing your company)
Ok, where do you store the information about where the error is in the 3 bit string. You have listed next to each 3 bit number an error and a position for that error. You also need to include information indicating whether or not the digits following the encoded 000 and 111 is significant. So here is your algorithm(the flaws in your reasoning should be apparent):
000 0 00
001 0 11
010 0 10
011 1 01
100 0 01
101 1 10
110 1 11
111 1 00
Reordering you encoded strings we get 000, 001, 010, 011, 100, 101, 110, 111, which is the exact same amount of data that we were trying to compress, so therefore you cannot compress a truly random 3 bit string, nice try though.
Note: we need to use 00 for the case of 111 and 000 in order the alorightm to be able to differentiate between cases such as 111 and 1,1,1.
I have a method that works in theory, I just don't have a computer trillions of times faster to make it work.
I think that the trick is not to think of the file as a stream of numbers, but instead as one or more incredibly large numbers or ILNs(tm). You've seen complicated equations that to resolve result in really large numbers - just build tools to analyze the number to find the shortest possible equation to represent it. Trade size for computing power.
As for the compress the compressed number thing, you would perhaps be able to do such but it would provide diminishing returns.
Maybe when I have a computer with the equivalent of 1 billion pentaflop chips my dream will be realized.
History will forget me though.
Paladin
Thanks.
Here's a scheme to get major compression of all files:
caveat: it requires user input.
Program saves number of bytes the file contains.
Program randomly generates files of this size and asks the user "Is this the right one?"
You are an idiot. He was saying the compression is lossy. This means that he doesn't store where the error bit is. Simply that they are probably erroring up/down to the nearest well compressible value, and then compressing. Not that the restore would make things exactly as they are supposed to be.
the whole universe decompresses at a ratio of 100:1 except for your file. you laugh, then you die. the end.
"Stratigraphically the origin of agriculture and thermonuclear destruction will appear essentially simultaneous" -- Lee