GZipping Life Forms: Deflate Reveals Bare-Bones
An anonymous reader writes "To distinguish images derived from living vs. non-living sources, USC and NASA JPL researchers report today using the standard gzip compression utility. As a measure of overall pattern complexity, they find that the inherent pixel content of biologically generated fossils produces higher image compression ratios [more data redundancy], compared to their non-biological counterparts. The more the file shrinks, the more likely it is that a living process was involved. A test is live online here. This extends the simple, but powerful, uses of gzip to biogenic fossil detectors, in addition to spam cop filters, DNA sequence comparisons, digital camera image crunchers, etc. In nine months, the two Mars rovers will send back the first microscopic-scale images of Mars rocks, which should be amenable to some of these same techniques: thus gzipping is apparently pretty zippy."
BTW, if you want to be file name independent, you can useThis way, gzip doesn't see the file name, and therefore doesn't include it into the
The Tao of math: The numbers you can count are not the real numbers.
Read about it in _the_ book (http://www.cwi.nl/~paulv/kolmogorov.html) or check out the web site here (http://www.hutter1.de/kolmo.htm). For a more succint idea of the approach, these articles by one of the gurus on the topic (http://www.cs.ucsb.edu/~mli/focs.ps and http://www.cwi.nl/~paulv/papers/ecml97.ps).
"Consensus" in science is _always_ a political construct.
This is not surprising at all really. Gzip and other compression utilities can be used to get upper bound for real/nonredundant information content.
<p>I'm not sure if above is public knowledge, but I have used it as a one additional feature for certain pattern recognition tasks for a while.</p>
Could this be what you're after?
A pkzip file (aka Winzip default) is not equivalent to a gzipped file, but more analogous to a gzipped tar archive! Pkzip stores all that wonderful file information - full path, permissions, owner, and so on with the compressed data. Gzip by contrast only compresses, and doesn't store archival information. Gzip leaves the archival information in the filesystem. If you tar.gz'd the file; the filesize of the .tgz would be similar to the pkzip.
The difference between your filesize and his is likely the difference in the lengths of the pathnames to the respective text files and not a difference in the size of the compressed data. Remember pkzip files store the full pathname in the file uncompressed; gzip doesn't store the filename at all.
Using CGI as the user hit the web page it took pictures at different shutter speeds. Working up from the slowest shutter speed the first JPG over 20K bytes was the right exposure and was shown on the page.