Slashdot Mirror


GZipping Life Forms: Deflate Reveals Bare-Bones

An anonymous reader writes "To distinguish images derived from living vs. non-living sources, USC and NASA JPL researchers report today using the standard gzip compression utility. As a measure of overall pattern complexity, they find that the inherent pixel content of biologically generated fossils produces higher image compression ratios [more data redundancy], compared to their non-biological counterparts. The more the file shrinks, the more likely it is that a living process was involved. A test is live online here. This extends the simple, but powerful, uses of gzip to biogenic fossil detectors, in addition to spam cop filters, DNA sequence comparisons, digital camera image crunchers, etc. In nine months, the two Mars rovers will send back the first microscopic-scale images of Mars rocks, which should be amenable to some of these same techniques: thus gzipping is apparently pretty zippy."

6 of 243 comments (clear)

  1. Re:The same image... by maxwell+demon · · Score: 2, Informative
    Hmmm.... really as "image1" and "image2", and not as "img1" and "image_2_with_an_incredibly_long_file_name"?

    BTW, if you want to be file name independent, you can use
    cat file | gzip -c9 | wc -c
    This way, gzip doesn't see the file name, and therefore doesn't include it into the .gz file.
    --
    The Tao of math: The numbers you can count are not the real numbers.
  2. Operating Principle? Kolmogorov Complexity by fygment · · Score: 3, Informative

    Read about it in _the_ book (http://www.cwi.nl/~paulv/kolmogorov.html) or check out the web site here (http://www.hutter1.de/kolmo.htm). For a more succint idea of the approach, these articles by one of the gurus on the topic (http://www.cs.ucsb.edu/~mli/focs.ps and http://www.cwi.nl/~paulv/papers/ecml97.ps).

    --
    "Consensus" in science is _always_ a political construct.
  3. gzip == measure of information content by firecode · · Score: 2, Informative

    This is not surprising at all really. Gzip and other compression utilities can be used to get upper bound for real/nonredundant information content.



    <p>I'm not sure if above is public knowledge, but I have used it as a one additional feature for certain pattern recognition tasks for a while.</p>
  4. Re:and language detection. by spot35 · · Score: 3, Informative

    Could this be what you're after?

  5. Irony? No... by Anonymous Coward · · Score: 1, Informative

    A pkzip file (aka Winzip default) is not equivalent to a gzipped file, but more analogous to a gzipped tar archive! Pkzip stores all that wonderful file information - full path, permissions, owner, and so on with the compressed data. Gzip by contrast only compresses, and doesn't store archival information. Gzip leaves the archival information in the filesystem. If you tar.gz'd the file; the filesize of the .tgz would be similar to the pkzip.

    The difference between your filesize and his is likely the difference in the lengths of the pathnames to the respective text files and not a difference in the size of the compressed data. Remember pkzip files store the full pathname in the file uncompressed; gzip doesn't store the filename at all.

  6. Did something like this years ago by rasper99 · · Score: 2, Informative
    I used a technique like this to do a web cam way back in 1997 before web cams were an easy thing to do. I was supporting Silicon Graphics workstations at the time. One of the models came with a digital camera. The cameras did not have automatic exposure.

    Using CGI as the user hit the web page it took pictures at different shutter speeds. Working up from the slowest shutter speed the first JPG over 20K bytes was the right exposure and was shown on the page.