GZipping Life Forms: Deflate Reveals Bare-Bones

← Back to Stories (view on slashdot.org)

GZipping Life Forms: Deflate Reveals Bare-Bones

Posted by Hemos on Monday March 31, 2003 @02:31AM from the getting-to-the-core-of-the-matter dept.

An anonymous reader writes "To distinguish images derived from living vs. non-living sources, USC and NASA JPL researchers report today using the standard gzip compression utility. As a measure of overall pattern complexity, they find that the inherent pixel content of biologically generated fossils produces higher image compression ratios [more data redundancy], compared to their non-biological counterparts. The more the file shrinks, the more likely it is that a living process was involved. A test is live online here. This extends the simple, but powerful, uses of gzip to biogenic fossil detectors, in addition to spam cop filters, DNA sequence comparisons, digital camera image crunchers, etc. In nine months, the two Mars rovers will send back the first microscopic-scale images of Mars rocks, which should be amenable to some of these same techniques: thus gzipping is apparently pretty zippy."

25 of 243 comments (clear)

Min score:

Reason:

Sort:

Makes sense... by Anonymous Coward · 2003-03-31 02:33 · Score: 4, Insightful

Lifeforms seem to be built on patterns afterall. Patterns are easily compressible.
1. Re:Makes sense... by jolyonr · 2003-03-31 03:27 · Score: 5, Interesting
  
  Unfortunately it's not that simple, inorganic systems can have as much visual complexity as organic things. For example.. um.. (looks out of window here in Toronto).. a snowflake! Fractal complexity, such as that seen in the branches of a tree, is frequently mirrored in the inorganic world - the snowflake is one example, another less well known example are manganese dendrites, they look just like fossil plants, but are totally inorganic such as these [Victoria Museum]. The patterns of frost on a frozen windscreen are another example. I can't see how a computer program can distinguish whether such complex patterns are signs of life or not. Still, if it helps NASA get more funding, then who am I to argue! Jolyon
  
  --
  
  Please read my Canon EOS tech blog at http://www.everyothershot.com
I compress.. by mr.+methane · 2003-03-31 02:35 · Score: 4, Funny

... therefore I am.

I'm not sure I should be flattered that the best way to tell a picture of me from a picture of a rock is that I have more redundant image data. :-)
1. Re:I compress.. by DShard · 2003-03-31 02:44 · Score: 5, Funny
  
  That actually should flatter you. You have less entropy so you are of a higher order than the rock. You can brag to all your non-rock friends that those stupid rocks have high entropy.
A-ha! by grub · 2003-03-31 02:36 · Score: 4, Funny

So when we compress the ultimate, super-duper intelligent life form we get a two byte file containing "42"

--
Trolling is a art,
Excellent... by Anonymous Coward · 2003-03-31 02:37 · Score: 5, Funny

No more sniffing when i'm checking items in the refrigerator - is it 'alive' ? gzip is the answer!
Be Humble by hugesmile · 2003-03-31 02:39 · Score: 4, Funny

OK, so if I have this right: Life is less random, and more predictible (more compressable)than non-life.

So that tells me that life contains less data then non-life.

Perhaps sophisticated life (human life?) contains even less data than non-sophisticated life. So the smarter we get, the more predictable we get, and the less data we contain.

Perhaps we will someday get smart enough to be totally compressed to one bit. In the time I thought about this concept, I think my gzip file got even more compressed. Hmm....
1. Re:Be Humble by javatips · 2003-03-31 02:50 · Score: 5, Insightful
  
  > So that tells me that life contains less data then non-life.
  
  No, it means that life contain less noise than non-life.
The fractal geometry of nature? by RNG · 2003-03-31 02:43 · Score: 4, Interesting

Although I'm certainly no compression expert, I think this makes sense. Many (most?) natural systems have fractal structures on some level so it only makes sense for them to compress better (ie: have more self-similar features) than systems which don't have this feature.

Then again, what do I know? Maybe something more immersed in this field can tell us whether there's a seed of truth to my ramblings ...

Greetings
--> R
this might have a few glitches by jj_johny · 2003-03-31 02:45 · Score: 4, Funny

When I compressed the transcript of the Osbornes, it got increadibily high compression but I don't think they are intelligent life forms. Or maybe I am really wrong.
This post can't be compressed.
The Mars fossil IS made by life; my wife is not. by Saint+Aardvark · 2003-03-31 02:46 · Score: 5, Funny
In a true first for extraterrestrial biotic research, I decided to compare two pictures:
- The Mars meteorite fossil
- and my wife
at the comparison page attached to the article that lets you run the same test on images that the researchers tried. In a startling discovery that is sure to earn me a Nobel Prize for Physics, Chemistry, Biology and Marital Relations, I was told the following:
"Answer: Image 1 [the Mars image](1.43702451394759 % compression) has a higher complexity measure than image 2[the image of my wife] (0.773501341151519 % compression), and thus image 1 is more probably biogenic."
Not only does this prove that there was once life on Mars, but it also proves that my wife is some sort of robot. Further research will be undertaken pending receipt of my prize money.
--
Carousel is a lie!
Kolmogorov Complexity by MarkWatson · 2003-03-31 02:53 · Score: 4, Interesting

This seems like a "sort of" restatement of Kolmogorov Complexity.
Roughly, Kolmogorov Complexity is a measure of randomness - the measure is how long a computer program needs to be to reproduce data (pardon an oversimplification).
-Mark
Re:The Mars fossil IS made by life; my wife is not by Anonymous Coward · 2003-03-31 02:53 · Score: 5, Funny

The problem here is that your wife is wearing clothes. Clothes are man made.

If you send me a picture of your unclothed wife, I'll be happy to, uhm, test this theory.
Operating Principle? Kolmogorov Complexity by fygment · 2003-03-31 02:57 · Score: 3, Informative

Read about it in _the_ book (http://www.cwi.nl/~paulv/kolmogorov.html) or check out the web site here (http://www.hutter1.de/kolmo.htm). For a more succint idea of the approach, these articles by one of the gurus on the topic (http://www.cs.ucsb.edu/~mli/focs.ps and http://www.cwi.nl/~paulv/papers/ecml97.ps).

--
"Consensus" in science is _always_ a political construct.
Re:The Mars fossil IS made by life; my wife is not by (startx) · 2003-03-31 02:58 · Score: 3, Interesting

ahh, but the picture of your wife contains a lot of inanimate objects. I'm sure if you cropped the picture down to just her (or reasonably close) she would fare better in this comparison.
I am not by Karpe · 2003-03-31 02:58 · Score: 4, Funny

I compress to binary 0, therefore I am not.. :(
Biological clocks in unicorns... by dpbsmith · 2003-03-31 02:59 · Score: 4, Interesting

zip is a fine thing, but it's not a pattern-recognition program!

This is the loopiest thing I've heard of since Rosenblatt reported that his Perceptrons could distinguish between music composed by Bach and music composed in imitation of Bach.

Good heavens, any picture that's slightly out of focus will now be declared to be evidence of "biological processes."

I'm guessing that the researchers are not as nutty as they sound and that they've done more than is being reported, but still...

Reminds me of the researchers in the sixties who were publishing analyses of data that supposedly showed "biological clocks." It turned out that they were using smoothing algorithms that, basically, were filters that had a 24-hour peak in the frequency domain--so their analysis was creating the patterns they claimed to be detecting. A debunking article was published in Science in which another research used data from a random number table (the "unicorn" data) and showed that the same analysis techniques showed that the unicorn had a biological clock.

--
"How to Do Nothing," kids activities, back in print!
gzip - the swiss army knife utility by kinnell · 2003-03-31 03:07 · Score: 5, Funny

I myself have successfully used gzip for factoring large prime numbers, sorting the men from the boys, unblocking the kitchen sink and cracking safes. I'm currently trying to locate Osama Bin Laden by compressing Al Jazeera footage, but all I come up with are reports of Elvis sightings.

--
If I seem short sighted, it is because I stand on the shoulders of midgets
Slightly Dodgy by jolyonr · 2003-03-31 03:10 · Score: 5, Interesting

This whole thing is slightly dodgy, and I begin to wonder whether it was released a day early by mistake.

The big problem is the use of JPEG source images. Unless you've stuck it up to the maximum size on quality, then the jpeg artifacting (which is in effect repeating blocks of image data after transitions) will probably mask any hidden level of complexity in the images - the human brain is a much better tool at pattern recognition than most computer algorithms (especially those algorithms not designed for the task!).

Throw high-resolution bitmap files at it, and I'd be more persuaded that there is a genuine effect. Until then, I suspect it's more of a happy coincidence that the files they've thrown at it give results they are excited about.

Jolyon

--

Please read my Canon EOS tech blog at http://www.everyothershot.com
Re:why no bzip2 ? by bill_mcgonigle · 2003-03-31 03:10 · Score: 5, Interesting

doesnt bzip2 outperforms gzip ?

gzip might be preferable because it works more locally. It only keeps track of the last n bytes of data and does substitutions based on patterns seen in those n bytes.

bzip2 uses a markov predictor and the chain length is typically much longer than gzip uses, so the compression is less local. That's great if you're going for compression but for this work, it might be misleading.

That said, gzip doesn't know about image formats, so I wonder if these guys are getting some false positives on scanline wraps and other non-image data.

--
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
Compression to measure semantic content by KingRamsis · 2003-03-31 03:20 · Score: 3, Interesting

It was an interesting coffee break discussion with one of my professors, we were arguing if there is neat way to estimate the semantic content of a neural network after training it, I recall suggesting to compress the value of the weights of all layers and the less compressible the more this neural network is trained.
Re:and language detection. by spot35 · 2003-03-31 03:21 · Score: 3, Informative

Could this be what you're after?
Bzip2? Bah , new fangled rubbish! by Viol8 · 2003-03-31 03:53 · Score: 3, Funny

What about compress? Or even good old "compact". Ah I remember the days when we had 20% compression
and were glad of it and some of the old timers could have been confused with non living processes
even without the help of gzip anyway!
this can also detect PHB's by IDigUNIX · 2003-03-31 04:05 · Score: 4, Funny
As alternative to this hypothesis consider:
feed a business technology proposal through gzip
- A very high compression ratio indicates that the proposal was likely to be written by consultants. As supported by the fact that they usually re-use the same buzz phrases over and over.
- A moderate compression ratio indicates that the proposal was written by engineers. Typically they use large words, and unique phrases that are already compressed. I.E. SNMP, J2EE, WWW, and so on.
- A zero to negative compression ratio indicates that the proposal was likely to be written by a PHB, and hence void of all indications of intelligent life. As evidenced by most PHB's having a hard time using buzz phrases and keywords in context, so they won't recycle enough words to form a good compression dictionary.
Re:Cool by tijnbraun · 2003-03-31 05:12 · Score: 3, Interesting

A similiar technique has been used by italian mathematicians to differentiate pages from various authors by using zip. A nature article can be found here. After a request from a dutch newspaper they were able to identify one author (Marek van der Jagt, which made his first debut) to be the same as an already well-known author (Arnon Grunberg).