GZipping Life Forms: Deflate Reveals Bare-Bones
An anonymous reader writes "To distinguish images derived from living vs. non-living sources, USC and NASA JPL researchers report today using the standard gzip compression utility. As a measure of overall pattern complexity, they find that the inherent pixel content of biologically generated fossils produces higher image compression ratios [more data redundancy], compared to their non-biological counterparts. The more the file shrinks, the more likely it is that a living process was involved. A test is live online here. This extends the simple, but powerful, uses of gzip to biogenic fossil detectors, in addition to spam cop filters, DNA sequence comparisons, digital camera image crunchers, etc. In nine months, the two Mars rovers will send back the first microscopic-scale images of Mars rocks, which should be amenable to some of these same techniques: thus gzipping is apparently pretty zippy."
Lifeforms seem to be built on patterns afterall. Patterns are easily compressible.
Bad pun at the end of the original post not withstanding, this is pretty cool stuff. Wonder why nobody thought of using comression in this manner before? This has all sorts of potential uses.
... therefore I am.
:-)
I'm not sure I should be flattered that the best way to tell a picture of me from a picture of a rock is that I have more redundant image data.
So when we compress the ultimate, super-duper intelligent life form we get a two byte file containing "42"
Trolling is a art,
that this has something to due with patterns and image continuity. If so (enlighten me!), then it would be a decent filtering tool, but reliability would be a major problem. Geological (or whatever) patterns could fool the algorithm. Finally, the most compressible image consists of monochrome - is it alive?
(Mods: the last line was a joke, intended to point out a particularly simple example of a problem - not a troll)
It is true that many pictures of life forms compress to better or worse than than in-antimate objects. Just beause a picture of something compresses similarly to a life form doesn't mean it is a life form. This is simply coincidence.
Just zip me up and email me to Scotty. Too bad I'll probably have to do it naked to save a few bytes.
No more sniffing when i'm checking items in the refrigerator - is it 'alive' ? gzip is the answer!
Doesn't gzip only look for patterns in one dimension? Assuming they are using these for pictures, they are missing the boat on at least one more area of complexity!
then we will find out if he truly is the borg!
this sig steers like a cow. and i can prove it
OK, so if I have this right: Life is less random, and more predictible (more compressable)than non-life.
So that tells me that life contains less data then non-life.
Perhaps sophisticated life (human life?) contains even less data than non-sophisticated life. So the smarter we get, the more predictable we get, and the less data we contain.
Perhaps we will someday get smart enough to be totally compressed to one bit. In the time I thought about this concept, I think my gzip file got even more compressed. Hmm....
So does this go to show that life is anti-enthrophy?
May I please have my frontal lobotomy if I bring back the ashtrays?
The Magic School Bus is true!
From excellent karma to terible karma with a single +5 funny post...
... if it could find life forms in my doom wad's?
....as image 1 and image 2 seems to have different complexity........???
I tried it with two jpgs off my little home site, and the poor thing died with a div by 0 error.
Either I'm doing something wrong with my jpg compression, or this is slightly flakey - a successful pair of test pics would be most helpful.
"I Know You Are But What Am I?"
Has anyone checked if bzip2 is better or worse in detecting biological products?
/usr/src/linux/kernel/sys.c, 24957 bytes uncompressed).
After all, they have quite different compression characteristics (on one hand, compression of a megabyte of zeroes is much better in bzip2, OTOH adding the same file on top of itself and then compressing gives much less additional compressed size with gzip than with bzip2 - tested with
The Tao of math: The numbers you can count are not the real numbers.
...that I am not a unix user.
.sit? I thought you people became mac users! :P
Where is
Although I'm certainly no compression expert, I think this makes sense. Many (most?) natural systems have fractal structures on some level so it only makes sense for them to compress better (ie: have more self-similar features) than systems which don't have this feature.
...
Then again, what do I know? Maybe something more immersed in this field can tell us whether there's a seed of truth to my ramblings
Greetings
--> R
It would be nice if someone would play with gzip and the various genomes available.
That would probably have more relevance (being in a single dimension) than images, IMHO.
Every one of us is incredibly redundant, and I don't just mean in our posts on slashdot!
Simply consider that you can have a reasonably good duplicate of yourself, with only the DNA contained in a single cell!
You may need most of your parts to be functional but, information-wise, it all comes down to 1 germ cell (say, a spermatozoid) and the aparatus needed to move it into proximity of another compatible germ cell ;)
Hmmm, since life forms are generaly based on geometric patterns, I would think fractal compression would be even conclusive in terms of detecting life.
Now gzip stands for GenomeZip.
So an RFC1437 bodypart could fit on a floppy?
Companies like Image Metrics use a mathematical translation into n-dimensional space similar to a compression algorithm to perform some interesting kinds of image recognition and processing. Examples are medical diagnosis, facial recognition, crystal growth monitoring and the like.
s p
http://www.image-metrics.com/pages/technology.a
This post can't be compressed.
at the comparison page attached to the article that lets you run the same test on images that the researchers tried. In a startling discovery that is sure to earn me a Nobel Prize for Physics, Chemistry, Biology and Marital Relations, I was told the following:
"Answer: Image 1 [the Mars image](1.43702451394759 % compression) has a higher complexity measure than image 2[the image of my wife] (0.773501341151519 % compression), and thus image 1 is more probably biogenic."
Not only does this prove that there was once life on Mars, but it also proves that my wife is some sort of robot. Further research will be undertaken pending receipt of my prize money.
Carousel is a lie!
.. thought of being gzipped is quite disturbing.
Mad Scientist: "Fire up the GZip Continueum Transfunctioner!"
Operator: "Okay, Boss"
*Bizzzttt*
"Never let the truth get in the way of a good story..."
i'm dropping pictures in from stileprojects webcam site versus pictures of various cars. i consistently get results showing that the anime is more biogenic than a car.
Great, I think I just figured out a new method for pr0n detection. Unless we're talking anime, of course.
What if this weren't a hypothetical question?
The creators of WinZip filed suit stating that they have a better assembled compression utility and will use it not only to distinguish between living and non-living, but make the living incased in a tiny plastic cell on a keychain that kids can take with them, feed, and keep healthy.
Business \Busi"ness\, n.;
A scam in which all people involved perceive as beneficial...
Hubba Hubba!
Using that live test, I gave it one image of my face, and another of some rocks.
Apparently, the rocks are "more probably biogenic" than I am. Bastards.
One of the posters brings up an interesting point. Although meaningful data has more information than pure noise, it also has less than a blank signal. When you download pictures, regardless of the "meaning" they have to you, their compression can vary a considerable amount. And you've probably heard the statistic that the english language is 50 percent redundant. That figure may vary a bit too, but the point is that english's meaning to us is independent of its information content. And the probability that an image of a life form with more information will also have more "meaning" is probably just as uncertain.
Roughly, Kolmogorov Complexity is a measure of randomness - the measure is how long a computer program needs to be to reproduce data (pardon an oversimplification).
-Mark
The problem here is that your wife is wearing clothes. Clothes are man made.
If you send me a picture of your unclothed wife, I'll be happy to, uhm, test this theory.
It was pretty simple... Images over a certain size contained lightning, the others were mostly black, therefore smaller. Once I filtered it that way, manually filtering out the better images was easy.
Ok, I might very well be confused, and not using the tool right, but I plopped two of the only pictures I could find into this thing; an areal shot of a bunch of houses, and a picture of a really good looking woman. The thing told me that the houses were much more likely biological. Strange. Unless it can see implants?
~Jon~
This space for rent, inquire within.
Read about it in _the_ book (http://www.cwi.nl/~paulv/kolmogorov.html) or check out the web site here (http://www.hutter1.de/kolmo.htm). For a more succint idea of the approach, these articles by one of the gurus on the topic (http://www.cs.ucsb.edu/~mli/focs.ps and http://www.cwi.nl/~paulv/papers/ecml97.ps).
"Consensus" in science is _always_ a political construct.
Couldn't we use a similar system to see if an e-mail is spam or not?
Mark
ahh, but the picture of your wife contains a lot of inanimate objects. I'm sure if you cropped the picture down to just her (or reasonably close) she would fare better in this comparison.
I compress to binary 0, therefore I am not.. :(
zip is a fine thing, but it's not a pattern-recognition program!
This is the loopiest thing I've heard of since Rosenblatt reported that his Perceptrons could distinguish between music composed by Bach and music composed in imitation of Bach.
Good heavens, any picture that's slightly out of focus will now be declared to be evidence of "biological processes."
I'm guessing that the researchers are not as nutty as they sound and that they've done more than is being reported, but still...
Reminds me of the researchers in the sixties who were publishing analyses of data that supposedly showed "biological clocks." It turned out that they were using smoothing algorithms that, basically, were filters that had a 24-hour peak in the frequency domain--so their analysis was creating the patterns they claimed to be detecting. A debunking article was published in Science in which another research used data from a random number table (the "unicorn" data) and showed that the same analysis techniques showed that the unicorn had a biological clock.
"How to Do Nothing," kids activities, back in print!
Isn't that conclusion the opposite of CmdrTaco's use of compression to weed out "lame" postings? More noise is apparently more valuable discussion, while less noise is somehow considered likely spam? How many good postings have you seen with a line "this has been added to get past the lameness filter"?
[
This must have been the solution in second Fly film. The researchers kept using lame when they should have used gzip!
gzip seems to be good for every sort of pattern detection. There was an article, but I forget where I read it, a couple of months ago, on how gzip was used to detect the language used in a few written words. I know it's OT, but could somebody who remembers please answer me with a link ?
Erm, it's a photograph. It's all inanimate. Just like my ex-wife, though she was inanimate in real life too...
I myself have successfully used gzip for factoring large prime numbers, sorting the men from the boys, unblocking the kitchen sink and cracking safes. I'm currently trying to locate Osama Bin Laden by compressing Al Jazeera footage, but all I come up with are reports of Elvis sightings.
If I seem short sighted, it is because I stand on the shoulders of midgets
...that this item was posted a day early.
Sigs are bad for your health.
This whole thing is slightly dodgy, and I begin to wonder whether it was released a day early by mistake.
The big problem is the use of JPEG source images. Unless you've stuck it up to the maximum size on quality, then the jpeg artifacting (which is in effect repeating blocks of image data after transitions) will probably mask any hidden level of complexity in the images - the human brain is a much better tool at pattern recognition than most computer algorithms (especially those algorithms not designed for the task!).
Throw high-resolution bitmap files at it, and I'd be more persuaded that there is a genuine effect. Until then, I suspect it's more of a happy coincidence that the files they've thrown at it give results they are excited about.
Jolyon
Please read my Canon EOS tech blog at http://www.everyothershot.com
doesnt bzip2 outperforms gzip ?
gzip might be preferable because it works more locally. It only keeps track of the last n bytes of data and does substitutions based on patterns seen in those n bytes.
bzip2 uses a markov predictor and the chain length is typically much longer than gzip uses, so the compression is less local. That's great if you're going for compression but for this work, it might be misleading.
That said, gzip doesn't know about image formats, so I wonder if these guys are getting some false positives on scanline wraps and other non-image data.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
April Fools isn't until tomorrow.
Did someone perhaps get caught a day early?
Wait wait wait you have a wife? Dude, this is Slashdot; are you sure you're not a misdirected user???
Karma whorin' since 1999
42 is one byte.
BZZZIIIPPP2 it, haha.
This isn't funny.
But not quite. It detects patterns but it does use gzip in a similar manner.
Simphile uses the gzip program to detect patterns in two files. Used to determine things from whether two sonnets where written by shakespeare or whether certain sounds files came from the same source.
This is not surprising at all really. Gzip and other compression utilities can be used to get upper bound for real/nonredundant information content.
<p>I'm not sure if above is public knowledge, but I have used it as a one additional feature for certain pattern recognition tasks for a while.</p>
It was an interesting coffee break discussion with one of my professors, we were arguing if there is neat way to estimate the semantic content of a neural network after training it, I recall suggesting to compress the value of the weights of all layers and the less compressible the more this neural network is trained.
- Your wife is hot.
- Can I have her number?
Godspeed!now imagine a bewoulf cluster of these....
'42', EOF
In order for this to work, the tester would need to eliminate a whole spectrum of other variables that would affect the outcome of the test. Image format (JPEG compresses less than BMP), image size, JPEG "resolution" (pixels per inch), color depth, etc. Of course, I assume they have some way of standardizing their input images, but it's unlikely to become an automated process....
I am alone, yet I also surf the universal backwash of undifferentiated Being, which is LOVE.
I envision a whole array of compression algorithms.
Each algorithm could be fine tuned for a paticular type of pattern.
Is that an elephant or a giraffe?
Does it compress better with the elephant algorithm or the giraffe algorithm?
The Slashdot spam filter was the first thing that came to mind on reading this. Are we seeing excuses? ;)
Have fun. Or failing that, be miserable with style.
Reread the story:
Now, there are several questions you should ask yourself:
(1) Are you calling your wife a fossil?
(2) Does she read Slashdot?
(3) If she could mod, would she mark it +1, Funny, or -1, Won't_get_some_for_a_while?
This is of course ignoring the effect of the environment which almost certainly reduces the symmetry (and thus the redundancy) produced by the genetics of a life-form.
But then again, maybe DNA would also have a high degree of repetition, at least in sexually reproducing life-forms, since DNA also has to be sensably combinable.
And the wife wasn't man made? Wow!
out of the tar and into the gzip, eh?
It is a well-known fact that any compression algorithm will cause some files to increase in size when 'packed'. If this were not the case, then '42' would be the compressed version of some other file, say 'Wdugiu*6x9', which in turn would be the compressed version of DNA's DNA, which in turn might be the compressed version of the answer to life, the universe, and everything. Furthermore, everybodies DNA would compress down to the same file '42' (since we all contain the answer within ourselves, presumable mice would compress down to something else), which would mean we were all clones, which means that I am the Pope and you are CowboyNeal (and vice versa). QED.
- Nevermind.
- She kinda looks like Dick Nixon.
Retreat!I doubt this is very accurate for marking photos as hits or misses directly. This kind of thing may be useful more for detecting the lack of life rather than the presence of it. If compression rates are low, maybe you don't have to look at this photo so much. If they're high, maybe you want to examine it more closely. If you're dealing with truck loads of data and you're looking for a needle in a haystack, a mechanism for ruling out uninteresting data is invaluable.
That having been said, it sounds good in theory that 'organisms are highly patterned and therefore compress better', but then why would you use gzip? Why not take that theory and build something a little more adept at locating particular types of patterns you're interested in, or ruling out the ones you know are going to create false positives?
So, THAT having been said, I'm forced to wonder if somebody forgot that March has 31 days. Lord knows I can never keep track.
Interesting. For genome analysis Hidden Markov Models have been used in a lot of software.
Maybe if you could have an image recognition system do the Hard Machine Vision probelm of generating a schematic of the picture, and then fed the "leg bone is connected to the hip bone" kinda data into a HMM you could work out which fossils are ancient Cambrian crustations and which ones are Trogdor the Burninator.
Can you hear me now?
Actually, it would suprise me if an advanced civilization would send all their signals uncompressed and waste radio bandwidth.
There are 2 kinds of people in this world: Those who write in decimal and those who don't
I wonder if viruses (sorry - didn't RTFA) would compress like living life forms or if they would be more similar to nonliving.
Just a thought.
Mathematician, n.:
Someone who believes imaginary things appear right before your i's.
A pkzip file (aka Winzip default) is not equivalent to a gzipped file, but more analogous to a gzipped tar archive! Pkzip stores all that wonderful file information - full path, permissions, owner, and so on with the compressed data. Gzip by contrast only compresses, and doesn't store archival information. Gzip leaves the archival information in the filesystem. If you tar.gz'd the file; the filesize of the .tgz would be similar to the pkzip.
The difference between your filesize and his is likely the difference in the lengths of the pathnames to the respective text files and not a difference in the size of the compressed data. Remember pkzip files store the full pathname in the file uncompressed; gzip doesn't store the filename at all.
This gives new meaning to the phrase:
"Honey do I look fat in this?" Put on Gzip glasses. "Of course not dear."
There are other techniques for measuring the level of chaos in a set of data, and they'd probably yield more consistent results than running the data through an algorithm meant for an entirely different purpose.
What about compress? Or even good old "compact". Ah I remember the days when we had 20% compression
and were glad of it and some of the old timers could have been confused with non living processes
even without the help of gzip anyway!
What was said was that lifeforms tend to organize their environment, reducing entropy and the reduced entropy yields more compressible images. This is a very logical progression since all compression is entropy-based in inverse proportion. The greater the entropy (disorganization), the less compressible a file is.
To say that compressibility implies life is not the case. To say that life implies compressibility is the case, and can certainly reduce the the number of images through which one must cull. Better to look at the interesting image files that might contain life signs, than looking at the less compressible areas that definitely do not.
Short form of above: I agree.
The comment is implying that gzip removes most of the surplus information. So it's quite a good zip.
compress well, and only structures which contain a great deal of order. So it's probably interesting, even if it's not life.
feed a business technology proposal through gzip
But seriously, I wonder what weird pics people have uploaded :)
Get your own free personal location tracker
I don't understand . . . Does an image having a fractal structure really compress better than one without? I can see that it might compress really well if you could detect the underlying algorithm: "Hey, that's region X of the Mandelbrot set", so its Kolmogorov complexity would be pretty low. But does gzip really detect this? As an image, that bit of the Mandelbrot set might be pretty hard to compress.
.
I just find it strange that I keep reading comments nodding at the assumption that being fractalish means easy compression . .
In addition to the artificial object problem mentioned above, there is also the far more likely problem of resolution and size. A 12' by 12' picture using 600 dpi resolution is FAR more complex than a 4' by 4' using 200 dpi.
excitingthingstodo.blogspot.com
I tried out the link and uploaded a pic of kermit, and another of a tree frog. Guess who's biogenic?
This is not the first time zip (or other compression algorithms) have been used as an identification tool.
:)
You can do a form of crude language identification using it too, its a bit long winded and you can do far better with other metrics but...
1) Create a load of zip files wich contain large samples of language - one zip file per language
2) Add the file to be identified to each zip file in turn and see which one grows in size by the smallest amount.
Because compresion is based on patterns, the new file will compress the most in the archive which contains the most similar text.
So all we need to do to spot life on other planets is to fill zip files with images from good scifi films (mars attacks seems appropriate) and away we go
Spell checker (c) creative spelling inc. (aka my dyslexic brain)
What are you, a Ferengi? "UUUman FEEmales wear clothes!"
Dude, you got a fembot!
The linked article points out some problems with this approach.
Really, why is this news?
Sounds like someone at NASA got a little carried away with their new toy
You can accomplish anything you set your mind to. The impossible just takes a little longer.
this?!?!?
"If I have been able to see so far, It is because I went out and bought a damn binoculars" - Ze da Esquina
It would seem that the same approach could be used to distinguish potential intelligent radio signals from those of random or astronomical origin. Though perhaps you would want a pattern to be present resulting in a more compressible file? I think it would depend whether the signal that is picked up is a deliberate simple pattern meant to be a "hello, are you out there?" broadcast by an E.T, or if it is normal communications between E.T.'s not realizing (or not concerned) that they are being overheard.
Work for Change & GET PAID!
Because of the mutation factor inherent in meiosis, her genes are not fully determined by other humans. Ergo, she can make humans but humans did not make here. QED.:)
Is this a sigs-optional kind of place? 'Cause I am totally down with that if you know what I mean.
Oh great... wait until the cost-cutting management gets ahold of this. I can just hear it, "We can save money and data storage space if you'll just stop using full-data compression and use a lossy compression algorithm instead! The savings will be huge, and the end-users won't know the difference."
No wonder Dr. McCoy didn't want to use the transporters.
Someone pointed out that using JPEGs as source images is tainting the results. That's definitely true, but the basic concept here is valid. Anyone who's studied both physical entropy and information theory can understand that the two are highly related. I'd like to see them do this study again with the *original* images -- I suspect the result will still come out. I'm not sure gzip is entirely the most appropriate algorithm to use here, but it could work.
Very cool...
tried it with a pic of a posing pr0nmodel & a TVR & Porsche 911.
Try it with a pic of an amateur.:)
Is this a sigs-optional kind of place? 'Cause I am totally down with that if you know what I mean.
Hey, maybe GZIP will help Stallmann find his home planet!
:-)
{ just kidding, Richard--you da man. }
I believe the first step is to test pictures of yourself against her, so that you can rule out who's really the cyborg.
Using CGI as the user hit the web page it took pictures at different shutter speeds. Working up from the slowest shutter speed the first JPG over 20K bytes was the right exposure and was shown on the page.
So I guess if you squeeze hard enough you literally CAN get blood from a stone!
;)
(And I deserve "most obscure joke" points for that one
"This extends the simple, but powerful, uses of gzip to biogenic fossil detectors..."
The problem with gzip is that doesn't preserve data very well. Now tar, it preserves fossil data quite well.
And the jury says:
There you have it, CowboyNeal is more biogenic than Taco.
Cheers. Lucky for you, I am man-made and I can be produced on demand. You didn't think Saint Aardvark actually wooed me in a conventional fashion? Nay, he had me crafted from bits left over from his home made beowulf cluster.
- The posting-from Sunday-apparently Wife
..."a fascinating new paper on the use of data compression algorithms for allowing a machine to quickly determe aspects of a document like language and authorship." For those less mathematically inclined see report from the Economist.com.
here she is
I want to delete my account but Slashdot doesn't allow it.
In some ways this technique is meant to defeat systematic biases like the ones you mention. Compression tools make few assumptions about the data they process, so they serve as a check against more tailored filters which may introduce artifacts, or be defeated in some way. This problem may occur because they look for pre-selected "features" in the data rather than looking at the distribution of the data as a whole.
gzip isn't perfect, but it will find repetitive byte sequences of any kind, regardless of the type of data. It's more of a sanity check than a knowledge extraction method.
---- "If we have to go on with these damned quantum jumps, then I'm sorry that I ever got involved" - Erwin Schrodinger
Now, I know I will either be flamed or derided for bringing up the mention of this text. I don't claim to be an expert on it (in fact, the scope and breadth of the reading convinced me that one time through is no where near enough - I will probably re-read it several more times in my life). I also know that his work both extrapolates and builds upon previous work - he mentions this repeatedly throughout the book.
If you are at all interested in this sort of thing (and let me tell you this, his book covers much more than just using compression algorithms to determine patterns created by biological processes), you owe it to yourself to read, in full, the book at least once.
Reason is the Path to God - Anon
well, i don't know if this is a result of iteration or merelly a fact of nature, but i tried the "biogenic image detector" and when comparing the same image, the file 1 was allways more compressed than the second, despiste being the same... Can anyone explain me this?
Wesley Willis writes:
I told you about the time
that life was differentiated with GZip.
I was very greatful to hear about that stuff.
Take it to me. It'll be the best test to the life.
Tell it all to Jean-loup Gailly. Tell it to Mark Adler with all your mite. I'm saying this again to let you know:
GZIP IS THE ANSWER
GZIP IS THE ANSWER
GZIP IS THE ANSWER
GZIP IS THE ANSWER!
Tell it like it is, Jack
It was so alike to be compressed.
I told it the same way.
I have to compress it the same way so it is to be alive.
Make it great for the grant money.
It should be for the grant money.
I'm saying this again to let you know.
GZIP IS THE ANSWER
GZIP IS THE ANSWER
GZIP IS THE ANSWER
GZIP IS THE ANSWER!
Tell it like it is, Jack
Everywhere you go, GZip is helping you.
GZip that file. It'll sound the trumpet for you like it'll sound the trumpet for me.
GZip it well for you and I'll gunzip it well for mell.
I'm saying this again to let you know.
GZIP IS THE ANSWER
GZIP IS THE ANSWER
GZIP IS THE ANSWER
GZIP IS THE ANSWER!
I'm letting you know that GZip is the answer.
Rock on Jean-loup and roll on Mark.
Otherwise, you'll just get a worthless puddle of protoplasm when you uncompress people on the other end of the teleport. Also, don't compress humans and insects together, altough you might get a better ratio that way.
You didn't read the article carefully enough. The seventh paragraph of the article clearly states they used TIFF images, not JPEG.
tried it on goatse.sx yet? he has no clothes, so...
Surely winzip would be a better solution after all everyone knows that there are more windows users then any other platform.
;^>
This would give a much better chance of finding the existance of life through shear numbers.
Please tell me this story is an early April Fool!
I uploaded one image of war casualties (incinerated bodies from 1991 gulf war) and an image of a friend of mine and. The result:
Answer: Image 2 (24.1070556640625 % compression) has a higher complexity measure than image 1 (0.861397377968705 % compression), and thus image 2 is more probably biogenic.
I guess it's doing what it's supposed to...
Did a test run with some default images in windows xp. Windows XP's "Purple Flower.jpg" is apparently more "alive" than Windows Xp's "Tulips.jpg" but "Windows XP.jpg" is more alive than both of them!
--- If we knew half the things we shouldn't we'd stop wishing we knew it all
What's so special about (g)zip? Would not any good archiver (or, rather, any good archiving algorithm) do?
Rhetorical questions, of course -- what good is an article if it does not mention GNU and/or Linux...
In Soviet Washington the swamp drains you.
"reigons of similar, predictable data while preserving aspects of dissimilar, unpredictable detail.. " the article posted doesn't (if you did any reading) claim to do anything other than that, and that there might be good correlations to that, and life forms.
what does what you do with gzip have anything to do with the article ?
The basic idea is that the probability of highly compressible data resulting from a algorithmic process is much higher than one that would result from a random process.
http://saintaardvarkthecarpeted.com/images/snapsho t.png
l l.jpg
http://saintaardvarkthecarpeted.com/images/tacobe
will somebody please moderate that fossil -1 redundant?
Random data may not be meaningful but they are full of information by definition.
Consider the sequence 123123123123. The sequence is highly ordered, and therefore is probably meaningful (at least to someone, somewhere), but it contains very little information. In contrast the random sequence 196390244187 is highly disordered, totally meaningless, yet contains more information than the 123123123123 sequence.
The technical definition of "information" is counterintuitive, not as simple as "noise vs. signal."
Long ago engineers learned that nature evolves life to adapt to it's environment. We often look to nature for inspiration.
What drives nature? Survival for one. Survival often depends on having a backup so it's no surprise that nature tends to adapt redundant systems.
I have 2 hands in which I can hold a spear or club. I have 2 eyes to identify my potential predators. I have 2 ears to hear them coming. I have two...well you get the picture.
All of this is externally visible. As such, the a jpeg or gif image is likely to capture some quantifiable amount of this externally visible redundancy.
GZip which is being used here to measure visual redundancy in an ingenious manner. It's not entirely surprising that it's working.
-- Good judgement comes with experience. -- Experience comes with bad judgement.
This same information could be achieved by doing a frequency histogram on the data.
to measure entropy or redundancy. Why not do that directly? A program to measure 8bit entropy is not more than a few dozen lines of C, or one could simply "apg-get install ent".
Marklar: marklar
too bad not many will get the joke :-)
Add some entropy to your life; write drunk
But would writing drunk increase or decrease entropy overall ?
Working for necessity's mother.
"player 4 hit player 1 with 0 stroms"
Don't Gzip the living, that's a felony.
With some users complaining about receiving up to 1,000 unsolicited e-mails a day there is no shortage of innovative solutions to stop the spam. Blacklists of known spammers and keyword filtering have been tried with mixed results. Another promising approach, pioneered by companies like MailFrontier, MailBlocks and others is called challenge-response. It works as follows. When a customer receives a new message from an unknown correspondent, the system will intercept the message and automatically return to the sender a form to fill out. The typical form contains graphics images, simple pictures, geometric shapes, colored check-boxes or other objects that are easily recognizable by humans but hard for computers to make sense of. Once a human being views those images and types the response into the form - demonstrating that she is a person and not an automated mass-mailing machine - the system will forward the e-mail to the intended recipient. DyedBlond, a secretive Silicon Valley artificial intelligence startup, is rumored to have been working on an advanced version of a challenge-response spam blocker. Whereas existing challenge-response spam blockers discriminate between mass-mailing machines and humans, DyedBlond discriminates between intelligent and not so intelligent humans. "Counting daisies and bunnies is too simple," says Alex Brodenschmuck, a renowned AI expert, "Sooner or later the machines would learn how to do it. You need more sophisticated tests for human intelligence. An ability to maintain a conversation has always been considered a test for true intelligence (so-called Turing test)." But DyedBlond goes beyond small talk. Want to send an e-mail to a physics professor? Be ready to take an integral or solve a differential equation. Sending your resume to Wall Street? Prepare to price an exotic option. Not only does DyedBlond solution eliminate spam, it prioritizes mail by sorting the correspondents by their intelligence. "If the guy cannot solve Schroedinger equation for hydrogen atom you probably don't want to hear from him," says Tiev Resle, a Cornell physics professor. "I am looking forward to the day when American Physical Society makes DyedBlond mandatory filter for all e-mail sent to its members. This will kill spam once and for all, and improve students' performance."
1. This seems like it might be a special case. The samples in question are layerd and thus have a greater chance of being compressable. I wonder how well a tree would do.
2. As far as I can tell they are not pictures of life but rather the effects of life. The results may come out differently when the pictures were of actuall living things.
3. If for some reason an image gets compressed by a lossy compression scheme then the data has to be thrown out.