Using gzip As A Spam Filter
captainclever writes "Kuro5hin have an interesting article on detecting spam using gzip." Here's a sample: "Loosely speaking, the LZ (Zip) and the related gzip compression algorithms look for repeated strings within a text, and replace each repeat with a reference to the first occurrence. The compression ratio achieved therefore measures how many repeated fragments, words or phrases occur in the text."
This article will make much more sense if you look at the raw data in tabular form.
http://www.paulgraham.com/spam.html
http://www.archive.org/details/ThePowerOfNightmares