Slashdot Mirror


Using gzip As A Spam Filter

captainclever writes "Kuro5hin have an interesting article on detecting spam using gzip." Here's a sample: "Loosely speaking, the LZ (Zip) and the related gzip compression algorithms look for repeated strings within a text, and replace each repeat with a reference to the first occurrence. The compression ratio achieved therefore measures how many repeated fragments, words or phrases occur in the text."

3 of 268 comments (clear)

  1. Excellent by Phosphor3k · · Score: 5, Funny

    Slashdot can use it to filert out duplicate stories.

  2. Yay! by Anonymous Coward · · Score: 5, Funny

    What an idea!

    I could use this to avoid those people who keep saying the same thing all the time, over and over again...

    Now, how can I convince my mother to use e-mail?

  3. Re:Grep it instead! by Walterk · · Score: 5, Funny

    Just egrep for '(penis|enlarge|money|auction|cash|advance|fortune )'. And hope no hot babes email you complimenting your penis, or mention they want their breasts enlarged, offer you money, auction off your award winning lego collection or anything like that.