Slashdot Mirror


Using gzip As A Spam Filter

captainclever writes "Kuro5hin have an interesting article on detecting spam using gzip." Here's a sample: "Loosely speaking, the LZ (Zip) and the related gzip compression algorithms look for repeated strings within a text, and replace each repeat with a reference to the first occurrence. The compression ratio achieved therefore measures how many repeated fragments, words or phrases occur in the text."

18 of 268 comments (clear)

  1. Grep it instead! by WestieDog · · Score: 2, Funny

    Forget about gzip all the 'cool' geeks use grep! :)

    1. Re:Grep it instead! by Walterk · · Score: 5, Funny

      Just egrep for '(penis|enlarge|money|auction|cash|advance|fortune )'. And hope no hot babes email you complimenting your penis, or mention they want their breasts enlarged, offer you money, auction off your award winning lego collection or anything like that.

  2. It's all spam by amigaluvr · · Score: 4, Funny

    Hey if you compress all of your mail with gzip then it all looks like foreign spam anyway!

  3. Excellent by Phosphor3k · · Score: 5, Funny

    Slashdot can use it to filert out duplicate stories.

  4. It won't work for businesses by autocracy · · Score: 4, Funny

    Anything from mid-level management or the marketing department would immediately be marked as spam and trashed. Maybe not very important in the first place, but you'd at least need to be able to say "yeah, I saw the memo on the TPS reports."

    --
    SIG: HUP
    1. Re:It won't work for businesses by blibbleblobble · · Score: 2, Funny

      "Anything from mid-level management or the marketing department would immediately be marked as spam and trashed."

      And the problem?

  5. In additon by some+homeless+guy · · Score: 0, Funny

    In comments submitted on Kuro5hin, a question (see comment) is raised on whether or not Slashdot employs a similiar technique (as presented in the article) to foil spam-flooders

  6. Don't compress by Fuzzums · · Score: 3, Funny

    Usually I don't compress my spam.

    I delete it.

    This will save me a lot more space ;-)

    --
    Privacy is terrorism.
  7. Dupes by BESTouff · · Score: 1, Funny

    Do you mean that each time you can find dupes, that's spam ? Oh my god, poor /. ...

  8. Yay! by Anonymous Coward · · Score: 5, Funny

    What an idea!

    I could use this to avoid those people who keep saying the same thing all the time, over and over again...

    Now, how can I convince my mother to use e-mail?

  9. What is spam, though? by Big+Mark · · Score: 4, Funny
    The compression ratio achieved therefore measures how many repeated fragments, words or phrases occur in the text.
    Ah. I thought to detect really useless, annoying, pointless, bandwith-sapping and time-consuming email all you had to do was look for "fwd:" in the subject line.

    -Mark
  10. Re:Quantitive, not qualititive by Anonymous Coward · · Score: 0, Funny

    or.....

    1 ham) "Winning a brand new convertible you have, from entered in the contest you were."

    and

    2 (spam) "Winning the convertible you can, enter you have?"

    Both would immediately be recognized as from recent lame movie, and dumped by the filter...this problem isn't as easy as it appears.

  11. Re:Maybe I am missing something here by 6Yankee · · Score: 3, Funny

    the text in each is quite varied; e.g. longer xxx

    The text in each of my spams seems to have more XXX...

  12. Re:Legislation by liquidsin · · Score: 2, Funny

    That's pretty harsh. Once the death sentence has been carried out, I see no reason not to parole them. Have some compassion.

    --
    do not read this line twice.
  13. Email to my girlfriend by FroBugg · · Score: 4, Funny

    Unfortunately, using this my girlfriend would never get any of my emails.

    "I'm sorry. Really, really, really, really sorry. I'm so very, very, very sorry. I'm sorry..."

  14. Re:Quantitive, not qualititive by Anonymous Coward · · Score: 1, Funny


    Yoda filter then, this is like?

  15. Messages from teenagers would be spam by Adam9 · · Score: 4, Funny

    Don't use this filtering if you're a high school teacher or something else that involves getting messages from teenagers..

    [E-mail from skittles9333@some.email marked as spam and deleted] So like, I was like sick, and like, I didn't go to school today. So like, I was told like, that Jim like said, that like you might like, have some homework due like tomorrow. Could you like, tell me what like that homework would like be?

  16. Re:Slashdot filter by fredrikj · · Score: 2, Funny

    Oops. Well, my experience from my troll accounts is that the filter does a lousy job, I could never have guessed that something that sophisticated was behind it ;)

    Err, ignore the troll account part, I never said that.