Using gzip As A Spam Filter
captainclever writes "Kuro5hin have an interesting article on detecting spam using gzip." Here's a sample: "Loosely speaking, the LZ (Zip) and the related gzip compression algorithms look for repeated strings within a text, and replace each repeat with a reference to the first occurrence. The compression ratio achieved therefore measures how many repeated fragments, words or phrases occur in the text."
Forget about gzip all the 'cool' geeks use grep! :)
Hey if you compress all of your mail with gzip then it all looks like foreign spam anyway!
Slashdot can use it to filert out duplicate stories.
Anything from mid-level management or the marketing department would immediately be marked as spam and trashed. Maybe not very important in the first place, but you'd at least need to be able to say "yeah, I saw the memo on the TPS reports."
SIG: HUP
In comments submitted on Kuro5hin, a question (see comment) is raised on whether or not Slashdot employs a similiar technique (as presented in the article) to foil spam-flooders
Usually I don't compress my spam.
;-)
I delete it.
This will save me a lot more space
Privacy is terrorism.
Do you mean that each time you can find dupes, that's spam ? Oh my god, poor /. ...
What an idea!
I could use this to avoid those people who keep saying the same thing all the time, over and over again...
Now, how can I convince my mother to use e-mail?
-Mark
or.....
1 ham) "Winning a brand new convertible you have, from entered in the contest you were."
and
2 (spam) "Winning the convertible you can, enter you have?"
Both would immediately be recognized as from recent lame movie, and dumped by the filter...this problem isn't as easy as it appears.
the text in each is quite varied; e.g. longer xxx
The text in each of my spams seems to have more XXX...
That's pretty harsh. Once the death sentence has been carried out, I see no reason not to parole them. Have some compassion.
do not read this line twice.
Unfortunately, using this my girlfriend would never get any of my emails.
"I'm sorry. Really, really, really, really sorry. I'm so very, very, very sorry. I'm sorry..."
Yoda filter then, this is like?
Don't use this filtering if you're a high school teacher or something else that involves getting messages from teenagers..
[E-mail from skittles9333@some.email marked as spam and deleted] So like, I was like sick, and like, I didn't go to school today. So like, I was told like, that Jim like said, that like you might like, have some homework due like tomorrow. Could you like, tell me what like that homework would like be?
Oops. Well, my experience from my troll accounts is that the filter does a lousy job, I could never have guessed that something that sophisticated was behind it ;)
Err, ignore the troll account part, I never said that.