Researchers Claim "Effectively Perfect" Spam Blocking Discovery
A team of computer scientists from the International Computer Science Institute in Berkeley, CA are claiming to have found an "effectively perfect" method for blocking spam. The new system deciphers the templates a botnet is using to create spam and then teaches filters what to look for. "The system ... works by exploiting a trick that spammers use to defeat email filters. As spam is churned out, subtle changes are typically incorporated into the messages to confound spam filters. Each message is generated from a template that specifies the message content and how it should be varied. The team reasoned that analyzing such messages could reveal the template that created them. And since the spam template describes the entire range of the emails a bot will send, possessing it might provide a watertight method of blocking spam from that bot."
As a co-author of this work, I should be clear that we never suggested that we have a perfect spam filter per se, simply a new tool that has the benefit of being orthogonal to existing techniques. For _existing_ botnets, our filters are extremely good, but the paper is also quite clear about the variety of ways that spammers might try to evade the approach.
Spam filtering isn't very hard, if you see the email for a large number of accounts, as Gmail does. The one characteristic that spam must have is that it's sent in bulk. The commonality across receiving email accounts gives it away. The only hard part is recognizing the commonality, which is already working rather well. This is just a new technique for recognizing commonality.
Recognizing spam for a single account is tougher, because you don't get to see the "bulk" property.
I could bet a nice sum of money that if you give a traditional, learning spam filter 1000 e-mails sent by the same bot and flag those all as spam, it can then recognize the bot's further e-mails as spam.
If that were true, then by now Thunderbird's filter would stop missing all the Russian spam I get. I have no idea what the spam says, as I don't know Russian, and I never get legitimate mail in Russian; all the Russian spam I get appears very similar in format and length. I'm quite certain that Thunderbird has had over a thousand such e-mails marked as spam over the last few years, and yet it consistently fails to flag them.
Point being: traditional learning filters are not sufficient.
This is anecdotal evidence, YMMV, etc etc.
I originally posted it here in 2002. Note how dated it is (e.g. no smartass comment about CAPTCHA).
Some mathematician (I forget who) had his graduate students send back cards with forms like these to people who sent in attempted proofs of Fermat's Last Theorem.