Researchers Claim "Effectively Perfect" Spam Blocking Discovery
A team of computer scientists from the International Computer Science Institute in Berkeley, CA are claiming to have found an "effectively perfect" method for blocking spam. The new system deciphers the templates a botnet is using to create spam and then teaches filters what to look for. "The system ... works by exploiting a trick that spammers use to defeat email filters. As spam is churned out, subtle changes are typically incorporated into the messages to confound spam filters. Each message is generated from a template that specifies the message content and how it should be varied. The team reasoned that analyzing such messages could reveal the template that created them. And since the spam template describes the entire range of the emails a bot will send, possessing it might provide a watertight method of blocking spam from that bot."
As a co-author of this work, I should be clear that we never suggested that we have a perfect spam filter per se, simply a new tool that has the benefit of being orthogonal to existing techniques. For _existing_ botnets, our filters are extremely good, but the paper is also quite clear about the variety of ways that spammers might try to evade the approach.
Divining the template seems to depend on analyzing numerous messages. Presumably, only very large mail servers (or an aggregated network of smaller servers) would be able to collect enough messages to rapidly divine the various templates. It sounds like a small or medium site could not benefit from operating the analysis software themselves; they would not have sufficient spam volume (from each template) to rapidly divine the template.
and then the researchers discovered the Halting problem and pretended it didn't exist.
I don't quite see your point - the halting problem proves that you cannot create an algorithm that will tell whether an arbitrary program will ever halt. It has no significance for this particular program, since it would be trivial to ensure that it does halt.
Spam filtering isn't very hard, if you see the email for a large number of accounts, as Gmail does. The one characteristic that spam must have is that it's sent in bulk. The commonality across receiving email accounts gives it away. The only hard part is recognizing the commonality, which is already working rather well. This is just a new technique for recognizing commonality.
Recognizing spam for a single account is tougher, because you don't get to see the "bulk" property.
There is a final solution: ...
Your post advocates a
(x) technical ( ) legislative (x) market-based ( ) vigilante
approach to fighting spam. Your idea will not work. Here is why it won't work. (One or more of the following may apply to your particular idea, and it may have other flaws which used to vary from state to state before a bad federal law was passed.)
( ) Spammers can easily use it to harvest email addresses
(x) Mailing lists and other legitimate email uses would be affected
( ) No one will be able to find the guy or collect the money
( ) It is defenseless against brute force attacks
( ) It will stop spam for two weeks and then we'll be stuck with it
(x) Users of email will not put up with it
( ) Microsoft will not put up with it
( ) The police will not put up with it
( ) Requires too much cooperation from spammers
(x) Requires immediate total cooperation from everybody at once
(x) Many email users cannot afford to lose business or alienate potential employers
( ) Spammers don't care about invalid addresses in their lists
(x) Anyone could anonymously destroy anyone else's career or business
Specifically, your plan fails to account for
( ) Laws expressly prohibiting it
(x) Lack of centrally controlling authority for email
( ) Open relays in foreign countries
( ) Ease of searching tiny alphanumeric address space of all email addresses
( ) Asshats
(x) Jurisdictional problems
(x) Unpopularity of weird new taxes
( ) Public reluctance to accept weird new forms of money
( ) Huge existing software investment in SMTP
( ) Susceptibility of protocols other than SMTP to attack
( ) Willingness of users to install OS patches received by email
( ) Armies of worm riddled broadband-connected Windows boxes
( ) Eternal arms race involved in all filtering approaches
(x) Extreme profitability of spam
( ) Joe jobs and/or identity theft
( ) Technically illiterate politicians
( ) Extreme stupidity on the part of people who do business with spammers
( ) Dishonesty on the part of spammers themselves
( ) Bandwidth costs that are unaffected by client filtering
( ) Outlook
and the following philosophical objections may also apply:
(x) Ideas similar to yours are easy to come up with, yet none have ever been shown practical
( ) Any scheme based on opt-out is unacceptable
( ) SMTP headers should not be the subject of legislation
( ) Blacklists suck
( ) Whitelists suck
( ) We should be able to talk about Viagra without being censored
( ) Countermeasures should not involve wire fraud or credit card fraud
( ) Countermeasures should not involve sabotage of public networks
(x) Countermeasures must work if phased in gradually
(x) Sending email should be free
( ) Why should we have to trust you and your servers?
( ) Incompatiblity with open source or open source licenses
( ) Feel-good measures do nothing to solve the problem
( ) Temporary/one-time email addresses are cumbersome
( ) I don't want the government reading my email
( ) Killing them that way is not slow and painful enough
Furthermore, this is what I think about you:
(X) Sorry dude, but I don't think it would work.
( ) This is a stupid idea, and you're a stupid person for suggesting it.
( ) Nice try, assh0le! I'm going to find out where you live and burn your house down!
I just don't trust anything that bleeds for five days and doesn't die.
Seriously, am I the only one that thought of the Trace-Buster-Buster-Buster from The Big Hit
Now that I think about it, I'm pretty sure everything I just said is completely wrong.
I could bet a nice sum of money that if you give a traditional, learning spam filter 1000 e-mails sent by the same bot and flag those all as spam, it can then recognize the bot's further e-mails as spam.
If that were true, then by now Thunderbird's filter would stop missing all the Russian spam I get. I have no idea what the spam says, as I don't know Russian, and I never get legitimate mail in Russian; all the Russian spam I get appears very similar in format and length. I'm quite certain that Thunderbird has had over a thousand such e-mails marked as spam over the last few years, and yet it consistently fails to flag them.
Point being: traditional learning filters are not sufficient.
This is anecdotal evidence, YMMV, etc etc.
I originally posted it here in 2002. Note how dated it is (e.g. no smartass comment about CAPTCHA).
Some mathematician (I forget who) had his graduate students send back cards with forms like these to people who sent in attempted proofs of Fermat's Last Theorem.