The Next Step In Spam Filtering
simeonbeta2 writes "Paul Graham (of "A Plan for Spam" fame) has a couple of new articles up. The first one details the success of Bayesian spam filters despite various circumvention techniques by spammers. While the success of Bayesian spam filtering is encouraging, it certainly hasn't seemed to stem the flow of spam in the last year or so.
His second article, however, suggests finally taking the anti-spam battle to the spammers!
Paul proposes that spam filtering packages automatically spider links contained in probable spam.
Not only will this increase the accuracy of filters (by running the retrieved content through the spam filter as well) but this would effectively be a massive distributed DOS attack on spammers.
This isn't a new idea nor is it without its problems but I think it's definitely an idea whose time has come."
Feel free to read the comments from when this article was posted to slashdot in August.
Really, you cn take quite a bit of browser code out of the browser and use it in a filter.
The recognizable words (neonatal, pedant, betsy) might be a weak attempt at that in addition to creating non-identical subjects, although they'd need a lot more non-spammy words buried in the article to get through... which they usually do, surrounded with HTML to make them invisible.
Try not. Do or do not, there is no try.
-- Dr. Spock, stardate 2822-3.
Correct, clickable link here: Boston Globe
"I can not bring myself to believe that if knowledge presents danger, the solution is ignorance" - Isaac Asimov
Eventually, a good filter will have to mimic what the browser does very closely. Maybe it'd be better to actually use a browser that the user can't see.
Or set up a filter, and just stop accepting HTML mail altogether. Life is so much better when all of your incoming email is plain text. Most legitimite incoming mail is sent as multipart, so mail from your friends still gets through, even when they use mail clients that want to send out formatted mail.
The spammers sometimes send multipart messages with a text part that says something like "There is no plain text version of this message", but that's still better to see than a picture I didn't ask for.
In order to render the image it would have to be dowloaded.
This is how spammers know that they found a working e-mail address.
EA David Gardner -"... but the consumers have proven that actually what they want is fun."