Comparison of Bayesian POP3 Spam Filters
kreide writes "Spam e-mail has become an ever increasing problem, and these days it is next to impossible to use e-mail without receiving it in large amounts. Although various techniques exits to combat the problem, spammers seemed to be winning the war - until a new, powerful weapon appeared on the scene: Bayesian filters, our last, best hope for spam-free inboxes. In this review I compare POP3 based bayesian spam filters." We did an Ask Slashdot on this a few weeks ago.
I have long been an advocate of Bayesian or keyword based spam filters, but have recently been forced to change my outlook, and to argue that MULTIPLE SIMULTANEOUS solutions are the answer.
I encountered a very simple but unique spam system which works entirely on the sender's address. Simply, you create a small database with the domains/addresses you want to whitelist. Then, a program screens your mail, and if the sender is not in your whitelist, it sends an e-mail BACK to the sender with a simple URL (or even an actual link for HTML e-mail clients) which states that they REALLY want to send the e-mail to its destination. When this is done, they are added to the whitelist. Therefore, mails from forged remote addresses are no longer a problem, and neither are mails from trusted sources. And, better than SPEWS or similar blacklists, the sender gets a SECOND CHANCE to send their mail to you.
There's a commercial solution using this system right now, although the URL escapes me.
Of course, one could encounter problems when ordering online, say. Droids at Amazon will not be clicking your links to make sure your order receipt got through. One could argue that you'd put things like Amazon.com in the whitelist, but what if someone used amazon.com as a spoofed e-mail domain/address? Ay, there's the rub. But if this system were tied in with a Bayesian system, it'd be pretty unbeatable. What's more the Bayesian system would have extra data for negative matches, in the form of e-mails that were never 'approved', and positive data in the form of those that were.
So, I'd be more interested in producing a homebrew system that used MULTIPLE weaker systems, than one supposed 'sure fire' method.. as I feel no one method is perfect, whereas multiple systems can approach this nirvana.
Speaking from experience, I know for a fact that many of the harvesting programs (written in perl, running on linux, written by geeks) are very robust at deciphering most email obfuscation methods. You all sit and shake your fists, and the spamware writers are laughing their asses off.
You have the easy answer: don't obfuscate your email, don't even bother putting it on your posts.
spammers should love Bayesian filtering, it takes the presure off them while allowing them to reach exactly the same number of marks with a mailing.
I'm afraid you've made the cardinal mistake of thinking that spammers follow logic.
First question: Why do people install filters on their mailboxes?
Answer: To stop spam.
Now, take a look at any interview with any spammer.. you'll note that when they're asked, the spammer will say "I don't send it to people who don't want it."
They'll also say "we're always coming up with ways to bypass filters."
Now, you'd think that with the two statements, that one of them is false - however (besides the fact that spammers lie), any sociologist will tell you that the spammer actually believes he's telling the truth in each of these statements..
How he justifies it in his mind is that he believes that even though someone has installed a spam filter, that this person only wants to filter spam from other spammers - that his spam is somehow "special".
Spammers are sociopaths, and like all sociopaths, they believe the rules do not apply to them.
If spammers weren't sociopaths, and were capable of applied logic, then they'd realize that any filter (not just Bayseian) would benefit them.. but then, if they weren't sociopaths, they wouldn't be spammers in the first place.