Bayesian Filter Testing?
pu33y asks: "Since the publication of Paul Graham's A Plan For Spam, several programs that perform Bayesian filtering having become available, including CRM114 and Bogofilter. But missing is any serious testing to see how they perform in relation to themselves and to other, non-Bayesian filters.Searching Google has turned up nothing and when I asked Paul Graham, he was unaware of any such testing, as well. Can anyone point to any such testing or provide the results of their own personal experiences with Bayesian filters?"
Dspam (http://www.networkdweebs.com) rocks !
Some impressive stats were posted to the mailing list.
It's main feature is that it's completely maintainance free, and that even dumb people can use it (I know, I am).
My personnal stats are 2 false positives actually (one from PayPal, one from a company I work with), 280 spams learnt (I told it they were spam), 2877 spam catched and 4354 innocent.
Votez ecolo : Chiez dans l'urne !
Ideally, someone, probably an academic, should make a repository of spam available for testing. Software spam filters can say things like, "Correctly classified 99.9% of the email in the UCI spambase 1999-08-20 repository"
Something like say, the UCI Machine Learning Repository. In fact, look at the UCI spambaseA couple of problems with the UCI spambase. Too old / out of date. And too small.
I looks like there is a more recent community effort going on over a SpamArchive
Looks like you should have googled.
I use Ella from OpenField Software. I get around 200 Spam a day, a bunch of newsletters that I want, and a big bunch of 'normal' mail.
I have had it for about 2 weeks. In the last 3 days I have had 2 false +'s (messge in Spam that shouldn't be there) and 4 that went to the newsletter folder that shouldn't have.
Gavin Fischer
For years, the only spam filter I used was a very simple one: if the mail's not from a list I'm on, and not addressed to me, it's spam. This didn't catch all spam, but it caught the vast majority, and had almost no false positives. (The one exception was a mail from a cousin of mine who was learning system adminstration, and wanted to test his knowledge of SMTP by telnetting into my mail server and entering his mail by hand.)
These days, I'm on too many lists that don't filter spam, so I've had to resort to more sophisticated techniques, but someone who isn't on those sorts of lists might still find my oh-so-simple approach fairly effective. Not to disparage Bayesian filtering, but if you want something to compare against...