Slashdot Mirror


Proving Which Spam Filters work Best

pirateninja writes "Dr. Gord Cormack decided to find and prove what the best spam filter is. In his study he looked at the major spam filters (DSPAM, SpamAssassin, etc.) along with those submitted by various academics. The results are quite surprising, with a previously unheard-of spam filter, which uses ideas from various compression algorithms, performing the best overall. He recently presented the results and methodology used in a presentation titled 'Spam Filters, Do they Work? and Can you prove it?'" Note that this is a video of his presentation.

2 of 263 comments (clear)

  1. from china with love by nihaopaul · · Score: 0, Redundant

    44% [===============> ] 220,996,832 89.21K/s ETA 48:16

    almost got it!

  2. Re:In my experience... by Anonymous Coward · · Score: 0, Redundant

    I have a simple, foolproof idea to help eliminate spam.

    Email certification.

    If you want to be able to send Certified Email (CE), you apply for Certification from the company that gives you internet connectivity. They check you out, and 'Certify' you as being a legitimate emailer (ie: not a spammer). Then, you generate a private/public key pair and give them the public one. In the headers of all your email, is their certification, and an encrypted header line that's createdusing your private key.

    When email arrives at the recipients server (or this could be done at the client level, as well), the server sees the certification, and connects to the certifying server to get your public key. It attempts to decrypt the header line. If it does it marks the email as 'certified', if it cannot, it marks the email as 'uncertified', and the email client can be programmed to filter messages based on that.

    Due to the public/private key cryptography, there can be no certified email spoofing. (Assuming the private keys are secure, the keys are of decent length, etc.) All emails are traceable back to the originating server. CORRECTION- all CERTIFIED emails are traceable. Anonymous email is still possible. People can still set up email servers for mailing lists without "having" to get them certified. And people can still receive non-certified mail.

    If an email server sends out spam, the complaints go to it's certifier. They can drop the certification, deleting the public key from their server. When this happens, ALL the email from the spamming server is now 'uncertified', and gets handled accordingly by email clients. If nothing is done, complaints go to THEIR upstream, etc. Individuals and groups can keep their own blacklists, if they wish, and anyone can choose to filter emails according to those lists.

    Now, I've looked over that 'form email' that people like to post to shoot down anti-spam ideas. And nothing applies to this idea. (If something seems to apply, it's because I either left out details, or explained something wrong.) This idea does NOT need to be universally adopted, nor does it need to be adopted by everyone all at once. It's primarily a way of reliably tracing (certified) emails back to their originating server. The anti-spam part comes later: if you receive certified spam, complain and get the server un-certified. If you receive un-certified spam... well, just have your email client dump all uncertified emails in the trash. (Not nessisarilly, you could just use it's un-certifedness as a factor in filtering your email.)

    This idea does not require anything be changed with SMTP. It simply requires a second connection be made to the certifying server. Now, before you bitch about the extra bandwidth, I'd like to remind you that, once this idea catches on, spam will be greatly reduced. This reduction will MORE than make up for the slight increase in bandwidth created in querying the certifying servers. Also, the certifying servers can set time limits on when the certifications expire, and need to be re-downloaded (kind of like DHCP leases). A 'new' company that just applied for certification might have it's certificate set to expire almost instantly. This way, every email they send requires a download of the certificate. This allows the certificate to be pulled rapidly if they start spamming. After a month or two, it could be set to expire weekly or monthly.

    To sum up: Email Certification is reliable way of tracing the certified emails back to their originating server. This allows spammers to be identified unequivocally, and have their certification pulled. Email servers are NOT required to be certified, and anonymous email is still possible. Email recipients can, if they choose, set up their client to send uncertified emails to the trash, or to handle them however they wish. White lists and black lists are still possible. 'Hobby mailing lists' are still possible, certified or not. The extra bandwidth is minimal, and easily overshadowed by the reduction in spam being send once spammers realize no one is even seeing, much less reading or replying to their spam.