Two Spam Filters 10 Times As Accurate As Humans

← Back to Stories (view on slashdot.org)

Two Spam Filters 10 Times As Accurate As Humans

Posted by timothy on Monday February 23, 2004 @01:13PM from the dev/null-is-getting-fatter dept.

Nuclear Elephant writes "The authors of two spam filters, CRM114 and DSPAM, announced recently that their filters have achieved accuracy rates ten times better than a human is capable of. Based on a study by Bill Yerazunis of CRM114, the average human is only 99.84% accurate. Both filters are reporting to have reached accuracy levels between 99.983% and 99.984% (1 misclassification in 6250 messages) using completely different approaches (CRM114 touts Markovan, while DSPAM implements a Dolby-type noise reduction algorithm called Dobly). If you're looking for a way to rid spam from your inbox, roll on over to one of these authors' websites."

5 of 487 comments (clear)

Min score:

Reason:

Sort:

IM Spam by jeffskyrunner · 2004-02-23 13:15 · Score: 5, Interesting

Once Email Spam is eliminated, then IM spam will begin...

--
Jeff
Obligatory Q... When will mozilla/TB have them? by sisukapalli1 · 2004-02-23 13:21 · Score: 5, Interesting

I reached the conclusion of "two filters better than humans" by using two sequential filters:
server side spamassassin, and a couple of simple procmail recipes. They have kept almost all the SPAM away.

However, it is good to see such good techniques becoming available and we can hope to see them as straight forward usable tools.

So, when will mozilla/TB (or your favourite server side or client side filter) get them?

S
Re:Huh? Aren't humans 100%? by Elwood+P+Dowd · 2004-02-23 13:31 · Score: 5, Interesting

No, humans are not 100%.

If you see a strange name in your inbox with an odd title, that might be a Nigerian businessman, or it might be your long lost Nigerian brother.

I recently tried to order a t-shirt from this guy for a band he used to be in. I found his band because we have the same (semi-uncommon) name. So, he got an email From: himself. I had to send him two emails because he deleted the first one assuming it was spam.

I ordered some RAM for my dad a while back. He gets 200 spam emails a day (email addy in resume & web page), and he deleted the confirmation email from the RAM vendor. The RAM never shipped, and it took us a week to figure out that there was a problem.

People make mistakes all the time. Why is this an unexpected result? People are jackasses. This should be obvious.

--

There are no trails. There are no trees out here.
Could somebody explain this to me... by heldlikesound · 2004-02-23 13:32 · Score: 5, Interesting

I order all kinds of stuff online, wouldn't the receipt emails look like spam? My current spam solution is very simple:

1. display my email online as little as possible

2. use a number of addresses that all filter into one account, then filter by the sent-to address... this has turned up some VERY interesting results, for instance. I used dellorders@mydomain.com for an order from Dell, and NEVER used it or even typed it anywhere again, and started get spam about 6 months later, and I mean the nasty stuff, no just innocent stuff from Dell resellers...

3. i built a rudementary filter that looks for viagra,free,debt,enlarge, etc... if the sender is not in my address book, and the email contains these words, it is sent to a "check these out" folder...

How might a spam filter help me out without zapping confirmation type emails?

--

Cloud City Digital: DVD Production at its cheapest/finest
Share the luxury by bigberk · 2004-02-23 18:40 · Score: 5, Interesting

Having such a powerful statistical spam filter is definitely a luxury. I have no difficulty believing the accuracy values presented here. I have had experience with spamprobe, CRM114, bogofilter, spambayes, and spamassassin and all of these do an amazing job to the point where spam no longer exists (for you).

Which leads to me plug a little project called WPBL that uses exactly these types of statistical spam filters to spot spam sources in a distributed fashion. Each project member uploads hourly the IPs they see relaying spam and non-spam, where the 'decision' is made by these extremely reliable filters. This effectively converts your regular mail account into an intelligent spam-trap that feeds a central blocklist.

The more members we get, the better we can identify active spam sources around the world. This information is then used by some sites for quite large-scale blocking. Since you're doing all this filtering processing anyway, why not also share "what you learn" (the IPs that are spamming you)?

If this grabs your interest, read up on the reporting scripts or alternatively, the open WPBL data upload protocol if you want to code your own report generator. Bandwidth usage is minimal.