Live spam-catching contest at CEAS

← Back to Stories (view on slashdot.org)

Live spam-catching contest at CEAS

Posted by CmdrTaco on Wednesday April 11, 2007 @04:19AM

noodleburglar writes "The 2007 Conference on Email and Anti-Spam (CEAS) will feature a live spam-catching contest. Entrants will be treated to a torrent of spam and must use their spam filtering technique to filter out as much as possible, while also letting legitimate messages. My money's on Spam Assassin." This ought to be a sweeps week television spectacular.

2 of 126 comments (clear)

Min score:

Reason:

Sort:

The prize list :) by davidwr · 2007-04-11 04:37 · Score: 5, Funny

1st prize: Job offer from a security-software vendor
2nd prize: Lifetime supply of Hormel meat products
3rd prize: Commemorative tin of SPAM meat product
Last place: Inheritance from Nigerian Prince

--
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Re:Group spam detection by kebes · 2007-04-11 04:43 · Score: 5, Interesting

You're right--but the size of Gmail gives them another advantage. In those marginal cases where the spam filter isn't sure about an email (is this spam or a mailing list?) it has the advantage of having a huge number of people checking all the emails. That is, the users do the final check.

I have received a spam to my gmail account exactly once. And when I did, shocked, I clicked the "mark as spam" button. The point is that this spam was probably sent to millions of Gmail users, and the algorithm wasn't sure how to categorize it. But because I clicked "spam" (and probably a few other people did, too), it was marked as spam for everyone. So most users never say it in their inbox. Thus only a dozen out of the million recipients was ever bothered by the spam. Conversely, an email list would receive no (or very few) "mark as spam" clicks, and would be allowed to pass. So basically the Gmail userbase acts the workforce to continually train the spam filter, and moreover to detect new spam within minutes of it being sent.

It's hard to beat a system like that. But the point is that it relies on the large number of users who are all (effectively) sharing their spam training sets with each other in realtime.

This is not to say that the baseline algorithm that Gmail implements isn't quite effective, but the point is that Gmail can use the users to resolve those tricky false-positive and false-negative situations.