Live spam-catching contest at CEAS

← Back to Stories (view on slashdot.org)

Live spam-catching contest at CEAS

Posted by CmdrTaco on Wednesday April 11, 2007 @04:19AM

noodleburglar writes "The 2007 Conference on Email and Anti-Spam (CEAS) will feature a live spam-catching contest. Entrants will be treated to a torrent of spam and must use their spam filtering technique to filter out as much as possible, while also letting legitimate messages. My money's on Spam Assassin." This ought to be a sweeps week television spectacular.

11 of 126 comments (clear)

Min score:

Reason:

Sort:

CRM114 by sageFool · 2007-04-11 04:22 · Score: 4, Informative

http://crm114.sourceforge.net/ using hyperspace! It's been working better than spam assassin for me.
Group spam detection by Animats · 2007-04-11 04:32 · Score: 4, Informative

Gmail, like SpamCop, has a group spam filter system. It looks at mail sent to a large number of recipients. The defining characteristic of spam is that it's sent to a large number of recipients, after all. If you're in a position to watch the incoming mail of a few million mailboxes, detecting spam is easy.
1. Re:Group spam detection by kebes · 2007-04-11 04:43 · Score: 5, Interesting
  
  You're right--but the size of Gmail gives them another advantage. In those marginal cases where the spam filter isn't sure about an email (is this spam or a mailing list?) it has the advantage of having a huge number of people checking all the emails. That is, the users do the final check.
  
  I have received a spam to my gmail account exactly once. And when I did, shocked, I clicked the "mark as spam" button. The point is that this spam was probably sent to millions of Gmail users, and the algorithm wasn't sure how to categorize it. But because I clicked "spam" (and probably a few other people did, too), it was marked as spam for everyone. So most users never say it in their inbox. Thus only a dozen out of the million recipients was ever bothered by the spam. Conversely, an email list would receive no (or very few) "mark as spam" clicks, and would be allowed to pass. So basically the Gmail userbase acts the workforce to continually train the spam filter, and moreover to detect new spam within minutes of it being sent.
  
  It's hard to beat a system like that. But the point is that it relies on the large number of users who are all (effectively) sharing their spam training sets with each other in realtime.
  
  This is not to say that the baseline algorithm that Gmail implements isn't quite effective, but the point is that Gmail can use the users to resolve those tricky false-positive and false-negative situations.
Curious:When urologists email each other... by dpbsmith · 2007-04-11 04:33 · Score: 4, Interesting

... are they able to refer to Pfizer's brand name for sildenafil, Lilly's name for tadalafil, or Bayer's brand name for vardenafil without getting caught in the spam filters?

--
"How to Do Nothing," kids activities, back in print!
1. Re:Curious:When urologists email each other... by kebes · 2007-04-11 04:53 · Score: 3, Informative
  
  Suffice it to say that a doctor is likely to write an email like:
  
  "Ted, I just read the news about Viagra in the New England Journal of Medicine. Very interesting results, though the error bars are a bit large to draw any major conclusions just yet. What do you think?"
  
  Whereas a doctor rarely writes email like:
  
  "NoW ava ilable is generic V1AGRA at low price! Generic, quality, all low price now!"
  
  The point is that modern spam filters don't just look for "bad words" but consider relative word frequencies, the sender and receiver fields, word correlations, formatting elements, URLs, etc. Spam filters in your email client will be trained against email you typically send/receive, and so can be even more precise. Spammers of course try to make their emails include words so that they end up looking like real email, but if the filter is good enough, then the only way to get past it is to send an email that now lacks those critical spam elements (like the link you're supposed to click to buy the generic drug or whatever)...
The prize list :) by davidwr · 2007-04-11 04:37 · Score: 5, Funny

1st prize: Job offer from a security-software vendor
2nd prize: Lifetime supply of Hormel meat products
3rd prize: Commemorative tin of SPAM meat product
Last place: Inheritance from Nigerian Prince

--
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Professional spammers in attendance? by MobyDisk · 2007-04-11 04:40 · Score: 4, Interesting

I wonder if professional spammers will attend the conference to learn how to get through the next generation of filters. Maybe it would be like playing spot the Fed at the hacker's conferences.
SpamAssassin? by raddan · 2007-04-11 04:41 · Score: 3, Interesting

Ha ha, silly admin. My money's on greylisting.

We use both SpamAssassin and OpenBSD's spamd, to great effect. spamd does most of the work, though. Daniel Hartmeier (site down ATM, unfortunately) has an example of how to tie SA scores back into spamd for blacklisting, which is just awesome. I'd implement it here, but our current setup is effective enough as to not make it worth my time.
The First Annual Greased Spammer Contest! by Penguinisto · 2007-04-11 04:47 · Score: 4, Funny

(cue Monster Truck Rally announcer guy voice...) THIS SATURDAY AT THE EXPO CENTER! The Best admins and the worst spammers come together in a throwdown-showdown-lowdown Greased Spammer Contest! We kidnap, strip, and grease down every known spammer we can find on Planet Earth! We bring 'em here, then we give our lucky mail server admins (as determined by lottery) a chance to catch 'em! The spammers will be released into a large pit, where the admins may use any method to catch and immobilize spammers (firearms and other projectile weapons are excluded). Points will be given for the number of spammers caught, the methods of capture, and the level of eye-rattling violence applied to each spammer after their capture! Watch as the winning admin gets to publicly execute the dreaded Sanford Wallace by any method that he or she can dream up! Any method at all! You'll buy a ticket for the whole seat, but you will only need the edge! Get your tickets at the Mondotix - DON'T MISS IT!(/voice)
/P

--
Quo usque tandem abutere, Nimbus, patientia nostra?
Re:Flawed by gvc · 2007-04-11 06:30 · Score: 3, Interesting

So here's the issue. If you are going to try to discriminate among filters using several thousand messages, you have to send them all the same messages. To send them the same messages you have to capture and redistribute them. You can pass on all the info from the capture, including all SMTP commands, but you can't do intrusive protocol probes. And since this is *real spam* you can't very well ask the sender to act in an obliging way by repeating its message and behavior for each participant.

I'd be very interested to hear of a design that would allow greylisting to be tested. The best I can come up with is to fail the message after transmission, then to try to simulate the behavior of the sender in response to this failure. But that would be catering to one very specific method of perturbing the protocol. And it would be necessary to do a fair amount of work to spoof the IP address presented to the participant filters.

For this reason, we chose to exclude all SMTP interactions, and simulate a second-in-the-chain filter appliance application. The reasons are practical, not policy.
Error rate (false positives) isn't the whole story by InakaBoyJoe · 2007-04-11 14:04 · Score: 3, Insightful

From TFCFP (call for participation):
Filters will be evaluated based on a weighted combination of the percentage of spam blocked and its false positive percentage.
From a theoretical standpoint, a low false positive average over an entire set (like <1%) might seem okay, but that doesn't take into account what's important to users.
Take, for example, a message from a long-lost friend, whose current address isn't yet in your whitelist, and who would have no other way of contacting you should the message get spamboxed. Here's an example of a message that's important to a user but gets lost among the everyday messages when simply talking about the percentage of false positives.
There's lots of other examples, too -- if you run your own domain, your messages are likely to be spamboxed, etc. Furthermore, the lower the false-positive rate, the less likely a user is to actually *check* their spambox, thus making a single false-positive even worse.
Microsoft's own Hotmail, of course, is notorious for spamboxing messages like that. And yet the conference is being held at Microsoft, and Microsoft's own spam researchers proudly touted their system in the February 2007 Communications of the ACM.
Something tells me the leaders in the field are sort of missing the point. Simply bringing down the aggregate false positive rate is *not* enough. The measure needs to take into account how often the user actually misses information that's important to them.