Slashdot Mirror


Live spam-catching contest at CEAS

noodleburglar writes "The 2007 Conference on Email and Anti-Spam (CEAS) will feature a live spam-catching contest. Entrants will be treated to a torrent of spam and must use their spam filtering technique to filter out as much as possible, while also letting legitimate messages. My money's on Spam Assassin." This ought to be a sweeps week television spectacular.

26 of 126 comments (clear)

  1. CRM114 by sageFool · · Score: 4, Informative

    http://crm114.sourceforge.net/ using hyperspace! It's been working better than spam assassin for me.

  2. Sweeps by cyphercell · · Score: 2, Funny

    This ought to be a sweeps week television spectacular.

    It think I've seen people catching spam on tv, just not the kind you're talkin' 'bout. http://www.spam.com/

    --
    Under the influence of Post-Cyberpunk Gonzo Journalism
  3. Re:My money by rodney+dill · · Score: 2, Funny

    Well let's just find out, just what is your gmail address, hmmmm?

    ;)

    --

    Use your head, can't you, use your head,
    You're on earth, there's no cure for that
    - S. Beckett
  4. Group spam detection by Animats · · Score: 4, Informative

    Gmail, like SpamCop, has a group spam filter system. It looks at mail sent to a large number of recipients. The defining characteristic of spam is that it's sent to a large number of recipients, after all. If you're in a position to watch the incoming mail of a few million mailboxes, detecting spam is easy.

    1. Re:Group spam detection by kebes · · Score: 5, Interesting

      You're right--but the size of Gmail gives them another advantage. In those marginal cases where the spam filter isn't sure about an email (is this spam or a mailing list?) it has the advantage of having a huge number of people checking all the emails. That is, the users do the final check.

      I have received a spam to my gmail account exactly once. And when I did, shocked, I clicked the "mark as spam" button. The point is that this spam was probably sent to millions of Gmail users, and the algorithm wasn't sure how to categorize it. But because I clicked "spam" (and probably a few other people did, too), it was marked as spam for everyone. So most users never say it in their inbox. Thus only a dozen out of the million recipients was ever bothered by the spam. Conversely, an email list would receive no (or very few) "mark as spam" clicks, and would be allowed to pass. So basically the Gmail userbase acts the workforce to continually train the spam filter, and moreover to detect new spam within minutes of it being sent.

      It's hard to beat a system like that. But the point is that it relies on the large number of users who are all (effectively) sharing their spam training sets with each other in realtime.

      This is not to say that the baseline algorithm that Gmail implements isn't quite effective, but the point is that Gmail can use the users to resolve those tricky false-positive and false-negative situations.

  5. Curious:When urologists email each other... by dpbsmith · · Score: 4, Interesting

    ... are they able to refer to Pfizer's brand name for sildenafil, Lilly's name for tadalafil, or Bayer's brand name for vardenafil without getting caught in the spam filters?

    1. Re:Curious:When urologists email each other... by kebes · · Score: 3, Informative

      Suffice it to say that a doctor is likely to write an email like:

      "Ted, I just read the news about Viagra in the New England Journal of Medicine. Very interesting results, though the error bars are a bit large to draw any major conclusions just yet. What do you think?"

      Whereas a doctor rarely writes email like:

      "NoW ava ilable is generic V1AGRA at low price! Generic, quality, all low price now!"

      The point is that modern spam filters don't just look for "bad words" but consider relative word frequencies, the sender and receiver fields, word correlations, formatting elements, URLs, etc. Spam filters in your email client will be trained against email you typically send/receive, and so can be even more precise. Spammers of course try to make their emails include words so that they end up looking like real email, but if the filter is good enough, then the only way to get past it is to send an email that now lacks those critical spam elements (like the link you're supposed to click to buy the generic drug or whatever)...

  6. I wish the contest was.... by ruffnsc · · Score: 2, Interesting

    physically catching the spammers! (your imagination can do the rest)

  7. The prize list :) by davidwr · · Score: 5, Funny

    1st prize: Job offer from a security-software vendor
    2nd prize: Lifetime supply of Hormel meat products
    3rd prize: Commemorative tin of SPAM meat product
    Last place: Inheritance from Nigerian Prince

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
  8. that's easy. Yahoo mail! by number6x · · Score: 2, Funny

    Just open a yahoo mail account, and start posting with the e-mail address all over th internet.

    You'll catch more spam than anyone else!

    Oh, you want me to filter out spam, not just get spam, nevermind.

    Still, it might be the fastest way to build a database of spam.

  9. Professional spammers in attendance? by MobyDisk · · Score: 4, Interesting

    I wonder if professional spammers will attend the conference to learn how to get through the next generation of filters. Maybe it would be like playing spot the Fed at the hacker's conferences.

  10. SpamAssassin? by raddan · · Score: 3, Interesting

    Ha ha, silly admin. My money's on greylisting.

    We use both SpamAssassin and OpenBSD's spamd, to great effect. spamd does most of the work, though. Daniel Hartmeier (site down ATM, unfortunately) has an example of how to tie SA scores back into spamd for blacklisting, which is just awesome. I'd implement it here, but our current setup is effective enough as to not make it worth my time.

  11. Re:My money by 0100010001010011 · · Score: 2, Informative

    Set up a catchall on your domain. You'll start getting stuff through. Especially the images ones. Some of the newer "make it look like a real e-mail" gets through.

    Everywebsite I have gets its own e-mail account, eg. slashdot@myhost.com.
    One day I started getting spam to site@myhost.com. So I setup in dreamhost to bounce everything to that e-mail address.

    Then I started getting flooded with:
    otehoenut-site@myhost.com
    cgjwbmkh-site@myhost.com

    Google has, thankfully, let me do delete of *site@myhost.com, but for a time I was still getting them.

  12. New packaging? by davmoo · · Score: 2, Funny

    A torrent of spam? It doesn't come in cans anymore?!

    The cans were so much easier to catch, too.

    --
    I want a new quote. One that won't spill. One that don't cost too much. Or come in a pill.
  13. Re:West Virginia by UnknowingFool · · Score: 2, Funny

    Back in West Virginia we'all used to go spam catchin' every weekend while they was in season! Them spam made good eatin'.

    Don't lie. You and your buddies got drunk and would go spam tipping. There was no hunting involved.

    --
    Well, there's spam egg sausage and spam, that's not got much spam in it.
  14. The First Annual Greased Spammer Contest! by Penguinisto · · Score: 4, Funny
    (cue Monster Truck Rally announcer guy voice...) THIS SATURDAY AT THE EXPO CENTER! The Best admins and the worst spammers come together in a throwdown-showdown-lowdown Greased Spammer Contest! We kidnap, strip, and grease down every known spammer we can find on Planet Earth! We bring 'em here, then we give our lucky mail server admins (as determined by lottery) a chance to catch 'em! The spammers will be released into a large pit, where the admins may use any method to catch and immobilize spammers (firearms and other projectile weapons are excluded). Points will be given for the number of spammers caught, the methods of capture, and the level of eye-rattling violence applied to each spammer after their capture! Watch as the winning admin gets to publicly execute the dreaded Sanford Wallace by any method that he or she can dream up! Any method at all! You'll buy a ticket for the whole seat, but you will only need the edge! Get your tickets at the Mondotix - DON'T MISS IT!(/voice)

    /P

    --
    Quo usque tandem abutere, Nimbus, patientia nostra?
  15. Greylisting? by schmiddy · · Score: 2, Insightful

    I can't help but wonder how realistic this scenario is.. They're basically going to have a single server dumping a whole ton of spam at your filtering package, and you're supposed to be able to filter on.. what, just the content of the messages? Real world techniques use many more subtle hacks, such as greylisting, or actually looking at the domains the messages are coming from. If their server is going to be dumping millions of messages at you in a short amount of time, I don't think they'll let you use greylisting or similar techniques.

    --
    http://cltracker.net -- powerful craigslist multi-city search
  16. Re:Mateo_LeFou, prepare yourself... by Zephyros · · Score: 2, Funny

    Translation: "You have no chance to survive. Make your time."

  17. Boring. by bmo · · Score: 2, Funny

    Couldn't we just have a contest where actual live spammers are fed to lions?

    To quote Bill Mattocks...

    "My sense of personal integrity is none of your concern."
                                                    -thus spake Walt "Pickle Jar" Rines
    "I'm going to pound your balls flat with a wooden mallet."
                                                    -thus respondeth Bill Mattocks

  18. Kobayashi Maru by Kozar_The_Malignant · · Score: 2, Funny

    Find a creative and unique solution (cheat):

    • Hunt through CEAS conference hall
    • Find contest spammers
    • Drag spammers back to contest area
    • Spammers are beaten to death by audience
    • Win!!!
    • ...Oh, wait, they weren't realspammers?
    • Sorry
    --
    Some mornings it's hardly worth chewing through the restraints to get out of bed.
  19. Agile and evolutionary versus ergodic spam by goombah99 · · Score: 2, Insightful

    The trouble I can see with a test like this is that's it's a static test. It assumes a key feature of spam which is not true. namely that the spam signature is constant over time or at least makes an ergodic assumption. The thing about spam is that it is evolutionary. Not only does it's signature vary but the spammers learn what is getting through and shift to sending more of that flavor.

    To see why this matters consider two spam hypothetical spam programs. One blocks 99% of the test set spam but lets a particular form of spam comprising only 1% of the test set through. And contrast this with another program that is adaptive but to avoid false-postives has to err on the side of letting through 20% of the spam it flags (making it only 20% effective).

    While the former method would smoke the latter in a static trial. in the real world spammers would just shift to exclusively sending the kind of spam that gets through the first filter.

    To make this a real contest they should make it adversarial. Give the spam script a feedback signal on which spam is getting through and let it adjust it's mix of spam and chaffe to try to maximize the the rate it can push spam through (or bust the filter by chaffing to minimize the number of legit e-mails that survive).

    --
    Some drink at the fountain of knowledge. Others just gargle.
  20. Isn't this already on TV? by Minwee · · Score: 2, Funny

    "This ought to be a sweeps week television spectacular."

    I think that it already is, but it's only on in Japan and uses real SPAM.

  21. Re:Greylisting no longer works by raddan · · Score: 2, Interesting

    It doesn't work? Maybe you should tell that to my 300-strong userbase!

    I'm certain that there are differences in implementation between different greylisters. I've never tried Postfix's, for example, because OpenBSD's works fine for me. A small point wrt to OpenBSD's spamd: you actually need to try thrice. The first time you're rejected. The second time you're marked as OK, but still rejected. The third time you get through. Maybe it's the third time, or some of the time limits, or some other things that spamd is doing (BTW, we do not use *any* blacklists), but it works great. I probably see a spam in my inbox once a month, maybe. The rest of my users who complain about the "spam" they're still getting are really getting email they've signed up for (listservs aren't spam, people!), in which case, it's usually just a simple matter of education.

    I don't know where your greylisting system failed, but it works wonders for us. When I implemented it, I was a sysadmin rock star for a week. Who knew there were anti-spam groupies? Now it's back to picking the crud out of the VP's keybord ;^)

    (You're spot-on about one thing though: defense in depth. That principle is in effect for EVERYTHING, which is why I want to administer electric shocks to our Mac users when they try to call the Help Desk.)

  22. Flawed by lazarus · · Score: 2, Informative

    "This ought to be a sweeps week television spectacular."
    This ought to be ignored as the contest is flawed.

    "Ha ha, silly admin. My money's on greylisting."
    They're sending a stream of spam from where? Sounds like a real mail server...

    From TFA: "Live email stream, delivered by standard protocols (SMTP, IMAP, POP)"
    [One wonders how else they would deliver e-mail if it was not from standard protocols. I also wonder how they plan on delivering e-mail using POP... The mind boggles...]

    In any case if I read this correctly this effectively eliminates anti-spam technologies which work on the premise that the spam is coming from illegitimate mail servers. One of these techniques is greylisting. Meaning, greylisting will not work. So if I were you, I wouldn't put your money on it.

    GENERAL JUNK E-MAIL FILTERING RANT (You've been warned): If you're using an anti-spam technique which takes more cpu cycles to execute than it takes for the spammer to send the damn spam in the first place, you've already lost this war. In other words, as long as it's costing you more than it is costing him/her you will always be on the losing end of the deal.

    And I would like to add that despite my post above, I agree with you that greylisting and its derivatives when properly deployed are excellent techniques for eliminating UBE. But I think this contest is engineered to ignore that fact.

    --
    I am not interested in articles about life extension advancements.
    1. Re:Flawed by gvc · · Score: 3, Interesting

      So here's the issue. If you are going to try to discriminate among filters using several thousand messages, you have to send them all the same messages. To send them the same messages you have to capture and redistribute them. You can pass on all the info from the capture, including all SMTP commands, but you can't do intrusive protocol probes. And since this is *real spam* you can't very well ask the sender to act in an obliging way by repeating its message and behavior for each participant.

      I'd be very interested to hear of a design that would allow greylisting to be tested. The best I can come up with is to fail the message after transmission, then to try to simulate the behavior of the sender in response to this failure. But that would be catering to one very specific method of perturbing the protocol. And it would be necessary to do a fair amount of work to spoof the IP address presented to the participant filters.

      For this reason, we chose to exclude all SMTP interactions, and simulate a second-in-the-chain filter appliance application. The reasons are practical, not policy.

  23. Error rate (false positives) isn't the whole story by InakaBoyJoe · · Score: 3, Insightful

    From TFCFP (call for participation):
    Filters will be evaluated based on a weighted combination of the percentage of spam blocked and its false positive percentage.

    From a theoretical standpoint, a low false positive average over an entire set (like <1%) might seem okay, but that doesn't take into account what's important to users.

    Take, for example, a message from a long-lost friend, whose current address isn't yet in your whitelist, and who would have no other way of contacting you should the message get spamboxed. Here's an example of a message that's important to a user but gets lost among the everyday messages when simply talking about the percentage of false positives.

    There's lots of other examples, too -- if you run your own domain, your messages are likely to be spamboxed, etc. Furthermore, the lower the false-positive rate, the less likely a user is to actually *check* their spambox, thus making a single false-positive even worse.

    Microsoft's own Hotmail, of course, is notorious for spamboxing messages like that. And yet the conference is being held at Microsoft, and Microsoft's own spam researchers proudly touted their system in the February 2007 Communications of the ACM.

    Something tells me the leaders in the field are sort of missing the point. Simply bringing down the aggregate false positive rate is *not* enough. The measure needs to take into account how often the user actually misses information that's important to them.