Live spam-catching contest at CEAS
noodleburglar writes "The 2007 Conference on Email and Anti-Spam (CEAS) will feature a live spam-catching contest. Entrants will be treated to a torrent of spam and must use their spam filtering technique to filter out as much as possible, while also letting legitimate messages. My money's on Spam Assassin." This ought to be a sweeps week television spectacular.
http://crm114.sourceforge.net/ using hyperspace! It's been working better than spam assassin for me.
It think I've seen people catching spam on tv, just not the kind you're talkin' 'bout. http://www.spam.com/
Under the influence of Post-Cyberpunk Gonzo Journalism
Well let's just find out, just what is your gmail address, hmmmm?
;)
Use your head, can't you, use your head,
You're on earth, there's no cure for that - S. Beckett
Gmail, like SpamCop, has a group spam filter system. It looks at mail sent to a large number of recipients. The defining characteristic of spam is that it's sent to a large number of recipients, after all. If you're in a position to watch the incoming mail of a few million mailboxes, detecting spam is easy.
... are they able to refer to Pfizer's brand name for sildenafil, Lilly's name for tadalafil, or Bayer's brand name for vardenafil without getting caught in the spam filters?
"How to Do Nothing," kids activities, back in print!
physically catching the spammers! (your imagination can do the rest)
1st prize: Job offer from a security-software vendor
2nd prize: Lifetime supply of Hormel meat products
3rd prize: Commemorative tin of SPAM meat product
Last place: Inheritance from Nigerian Prince
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Just open a yahoo mail account, and start posting with the e-mail address all over th internet.
You'll catch more spam than anyone else!
Oh, you want me to filter out spam, not just get spam, nevermind.
Still, it might be the fastest way to build a database of spam.
I wonder if professional spammers will attend the conference to learn how to get through the next generation of filters. Maybe it would be like playing spot the Fed at the hacker's conferences.
Ha ha, silly admin. My money's on greylisting.
We use both SpamAssassin and OpenBSD's spamd, to great effect. spamd does most of the work, though. Daniel Hartmeier (site down ATM, unfortunately) has an example of how to tie SA scores back into spamd for blacklisting, which is just awesome. I'd implement it here, but our current setup is effective enough as to not make it worth my time.
Set up a catchall on your domain. You'll start getting stuff through. Especially the images ones. Some of the newer "make it look like a real e-mail" gets through.
Everywebsite I have gets its own e-mail account, eg. slashdot@myhost.com.
One day I started getting spam to site@myhost.com. So I setup in dreamhost to bounce everything to that e-mail address.
Then I started getting flooded with:
otehoenut-site@myhost.com
cgjwbmkh-site@myhost.com
Google has, thankfully, let me do delete of *site@myhost.com, but for a time I was still getting them.
A torrent of spam? It doesn't come in cans anymore?!
The cans were so much easier to catch, too.
I want a new quote. One that won't spill. One that don't cost too much. Or come in a pill.
Don't lie. You and your buddies got drunk and would go spam tipping. There was no hunting involved.
Well, there's spam egg sausage and spam, that's not got much spam in it.
Quo usque tandem abutere, Nimbus, patientia nostra?
I can't help but wonder how realistic this scenario is.. They're basically going to have a single server dumping a whole ton of spam at your filtering package, and you're supposed to be able to filter on.. what, just the content of the messages? Real world techniques use many more subtle hacks, such as greylisting, or actually looking at the domains the messages are coming from. If their server is going to be dumping millions of messages at you in a short amount of time, I don't think they'll let you use greylisting or similar techniques.
http://cltracker.net -- powerful craigslist multi-city search
Translation: "You have no chance to survive. Make your time."
Couldn't we just have a contest where actual live spammers are fed to lions?
To quote Bill Mattocks...
"My sense of personal integrity is none of your concern."
-thus spake Walt "Pickle Jar" Rines
"I'm going to pound your balls flat with a wooden mallet."
-thus respondeth Bill Mattocks
Find a creative and unique solution (cheat):
Some mornings it's hardly worth chewing through the restraints to get out of bed.
The trouble I can see with a test like this is that's it's a static test. It assumes a key feature of spam which is not true. namely that the spam signature is constant over time or at least makes an ergodic assumption. The thing about spam is that it is evolutionary. Not only does it's signature vary but the spammers learn what is getting through and shift to sending more of that flavor.
To see why this matters consider two spam hypothetical spam programs. One blocks 99% of the test set spam but lets a particular form of spam comprising only 1% of the test set through. And contrast this with another program that is adaptive but to avoid false-postives has to err on the side of letting through 20% of the spam it flags (making it only 20% effective).
While the former method would smoke the latter in a static trial. in the real world spammers would just shift to exclusively sending the kind of spam that gets through the first filter.
To make this a real contest they should make it adversarial. Give the spam script a feedback signal on which spam is getting through and let it adjust it's mix of spam and chaffe to try to maximize the the rate it can push spam through (or bust the filter by chaffing to minimize the number of legit e-mails that survive).
Some drink at the fountain of knowledge. Others just gargle.
"This ought to be a sweeps week television spectacular."
I think that it already is, but it's only on in Japan and uses real SPAM.
It doesn't work? Maybe you should tell that to my 300-strong userbase!
;^)
I'm certain that there are differences in implementation between different greylisters. I've never tried Postfix's, for example, because OpenBSD's works fine for me. A small point wrt to OpenBSD's spamd: you actually need to try thrice. The first time you're rejected. The second time you're marked as OK, but still rejected. The third time you get through. Maybe it's the third time, or some of the time limits, or some other things that spamd is doing (BTW, we do not use *any* blacklists), but it works great. I probably see a spam in my inbox once a month, maybe. The rest of my users who complain about the "spam" they're still getting are really getting email they've signed up for (listservs aren't spam, people!), in which case, it's usually just a simple matter of education.
I don't know where your greylisting system failed, but it works wonders for us. When I implemented it, I was a sysadmin rock star for a week. Who knew there were anti-spam groupies? Now it's back to picking the crud out of the VP's keybord
(You're spot-on about one thing though: defense in depth. That principle is in effect for EVERYTHING, which is why I want to administer electric shocks to our Mac users when they try to call the Help Desk.)
"This ought to be a sweeps week television spectacular."
This ought to be ignored as the contest is flawed.
"Ha ha, silly admin. My money's on greylisting."
They're sending a stream of spam from where? Sounds like a real mail server...
From TFA: "Live email stream, delivered by standard protocols (SMTP, IMAP, POP)"
[One wonders how else they would deliver e-mail if it was not from standard protocols. I also wonder how they plan on delivering e-mail using POP... The mind boggles...]
In any case if I read this correctly this effectively eliminates anti-spam technologies which work on the premise that the spam is coming from illegitimate mail servers. One of these techniques is greylisting. Meaning, greylisting will not work. So if I were you, I wouldn't put your money on it.
GENERAL JUNK E-MAIL FILTERING RANT (You've been warned): If you're using an anti-spam technique which takes more cpu cycles to execute than it takes for the spammer to send the damn spam in the first place, you've already lost this war. In other words, as long as it's costing you more than it is costing him/her you will always be on the losing end of the deal.
And I would like to add that despite my post above, I agree with you that greylisting and its derivatives when properly deployed are excellent techniques for eliminating UBE. But I think this contest is engineered to ignore that fact.
I am not interested in articles about life extension advancements.
From TFCFP (call for participation):
Filters will be evaluated based on a weighted combination of the percentage of spam blocked and its false positive percentage.
From a theoretical standpoint, a low false positive average over an entire set (like <1%) might seem okay, but that doesn't take into account what's important to users.
Take, for example, a message from a long-lost friend, whose current address isn't yet in your whitelist, and who would have no other way of contacting you should the message get spamboxed. Here's an example of a message that's important to a user but gets lost among the everyday messages when simply talking about the percentage of false positives.
There's lots of other examples, too -- if you run your own domain, your messages are likely to be spamboxed, etc. Furthermore, the lower the false-positive rate, the less likely a user is to actually *check* their spambox, thus making a single false-positive even worse.
Microsoft's own Hotmail, of course, is notorious for spamboxing messages like that. And yet the conference is being held at Microsoft, and Microsoft's own spam researchers proudly touted their system in the February 2007 Communications of the ACM.
Something tells me the leaders in the field are sort of missing the point. Simply bringing down the aggregate false positive rate is *not* enough. The measure needs to take into account how often the user actually misses information that's important to them.