Live spam-catching contest at CEAS
noodleburglar writes "The 2007 Conference on Email and Anti-Spam (CEAS) will feature a live spam-catching contest. Entrants will be treated to a torrent of spam and must use their spam filtering technique to filter out as much as possible, while also letting legitimate messages. My money's on Spam Assassin." This ought to be a sweeps week television spectacular.
http://crm114.sourceforge.net/ using hyperspace! It's been working better than spam assassin for me.
My money would be on greylisting + RFC compliance checking except for the fact that those are very hard to do in a testbed.
is on whatever Gmail uses. I've not yet seen a spam message in my inbox, nor have I missed any mail, even from auto-mailing scripts at websites I'm building...
My turnips listen for the soft cry of your love
It think I've seen people catching spam on tv, just not the kind you're talkin' 'bout. http://www.spam.com/
Under the influence of Post-Cyberpunk Gonzo Journalism
My money is on whoever rigs up a Amazon's Mechanical Turk-based system fast enough.
I wonder if I use bold in my signature, people will notice my posts.
Wonder what the SPAM messages are?
One of the funnier ones I think I ever got was from Oliver Kloshoff for "Male Enhancement".
No department? Come on taco, you can keep up the tradition. A lame one is better than no one.
Damn. I was hoping they'd be launching phone-book sized printed copies of spam at the contestants, complete with blood, with each week adding a few pounds. Add some half naked chicks and dudes (cater to multiple markets) dancing around, maybe some buckets of slime and you've got yourself a show worthy of running on Fox.
Gmail, like SpamCop, has a group spam filter system. It looks at mail sent to a large number of recipients. The defining characteristic of spam is that it's sent to a large number of recipients, after all. If you're in a position to watch the incoming mail of a few million mailboxes, detecting spam is easy.
Every email address variation of "Mateo_LeFou" is now being generated and gmail is now being bombarded using my army of hijacked PC's. It's just a matter of time. You wil have 50GB of spam within the hour...
... are they able to refer to Pfizer's brand name for sildenafil, Lilly's name for tadalafil, or Bayer's brand name for vardenafil without getting caught in the spam filters?
"How to Do Nothing," kids activities, back in print!
physically catching the spammers! (your imagination can do the rest)
It's be interesting to see a solid setup that handles a combination of the two, then publish the results (yes, spammers can read those results/settings to try to foil the setup, but many settings would make it patently unprofitable for them to do so).
Quo usque tandem abutere, Nimbus, patientia nostra?
1st prize: Job offer from a security-software vendor
2nd prize: Lifetime supply of Hormel meat products
3rd prize: Commemorative tin of SPAM meat product
Last place: Inheritance from Nigerian Prince
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Just open a yahoo mail account, and start posting with the e-mail address all over th internet.
You'll catch more spam than anyone else!
Oh, you want me to filter out spam, not just get spam, nevermind.
Still, it might be the fastest way to build a database of spam.
I wonder if professional spammers will attend the conference to learn how to get through the next generation of filters. Maybe it would be like playing spot the Fed at the hacker's conferences.
Ha ha, silly admin. My money's on greylisting.
We use both SpamAssassin and OpenBSD's spamd, to great effect. spamd does most of the work, though. Daniel Hartmeier (site down ATM, unfortunately) has an example of how to tie SA scores back into spamd for blacklisting, which is just awesome. I'd implement it here, but our current setup is effective enough as to not make it worth my time.
Back in West Virginia we'all used to go spam catchin' every weekend while they was in season! Them spam made good eatin'.
One ring to bind them - should probably have more fiber and less rings in their diet.
Could this actually be a fair contest though?
The first thing that came to my mind was; are they using scripts to send out "legit" emails to everyone. Is there someone going through legit domains with legit accounts typing/copy-pasting legit letters and sending INDIVIDUALLY to EACH contestant?
The amount of variations of LEGIT e-mails varies about as much as SPAM e-mails. So how do they plan on rigging up sending LEGIT e-mails on a a massive competitive level in ALL variations?
I'm going to take a page from the Veruca Salt needle-in-a-haystack problem and outsource this to a million peasants in India.
To pay for it I'll be spamming the world with my stock pump-and-dump scheme.
This just in: DAVI (OTC) NOW $0.02 TARGET $0.25!
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
A torrent of spam? It doesn't come in cans anymore?!
The cans were so much easier to catch, too.
I want a new quote. One that won't spill. One that don't cost too much. Or come in a pill.
A couple of years ago, I wrote a prototype for a video game called "Spam Rage Rampage" -- a first-person shooter where you roamed a Tron-like world, killing spam zombies and rescuing real people (== legitimate mail) while you searched for clues to the location of the nefarious spam kingpin, Ospama Bin Sendin. Each zombie represented a different class of spam... prostitute zombies for porn, business-suited zombies for stocks, pharmacist zombies for pill ads, etc.
Upon seeing a demo, one of my friends commented that I should hook it up to a real e-mail inbox, so you could kill your own spam messages, perhaps even in real time. Unfortunately I have never had the time to complete it... maybe after the kids are out of the house.
$nice = $webHosting + $domainNames + $sslCerts
Quo usque tandem abutere, Nimbus, patientia nostra?
I can't help but wonder how realistic this scenario is.. They're basically going to have a single server dumping a whole ton of spam at your filtering package, and you're supposed to be able to filter on.. what, just the content of the messages? Real world techniques use many more subtle hacks, such as greylisting, or actually looking at the domains the messages are coming from. If their server is going to be dumping millions of messages at you in a short amount of time, I don't think they'll let you use greylisting or similar techniques.
http://cltracker.net -- powerful craigslist multi-city search
I think email in its current form will eventally die. There is no way with increased information transparancy that a global network of email will continue to function efficiently. Simply too many senders and too much spam.
I could work better if we migrate to an invite-only system on top of email (extending the email-realted RFCs) -- one where mail delivery only occurs to individuals from those who hold a key (the public half of a keypair between the two people).
Such a migration will require minimal additional functionality by both existing email clients and servers. I wrote up some thoughts on this idea here http://biocontact.org/pmail/ but I've recieved no response.
Pure content scanning would probably trigger those ... unless you had previously manually approved similar messages.
Other approaches use multiple tests such as checking whether the sending server's IP address is on a blacklist or whether any of the links in the message (should it contain links) were on blacklists.
Try slutting your address around a bit. Mine is only publicly readable here on /. and I get plenty of spam in my gmail inbox. Yahoo seems to do a better job based on my experience.
"I'd rather be a lightning rod than a seismometer." -Ken Kesey
Greylisting was designed on the single proposition that spam mailers wouldn't "call back" if they got a "call back later" code from the site they were spamming. And maybe that was true for awhile. In my last job I had to add spam filtering to our email and greylisting was one of the first things I tried.
The spammers just kept trying until they got through.
Spamming has evolved past greylisting and it is now worthless.
Bayesian keyword filtering is decent, but is constantly attacked by images or hiding the spam content in random text. If you train it well, you can eventually just pass through the sort of mail you normally get and spam that doesn't mesh with your normal mail might get blocked, but when you take this to a company level it fails unless everyone separates their spam from their real mail and makes appropriate filters and rules -- which they won't do.
It's a tough problem, and there's no one solution that can do the whole job. A well-trained Bayesian along with an RBL like Spamcop can get about 80% of them.
But I doubt that they have a hundred thousand systems that they'll be using to send the test spam.
A big part of the system I use at work is based upon IP addresses and rDNS. I block a HUGE amount of spam just by rejecting all connections from Comcast that aren't from their SMTP servers.
I know, some people want to run SMTP servers at home. But so far none of them have attempted to send email to my system.
So it really depends upon how they configure the test spam servers. Personally, I don't see this as being a very useful competition. But I may be wrong.
Couldn't we just have a contest where actual live spammers are fed to lions?
To quote Bill Mattocks...
"My sense of personal integrity is none of your concern."
-thus spake Walt "Pickle Jar" Rines
"I'm going to pound your balls flat with a wooden mallet."
-thus respondeth Bill Mattocks
I certainly hope that after this senseless hunt, they'll re-release the poor SPAMs back into the wild where they belong.
Find a creative and unique solution (cheat):
Some mornings it's hardly worth chewing through the restraints to get out of bed.
Many of the questions asked here are answered in the Challenge Call for Participation
Or the overview talk that Rich Segal gave at the MIT Spam Conference.
The guidelines are scheduled to be finalized May 1.
The trouble I can see with a test like this is that's it's a static test. It assumes a key feature of spam which is not true. namely that the spam signature is constant over time or at least makes an ergodic assumption. The thing about spam is that it is evolutionary. Not only does it's signature vary but the spammers learn what is getting through and shift to sending more of that flavor.
To see why this matters consider two spam hypothetical spam programs. One blocks 99% of the test set spam but lets a particular form of spam comprising only 1% of the test set through. And contrast this with another program that is adaptive but to avoid false-postives has to err on the side of letting through 20% of the spam it flags (making it only 20% effective).
While the former method would smoke the latter in a static trial. in the real world spammers would just shift to exclusively sending the kind of spam that gets through the first filter.
To make this a real contest they should make it adversarial. Give the spam script a feedback signal on which spam is getting through and let it adjust it's mix of spam and chaffe to try to maximize the the rate it can push spam through (or bust the filter by chaffing to minimize the number of legit e-mails that survive).
Some drink at the fountain of knowledge. Others just gargle.
"This ought to be a sweeps week television spectacular."
Is there an ESPN 6 or 7 cable channel? I'm thinking this is below Cheerleading and Dog Agility, but perhaps above Lumberjack competitions.
Vincent J. Murphy
Spandex Justice
"This ought to be a sweeps week television spectacular."
I think that it already is, but it's only on in Japan and uses real SPAM.
Sigh. And i had such hopes. Pictures of a team of people, with a spam and tennis ball loaded tennis ball launcher at the other end of a court. When something gets fired at you, determine if you should let the ball go by, or wack the spam from the air. Alas, it's not to be. Dan
"This ought to be a sweeps week television spectacular."
This ought to be ignored as the contest is flawed.
"Ha ha, silly admin. My money's on greylisting."
They're sending a stream of spam from where? Sounds like a real mail server...
From TFA: "Live email stream, delivered by standard protocols (SMTP, IMAP, POP)"
[One wonders how else they would deliver e-mail if it was not from standard protocols. I also wonder how they plan on delivering e-mail using POP... The mind boggles...]
In any case if I read this correctly this effectively eliminates anti-spam technologies which work on the premise that the spam is coming from illegitimate mail servers. One of these techniques is greylisting. Meaning, greylisting will not work. So if I were you, I wouldn't put your money on it.
GENERAL JUNK E-MAIL FILTERING RANT (You've been warned): If you're using an anti-spam technique which takes more cpu cycles to execute than it takes for the spammer to send the damn spam in the first place, you've already lost this war. In other words, as long as it's costing you more than it is costing him/her you will always be on the losing end of the deal.
And I would like to add that despite my post above, I agree with you that greylisting and its derivatives when properly deployed are excellent techniques for eliminating UBE. But I think this contest is engineered to ignore that fact.
I am not interested in articles about life extension advancements.
I know I've removed myself from a few mailing lists by simply having gmail count them as spam.
These aren't really spam, they are companies that I did business with once and can't be bothered to find my username and password to change my email subscription settings. But gmail seems to happily block everything else from that sender without my interaction.
Surely other users do want these particular emails so there must be some kind of per user dynamic as well.
Issue hunting permits for the spammers themselves. Whoever wastes the most spammers, wins.
Evidence of wasted spammers can be in the form of complete heads, or ears.
Lawyers, MBA's, RIAA? A jedi fears not these things!
The loser will have to wear a dress to the after-contest party?
> A couple of years ago, I wrote a prototype for a video game called "Spam Rage Rampage"
> -- a first-person shooter where you roamed a Tron-like world, killing spam zombies and
> rescuing real people (== legitimate mail) while you searched for clues to the location
> of the nefarious spam kingpin, Ospama Bin Sendin. Each zombie represented a different
> class of spam... prostitute zombies for porn, business-suited zombies for stocks,
> pharmacist zombies for pill ads, etc.
>
> Upon seeing a demo, one of my friends commented that I should hook it up to a real e-mail
> inbox, so you could kill your own spam messages, perhaps even in real time....
Um, doesnt the system already have to know whether messages are spam or not?
> Unfortunately I have never had the time to complete it... maybe after the kids are out of the house.
Release the code under GPL with a couple screenshots of the demo and I'm sure
others will finish it for you! It's a cool enough idea...
them train as many ignorant users to catch spam as possible in the alloted time and be judged on how well the users did.
Relevant to nothing, but this is the first time I can remember seeing an article on /. without the requisite department tag in the story header.
Anyone want to try their hand at making up their own?
Soylens viridis homines es
How's this for a plan:
Seed a few thousand fake email addresses all across the net. Put some on big sites. Put some on small sites. Put some on USENET. Change half the list every month.
If anyone emails two addresses with similar content, the content and the originating IP addresses get marked as likely spam, and used for realtime blackhole-list systems. The more of those fake addresses it hits in a short period of time, the greater it's spammishness.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
I hope its not a Catch and Release internet stream.
"The universe is my dwelling place and my house is my only clothes! Why are you entering into my pants?" - Liu Ling
Okay, here's the first question I have, and I'm sure many others wonder the same. How will spam be combatted when it's not real spam? For example, Spam Assassin checks actual mail server names and addresses to see if they are on known spammer lists and so on. Won't extremely useful/effective features like these be overriden by the fact that these spam emails are intentionally sent and won't be from any known spam-relaying mail servers??
Problem solved.
:P
Now get people and free email services like Hotmail and Gmail to turn off their URL signatures in the bottom of their outgoing emails and you will stamp the spam email menace out in one bold stroke.
Moves the spam back to USENET which is already spammed-out already....
If people you don't know want to start a meaningful email conversation with you, they WON'T try to get you to visit the URL of some 'paysite' contained in their email.
Then something has to be done about spammers bouncing their crap 'back' to their victims as 'undeliverable email'. Can something be done about that without too much overhead or breaking the SMTP protocol?
Extend the URL-in-unsolicited-email ban to email addresses and you can quash '419 spam' as well.
Slashdot CAPTCHA: forging -- how apt!
then what happens when I click on the NOT SPAM button... *snicker*
Perhaps this is because I have email forwarded to my gmail account from several other accounts. And those accounts are probably not in the same spam batches as all the gmail recipients. So maybe I'm really lucky and get the latest spam before the rest of you gmailers, even without paying for a subscription!
Yes yes, of course I do my duty and mark those 0-day mails as spam, even though I don't seem to get anything out of it myself...
From TFCFP (call for participation):
Filters will be evaluated based on a weighted combination of the percentage of spam blocked and its false positive percentage.
From a theoretical standpoint, a low false positive average over an entire set (like <1%) might seem okay, but that doesn't take into account what's important to users.
Take, for example, a message from a long-lost friend, whose current address isn't yet in your whitelist, and who would have no other way of contacting you should the message get spamboxed. Here's an example of a message that's important to a user but gets lost among the everyday messages when simply talking about the percentage of false positives.
There's lots of other examples, too -- if you run your own domain, your messages are likely to be spamboxed, etc. Furthermore, the lower the false-positive rate, the less likely a user is to actually *check* their spambox, thus making a single false-positive even worse.
Microsoft's own Hotmail, of course, is notorious for spamboxing messages like that. And yet the conference is being held at Microsoft, and Microsoft's own spam researchers proudly touted their system in the February 2007 Communications of the ACM.
Something tells me the leaders in the field are sort of missing the point. Simply bringing down the aggregate false positive rate is *not* enough. The measure needs to take into account how often the user actually misses information that's important to them.
The data will be available to participants to do whatever post-hoc analysis they like. They may, for example, wish to classify misclassified mail into genres, as discussed here.:
But if you're going to rank systems you need some sort of simple summary measure and the logistic mean of false positive and false negative rates works pretty well. Have a look at TREC 2005 or TREC 2006 summary results, for example.
Subject: Interesting phenomon related to Viagra use
Hi, Dr. Smith-
I just wanted to write you to let you know that I really enjoyed the article you wrote in the New England Journal of Medicine about the side effects of Cialis, Viagra, and Levitra. It turns out a patient of mine experienced debilitating nausea while on Levitra, so I prescribed Viagra in its place, as you recommend.
In addition, I thought you might be interested to know that this patient suffers from Raynaud's disease, and he reported a 50% reduction in the frequency of his attacks after switching from Levitra to Viagra. Curious, I found an article in PubMed detailing this phenomenon and I thought I'd pass it along to you.
I hope your knee is healing up nicely, I'm sure you can't wait to get back on the tennis court.
Best Regards,
Dr. Gerald Jones
Then, I sent it from my business email to my personal email, which is protected by SpamAssassin. The results were as follows: I knew that the email would likely be delivered because, at the very least, the AutoWhiteList would knock the score down based on the low scores of previous messages. What I found to be pretty remarkable was that the Bayesian classifier scored the email at a 0 despite my gratuitous use of erectile drug names both in the subject as well as the body.
Conclusion: The work done by the SpamAssissin folks is top-notch and should be recognized as such. Thank you SpamAssassin for making my email bearable.
They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock
I use both, and I have to say that greylisting catches a metric boatload of spam. On the other hand, spammers have wised up and many are now retrying.
Sure does take a lot of load off of spamassassin, though.
They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock
I agree, and was pondering how to exploit that fact. I couldn't think of a good answer, so I decided to just let the Bayesian classifier figure it out for me.
I use a routine that can quickly determine the origin country of an IP address and just insert that origin country into the headers of the message in an X- header. Then, it's just one more thing for the Bayesian classifier to decide what to do with. It realizes that I don't get much ham from Latvia, so when it sees X-Origin-Country: Latvia, that spam probability goes through the roof.
They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock