Live spam-catching contest at CEAS

CRM114 by sageFool · 2007-04-11 04:22 · Score: 4, Informative

http://crm114.sourceforge.net/ using hyperspace! It's been working better than spam assassin for me.

Re:CRM114 by Anonymous Coward · 2007-04-11 05:01 · Score: 1, Funny

Unlike many other "filters", CRM114's default action is to read all of input, and put NOTHING onto output.

This is either:

1) "automatic" white-listing?
2) Not healthy and you should eat more fibre.

Greylisting by Anonymous Coward · 2007-04-11 04:23 · Score: 0

My money would be on greylisting + RFC compliance checking except for the fact that those are very hard to do in a testbed.

My money by Mateo_LeFou · 2007-04-11 04:23 · Score: 1

is on whatever Gmail uses. I've not yet seen a spam message in my inbox, nor have I missed any mail, even from auto-mailing scripts at websites I'm building...

--
My turnips listen for the soft cry of your love

Re:My money by rodney+dill · 2007-04-11 04:29 · Score: 2, Funny

Well let's just find out, just what is your gmail address, hmmmm?

;)

--

Use your head, can't you, use your head,
You're on earth, there's no cure for that - S. Beckett
Re:My money by Anonymous Coward · 2007-04-11 04:40 · Score: 0

italiasw@gmail.com , it's go time!
Re:My money by 0100010001010011 · 2007-04-11 04:42 · Score: 2, Informative

Set up a catchall on your domain. You'll start getting stuff through. Especially the images ones. Some of the newer "make it look like a real e-mail" gets through.

Everywebsite I have gets its own e-mail account, eg. slashdot@myhost.com.
One day I started getting spam to site@myhost.com. So I setup in dreamhost to bounce everything to that e-mail address.

Then I started getting flooded with:
otehoenut-site@myhost.com
cgjwbmkh-site@myhost.com

Google has, thankfully, let me do delete of *site@myhost.com, but for a time I was still getting them.
Re:My money by hpavc · 2007-04-11 05:23 · Score: 1

The google gmail news group says otherwise for many other people, the filtering is practically non-existent it seems for me.

--
members are seeing something, your seeing an ad
Re:My money by Afecks · 2007-04-11 05:27 · Score: 1

My money is on whoever rigs up a Amazon's Mechanical Turk-based system fast enough.

Because you'd really want thousands of random people reading your emails looking for spam?
Re:My money by thePowerOfGrayskull · 2007-04-11 05:53 · Score: 1

is on whatever Gmail uses. I've not yet seen a spam message in my inbox, nor have I missed any mail, even from auto-mailing scripts at websites I'm building... I will agree that it's great for spam; but when it comes to 419 emails, it sucks. Badly. I'm not sure how I got on the 419ers lists, but I get at least 10-12 of them a day, none of which are caught by gmail filters. On the other hand, the 50-60 regular spam emails are correctly filtered. If only I could perform regex filtering in gmail, I could catch the 419 emails myself very easily, as they all have very common attributes.
Re:My money by gvc · 2007-04-11 05:59 · Score: 1

You're welcome to use Gmail -- or any other filter you like, animal, vegetable, or mineral -- to participate in the Live Challenge.
Re:My money by SL+Baur · 2007-04-11 06:50 · Score: 1

My bet would be on the gmail filter too. I've had my old xemacs.org email address (which has been harvested to death) forwarded through there for some months now. It's not perfect, but it still only lets through about as much spam as my old handcrafted .procmailrc did 8 or 9 years ago. Which is really good considering how much more spam there is today.

If I could tell it to junk everything except text in certain languages it would work even better. It seems to miss a lot of Korean and Russian spam.
Re:My money by martin-boundary · 2007-04-11 13:39 · Score: 1

That's not surprising. It's mathematically impossible for a single filter to classify emails correctly for a large group of people, because any large group is inconsistent. Someone believes X is spam, but another one truly believes X is not spam. Whatever the filter does, it's going to be wrong on one group of people. You're part of that crowd of Gmail's users.
You'll be much better off with a personal filter, that learns what you like, not what the majority of Gmail users like.

Sweeps by cyphercell · 2007-04-11 04:25 · Score: 2, Funny

This ought to be a sweeps week television spectacular.

It think I've seen people catching spam on tv, just not the kind you're talkin' 'bout. http://www.spam.com/

--
Under the influence of Post-Cyberpunk Gonzo Journalism

Re:Sweeps by session_start · 2007-04-11 06:19 · Score: 1

The trick is to try to catch the spam in a net with such velocity that the spam "squishes" through the net to fall on the ground, leaving you with only valid "message" hidden amongst the spam.

My money by TodMinuit · 2007-04-11 04:27 · Score: 1

My money is on whoever rigs up a Amazon's Mechanical Turk-based system fast enough.

--
I wonder if I use bold in my signature, people will notice my posts.

Wonder what the SPAM messages are? by willie_nelsons_pigta · 2007-04-11 04:27 · Score: 0

Wonder what the SPAM messages are?
One of the funnier ones I think I ever got was from Oliver Kloshoff for "Male Enhancement".

What... by Anonymous Coward · 2007-04-11 04:27 · Score: 0

No department? Come on taco, you can keep up the tradition. A lame one is better than no one.

Re:What... by Anonymous Coward · 2007-04-11 05:42 · Score: 0

lolled!

Damn. by daeg · 2007-04-11 04:28 · Score: 0

Damn. I was hoping they'd be launching phone-book sized printed copies of spam at the contestants, complete with blood, with each week adding a few pounds. Add some half naked chicks and dudes (cater to multiple markets) dancing around, maybe some buckets of slime and you've got yourself a show worthy of running on Fox.

Group spam detection by Animats · 2007-04-11 04:32 · Score: 4, Informative

Gmail, like SpamCop, has a group spam filter system. It looks at mail sent to a large number of recipients. The defining characteristic of spam is that it's sent to a large number of recipients, after all. If you're in a position to watch the incoming mail of a few million mailboxes, detecting spam is easy.

Re:Group spam detection by ProfessionalCookie · 2007-04-11 04:41 · Score: 1

Yeah- I'm waiting to see algorithmically generated spam where no two messages are alike. Bleh! That being said gmail does a tremendous job of letting through legitimate messages (which is no doubt the hardest part of making a spam filter these days).
Re:Group spam detection by kebes · 2007-04-11 04:43 · Score: 5, Interesting

You're right--but the size of Gmail gives them another advantage. In those marginal cases where the spam filter isn't sure about an email (is this spam or a mailing list?) it has the advantage of having a huge number of people checking all the emails. That is, the users do the final check.

I have received a spam to my gmail account exactly once. And when I did, shocked, I clicked the "mark as spam" button. The point is that this spam was probably sent to millions of Gmail users, and the algorithm wasn't sure how to categorize it. But because I clicked "spam" (and probably a few other people did, too), it was marked as spam for everyone. So most users never say it in their inbox. Thus only a dozen out of the million recipients was ever bothered by the spam. Conversely, an email list would receive no (or very few) "mark as spam" clicks, and would be allowed to pass. So basically the Gmail userbase acts the workforce to continually train the spam filter, and moreover to detect new spam within minutes of it being sent.

It's hard to beat a system like that. But the point is that it relies on the large number of users who are all (effectively) sharing their spam training sets with each other in realtime.

This is not to say that the baseline algorithm that Gmail implements isn't quite effective, but the point is that Gmail can use the users to resolve those tricky false-positive and false-negative situations.
Re:Group spam detection by Anonymous Coward · 2007-04-11 04:56 · Score: 0

A lot of spam is generated algorithmically. The key, though, is that a spammer typically cannot send each email individually. It sends its spam to a relay, and the relay is the one that actually does the mass mailing. If a spammer were actually required to send each email that it sent, that would GREATLY increase its operating costs.
Re:Group spam detection by Animats · 2007-04-11 05:10 · Score: 1

Yeah- I'm waiting to see algorithmically generated spam where no two messages are alike.
We've had that for years. The latest variant is in those Viagra spams with a faint pattern of background noise in the images, different for each spam.
Re:Group spam detection by Anonymous Coward · 2007-04-11 05:27 · Score: 0

If it's so easy, then why hasn't Earthlink mastered it?
Re:Group spam detection by Anonymous Coward · 2007-04-11 05:30 · Score: 0

They use DCC?

Anyone can use DCC.
Re:Group spam detection by iminplaya · 2007-04-11 06:00 · Score: 1

This doesn't lead to the possibility that a group of users could mark a legitimate sender as a spammer? I think this an old question, but I don't remember the answer. And if it is possible, how do you defend against it?

--
What?
Re:Group spam detection by Anonymous Coward · 2007-04-11 06:37 · Score: 0

The key, though, is that a spammer typically cannot send each email individually. It sends its spam to a relay, and the relay is the one that actually does the mass mailing. If a spammer were actually required to send each email that it sent, that would GREATLY increase its operating costs.

Huh? Have you been living in a cave? Most spam that my spam filter catches, even the same message to 40 different users in my domain, comes from 40 different addresses. Botnets baby. Get with the times man.
Re:Group spam detection by asninn · 2007-04-11 07:15 · Score: 1

Thus only a dozen out of the million recipients was ever bothered by the spam. Conversely, an email list would receive no (or very few) "mark as spam" clicks, and would be allowed to pass. So basically the Gmail userbase acts the workforce to continually train the spam filter, and moreover to detect new spam within minutes of it being sent.

This probably plays a role, but it will not be the only thing GMail relies on (and probably not even the most important factor), and it will likely require more than a dozen people, too. Think about it - otherwise, a spammer could just set up twelve fake GMail accounts, send the spam message in question to those as well, and mark them as "Not Spam" there when they filter catches them after the dozen users you refer to tell the system that it's indeed spam.

Needless to say, this is probably still being done - and given that email is a pretty private matter, I don't think there's much webmail providers can do about it, either. After all, it's not like you can just have an employee look at someone's account after the system flagged it as "suspicious" to see if they are a legitimate user or not; doing so would be a rather crass invasion of people's privacy and their right to private communication.

So the spammers *are* doing it (why wouldn't they, after all?), and GMail etc. can't really do all that much about it - and therefore, the system will probably not depend on user input quite as much as you think.

--
butter the donkey
Re:Group spam detection by Matt+Perry · 2007-04-11 08:27 · Score: 1

I have received a spam to my gmail account exactly once.
I wish my Gmail account was like that. Maybe you're new to Gmail. I get several spams in my inbox per week. Mostly these are spam messages in Russian and Chinese but I still get a lot of spam in English as well. I always use the button to mark them as spam, but Gmail doesn't seem to get the message that I don't want anything written in Russian. It's also disappointing that I can't create a filter to mark messages as spam. The best I can do is catch emails with Russian or Chinese characters and filter them off to a folder where I later go and mark them as spam.

--
Slashdot: Failed Car Analogies. Amateur Lawyering. Anecdote Battles.
Re:Group spam detection by Anonymous Coward · 2007-04-11 10:11 · Score: 0

It's probably computed as a percentage of the people who see the spam. If >95% of the people who get it in their inbox mark it as spam, then it's spam. If it's 10%, then it's not (maybe it's a mailing list and some people mark it as spam, but not everyone does).

A spammer can't really beat this system. Even if he registers 100 GMail accounts clicks "not spam" on all of them, this only gets him the ability to spam 100 other people... if he spams lots more people, eventually their "this is spam" clicks win.
Re:Group spam detection by ProfessionalCookie · 2007-04-11 11:30 · Score: 1

Botnets. Now they're programmable!

Mateo_LeFou, prepare yourself... by Anonymous Coward · 2007-04-11 04:32 · Score: 0

Every email address variation of "Mateo_LeFou" is now being generated and gmail is now being bombarded using my army of hijacked PC's. It's just a matter of time. You wil have 50GB of spam within the hour...

Re:Mateo_LeFou, prepare yourself... by Zephyros · 2007-04-11 05:05 · Score: 2, Funny

Translation: "You have no chance to survive. Make your time."

Curious:When urologists email each other... by dpbsmith · 2007-04-11 04:33 · Score: 4, Interesting

... are they able to refer to Pfizer's brand name for sildenafil, Lilly's name for tadalafil, or Bayer's brand name for vardenafil without getting caught in the spam filters?

--

"How to Do Nothing," kids activities, back in print!

Re:Curious:When urologists email each other... by kebes · 2007-04-11 04:53 · Score: 3, Informative

Suffice it to say that a doctor is likely to write an email like:

"Ted, I just read the news about Viagra in the New England Journal of Medicine. Very interesting results, though the error bars are a bit large to draw any major conclusions just yet. What do you think?"

Whereas a doctor rarely writes email like:

"NoW ava ilable is generic V1AGRA at low price! Generic, quality, all low price now!"

The point is that modern spam filters don't just look for "bad words" but consider relative word frequencies, the sender and receiver fields, word correlations, formatting elements, URLs, etc. Spam filters in your email client will be trained against email you typically send/receive, and so can be even more precise. Spammers of course try to make their emails include words so that they end up looking like real email, but if the filter is good enough, then the only way to get past it is to send an email that now lacks those critical spam elements (like the link you're supposed to click to buy the generic drug or whatever)...
Re:Curious:When urologists email each other... by misleb · 2007-04-11 05:19 · Score: 1

Only if they write things like:

Hey, I just pre sc ribed V.1.4.G.R.A to a patient today.

The monk said to the fox, why don't the squirrels to be or not to be, that is my answer. The fog was as thick as umbrellas in the wind thought the old maid.

--
"THERE IS NO JUSTICE, THERE IS ONLY ME." -Death
Re:Curious:When urologists email each other... by cgrayson · 2007-04-11 05:41 · Score: 1

See, er, listen to this hilarious Onion Radio News story from Feb. 8: Brilliant Scientist Trying To Get Word Out About Penis-Enlargement Breakthrough (warning: page may auto-play audio).

--
Cool funny t-shirts for geeks, gamers and everyone else
Re:Curious:When urologists email each other... by mutterc · 2007-04-11 07:15 · Score: 1

Happened with a lame spam filter my company used to have. This was a year or so ago.

I emailed my wife "can you stop by and pick up the Strattera and Effexor from the pharmacy?" once. Her reply, containing my message, got plonked by the filters.
Re:Curious:When urologists email each other... by Anonymous Coward · 2007-04-11 08:07 · Score: 0

I always assumed that spammers wrote "Hey, I just pre sc ribed V.1.4.G.R.A to a patient today." precisely because they couldn't write "Hey, I just prescribed Viagra to a patient today." without getting caught in the spam filters.

The mostly-grammatical Zen nonsense at the end would presumably improve the chances of being accepted(by lowering the Viagra-centrism of the message).
Re:Curious:When urologists email each other... by Atario · 2007-04-11 10:52 · Score: 1

... are they able to refer to Pfizer's brand name for sildenafil, Lilly's name for tadalafil, or Bayer's brand name for vardenafil without getting caught in the spam filters?
I would hope they use the real names and not the brand names.

--
"A great democracy must be progressive or it will soon cease to be a great democracy." --Theodore Roosevelt

I wish the contest was.... by ruffnsc · 2007-04-11 04:35 · Score: 2, Interesting

physically catching the spammers! (your imagination can do the rest)

Re:I wish the contest was.... by HTH+NE1 · 2007-04-11 05:03 · Score: 1

I wish the contest was physically catching the spammers!
Only as long as it is not catch-and-release.

--
Oh, say does that Star-Spangled Banner entwine / The myrtle of Venus with Bacchus's vine?

Will SMTP server settings count as well? by Penguinisto · 2007-04-11 04:36 · Score: 1

...or just the filter software/daemon performance/stats alone? There's lots you can do to the MTA itself to stop spam before it even has to be examined by the filters (mostly by monkeying w/ the SMTP session handling and timeouts).

It's be interesting to see a solid setup that handles a combination of the two, then publish the results (yes, spammers can read those results/settings to try to foil the setup, but many settings would make it patently unprofitable for them to do so).

/P

--
Quo usque tandem abutere, Nimbus, patientia nostra?

Re:Will SMTP server settings count as well? by gvc · 2007-04-11 06:08 · Score: 1

Envelope information will be preserved, so you can determine the purported sender, multiple recipients, HELO IP, actual IP, etc. But you can't play interactive games with the SMTP protocol because the same email must be delivered to all participants.
Re:Will SMTP server settings count as well? by pe1chl · 2007-04-11 06:20 · Score: 1

I agree. I filter the majority of spam by just doing strict RFC compliance testing in the SMTP engine. It rejects almost everything sent via botnets. What comes through is mostly 419 scamming, because that is sent via bonafide mailservers. But that is easily filtered with SpamAssassin.
Re:Will SMTP server settings count as well? by SCHecklerX · 2007-04-11 09:07 · Score: 1

That's my plan (I want to see how well my stuff works without customizing it too much just for the contest). Let's hope more details arrive soon...

The prize list :) by davidwr · 2007-04-11 04:37 · Score: 5, Funny

1st prize: Job offer from a security-software vendor
2nd prize: Lifetime supply of Hormel meat products
3rd prize: Commemorative tin of SPAM meat product
Last place: Inheritance from Nigerian Prince

--
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.

Re:The prize list :) by LearnToSpell · 2007-04-11 05:22 · Score: 1

2nd prize: Lifetime supply of Hormel meat products

Which is about 4 1/2 days if that's all you eat.

--
Haida Manga

that's easy. Yahoo mail! by number6x · 2007-04-11 04:38 · Score: 2, Funny

Just open a yahoo mail account, and start posting with the e-mail address all over th internet.

You'll catch more spam than anyone else!

Oh, you want me to filter out spam, not just get spam, nevermind.

Still, it might be the fastest way to build a database of spam.

Re:that's easy. Yahoo mail! by CrazyTalk · 2007-04-11 04:45 · Score: 1

Actually thats not a bad idea - have a contest to see how much spam you can ATTRACT with a fresh email account in a given time period. My Verizon account would win hands down. (And to you spammers out there - no, my email address is NOT CrazyTalk@verizon.net)
Re:that's easy. Yahoo mail! by Kozar_The_Malignant · 2007-04-11 05:43 · Score: 1

Actually thats not a bad idea - have a contest to see how much spam you can ATTRACT with a fresh email account in a given time period. My Verizon account would win hands down. (And to you spammers out there - no, my email address is NOT CrazyTalk@verizon.net)
The poor bastard who actually does have CrazyTalk@verizon.net is really, really pissed about now.

--
Some mornings it's hardly worth chewing through the restraints to get out of bed.

Professional spammers in attendance? by MobyDisk · 2007-04-11 04:40 · Score: 4, Interesting

I wonder if professional spammers will attend the conference to learn how to get through the next generation of filters. Maybe it would be like playing spot the Fed at the hacker's conferences.

SpamAssassin? by raddan · 2007-04-11 04:41 · Score: 3, Interesting

Ha ha, silly admin. My money's on greylisting.

We use both SpamAssassin and OpenBSD's spamd, to great effect. spamd does most of the work, though. Daniel Hartmeier (site down ATM, unfortunately) has an example of how to tie SA scores back into spamd for blacklisting, which is just awesome. I'd implement it here, but our current setup is effective enough as to not make it worth my time.

Re:SpamAssassin? by Anonymous Coward · 2007-04-11 07:37 · Score: 0

I do all this using MIMEDefang tied to a MySQL database. High SpamAssassin scores or viruses get the sending IP blacklisted for an increasing amount of time per incident.

Despite predictions of gloom and doom about greylisting and other SMTP-validity checks (legal argument to HELO, sendmail greet-pause, etc.), they still do a great job stopping a lot of spam before the data phase. Add in SA with auto-update rules, and only about 1.5% of e-mail delivered to my domain is spam, with only about 40% of attempted e-mail ever needing to be scanned by SpamAssassin...the other 60% is junked before that.

Oh, yeah, I also don't use any DNS-based blocking lists like SpamCop, XBL, Spamhaus, etc., because of too many false positives, slow DNS queries, and no local control over what I was blocking.
Re:SpamAssassin? by Sentry21 · 2007-04-11 10:58 · Score: 1

I'll second greylisting. I set up a new mail system on our mail server last year to replace our crufty and pathetic qmail installation. I started with RBLs in postfix and spamassassin/clamav via amavisd, and that was all well and good. A week after adding in greylisting, however, I took out spamassassin filtering by default (users can still enable it on a per-account basis). The reason? RBLs block out the most prolific hosts, and greylisting blocks the vast majority of everything else. The only mail that was ever hitting spamassassin was all legitimate (and was slowing the server down immensely).

Spamassasin may win this contest or it may not, but either way, I don't need it anymore. Sorry guys.
Re:SpamAssassin? by tacocat · 2007-04-11 13:21 · Score: 1

I'm not that impressed with SpamAssassin. Too much overhead in trying to keep all the static filtering rules up to date. Eventually, it get's dumb

The best spam filters I've seen in terms of effectiveness is bogofilter and dspam. Both of these are extensions of the Bayes statistical filtering.

bogofilter is awesome but it can't manage tokens from a database. Hence you can't have multiple machines very easily and users cannot share a database. Virtual hosting makes it harder and eventually you kind of get frustrated with it if trying to deploy a large number of users with user-specific token lists

dspam does a nice job of per-user token lists but it loses email. I found a lot of cases where mail checked in via the postfix logs, but it never came out of dspam or was recorded anywhere as an error or anything. It just got lost. It's a nice approach but scores 0.0 for reliability.

If only bogofilter would run with postgresql, I would be happy.

West Virginia by ehaggis · 2007-04-11 04:42 · Score: 0

Back in West Virginia we'all used to go spam catchin' every weekend while they was in season! Them spam made good eatin'.

--
One ring to bind them - should probably have more fiber and less rings in their diet.

Re:West Virginia by UnknowingFool · 2007-04-11 04:45 · Score: 2, Funny

Back in West Virginia we'all used to go spam catchin' every weekend while they was in season! Them spam made good eatin'.

Don't lie. You and your buddies got drunk and would go spam tipping. There was no hunting involved.

--
Well, there's spam egg sausage and spam, that's not got much spam in it.

Fair Contest? by Anonymous Coward · 2007-04-11 04:42 · Score: 0

Could this actually be a fair contest though?

The first thing that came to my mind was; are they using scripts to send out "legit" emails to everyone. Is there someone going through legit domains with legit accounts typing/copy-pasting legit letters and sending INDIVIDUALLY to EACH contestant?

The amount of variations of LEGIT e-mails varies about as much as SPAM e-mails. So how do they plan on rigging up sending LEGIT e-mails on a a massive competitive level in ALL variations?

Re:Fair Contest? by Anonymous Coward · 2007-04-11 05:15 · Score: 0

yes
Re:Fair Contest? by TFGeditor · 2007-04-11 08:06 · Score: 1

I was wondering how the test-spam generator will handle headers, especially origin IP address. That alone is often 75 percent accurate in determining spam. If it sources from an IP in Korea, South America, or Europe and is destined for a North American inbox, odds are it is spam.

Not flaming, just an observation based on my own experinces.

--
Ignorance is curable, stupid is forever.
Re:Fair Contest? by HomelessInLaJolla · 2007-04-11 12:07 · Score: 1

especially origin IP address Gmail, and possibly other new webmail services, no longer include the X-Originating-IP field and actually go the opposite route--all e-mail I receive from gmail accounts appears to originate from an internal 10. IP address.

I cannot possibly come up with any viable justification for this. I can think of plenty of excuses and all of them rely on idiotic fallacies.

--
the NPG electrode was replaced with carbon blac

My entry: Human computers by davidwr · 2007-04-11 04:42 · Score: 1

I'm going to take a page from the Veruca Salt needle-in-a-haystack problem and outsource this to a million peasants in India.

To pay for it I'll be spamming the world with my stock pump-and-dump scheme.

This just in: DAVI (OTC) NOW $0.02 TARGET $0.25!

--
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.

New packaging? by davmoo · 2007-04-11 04:44 · Score: 2, Funny

A torrent of spam? It doesn't come in cans anymore?!

The cans were so much easier to catch, too.

--
I want a new quote. One that won't spill. One that don't cost too much. Or come in a pill.

Spam Rage Rampage by Dekortage · 2007-04-11 04:44 · Score: 1

A couple of years ago, I wrote a prototype for a video game called "Spam Rage Rampage" -- a first-person shooter where you roamed a Tron-like world, killing spam zombies and rescuing real people (== legitimate mail) while you searched for clues to the location of the nefarious spam kingpin, Ospama Bin Sendin. Each zombie represented a different class of spam... prostitute zombies for porn, business-suited zombies for stocks, pharmacist zombies for pill ads, etc.

Upon seeing a demo, one of my friends commented that I should hook it up to a real e-mail inbox, so you could kill your own spam messages, perhaps even in real time. Unfortunately I have never had the time to complete it... maybe after the kids are out of the house.

--
$nice = $webHosting + $domainNames + $sslCerts

The First Annual Greased Spammer Contest! by Penguinisto · 2007-04-11 04:47 · Score: 4, Funny

(cue Monster Truck Rally announcer guy voice...) THIS SATURDAY AT THE EXPO CENTER! The Best admins and the worst spammers come together in a throwdown-showdown-lowdown Greased Spammer Contest! We kidnap, strip, and grease down every known spammer we can find on Planet Earth! We bring 'em here, then we give our lucky mail server admins (as determined by lottery) a chance to catch 'em! The spammers will be released into a large pit, where the admins may use any method to catch and immobilize spammers (firearms and other projectile weapons are excluded). Points will be given for the number of spammers caught, the methods of capture, and the level of eye-rattling violence applied to each spammer after their capture! Watch as the winning admin gets to publicly execute the dreaded Sanford Wallace by any method that he or she can dream up! Any method at all! You'll buy a ticket for the whole seat, but you will only need the edge! Get your tickets at the Mondotix - DON'T MISS IT!(/voice)

/P

--
Quo usque tandem abutere, Nimbus, patientia nostra?

Re:The First Annual Greased Spammer Contest! by NewbieV · 2007-04-11 07:48 · Score: 1

You forgot to mention that's it's being held on

SUNDAY! SUNDAY! SUNDAY!

Be There!

--

"For every right, an equal responsibility..."

Greylisting? by schmiddy · 2007-04-11 04:49 · Score: 2, Insightful

I can't help but wonder how realistic this scenario is.. They're basically going to have a single server dumping a whole ton of spam at your filtering package, and you're supposed to be able to filter on.. what, just the content of the messages? Real world techniques use many more subtle hacks, such as greylisting, or actually looking at the domains the messages are coming from. If their server is going to be dumping millions of messages at you in a short amount of time, I don't think they'll let you use greylisting or similar techniques.

--
http://cltracker.net -- powerful craigslist multi-city search

Re:Greylisting? by blhack · 2007-04-11 06:12 · Score: 1

No. they give the nerds of an email address, then reverse the web filter so that it ONLY allows them to go to porn sites.

after a few minutes their email servers should reach critical mass.

--
NewslilySocial News. No lolcats allowed.
Re:Greylisting? by martin-boundary · 2007-04-11 13:57 · Score: 1

Read the rules. You can use any technique you like, you're getting each message delivered to you in real time transparently as if you were hooked to the net yourself. If you need POP, you get it, if you need SMTP, you get it. You can use external RBLs if you like, you can use a commercial filter from work (just pipe the data you receive through the work filter and report the result, assuming you have permission of course) etc. Even greylisting shouldn't be an issue in principle.
Just pretend you're an admin who is told to fix the spam problem with the mail server.

email migrating to pmail (permission mail) by drDugan · 2007-04-11 04:52 · Score: 0, Redundant

I think email in its current form will eventally die. There is no way with increased information transparancy that a global network of email will continue to function efficiently. Simply too many senders and too much spam.

I could work better if we migrate to an invite-only system on top of email (extending the email-realted RFCs) -- one where mail delivery only occurs to individuals from those who hold a key (the public half of a keypair between the two people).

Such a migration will require minimal additional functionality by both existing email clients and servers. I wrote up some thoughts on this idea here http://biocontact.org/pmail/ but I've recieved no response.

That depends upon the method used. by khasim · 2007-04-11 04:55 · Score: 1

Pure content scanning would probably trigger those ... unless you had previously manually approved similar messages.

Other approaches use multiple tests such as checking whether the sending server's IP address is on a blacklist or whether any of the links in the message (should it contain links) were on blacklists.

Gmail's filtering is not that great by winkydink · 2007-04-11 04:59 · Score: 1

Try slutting your address around a bit. Mine is only publicly readable here on /. and I get plenty of spam in my gmail inbox. Yahoo seems to do a better job based on my experience.

--

"I'd rather be a lightning rod than a seismometer." -Ken Kesey

Re:Gmail's filtering is not that great by jfengel · 2007-04-11 06:09 · Score: 1

Huh. I'm using GMail to host my domain. My email addresses are pretty slutty (a combination of supporting the catchall, some public "info@" addresses that get forwarded to me, and a few mailing lists with lousy privacy or security policies.)

I do see perhaps three spams a day that actually make it into the inbox, and about 300 or so that are shunted to the spam folder.

There may be false positives in there, but with 300 per day I'm not going to find out. I've never noticed one in there, or had a friend tell me about an email that never reached me.

Greylisting no longer works by Tipa · 2007-04-11 05:00 · Score: 1

Greylisting was designed on the single proposition that spam mailers wouldn't "call back" if they got a "call back later" code from the site they were spamming. And maybe that was true for awhile. In my last job I had to add spam filtering to our email and greylisting was one of the first things I tried.

The spammers just kept trying until they got through.

Spamming has evolved past greylisting and it is now worthless.

Bayesian keyword filtering is decent, but is constantly attacked by images or hiding the spam content in random text. If you train it well, you can eventually just pass through the sort of mail you normally get and spam that doesn't mesh with your normal mail might get blocked, but when you take this to a company level it fails unless everyone separates their spam from their real mail and makes appropriate filters and rules -- which they won't do.

It's a tough problem, and there's no one solution that can do the whole job. A well-trained Bayesian along with an RBL like Spamcop can get about 80% of them.

Re:Greylisting no longer works by LurkerXXX · 2007-04-11 05:16 · Score: 1

Graylisting is worthless? Umm, no.

It's certainly not perfect, but it reduces the load on my spam-filter. A *lot*. More than 90+% of smtp connections don't make it through spamd here. I hardly call that worthless.

Last year it was more like 99+%. Here's some stats from someone else last year: http://undeadly.org/cgi?action=article&sid=2006021 7105149
Re:Greylisting no longer works by raddan · 2007-04-11 05:44 · Score: 2, Interesting

It doesn't work? Maybe you should tell that to my 300-strong userbase!

I'm certain that there are differences in implementation between different greylisters. I've never tried Postfix's, for example, because OpenBSD's works fine for me. A small point wrt to OpenBSD's spamd: you actually need to try thrice. The first time you're rejected. The second time you're marked as OK, but still rejected. The third time you get through. Maybe it's the third time, or some of the time limits, or some other things that spamd is doing (BTW, we do not use *any* blacklists), but it works great. I probably see a spam in my inbox once a month, maybe. The rest of my users who complain about the "spam" they're still getting are really getting email they've signed up for (listservs aren't spam, people!), in which case, it's usually just a simple matter of education.

I don't know where your greylisting system failed, but it works wonders for us. When I implemented it, I was a sysadmin rock star for a week. Who knew there were anti-spam groupies? Now it's back to picking the crud out of the VP's keybord ;^)

(You're spot-on about one thing though: defense in depth. That principle is in effect for EVERYTHING, which is why I want to administer electric shocks to our Mac users when they try to call the Help Desk.)
Re:Greylisting no longer works by Tipa · 2007-04-12 00:15 · Score: 1

Well, maybe I should have rejected them twice instead of just once. But I looked at those logs every day, and they would just call back. I was hoping for a slam-dunk when I added greylisting and didn't get that. Spamcop and other RBLs were the magic bullet -- they'd get nearly everything, and SpamAssassin would get most of the rest.

I didn't think of rejecting twice. So maybe that's the root of my issues with greylisting.
Re:Greylisting no longer works by Slashdot+Parent · 2007-04-12 03:35 · Score: 1

I made my own greylisting implementation because none of the ones I found did exactly what I wanted.

Mine is time-based, not rejection count based. In other words, if your IP isn't whitelisted, I do some tests on your IP to see how long you have to wait to get through.

First, I try to do a reverse DNS lookup on your IP. No result means I don't like your IP.

Then, I look to see if I can find your IP address anywhere in the reverse-DNS result (indicating a dynamic IP). If I find it forwards or backwards, I don't like your IP.

Then I look to see if your IP is based in North America. If not, you guessed it. I don't like your IP.

Then, as a last resort, I run rblcheck on your IP to see if your IP is listed in their default set of RBLs (DUL, SC, etc.). Listed? Then I don't like your IP.

Ok, now that I know if I like your IP or not, I can determine how long you must sit on the greylist until I let you through. If I like your IP, you have to wait 10 seconds (you can retry 1000 times, but you won't get through until 10 seconds have elapsed).

On the other hand, if I do not like your IP, then you have to wait 60 minutes. Again, you can retry as many times as you want during those 60 minutes, but you won't get through until 60 minutes have gone by. This way, the RBLs have 60 minutes to get your IP onto their lists if you're a spammer, and then SpamAssassin makes quick work of your email if you are spamming.

I think greylisting is a great tool, but only if you use it properly.

--
They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock

I can't tell from the write up. by khasim · 2007-04-11 05:15 · Score: 1

But I doubt that they have a hundred thousand systems that they'll be using to send the test spam.

A big part of the system I use at work is based upon IP addresses and rDNS. I block a HUGE amount of spam just by rejecting all connections from Comcast that aren't from their SMTP servers.

I know, some people want to run SMTP servers at home. But so far none of them have attempted to send email to my system.

So it really depends upon how they configure the test spam servers. Personally, I don't see this as being a very useful competition. But I may be wrong.

Boring. by bmo · 2007-04-11 05:16 · Score: 2, Funny

Couldn't we just have a contest where actual live spammers are fed to lions?

To quote Bill Mattocks...

"My sense of personal integrity is none of your concern."
-thus spake Walt "Pickle Jar" Rines
"I'm going to pound your balls flat with a wooden mallet."
-thus respondeth Bill Mattocks

Re:Boring. by Anne+Thwacks · 2007-04-11 05:23 · Score: 1

Mod parent up +10 Wonderful Idea

--
Sent from my ASR33 using ASCII

Cruel and inhumane by Anonymous Coward · 2007-04-11 05:26 · Score: 0

I certainly hope that after this senseless hunt, they'll re-release the poor SPAMs back into the wild where they belong.

Kobayashi Maru by Kozar_The_Malignant · 2007-04-11 05:40 · Score: 2, Funny

Find a creative and unique solution (cheat):

Hunt through CEAS conference hall
Find contest spammers
Drag spammers back to contest area
Spammers are beaten to death by audience
Win!!!
...Oh, wait, they weren't realspammers?
Sorry

--
Some mornings it's hardly worth chewing through the restraints to get out of bed.

CEAS Call for Participation by gvc · 2007-04-11 05:40 · Score: 1

Many of the questions asked here are answered in the Challenge Call for Participation

Or the overview talk that Rich Segal gave at the MIT Spam Conference.

The guidelines are scheduled to be finalized May 1.

Re:CEAS Call for Participation by SL+Baur · 2007-04-11 07:29 · Score: 1

Participants will compete in filtering a live 24-hour e-mail stream Looks like greylisting is acceptable.
Simulated user-feedback will be provided to train learning-based filters. And it looks like gmail-type filters are acceptable.

Good job guys. The results will be interesting to read.

Agile and evolutionary versus ergodic spam by goombah99 · 2007-04-11 05:41 · Score: 2, Insightful

The trouble I can see with a test like this is that's it's a static test. It assumes a key feature of spam which is not true. namely that the spam signature is constant over time or at least makes an ergodic assumption. The thing about spam is that it is evolutionary. Not only does it's signature vary but the spammers learn what is getting through and shift to sending more of that flavor.

To see why this matters consider two spam hypothetical spam programs. One blocks 99% of the test set spam but lets a particular form of spam comprising only 1% of the test set through. And contrast this with another program that is adaptive but to avoid false-postives has to err on the side of letting through 20% of the spam it flags (making it only 20% effective).

While the former method would smoke the latter in a static trial. in the real world spammers would just shift to exclusively sending the kind of spam that gets through the first filter.

To make this a real contest they should make it adversarial. Give the spam script a feedback signal on which spam is getting through and let it adjust it's mix of spam and chaffe to try to maximize the the rate it can push spam through (or bust the filter by chaffing to minimize the number of legit e-mails that survive).

--
Some drink at the fountain of knowledge. Others just gargle.

Re:Agile and evolutionary versus ergodic spam by gvc · 2007-04-11 05:55 · Score: 1

The trouble I can see with a test like this is that's it's a static test.

No it isn't. Hence the name Live Spam Challenge.
Re:Agile and evolutionary versus ergodic spam by goombah99 · 2007-04-11 06:05 · Score: 1

No you are mistaken I believe. The term "live" is meant inthe sense of real time and sequentially deliveres spam. An on-line test. Not a test where one has the entire corpus of spam to train and filter. But the spam signature waveform is, unless I'm wrong, not going to be reactive to the filters. I'd even bet that all filters will be delivered the same message sets for ease of comparison. I doubt the spam will evolve it's signature in an intelligent reactive manner to evade the filter. But that's the hallmark of real spam--it not only varies but it adapts.

--
Some drink at the fountain of knowledge. Others just gargle.
Re:Agile and evolutionary versus ergodic spam by gvc · 2007-04-11 06:42 · Score: 1

I meant live to mean that the spam was captured and delivered in real time. If one or more spam filters adds the spam to Razor, or an RBL, or whatever, that'll be observable -- by spammers and filters alike.
Re:Agile and evolutionary versus ergodic spam by martin-boundary · 2007-04-11 13:26 · Score: 1

This contest is testing filters on a live short window of time. What you want has already been done many times in the past (look up the work done by NIST for example).
In the past, filters have been tested on spam data collected over literally a year or more, which captures the natural variation of the spam stream. Note that in these tests, filters aren't given the full dataset immediately, they have to learn the new spam patterns as the test progresses. That's what you're talking about, and it's been done (your other idea of giving direct feedback to a spam source on what works and what doesn't is meaningless, as real world spammers don't get feedback from individual filters either).
Re:Agile and evolutionary versus ergodic spam by goombah99 · 2007-04-11 17:17 · Score: 1

Every thing you say is completely wrong.
This contest is testing filters on a live short window of time. What you want has already been done many times in the past (look up the work done by
NIST for example). I'm sorry but you have utterly misunderstood what I was saying or you don't understand the reference you linked to. The reference you link to is an on-line tracking filter for spam. The spam itself can vary or not, but it is not co-evolving in response to the filter itself which is what real spam does.

In the past, filters have been tested on spam data collected over literally a year or more, which captures the natural variation of the spam stream. Now I'm certain you don't understand the difference between spam varying and spam co-evolving. In simple terms the first is game theory when you opponent does not change his strategy in response to yours. The second is game theory when the opponent adapts to changes in your strategy.
your other idea of giving direct feedback to a spam source on what works and what doesn't is meaningless, as real world spammers don't get feedback from individual filters either No that's not even wrong. Just about All spammers do is see what works. They stop using strategies that no longer work. It's not hard at all for them to test what is working. three techniques
1) look at the response rate to the ad as it varies with modality of the spam delivery
2) include a tracking gif. A certain fraction of people have html mail so you get a response.
3) open a gmail account and spam yourself to see what gets through.

--
Some drink at the fountain of knowledge. Others just gargle.
Re:Agile and evolutionary versus ergodic spam by martin-boundary · 2007-04-11 19:09 · Score: 1

I understand perfectly your point and simply disagree.

The spam itself can vary or not, but it is not co-evolving in response to the filter itself which is what real spam does.
There is no such thing as realtime coevolving spam in response to the filter. A filter doesn't give feedback to a spammer. There is no direct information path from the decision taken by a filter and the subsequent decisions taken by spammers on future spam campaigns. To believe there is is like believing in the tooth fairy.
The best you can argue along those lines is that feedback occurs indirectly over weeks and months, through market forces. That's true to some extent, but is meaningless in a real world filtering test, because no single organisation operating a spam filter controls those market forces which might influence the spammers.
The other thing you can argue is that spammers might install standard filters and obtain feedback through test runs in that way. While this does occur, it's trivial that if you give the public an identical copy of your filter plus settings, anyone can eventually bypass it. What is there to test?

Now I'm certain you don't understand the difference between spam varying and spam co-evolving. In simple terms the first is game theory when you opponent does not change his strategy in response to yours. The second is game theory when the opponent adapts to changes in your strategy.
Once again, you have the wrong premise. There is no direct adaptation path from filters back to spammers that might be worth adding into a test of single filters. It's all indirect market forces on a global scale which, while interesting, has no relevance to testing the performance of a single filter against another.

Just about All spammers do is see what works. They stop using strategies that no longer work.
That's incorrect. The cost is so small to the spammer that it's perfectly acceptable to continue to do things which no longer work. And if you look at spam message corpora, you'll see that while new techniques do appear, the old techniques don't actually disappear, they are just diluted in the mass of extra messages.

It's not hard at all for them to test what is working. three techniques 1) look at the response rate to the ad as it varies with modality of the spam delivery 2) include a tracking gif. A certain fraction of people have html mail so you get a response. 3) open a gmail account and spam yourself to see what gets through.
Right.
1) is related to global market forces, good luck deducing from this how well a given filter is blocking the spam arriving towards it. It's certainly not a basis for the scientific comparison of two filters.
2) If you need to assume that security holes are available in a filtering setup, how will this lead to a meaningful test of two filters? Tracking gifs are in the same category as client side scripting security issues, btw.
Let's say you model a filtering test with 20% security holes so that you can claim that spam will adapt over time in response to the tested filter(s). How is this interesting? Will you report that filter A has half the error rate of filter B as long as the organisation deploying it makes sure that they have 20% security holes to allow the spammers to adapt to their filter? I don't know who might be interested in such a study.
3) There are really two extreme cases (and weighted combinations thereof). Either the spammer's account on gmail is personalized, in which case having an account there is useless for direct feedback purposes, or gmail uses the same filter for everybody, which is frankly a stupid thing to do.
Is it interesting to test filters with the modeling assumption that one or more users are spies for the spammers? That's pretty unconvincing for a generic filtering test. It might have value if yo

On ESPN... by vjmurphy · 2007-04-11 05:42 · Score: 1

"This ought to be a sweeps week television spectacular."

Is there an ESPN 6 or 7 cable channel? I'm thinking this is below Cheerleading and Dog Agility, but perhaps above Lumberjack competitions.

--
Vincent J. Murphy
Spandex Justice

Isn't this already on TV? by Minwee · 2007-04-11 05:43 · Score: 2, Funny

"This ought to be a sweeps week television spectacular."

I think that it already is, but it's only on in Japan and uses real SPAM.

Visions of tennis ball machine gone.... by zippoiii · 2007-04-11 05:45 · Score: 1

Sigh. And i had such hopes. Pictures of a team of people, with a spam and tennis ball loaded tennis ball launcher at the other end of a court. When something gets fired at you, determine if you should let the ball go by, or wack the spam from the air. Alas, it's not to be. Dan

Flawed by lazarus · 2007-04-11 06:02 · Score: 2, Informative

"This ought to be a sweeps week television spectacular."
This ought to be ignored as the contest is flawed.

"Ha ha, silly admin. My money's on greylisting."
They're sending a stream of spam from where? Sounds like a real mail server...

From TFA: "Live email stream, delivered by standard protocols (SMTP, IMAP, POP)"
[One wonders how else they would deliver e-mail if it was not from standard protocols. I also wonder how they plan on delivering e-mail using POP... The mind boggles...]

In any case if I read this correctly this effectively eliminates anti-spam technologies which work on the premise that the spam is coming from illegitimate mail servers. One of these techniques is greylisting. Meaning, greylisting will not work. So if I were you, I wouldn't put your money on it.

GENERAL JUNK E-MAIL FILTERING RANT (You've been warned): If you're using an anti-spam technique which takes more cpu cycles to execute than it takes for the spammer to send the damn spam in the first place, you've already lost this war. In other words, as long as it's costing you more than it is costing him/her you will always be on the losing end of the deal.

And I would like to add that despite my post above, I agree with you that greylisting and its derivatives when properly deployed are excellent techniques for eliminating UBE. But I think this contest is engineered to ignore that fact.

--
I am not interested in articles about life extension advancements.

Re:Flawed by gvc · 2007-04-11 06:30 · Score: 3, Interesting

So here's the issue. If you are going to try to discriminate among filters using several thousand messages, you have to send them all the same messages. To send them the same messages you have to capture and redistribute them. You can pass on all the info from the capture, including all SMTP commands, but you can't do intrusive protocol probes. And since this is *real spam* you can't very well ask the sender to act in an obliging way by repeating its message and behavior for each participant.

I'd be very interested to hear of a design that would allow greylisting to be tested. The best I can come up with is to fail the message after transmission, then to try to simulate the behavior of the sender in response to this failure. But that would be catering to one very specific method of perturbing the protocol. And it would be necessary to do a fair amount of work to spoof the IP address presented to the participant filters.

For this reason, we chose to exclude all SMTP interactions, and simulate a second-in-the-chain filter appliance application. The reasons are practical, not policy.
Re:Flawed by lazarus · 2007-04-11 07:39 · Score: 1

Gordon,

Thanks for your response. I just sent your counterpart at IBM a lengthy probing e-mail about this which I can summarize as:

1. Real stream or fake stream?
2. Points for cost effectiveness?
3. Points for scalable/redundant architecture?

I applaud what you are doing and I wish you the best success (contests like this are good at stimulating inventiveness). I've been racking by brain trying to figure out how you could do this in a way that wouldn't be discriminatory. The best I could come up with would be to create a bunch of new domains (ceas-t1.org to ceas-t1000.org or whatever), then seed e-mail addresses on these domains with the spamming community. During the contest you insert legitimate e-mails into the stream by sending them from previously-undisclosed servers. The problem is how do you gauge success if you don't know what junk e-mail has been sent to the domain. You can't relay it because that instantly makes the test invalid. This technique would require previous statistics and a long lead time. Even then it would be possible for rival competitors to sabotage other entrants tests if you knew the domains being used...

Alternatively you would have to use an existing domain with known stats and perform the contest in a sequential fashion. Again, this becomes very time consuming and there are risks associated with doing it fairly. In short, I cannot think of a good way to do this.

I have developed a technique called GDSA which is quite effective, scalable, cost effective, but which in part relies on spammers needs to remain anonymous. This technique will not work in your contest (despite its effectiveness in the real world), and I will be unable to enter (unfortunately).

That said, it is easy to criticize, but difficult to be constructive. If I can think of a legitimate technique I will let you know.

Thanks.

--
I am not interested in articles about life extension advancements.
Re:Flawed by Thundersnatch · 2007-04-11 09:18 · Score: 1

GENERAL JUNK E-MAIL FILTERING RANT (You've been warned): If you're using an anti-spam technique which takes more cpu cycles to execute than it takes for the spammer to send the damn spam in the first place, you've already lost this war. In other words, as long as it's costing you more than it is costing him/her you will always be on the losing end of the deal.

Yes, the spammer will always win, since his CPU cycles and bandwidth are free. But those costs don't matter at all.

Bayesian and other resource-intensive spam filtering techniques are popular because they save people's time, which is far more expensive and valuable than CPU cycles. So you want your filter to catch the most spam. But false positives cost far more in people-time and lost opportunity than a missed spam, so reducing those is nearly as important as catching spam.

RBLs, SPF, greylisting, and most other protocol-oriented filtering techniques have comparatively high false positive rates. They also make recovering from false positives very difficult. A false positive will likely go unnoticed by the sender if they don't to recognize the bounce message (there's lots of "bounce spam") and take appropriate action to call by phone or use a website form or whatever. And then mail administrators need to get involved and whitelist the addresses.

Most businesses, including mine, find a high false postive cost unacceptable. So we accpet delivery of just about everything, and stick it in a "junk mail" folder if the content and protocol filters really, really don't like it. Yeah, it costs us in CPU cycles, storage, and bandwidth, but it's the best tradeoff at the moment.
Re:Flawed by richi · 2007-04-11 21:18 · Score: 1

Quite.

Fair comparative testing of spam control technologies is extremely difficult -- by some measures, it's impossible. Because some promising filter techniques rely on examining the real-time behaviour of the sending machine, it proves tricky to provide the exact same stream of email to all the filters at the same time.

For example, some filters attempt to fingerprint the sending machine's operating system -- the idea being that, say, a Windows 98 PC has no business submitting email direct-to-MX.

More at richij.com
Re:Flawed by raddan · 2007-04-13 03:09 · Score: 1

OK, so for the purposes of this contest, which does not model the real world, greylisting does not work. So what's the purpose of the contest, then?

Here's why greylisting will continue to work in the real world:

1. If a spammer adopts RFC-compliant mailers, greylisting will prevent them from pumping out huge numbers of mails. They will have to burn CPU cycles on their end in order to push mail through. This increases the cost of sending mail, and reduces their margins since they will be hitting fewer hosts with the same resources.

2. If a spammer stays with RFC-defying mailers, hosts with greylisting won't get the spammer's mail. The strategy here would be for a spammer to adopt mailers that detect greylisting early, get out, and move on in order to keep successful mail delivery high.

Obviously, we try to practice defense-in-depth. We use greylisting, Bayesian filtering, heuristic filtering, distributed checksum clearinghouses, and probably something else I'm forgetting about. Greylisting makes the biggest impact for us.

To answer my own question above, if the purpose of the contest is to find new approaches, then great. But I think greylisting will continue to work as long as the good guys stick to the RFCs.
Re:Flawed by gvc · 2007-04-14 02:52 · Score: 1

So what's the purpose of the contest, then?

First, the contest will establish a baseline against which greylisting may be compared. It is much more difficult to measure false positive and false negative rates for intrusive techniques like greylisting and challenge-response. Too difficult to be done in an open competition. But the open competition can show what other techniques can do, and then there will be some onus on the greylisters and challenge-responders to show that their techniques really are a value-add. It would be feasible, for example, to compare the best approach from the open competition against a select few greylisting etc. techniques.

Second, spam filters cannot always be deployed upstream of the first SMTP relay. This test models the situation in which the filter is immediately downstream from a trusted relay. In that situation, the filter has access to all the SMTP information from the sender; it just doesn't have the opportunity to perturb the SMTP session by, for example, issuing a false error message.

Third, many people and organizations find the delay, intrusion and risk of methods like greylisting to be unacceptable. I'm not sure that I would consent to replacing or modifying my well-tested "blah.com" mail gateway server to summarily bounce messages. At least not without very strong evidence as to the reliability and efficacy of the replacement, now and in the long run. "I think greylisting will continue to work as long as the good guys stick to the RFCs" wouldn't be enough of an assurance for me, were I CIO of blah.com

I wonder how they deal with pseudo-spam by grahamsz · 2007-04-11 06:02 · Score: 1

I know I've removed myself from a few mailing lists by simply having gmail count them as spam.

These aren't really spam, they are companies that I did business with once and can't be bothered to find my username and password to change my email subscription settings. But gmail seems to happily block everything else from that sender without my interaction.

Surely other users do want these particular emails so there must be some kind of per user dynamic as well.

I got a better idea by Indy1 · 2007-04-11 06:03 · Score: 1

Issue hunting permits for the spammers themselves. Whoever wastes the most spammers, wins.

Evidence of wasted spammers can be in the form of complete heads, or ears.

--
Lawyers, MBA's, RIAA? A jedi fears not these things!

So by Anonymous Coward · 2007-04-11 06:08 · Score: 0

The loser will have to wear a dress to the after-contest party?

how to finish [Re:Spam Rage Rampage] by nil0lab · 2007-04-11 07:17 · Score: 1

> A couple of years ago, I wrote a prototype for a video game called "Spam Rage Rampage"
> -- a first-person shooter where you roamed a Tron-like world, killing spam zombies and
> rescuing real people (== legitimate mail) while you searched for clues to the location
> of the nefarious spam kingpin, Ospama Bin Sendin. Each zombie represented a different
> class of spam... prostitute zombies for porn, business-suited zombies for stocks,
> pharmacist zombies for pill ads, etc.
>
> Upon seeing a demo, one of my friends commented that I should hook it up to a real e-mail
> inbox, so you could kill your own spam messages, perhaps even in real time....

Um, doesnt the system already have to know whether messages are spam or not?

> Unfortunately I have never had the time to complete it... maybe after the kids are out of the house.

Release the code under GPL with a couple screenshots of the demo and I'm sure
others will finish it for you! It's a cool enough idea...

i would rather see... by ushering05401 · 2007-04-11 07:23 · Score: 1

them train as many ignorant users to catch spam as possible in the alloted time and be judged on how well the users did.

From the not-from-a-dept dept. by etherlad · 2007-04-11 08:02 · Score: 1

Relevant to nothing, but this is the first time I can remember seeing an article on /. without the requisite department tag in the story header.

Anyone want to try their hand at making up their own?

--
Soylens viridis homines es

Global greytrapping by davidwr · 2007-04-11 08:09 · Score: 1

How's this for a plan:

Seed a few thousand fake email addresses all across the net. Put some on big sites. Put some on small sites. Put some on USENET. Change half the list every month.

If anyone emails two addresses with similar content, the content and the originating IP addresses get marked as likely spam, and used for realtime blackhole-list systems. The more of those fake addresses it hits in a short period of time, the greater it's spammishness.

--
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.

Catch and Release? by fractilian · 2007-04-11 08:51 · Score: 1

I hope its not a Catch and Release internet stream.

--
"The universe is my dwelling place and my house is my only clothes! Why are you entering into my pants?" - Liu Ling

How to test against spam that isn't REAL spam? by necro2607 · 2007-04-11 08:57 · Score: 1

Okay, here's the first question I have, and I'm sure many others wonder the same. How will spam be combatted when it's not real spam? For example, Spam Assassin checks actual mail server names and addresses to see if they are on known spammer lists and so on. Won't extremely useful/effective features like these be overriden by the fact that these spam emails are intentionally sent and won't be from any known spam-relaying mail servers??

Re:How to test against spam that isn't REAL spam? by gvc · 2007-04-11 09:17 · Score: 1

The mail messages will contain header information from which the sending IP may be derived. Of course, spammers try to forge this info, but the most recent header is guaranteed to be correct.
Re:How to test against spam that isn't REAL spam? by necro2607 · 2007-04-11 11:13 · Score: 1

That's exactly what I'm saying. Since these "contest" spams will be from the contest organization (as in, not from actual spammers), I would imagine they won't have the headers that indicate the mails were from spam-relaying servers out there on the net. So how are contestants supposed to use filtering-based-on-host-IP measures in their spam filtering application??
Re:How to test against spam that isn't REAL spam? by gvc · 2007-04-11 11:56 · Score: 1

I don't think you understood the parent. The messages are from actual spammers, not from the contest organization. The spams are merely relayed, and they are relayed accurately.

Just dump unsolicited email with URLs in them. by iamcf13 · 2007-04-11 09:26 · Score: 1

Problem solved.

Now get people and free email services like Hotmail and Gmail to turn off their URL signatures in the bottom of their outgoing emails and you will stamp the spam email menace out in one bold stroke.

Moves the spam back to USENET which is already spammed-out already.... :P

If people you don't know want to start a meaningful email conversation with you, they WON'T try to get you to visit the URL of some 'paysite' contained in their email.

Then something has to be done about spammers bouncing their crap 'back' to their victims as 'undeliverable email'. Can something be done about that without too much overhead or breaking the SMTP protocol?

Extend the URL-in-unsolicited-email ban to email addresses and you can quash '419 spam' as well.

Slashdot CAPTCHA: forging -- how apt!

If what you say is true... by tknd · 2007-04-11 10:09 · Score: 1

then what happens when I click on the NOT SPAM button... *snicker*

Eh, you're lucky to count your spams by the week! by blanne · 2007-04-11 12:34 · Score: 1

I wish my Gmail account was like that. Maybe you're new to Gmail. I get several spams in my inbox per week. I wish my Gmail account was like that. Maybe you only have geeky friends that don't forward your address to the nearest known spammer. I get several spams in my inbox every day. In plain english, even. With lovely "Hot pictures of paris hilton nude" and everything.

Perhaps this is because I have email forwarded to my gmail account from several other accounts. And those accounts are probably not in the same spam batches as all the gmail recipients. So maybe I'm really lucky and get the latest spam before the rest of you gmailers, even without paying for a subscription!

Yes yes, of course I do my duty and mark those 0-day mails as spam, even though I don't seem to get anything out of it myself...

Error rate (false positives) isn't the whole story by InakaBoyJoe · 2007-04-11 14:04 · Score: 3, Insightful

From TFCFP (call for participation):
Filters will be evaluated based on a weighted combination of the percentage of spam blocked and its false positive percentage.

From a theoretical standpoint, a low false positive average over an entire set (like <1%) might seem okay, but that doesn't take into account what's important to users.

Take, for example, a message from a long-lost friend, whose current address isn't yet in your whitelist, and who would have no other way of contacting you should the message get spamboxed. Here's an example of a message that's important to a user but gets lost among the everyday messages when simply talking about the percentage of false positives.

There's lots of other examples, too -- if you run your own domain, your messages are likely to be spamboxed, etc. Furthermore, the lower the false-positive rate, the less likely a user is to actually *check* their spambox, thus making a single false-positive even worse.

Microsoft's own Hotmail, of course, is notorious for spamboxing messages like that. And yet the conference is being held at Microsoft, and Microsoft's own spam researchers proudly touted their system in the February 2007 Communications of the ACM.

Something tells me the leaders in the field are sort of missing the point. Simply bringing down the aggregate false positive rate is *not* enough. The measure needs to take into account how often the user actually misses information that's important to them.

Re:Error rate (false positives) isn't the whole st by gvc · 2007-04-11 18:57 · Score: 1

a low false positive average over an entire set (like <1%) might seem okay, but that doesn't take into account what's important to users.

A 1% false positive rate is not OK. The good systems will misclassify at most a couple of good emails per thousand, and the vast majority of those will lie in the grey area between ham and spam. A few will be internet transactions -- sign-up messages, receipts, and the like -- and a vanishingly small number will be personal communications.

Microsoft's own Hotmail, of course, is notorious for spamboxing messages like that. And yet the conference is being held at Microsoft, and Microsoft's own spam researchers proudly touted their system in the February 2007 Communications of the ACM.

Only two of the three authors are from Microsoft, and they are from Microsoft Research, not Hotmail. The methods described are not in particular those deployed at Hotmail. None of the organizers of CEAS 2007 or the Live Spam Challenge is from Microsoft. The Microsoft venue is convenient and economical and made available to CEAS as a courtesy.

Something tells me the leaders in the field are sort of missing the point. Simply bringing down the aggregate false positive rate is *not* enough. The measure needs to take into account how often the user actually misses information that's important to them.

The data will be available to participants to do whatever post-hoc analysis they like. They may, for example, wish to classify misclassified mail into genres, as discussed here.:

Caution should be exercised in treating ham misclassification as a simple propor- tion. Extremely large samples would be needed to estimate it with any degree of statistical confidence, and even so, it is not clear what effect differences in proportion would have on the overall probability of catastrophic loss.

But if you're going to rank systems you need some sort of simple summary measure and the logistic mean of false positive and false negative rates works pretty well. Have a look at TREC 2005 or TREC 2006 summary results, for example.

Spamassassin scored -1.3 by Slashdot+Parent · 2007-04-12 03:11 · Score: 1

I thought your question was intriguing, so I composed the following message:
Subject: Interesting phenomon related to Viagra use Hi, Dr. Smith- I just wanted to write you to let you know that I really enjoyed the article you wrote in the New England Journal of Medicine about the side effects of Cialis, Viagra, and Levitra. It turns out a patient of mine experienced debilitating nausea while on Levitra, so I prescribed Viagra in its place, as you recommend. In addition, I thought you might be interested to know that this patient suffers from Raynaud's disease, and he reported a 50% reduction in the frequency of his attacks after switching from Levitra to Viagra. Curious, I found an article in PubMed detailing this phenomenon and I thought I'd pass it along to you. I hope your knee is healing up nicely, I'm sure you can't wait to get back on the tennis court. Best Regards, Dr. Gerald Jones

Then, I sent it from my business email to my personal email, which is protected by SpamAssassin. The results were as follows:

X-Spam-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00,DRUGS_ERECTILE, HTML_MESSAGE autolearn=no version=3.1.7-deb

I knew that the email would likely be delivered because, at the very least, the AutoWhiteList would knock the score down based on the low scores of previous messages. What I found to be pretty remarkable was that the Bayesian classifier scored the email at a 0 despite my gratuitous use of erectile drug names both in the subject as well as the body.

Conclusion: The work done by the SpamAssissin folks is top-notch and should be recognized as such. Thank you SpamAssassin for making my email bearable.

--
They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock

Why Not Use Both? by Slashdot+Parent · 2007-04-12 03:15 · Score: 1

Ha ha, silly admin. My money's on greylisting.

Why not use both?

I use both, and I have to say that greylisting catches a metric boatload of spam. On the other hand, spammers have wised up and many are now retrying.

Sure does take a lot of load off of spamassassin, though.

--
They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock

Exploiting This by Slashdot+Parent · 2007-04-12 03:20 · Score: 1

I agree, and was pondering how to exploit that fact. I couldn't think of a good answer, so I decided to just let the Bayesian classifier figure it out for me.

I use a routine that can quickly determine the origin country of an IP address and just insert that origin country into the headers of the message in an X- header. Then, it's just one more thing for the Bayesian classifier to decide what to do with. It realizes that I don't get much ham from Latvia, so when it sees X-Origin-Country: Latvia, that spam probability goes through the roof.

--
They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock

Slashdot Mirror

Live spam-catching contest at CEAS

126 comments