Spamassassin Beats CRM-114 In Anti-Spam Shootout

← Back to Stories (view on slashdot.org)

Spamassassin Beats CRM-114 In Anti-Spam Shootout

Posted by timothy on Tuesday June 22, 2004 @03:24PM from the hawaii-alaska-and-utah dept.

Simon Lyall writes "A new study of antispam software shows that Spamassassin performed well in various configurations along with Spamprobe , Bogofilter and Spambayes also came out good while CRM-114 failed to live up to its previous claims . The study shows: 'The best-performing filters reduced the volume of incoming spam from about 150 messages per day to about 2 messages per day.'"

20 of 330 comments (clear)

Min score:

Reason:

Sort:

Invasion by artlu · 2004-06-22 15:31 · Score: 1, Insightful

I must admit that I am not upto date on these new anti-spam software packages, which operate on the server side. However, what is the probability of real mail getting rejected by these things. It seems almost like an invasion of privacy to block my own email even if it is from a "benevolant big brother" perspective.
I guess that is why there are privacy policies though.

aj

GroupShares Inc. - A Free and Interactive Stock Market community!

--
-------
artlu.net
Re:Best anti-spam code by britneys+9th+husband · 2004-06-22 15:36 · Score: 0, Insightful

How exactly does the US (or other first world country) go about writing a code of law that puts Nigerian spammers in jail?

--
Hear recorded Slashdot headlines on your phone! New service beta testing. Just call (248) 434-5508
Re:Real way to block spam by PornMaster · 2004-06-22 15:39 · Score: 2, Insightful

Is to do away with current email protocols and go with new ones with verification. That should take care of the problems. The gov is now concentrating on this.

Except for making a new standard that's a requirement for doing business with federal agencies, just what do you think government's capable of doing regarding replacing protocols?

-PM

--
500GB of disk, 5TB of transfer, $5.95/mo
Re:in related news by bigberk · 2004-06-22 15:42 · Score: 4, Insightful

Content-based spam filtering is a waste of time. . . RBLs WORK
But content-based filters can very accurately determine what is spam and what's not, and so they can feed RBLs/DNSBLs. Let real spam to real user accounts form the blocklist! One such project is WPBL.
Isn't Human Accuracy always 100% by PetoskeyGuy · 2004-06-22 15:43 · Score: 4, Insightful

From the CRM-114 site...

News Flash: As of Feb 1 through March 1, 2004, 8738 messages (4240 spam, 4498 nonspam), and my total error rate was ONE. That translates to better than 99.984% accuracy, which is over ten times more accurate than human accuracy

Maybe I'm missing something human accuracy always going to be 100%? I tell the computer what is spam, it learns. I may decide that regardless of what it thinks, this last message is OK. So aside from clicking too fast or changing your mind (which is a common thing to do) how can a filter ever suggest it is be better then people at deciding what people want to see?
1. Re:Isn't Human Accuracy always 100% by sholden · 2004-06-22 15:50 · Score: 4, Insightful
  
  People make mistakes.
  
  Yes, given one message to classify as spam or ham you are going to get it right 100% of the time.
  
  Given 8000 messages to classify the wonders of boredom is going to mean you make a mistake every so often (not an "oops I clicked the wrong button" mistake, but an "oops I put it in the wrong folder because the subject looked spammy and I couldn't be bothered checking the body" mistake).
  
  In practice though, those stats on human accuracy are provided by having one person classify email that has been classified by others - which of course means some of the mistakes in fact be disagreements...
Re:in related news by plasm4 · 2004-06-22 15:43 · Score: 2, Insightful

filtering tools work fairly well, but more importantly they work right now. Waiting for the authorities to "wake from their slumber" might take years, if it ever even happens.
DSPAM by More+Trouble · 2004-06-22 15:48 · Score: 4, Insightful

In real world deploys of statistical filters, something like DSPAM's "global user" feature is necessary. The ability to begin with a relatively mature dictionary is critical to the user experience. Personally, DSPAM is filtering around 200 SPAMs per day for me, allowing one through every few days. It's 99.985% effective for me.

:w
No, REAL MEN... by Dimensio · 2004-06-22 16:16 · Score: 2, Insightful

...hammer the spammer's ISP with complaints until the advertised website is DEAD, DEAD, DEAD.

--
STOP MISUSING APOSTROPHES, YOU MORONS!!!
I'm running SpamAssassin at work. by khasim · 2004-06-22 16:21 · Score: 4, Insightful

People LOVE it.

There are some false positives and some false negatives.

But I have it set to delete anything 12+. That gets rid of the worst of the worst spam. So far, not a single complaint of any email being deleted.

Everything else has the subject re-written so people can run their own rule set against it.

In the past 8 hours
1867 messages received
375 messages deleted
1266 messages flagged as spam

So, only a few hundred actual, good emails.

Of course, that's only 4 hours during the regular work day (and 4 hours after work). But you can see the proportions. It saves people a TON of time.

And it makes them happier when they don't have to constantly dig through crap to see if any real messages have arrived.

Now, those spam messages are NOT distributed evenly. Our HR manager had her email address posted on the website. So she gets about 20-25% of the spam.

It's not exactly Big Brother 'cause no human sees the deleted spam.
1. Re:I'm running SpamAssassin at work. by Robmonster · 2004-06-22 20:57 · Score: 2, Insightful
  
  So far, not a single complaint of any email being deleted
  
  How do they know they are missing any emails to complain about it?
  
  I had a recent argument with my email provider. They introduced blacklist filtering to eliminate the worst of their spam. In the process it also blacklisted some legitimate email. (The mails in question were Topic Reply notifications from a message board)
  
  I dont have a problem with filtering, as long as there is a way to review undelivered mails
  
  In my case I only realsied something was wrong when the mails I regularly recieved stopped being delivered. I went right up the admin ladder of the message board as I assumed the problem was at their end (after all, my mail provider was supposed to tell me about any changes they make to my mail settings)
  
  My mail provider eventually found the problem and amended the blacklist settings and all was fine. However, without them providing me with a method of finding out if any of my mail is being blocked I have no way of knowing if I am missing any further legitimate mails. Even something as simple as a notification that they blocked a mail, with the senders email address included would be enough.
  
  Spam filtering either needs to be done Client Side, as who better to judge which of my email is spam than me, or Server Side with a mechanism to view and check undelivered emails. Programs like K9 (http://keir.net/k9.html) work very well and are easily trainable. Mine runs at 99.5 % accuracy.
  
  If servers HAVE to delete mail that is intended for me then it should be at the strictest possible setting.
  
  --
  I have no sig yet I must scream.
2. Re:I'm running SpamAssassin at work. by sTeF · 2004-06-23 01:02 · Score: 2, Insightful
  
  I'm also running spamassassin, but i am absolutely not satisfied with the performance of it. how long does it take for your SA to scan one message? My mailserver is only a Athlon 600, but still this does not justify a few seconds hit per message.
  
  other than the performance, i'm really happy with SA.
Re:Why don't people use catch-all accounts? by FrenZon · 2004-06-22 16:26 · Score: 3, Insightful

Why don't people use catch-all accounts?

Because you will always have one main 'obvious' address - be it something that goes on your business card, or something you tell to people you meet. For example, I use glen at glenmurphy.com.

Now all it takes is one slip - someone you know to get a virus, whatever, and your address is 'out there' for the taking. Your only possible recourse then is to stop using that address, but for some people that's just not an option, and it's a just bit defeatist to sit there surrendering email address after email address.
Re:Issues with testing corpus by PlusFiveTroll · 2004-06-22 16:40 · Score: 2, Insightful

Not exactly fair.

Huh, since when did spammers start playing fair!. This is about winning, not software political correctness.

Also on the unbalanced dataset, I train my filter with spam corpuses that reflect my what I receive in my email. Many accounts receive 10 spams for every ham. The biggest thing that I've had to retrain on is receipts for airplane tickets, spamassassin seems to think they are spam the first time I receive them, and from the article, they had the same issues too.
Re:Why am I so Blessed? by dasmegabyte · 2004-06-22 17:18 · Score: 3, Insightful

Because you don't put it into wierd text boxes, you don't use newsgroups, you don't have any enemies, you don't have any domains, and you don't have it in plaintext on your website.

I do all 4. I get my share of spam. It's not a HUGE deal, but it made it worth my while to get a spam filter.

--
Hey freaks: now you're ju
Re:in related news by dubl-u · 2004-06-22 17:56 · Score: 2, Insightful

Content-based spam filtering is a waste of time. [...] It's a never-ending battle of updating filters and formulas.

I update my SpamAssassin config file once a year or so. This hardly seems burdensome. And generally my updates have to do with which RBLs it uses for assiging point values. Other than that, I use the defaults plus the Bayesian filter.

Since the filter self-trains based in part on the RBL scores, it autoadjusts to new spam. And if you have spamtrap addresses, you can feed those back in, too.

My setup is well over 99% accurate, with no false positives in months.

RBLs WORK.

Yes, and I use those, too. Some I use for outright rejection of connections, and some count toward the spamminess score. As soon as they get the URL-based RBLs working, I'll use those, too. Why wouldn't you use all the tools at your disposal?
Re:The Mozilla ThunderBird SPAM filter by norton_I · 2004-06-22 20:01 · Score: 5, Insightful

Better to do spam filtering with your MTA/MDA anyway, if possible. That way, the same filter is used no matter which email client you use from which computer. Plus, it means you don't have to download spams to your MUA when on a slow connection.

Now if only I could get the rest of my mail configuration to be shared between evolution, mutt, and squirrelmail.
Is SpamAssassin being counterattacked? by jcjewell · 2004-06-22 20:06 · Score: 2, Insightful

I've been getting spams lately that seem to be trying to get around the highly effective statistical solutions, such as SpamAssassin, that have been implemented. Spammers seem to be adding random, or possibly even carefully selected dictionary words to skew their statistical rating. Here is an example from the several I've received lately--has anyone seen information about this on /. or elsewhere?

[spammers irritating message snipped]

Thu, 17 Jun 2004 19:42:34 -0500

No Thanks

beatify

sacred atom drank deprecate cathodic thermionic sherman delinquent hanley swum wooster asteroidal bilayer haiti saudi wink bijective reserpine baronial gloss ambrose threadbare chianti predatory earmark bilingual angora palazzi chartres alveolar phosphate civet radish barricade diem laurie minutem! en crusty

camilla jade lineman bendix masonic dublin incontrovertible defecate generous buddhist yesterday endow bitten conley trunk pitchfork beret bloat gelatine dovetail gambia medea niggardly blackburn suey dialogue ilyushin anastigmatic berth abort bodied contractor of ridden embarcadero corset trademark

ID: W993gt72

carnation

constructor maltese bantam airfield pique douglas pungent criterion cloudburst illiterate sausage career stile pebble bonnie shim carbonium

magnesite pembroke abrade jogging dynast physiochemical stochastic sumac conference obtain villain midwinter incompetent eradicable madhouse airline antony household cursory instinctual gratuitous clown shaven des cornflower
Re:Quit acting like goddamn babies... by Technician · 2004-06-22 21:04 · Score: 2, Insightful

just a different button...

I assume you are not referring to the delete key. ;-) There is more to life than hitting the delete key.

--
The truth shall set you free!
Re:Why don't people use catch-all accounts? by sfe_software · 2004-06-22 23:05 · Score: 2, Insightful

Wait till the spammers decide to spam your whole domain.

That's exactly when I decided to disable the "catch-all" and allow only specific addresses. Some spammer sent several hundred identical messages, in a few hours, to made-up names at my domain.

Catch-all is no longer a good idea in my opinion...

--
NGWave - Fast Sound Editor for Windows