Spamassassin Beats CRM-114 In Anti-Spam Shootout

The Mozilla ThunderBird SPAM filter by k.ellsworth · 2004-06-22 15:30 · Score: 5, Interesting

the mozilla spam filter does a very good job too, when it learns enough it becomes over 95% acurate. i dropped evolution for it , and never looked back

--
Putting a windows cd backwards, plays evil messages, but it gets worse, putting it right, installs windows.

Re:The Mozilla ThunderBird SPAM filter by Cyb3rBull3ts · 2004-06-22 15:37 · Score: 2, Interesting

If you use the Mozilla TB spam filter with your ISP filter its near 99% accurate.

I have gone from a wopping 200 spam messages a day (a very old e-mail address) to the occational spam message once a week.

Leme do the math. 200*7 = 1400. 1399/1400 = 0.9992857 accruaccy. Not TOO bad :D
Re:The Mozilla ThunderBird SPAM filter by Mark_MF-WN · 2004-06-22 15:40 · Score: 3, Interesting

It works with IMAP too -- which is something most other spam filters aren't capable of.

I didn't RTFPDF... by john_smith_45678 · 2004-06-22 15:32 · Score: 3, Interesting

The best-performing filters reduced the volume of incoming spam from about 150 messages per day to about 2 messages per day.

How many false positives though?

--
John Kerry is a Joke!

I use two... by hkfczrqj · 2004-06-22 15:33 · Score: 2, Interesting

I use Spamassassin. Surviving mail then goes through CRM-114. At least in my case, it works better than each of the filters on its own.

Mozilla Messenger / Thunderbird Performance? by Mark_MF-WN · 2004-06-22 15:34 · Score: 5, Interesting

I wonder how Mozilla Messenger/Thunderbird's spam filtering stacks up against these filters? I've heard some negative comments about the Mozilla filtering system, but it's worked wonders for me.

Re:Mozilla Messenger / Thunderbird Performance? by dasmegabyte · 2004-06-22 17:00 · Score: 2, Interesting

From person experience, it works pretty well (I think Mail.App is good too, but the management of the junk once marked needs to be customized). But since it's not really a server side program, you can't run a server-side test on it. Hence why it wasn't included in this test.

Some anecdotal "evidence" for you: some of the users at my office run their own spam engines on their desktops because they're control freaks. I let them pass by SpamAssassin entirely. In my observation, SpamAssassin works WAY better. It cleans about 90% of the spam we get, whereas most of the add-on desktop clients I've seen are 60-70% effective. Meaning about every third email gets through.

Either way, I would never run an email address "in the wild" without some kind of spam software. Not any more. I resisted for YEARS, but when I started pulling up Squirrelmail...and the first three PAGES of mail were all spam missed by the (SLOWWWWW) Squirrelmail bayesspam plugin...I moved on to using only IMAP client apps with SOME KIND of spam detection built in.

--
Hey freaks: now you're ju

Real way to block spam by DRWHOISME · 2004-06-22 15:35 · Score: 2, Interesting

Is to do away with current email protocols and go with new ones with verification.

That should take care of the problems. The gov is now concentrating on this.

Why don't people use catch-all accounts? by mattkinabrewmindspri · 2004-06-22 15:44 · Score: 5, Interesting

When you register with a hosting company, very frequently, they set up what's called a catch-all account, and any email to your domain that's not addressed to a real address goes there. This is how I use it:

I only use my main email address with friends and family, and never post it online.
Whenever I post an email address or register for anything online, I put thatsite@mydomain.com as my email address.
All email is received by one account, but each message can have a different "to:" header. I set my filters to filter mail to different boxes. Email sent to amazon@mydomain.com goes to the amazon folder. Same with ebay, slashdot, whatever.
Any time I start receiving spam, I just set my mail server to disregard email sent to whatever email address is getting the spam, and I can stop doing business with the company that sold my email address.

I receive on average 0 spams per day.

--
Albuquerque PC

Re:Why don't people use catch-all accounts? by Anonymous Coward · 2004-06-22 16:54 · Score: 2, Interesting

I do that too. Works great (0/day). The problem is, unlike you, for my job, I have to have a public e-mail address.
I even got spam from the president of the univesity I work for. (Why spam, because it was a political response to a news paper article that had nothing to do with my job.) When I asked to be removed, I was told I couldn't opt-out, since I worked for the university. So I removed my e-mail address from the offical database. I was lucky. It got worse. I know five other people who did the same thing over the next few years. Our univeristy has a pro-spam policy (from a committee of course). Anyone who works at your level or above can spam the entire list below for any reason as long as they don't break any existing rules. I could sent three a day to thousands of people without breaking the rules. I'm not required to have an e-mail address in the offical database.
I can't remove my e-mail address from my webpage. I work with lots of people all over the world. I don't think that just because I need an accessable address that I should have to put up with spam. It's not like I'm going to buy from someone selling child-incest-porn e-mailed to a .edu account, yet I get that every month. I've never gotten a single UCE related to my job.
Your solution work great for you, but it doesn't work for me. I wish it did.
BTW, I don't use a catch-all. I only forward specific addresses (300 max). One day, you'll find that once they get your domain, you'll get e-mail for john@yourdomain.com, even though no one ever thought of that address. I have john@mydomain forwarded to uce@ftc.gov.
Re:Why don't people use catch-all accounts? by Anonymous Coward · 2004-06-22 19:46 · Score: 1, Interesting

In Postfix you can set it up, so all mail to user+anything@mydomain.com is send to user@mydomain.com. This way you still limit the damage from the directory attacks somewhat (you can catch them with some smart greylisting anyway). and you can still track the emails you use elesewhere.

Another data point. by juuri · 2004-06-22 15:45 · Score: 4, Interesting

OSX's built in mail seems to be pretty close to the accuracy numbers listed in the above summary. I tend to have one to three pieces of spam slip through which are almost always entirely image based with some poetry or equivalent attached.

I must say I've been pleasantly surprised with the spam filtering it provides and it has been a lot easier than the hoops I used to utilize to clean out my inbox.

--
--- I do not moderate.

No DSPAM by XMichael · 2004-06-22 15:50 · Score: 2, Interesting

It's unforchunately that DSPAM was left out of this very good quality report. I have personally used SpamAssassin, SpamProbe and DSPAM

After using each for a couple months at a time, I found DSPAM to be by far the most effective (after it was properly trained)

DSPAMS claim "DSPAM (as in De-Spam) is an extremely scalable, open-source statistical hybrid anti-spam filter. While most commercial solutions only provide a mere 95% accuracy (1 error in 20), a majority of DSPAM users frequently see between 99.95% (1 error in 2000) all the way up to 99.991% (2 errors in 22,786). DSPAM is currently effective as both a server-side agent for UNIX email servers and a developer's library for mail clients, other anti-spam tools, and similar projects requiring drop-in spam filtering. DSPAM has been implemented on many large and small scale systems with the largest systems being reported at about 125,000 mailboxes." was quite accurate for me

Also check out some priceless photos Priceless Photos

--
Gamblers Forum

the true cause of the majority of spam... by Etaipo · 2004-06-22 15:58 · Score: 3, Interesting

users. those silly, silly users. i was in charge of spam for my company for the greater part of a year. using an outdated KEYWORD based system > I was forced to read every.caught.message to look for false positives. ... did you catch that? yeah...i had to go through EVERY 'spam' tagged e-mail that went through the company. needless to say, after the first week i was ready to gouge my eyes out. but hey, at least i earned that 'i read your e-mail' sticker! anyways, the point that i'm failing to make here is the cause of the spam... the damn users. whether it be responding to spam, putting their e-mail address in every single webform they encounter while surfing instead of working, signing up for spam voluntarily, or whatever the cause may be.. i ran some numbers on the logs, and came to an astounding find. a few people were getting literally a thousand messages blocked, per month. i, on the other hand, had maybe one or two a month. and i'm not a nazi with my e-mail address....but i do take some care in what places i type it in. an ounce of prevention goes a long way folks.

SpamAssassin used to work but recently... by squisher · 2004-06-22 15:58 · Score: 3, Interesting

SpamAssassin used to be super-good for me, but recently it has become a nightmare... even with Bayes filters on and training it with about almost 2000 spam messages that have escaped it before, I STILL get an enourmous amount of spam every day... maybe I'm doing something wrong with the config, I admit that I haven't spent that much time on that, but it seems like it should be working better :-((.

Spam sucks. Everyone stop buying the products advertised and it'll be over. But then again, people will always be too dumb for an easy solution like that (reminds me of the gooback southpark...)

Issues with testing corpus by w_mute · 2004-06-22 16:00 · Score: 5, Interesting

I haven't read everything in detail yet, but one of the things that stands out is that their 'gold standard' representing the best result consists of 9,038 ham messages (18.4%) 40,048 spams (81.6%). While large, the dataset is unbalanced. One of the things that is recommended by many of the filters is training on equal proportions of ham/spam in order to prevent biasing (overfitting).

Their train on errors approach may simulate what goes on with some filters it doesn't reflect the scenario where there is a initial dataset to be trained on _before_ new messages are processed. Instead, each message is in essence 'new'. So in their tests the machine learning filters start out knowing nothing, but SpamAssassin starts out with its inbuilt ruleset. Not exactly fair.

-Greg

Re:Issues with testing corpus by dubl-u · 2004-06-22 17:49 · Score: 2, Interesting

So in their tests the machine learning filters start out knowing nothing, but SpamAssassin starts out with its inbuilt ruleset. Not exactly fair.

Perhaps for some definitions of "fair". That strikes me as a reasonable scenario for real-world use, which seems pretty fair to me.

why I don't use spam filters by Begemot · 2004-06-22 16:08 · Score: 2, Interesting

just my humble opinion...

i use email for business and receive many letters from clients. i just afraid to loose any of these because of a spam filter. therefore even when i used one, i checked all the emails anyway.

Re:in related news by Crudely_Indecent · 2004-06-22 16:24 · Score: 4, Interesting

I can certainly see how waiting on our government will decrease the number of messages transmitted through my mail servers daily.

It's reassuring to know that the "authorities" have effectively reduced the number of messages through my server by 10-14k per day......What great guys, those 'authorities', aren't they thoughtful and quick to respond. We've only been waiting for a spam-relief law for....10 years and they finally gave one to us. Oh wait....SpamAssassin is what reduced those messages.

The reason we don't wait for the gov to step in and take care of business is that THEY'VE DONE NOTHING SO FAR. You expect me to believe the government will solve my spam problems? I'm not holding my breath.

A combination of RBLs, DNSBLs, F-Prot, and SpamAssassin is what reduced the number of messages sent through my servers. I'm interested in results NOW, not legislation tomorrow.

--

"Lame" - Galaxar

I've been using SpamAssassin about 6 months by cool_st_elizabeth · 2004-06-22 16:29 · Score: 2, Interesting

And it has just now learned to filter out almost all the spam. IIRC, SpamAssassin said it would learn what to mark as spam after a couple hundred obvious spams and the same number of obvious non-spams. I still get the occasional false positive.

Re:Holy Shit.... by fdiskne1 · 2004-06-22 16:29 · Score: 2, Interesting

It's getting just plain rediculous. When I started keeping track about a year ago, the email filtering system I set up was blocking about 10,000 spams per week for just under 1500 users. Last week, it blocked over 170,000. That is an average of over 100 spams per user and the vast majority of my users don't get any at all. There are a couple dozen that get the vast majority of it. Of course, these are addresses that would be a major pain in the ass to change because of all the people that would have to be notified, and only if I could convince the user they want to. Of course, with this many users, I can't get a good grasp on the number of spams that make it through, but I do know it's enough to have several people continually complaining about it. It's just plain sickening all the resources and bandwidth that gets wasted. I use three different black-hole lists, so about 110,000 of those don't get any further than initial helos, but still. Disgusting. Bring on the protocol change. I've told everyone that I would be willing to work 24 hours a day for an entire weekend to implement a server and/or gateway that uses a new email protocol if it meant most spam would disappear.

--
But why is the rum gone?

Re:Okay, but what about... by dasmegabyte · 2004-06-22 16:48 · Score: 3, Interesting

Here's how you assuade false positives:

You keep one account for people who don't know you. You spam check that one. You put that on business cards, use it to sign up for porn sites, and post it on slashdot.

You keep another account for responding to email. You set that as your reply-to. You do not spam check it.

This way, there is a way to reach you for customers, clients and friends that will ALWAYS work. Call it the direct line. And, there's a way for people to introduce themselves to you. Call it the "front desk." Anyhow, with SpamAssassin (which includes a bayesian filter, btw, which can be autotrained to learn spam-like language from other mail it sets up), most of the bullshit calls will be correctly tagged and most of the incoming calls will get to you. I haven't had a false positive in months. But I train the thing like Rocky Balboa.

--
Hey freaks: now you're ju

Bayes SHOULD be better than vanilla SpamAssassin by khasim · 2004-06-22 16:55 · Score: 2, Interesting

For an INDIVIDUAL, Bayesian filter works far better than just the regular SpamAssassin rulesets.

That's because the Bayesian system will LEARN from you what you consider to be spam and ham.

I use SpamAssassin with Bayesian filtering turned on and it catches over 90% of the spam. But then I've fed it a decent sized corpus.

POPFile? by gmuslera · 2004-06-22 17:19 · Score: 2, Interesting

I'm using since months POPFile and it have an accuracy of 99.75% with 17k messages. Its not very dependant on the client, it just sit as a pop3 proxy, and it classifies mails in buckets that you can define (so no need to just split mail in spam/ham, for some time i even have categories for virus, nigerian-like scams, automated reports, etc).

Would be interesting to see how that message sample reacts against more spam filtering technologies, or even webmails with spam protection integration.

Re:POPFile? by puppetman · 2004-06-22 17:53 · Score: 4, Interesting

Yah, I ran this for about a year before I switched ISPs (and got a new, spam-free email account).

It was amazingly accurate, with about one mistake per thousand emails once I had it trained. I'll go back to it if I start to get a bunch of crap in my in-box. I remember reading that spammers would test their emails against the most popular anti-spam filters, but they still almost never got through Popfile.

I tried SpamAssassin as well, after I had some issues with PopFile (it would stop responding after a large volume of email), and it was more difficult to set up, and didn't have the nice configuration options of Popfile.

Re:I've had CRM114 running for a few months . . . by fferreres · 2004-06-22 17:47 · Score: 2, Interesting

Me too. I couldn't check email for about a week and grew 4200 or so spam messages and 300 ham ones. 1 spam misclassified...(but some false positives also).

I try to teach the program the least possible (if a message doesn't look like spam for me, even if it is though, I do not teach it).

I also delete de ADV: (prefix) in the subject and the crm114 spam metadata (TAG) and fix it in general so it doesnt get confused when learning spam.

Bad teaching at the beggining leads to lower quality filtering (I did this at the beggining, not cleaning tags amongh other mistaques).

I tryed spamassasing and got fed up. The rules system made Spamassassin pass as ham everything that spooed a PINE filter. WTF...I deleted the entry, then one day upgraded and voila, lots and lots of spam again. And accuracy was much lower (the PINE problem reproduced with a lot of other "whitlisting rules" that I never needed).

After a week with CRM114, I deleted spamassain preprocessing for my account.

--
unfinished: (adj.)

Re: SpamSieve by hondo77 · 2004-06-22 17:57 · Score: 3, Interesting

I'd like to second SpamSieve. If more than one piece of spam gets through in a day (where each day I receive > 500 pieces of email), I am truly surprised. My stats for June are:

1007 Good Messages
13729 Spam Messages (93%)
1 False Positives
24 False Negatives (96%)
99.8% Correct

Works for me. Oh, the false positive was a list that I just signed up for. They sent a confirmation mail, I checked to see if it was caught (it was), and marked it as "good". Piece of cake.

--
I live ze unknown. I love ze unknown. I am ze unknown.

Counterintuitive Advertising by KalvinB · 2004-06-22 19:36 · Score: 4, Interesting

Some guy a few stories back mentioned he was getting 3000 ad impressions and 15 clicks a day or so with AdSense. Which is terrible. At first I assumed he was just oversaturating his visitors with ads. But his ad placement is also terrible. It's at the very bottom of the page where few are going to see it. But he is also over saturating. His pages are very busy with information and the ads are on every single page.

What happens when you constantly shove something in someone's face is that they learn to ignore it. Either consciously or subconsciously. In the case of advertising if someone is shown an ad and they aren't interested and another ad is shown there's a very good chance they won't even notice it. Even if they would have been interested in what it was offering. This is because they were annoyed by the first ad so they just mentally block any additional ads.

This is why the response rate to spam is so terrible. People for the most part just subconsciously ignore it. It's just noise.

Advertisers like radio stations because it tends to be a captive audience. People are very unlikely to turn the station when ads come on. However there is one local station that I've learned to turn the channel on when the ads start because I know I'm going to get to my destination before another song comes on. There are other stations that I don't change the channel on because I know it's just a short break.

Just like the guy pumping out 2985 ads that no one clicks on, spammers would benefit immensly by pulling a large chunk of the ads. People are more likely to notice when they aren't bombarded by ads and the response percentage goes up.

It seems counterintuitive that less advertising means a greater response but that's actually the case.

I normally notice the ad banners on Slashdot because that's pretty much all the advertising there is. I rarely ever notice the text ads. Even though they're placed on the left side in the best position as anyone who scrolls the page is probably going to see them. Slashdot's problem is that the ads blend in with the web-site's color scheme too well so they're pretty much invisible to anyone with a scroll wheel.

On GameDev the site is so littered with advertising that I never notice it anymore. By the time I close the stupid popup ads that circumvent Google's pop up blocker using evil little tricks I'm too annoyed to even look at the other ads.

Web-sites get desperate and think more ads == more money. And the actual result is less valuable ad space because the click thru rate is so low and fewer clicks because users tune the ads out which results in less money than if they had focused on the click thru percentage rather than the number of impressions. If you have a web-site with a high click thru rate advertisers are more likely to pay more because they know that if they show an ad there's a very good chance they'll get a click thru.

But then I'm guess spammers have never taken a course in marketing or bothered to think about things from their potential customer's perspective.

Keeping ineffective ads visible hurts the effectiveness of the better ads. Spammers are in effect destroying themselves in that area. As are ad happy web-sites.

Ben

--
Work Safe Porn

DSPAM. by asackett · 2004-06-22 20:20 · Score: 4, Interesting

I've been using DSPAM for nearly a year now, and it's just kept on getting better. I can't imagine life without it now.

I have 17 DNS-based blacklists in front of it, because I would rather block the messages at the network interface than filter them with my own resources, but those that slip through don't stand much of a chance of reaching my inbox. I have had my current email address out there on the web and in Usenet for six years, so I see a lot of junk -- DSPAM stops all but one or two per month. SpamAssassin can't even come close to that.

--

Warning: This signature may offend some viewers.

Slashdot Mirror

Spamassassin Beats CRM-114 In Anti-Spam Shootout

29 of 330 comments (clear)