Spamassassin Beats CRM-114 In Anti-Spam Shootout

Correct link to CRM-114 by athakur999 · 2004-06-22 15:27 · Score: 5, Informative

CRM-114

The link in the article points to SpamBayes again.

--
"People that quote themselves in their signatures bother me" - athakur999

Re:Correct link to CRM-114 by magefile · 2004-06-23 01:49 · Score: 1

Maybe that's why it didn't work ... they didn't install it!

The Mozilla ThunderBird SPAM filter by k.ellsworth · 2004-06-22 15:30 · Score: 5, Interesting

the mozilla spam filter does a very good job too, when it learns enough it becomes over 95% acurate. i dropped evolution for it , and never looked back

--
Putting a windows cd backwards, plays evil messages, but it gets worse, putting it right, installs windows.

Re:The Mozilla ThunderBird SPAM filter by Cyb3rBull3ts · 2004-06-22 15:37 · Score: 2, Interesting

If you use the Mozilla TB spam filter with your ISP filter its near 99% accurate.

I have gone from a wopping 200 spam messages a day (a very old e-mail address) to the occational spam message once a week.

Leme do the math. 200*7 = 1400. 1399/1400 = 0.9992857 accruaccy. Not TOO bad :D
Re:The Mozilla ThunderBird SPAM filter by ImpTech · 2004-06-22 15:39 · Score: 2, Informative

Of course its pretty easy to hook spamassassin, bogofilter, or whathaveyou into Evolution. Tutorials abound if you search google. Thunderbird's nice, but IMO Evolution's still a bit nicer, so it was worth my time to plug in a spam filter manually.
Re:The Mozilla ThunderBird SPAM filter by Mark_MF-WN · 2004-06-22 15:40 · Score: 3, Interesting

It works with IMAP too -- which is something most other spam filters aren't capable of.
Re:The Mozilla ThunderBird SPAM filter by k.ellsworth · 2004-06-22 15:53 · Score: 1

as said some post down on this thread, thunderbird does spam filtering on IMAP accounts. i used to love evolution.. but MozTB is waaaaay better. is lighter, faster, smarter for many tasks (email related not groupware related)

--
Putting a windows cd backwards, plays evil messages, but it gets worse, putting it right, installs windows.
Re:The Mozilla ThunderBird SPAM filter by norton_I · 2004-06-22 20:01 · Score: 5, Insightful

Better to do spam filtering with your MTA/MDA anyway, if possible. That way, the same filter is used no matter which email client you use from which computer. Plus, it means you don't have to download spams to your MUA when on a slow connection.

Now if only I could get the rest of my mail configuration to be shared between evolution, mutt, and squirrelmail.
Re:The Mozilla ThunderBird SPAM filter by jelwell · 2004-06-23 07:18 · Score: 1

What is the rest of your mail configuration? Filters/rules? Use procmail. I have all my rules done through procmail, that way no matter if I'm using squirrelmail, Mail.app or crack cocaine, my filters/rules always get processed before I see my email.

Squirrelmail even has a plugin with an interface to edit your procmail rules: "procfilter"
Joseph Elwell.
Re:The Mozilla ThunderBird SPAM filter by CritterNYC · 2004-06-23 08:38 · Score: 1

the mozilla spam filter does a very good job too, when it learns enough it becomes over 95% acurate. i dropped evolution for it , and never looked back

Personally, I've found the Mozilla filter ineffective. Even with Thunderbird 0.7, training data reset, freshly trained on 5000 messages, I only see around 60 to 70%. Spambayes gives me far greater than 90%. Even though Mozilla's algorithms have improved greatly, its tokenizer still needs improvement.

--
Portable versions of Firefox, GIMP, LibreOffice, etc
Re:The Mozilla ThunderBird SPAM filter by cbreaker · 2004-06-23 11:54 · Score: 1

Although I agree, sometimes it can be difficult if not impossible to set per-user settings. One user might *want* all their mail, and another user wants none of the spam.

If only my Spamassassin/Postfix combo had better control for users to set for themselves.

--
- It's not the Macs I hate. It's Digg users. -
Re:The Mozilla ThunderBird SPAM filter by CuppaJoe · 2004-06-23 13:08 · Score: 1

This is why I use SpamAssassin- I'm not willing to give up Evolution yet and I need to use it on my client end. However, it works great for about a day after I train it. Then within two days later, I'm getting a hundred spam a day in my inbox. So I retrain again, no spam for about a day, and then it forgets everything it knows. Lather, rinse, repeat. I'm tired of spending more time futzing with my spam filter than I would spend just manually deleting the spam myself!
Re:The Mozilla ThunderBird SPAM filter by norton_I · 2004-06-23 13:20 · Score: 1

All my mail filtering is done by procmail and spamprobe.

What still annoys me are things like address books, signatures, SMTP server settings, "do not send html email" checkboxes, and special mail folders (Sent, Drafts, and Trash).

What I am looking for is something where I can sit down in front of any computer with a mail client installed, type in my imap server, username, and password, and all other settings will be set up automatically. Then, if I (for instance) add or change address book entries, it will automatically propogate to every other mail client I use.

Obviously not every configuration item applies to every clinet, but it seems the the majority of them could be standardized. Ideally, I would also like to be able to edit my mail filters from the IMAP client, rather than using a shell account, but that is really a minor issue for me. If I were to try to set my parents up with a system like that, they would need to have the equivelent of the mozilla or evolution mail filters, but have them control procmail on the server.

It is certainly a lot of work to get all that to work together, but IMHO, email is *the* internet application worth putting all the resources into, even more than WWW clients.

Invasion by artlu · 2004-06-22 15:31 · Score: 1, Insightful

I must admit that I am not upto date on these new anti-spam software packages, which operate on the server side. However, what is the probability of real mail getting rejected by these things. It seems almost like an invasion of privacy to block my own email even if it is from a "benevolant big brother" perspective.
I guess that is why there are privacy policies though.

aj

GroupShares Inc. - A Free and Interactive Stock Market community!

--
-------
artlu.net

Re:Invasion by Arial+Sharon,+10pt. · 2004-06-22 15:39 · Score: 1

Yes, there can be false positives, which is why suspected spam is usually moved to a different folder (rather than deleted) that users can check every now and again. Another approach is to insert an extra header to indicate the message's probability of being spam so that the user agent can selectively filter it.

Your privacy concerns are, as always, more complicated than the technology.

--
Am I dead yet?
Re:Invasion by Anonymous Coward · 2004-06-22 15:45 · Score: 0

They are called false positives. And you will find the study includes this side effect.

I suspect 'aj' is actually complaining since his spam stock tips are being blocked.
Re:Invasion by p2sam · 2004-06-22 15:56 · Score: 1

The point of automated mail sorting isn't about having 0 false negatives. It's about having a lower false negative than if YOU were to sit down and sort the hundreds of spam yourself.
Re:Invasion by Anonymous Coward · 2004-06-22 16:06 · Score: 0

If I was to sit down and filter all my mail myself, I would have 0 false negatives/positives simply because I decide what is spam and what's not. If you were to sit down and filter my mail for me then I would expect some false negatives/positives as your idea of what is and isn't spam may be different from mine. Automated systems are designed to aid in helping YOU decide which messages are spam, not deciding for you. This is the reason many consider "learning" systems the best.
Re:Invasion by halowolf · 2004-06-22 17:14 · Score: 1

must admit that I am not upto date on these new anti-spam software packages, which operate on the server side. However, what is the probability of real mail getting rejected by these things.
My ISP introduced SpamAssasin for their mail server, which each user could selectivly turn on and off on. If your mail was classified as SPAM by it it would insert a SPAM tag into the mail, which your local mail client could use to classify the mail and move it out of the inbox.
However I found that their configuration of it was too broad for it to be useful for my purposes. Most of my mail from mailing lists (HTML formatted, remove me links) would be classified as SPAM. Normally this wouldn't be such a bad thing as I still got my mail and I could train Mozilla to recognise this mail as legitimate mail.
However my mail looked like it was uuencoded before it was sent to me and all I got was a nice block of characters that I couldn't read. And its not like I was going to unencode every SPAM mail I wanted to read. So in the end it became absolutely useless to me.
Of course I contacted my ISPs support and was subsequently told that nothing was wrong and it was probably just me. Of course having actually done SMTP programming, I knew that was complete bollocks, but there was no point arguing any further. They weren't interested in my problem. I havn't used it since.
What was good for the masses wasn't good for a sophisticated mail user. If you wan't to call getting HTML mail from a mail list sophisticated :) Mozilla Junk Mail classification is good enough for me.
Re:Invasion by Anonymous Coward · 2004-06-22 17:59 · Score: 0

If I was to sit down and filter all my mail myself, I would have 0 false negatives/positives simply because I decide what is spam and what's not.

You're ignoring the "I'm bored and can't be bothered to actually look at the message body because I've already looked at 8000 messages today" factor.
Re:Invasion by fferreres · 2004-06-22 18:15 · Score: 1

CRM114 and Spamassasin can be used on desktop computers, or even on remote accounts where you can get a shell. The is an extra risk if you use "as good as it gets" external email accounts like hotmail, yahoo or gmail.

--
unfinished: (adj.)
Re:Invasion by aka-ed · 2004-06-22 18:27 · Score: 1

It's a matter of point of view.
Your POV fails to recognize that, if the header does not look like spam, one's curiosity wrt the content of the email renders it non-spam, at least until it is viewed and a differing evaluation is made. It is not the content of the mail that makes a mail spam, but the user's feeling about that content.
It doesn't matter that the reason you judge it non-spam is because of fatigue. IMO, it just ain't Spam until I say it is. YMMV, of course.

--
I survived the Dick Cheney Presidency 7 to 9 AM 7-21-07
Re:Invasion by KjetilK · 2004-06-23 01:36 · Score: 1

However, what is the probability of real mail getting rejected by these things.

In my case, with about 30000 messages processed since SA 2.62 was released, that number would be 3 messages. Two of which was from Amnesty International (join!), and was blocked because they are actually using spamware for their mailings, for some mysterious reason. The other was from a friend who got some really bad spammer worm, and consequently got on every block list there is.
These were however not blocked, they just landed among the other spam I let through to a spam folder. I do examine the SA summary of rejected spams occasionally, never seen anything there, and given these numbers, it seems just extremely unlikely that SA will reject any legit mail falsely in my current configuration. One in a million, perhaps...

--
Employee of Inrupt, Project Release Manager and Community Manager for Solid

Okay, but what about... by Anonymous Coward · 2004-06-22 15:31 · Score: 0

...false positives?

Re:Okay, but what about... by dasmegabyte · 2004-06-22 16:48 · Score: 3, Interesting

Here's how you assuade false positives:

You keep one account for people who don't know you. You spam check that one. You put that on business cards, use it to sign up for porn sites, and post it on slashdot.

You keep another account for responding to email. You set that as your reply-to. You do not spam check it.

This way, there is a way to reach you for customers, clients and friends that will ALWAYS work. Call it the direct line. And, there's a way for people to introduce themselves to you. Call it the "front desk." Anyhow, with SpamAssassin (which includes a bayesian filter, btw, which can be autotrained to learn spam-like language from other mail it sets up), most of the bullshit calls will be correctly tagged and most of the incoming calls will get to you. I haven't had a false positive in months. But I train the thing like Rocky Balboa.

--
Hey freaks: now you're ju
Re:Okay, but what about... by Rhesus+Piece · 2004-06-22 18:10 · Score: 1

Y'know, I did essentially that, and I still get maybe 10 spam per day.

Know why?

My friends have worms.

Yep, all those blasted addressbook-reading pieces of crap tried to propagate themselves using my email, and as a result spammers are very much aware of my existence.

Moral of the story: Don't make friends with insecure people.
Re:Okay, but what about... by alain1234 · 2004-06-23 01:56 · Score: 1

> You keep another account for responding to email.
> You set that as your reply-to. You do not spam
> check it.

You tried it ?

Some outlook users will end with your email in their addressbooks, a virus sends mail from you to someone else, then to a mailing list which is archived online, your "private" address is now on the web, game over.
Re:Okay, but what about... by dasmegabyte · 2004-06-23 03:08 · Score: 1

See, I've done this for about five years, and haven't had too much of a problem with viruses. In fact, I've helped a couple of friends track down exactly who has the virus.

--
Hey freaks: now you're ju

Quit acting like goddamn babies... by Anonymous Coward · 2004-06-22 15:32 · Score: 5, Funny

Baysian, gaysian. Real men hit delete.

Re:Quit acting like goddamn babies... by fireman+sam · 2004-06-22 16:30 · Score: 4, Funny

Pfft, Real men have this as the ~/.bashrc

#!/bin/sh
rm -f /var/spool/mail/$USER

Who needs email.

--
it is only after a long journey that you know the strength of the horse.
Re:Quit acting like goddamn babies... by Anonymous Coward · 2004-06-22 16:39 · Score: 0

A real hacker would do:

/etc/init.d/sendmail stop
Re:Quit acting like goddamn babies... by Anonymous Coward · 2004-06-22 17:04 · Score: 0

You'll still get mail. You need

rm -f $MAIL
ln -s /dev/null $MAIL
Re:Quit acting like goddamn babies... by idiotnot · 2004-06-22 17:13 · Score: 1

killall -TERM sendmail
echo 'SENDMAIL="NONE"' >> rc.conf

*real men* don't do sysv, or so I've heard.

I do sysv and I don't run sendmail.

I also am typing this on a Macintosh. /me seriously questioning masculinity at the moment.....
Re:Quit acting like goddamn babies... by Technician · 2004-06-22 17:20 · Score: 1

Baysian, gaysian. Real men hit delete.

Real men have a life instead of spending the day poking a small button over and over.

--
The truth shall set you free!
Re:Quit acting like goddamn babies... by Anonymous Coward · 2004-06-22 17:27 · Score: 0

Most mail servers refuse to deliver to a device symlinked to a mailbox.
Re:Quit acting like goddamn babies... by Anonymous Coward · 2004-06-22 19:09 · Score: 0

Real men have a life instead of spending the day poking a small button over and over.

Really? I thought the goal of all real het men was to poke at a small "button" over and over.
Re:Quit acting like goddamn babies... by Anonymous Coward · 2004-06-22 19:12 · Score: 0

I guess that goes for gay guys, too - just a different button...
Re:Quit acting like goddamn babies... by Technician · 2004-06-22 21:04 · Score: 2, Insightful

just a different button...

I assume you are not referring to the delete key. ;-) There is more to life than hitting the delete key.

--
The truth shall set you free!
Re:Quit acting like goddamn babies... by Anonymous Coward · 2004-06-22 22:33 · Score: 0

Your signature is 404.
Re:Quit acting like goddamn babies... by Too+Much+Noise · 2004-06-23 00:45 · Score: 2, Funny

Silly rabbit! all you need is

ln -s /dev/null /var/spool/mail/$USER

and you will have email peace forever. ^_^
Re:Quit acting like goddamn babies... by Wdomburg · 2004-06-23 01:09 · Score: 1

That's great until you get: ln: /var/spool/mail/toomuchnoise: File exists I think what you *really* want is: ln -sf /dev/null /var/spool/mail/$USER
Re:Quit acting like goddamn babies... by Too+Much+Noise · 2004-06-23 01:21 · Score: 1

no-no, you only do it once - I should have made it clearer, no ba/k/tcsh script. Otherwise there's no net ain over the OP's solution of deleting in .bashrc. you don't want to worry abbout email AT ALL, right?

(besides, .bashrc gets sourced way too often to be efficient here).
Re:Quit acting like goddamn babies... by Ronald+Dumsfeld · 2004-06-23 02:21 · Score: 1

Baysian, gaysian. Real men hit delete.
Nonononono!

You report it to spamcop then you go to the spammer's unsubscribe page and enter the FCC complaints email address to get it "removed" from their database.

--
Where's the Kaboom?
There's supposed to be an Earth-shattering Kaboom.
Re:Quit acting like goddamn babies... by Wdomburg · 2004-06-23 03:04 · Score: 1

I was assuming that if you've had the account long enough to get sick of spam that there would be a mail folder already, so you'd either need to delete it and THEN do the command you gave, or just pass the force flag so it'll delete the mail folder and replace it with the symlink. :)
Re:Quit acting like goddamn babies... by Yer+Mom · 2004-06-23 03:51 · Score: 1

Pfft.
Real men use zsh.

--
Never mind Spamassassin. When's Spammerassassin coming out?
Re:Quit acting like goddamn babies... by Anonymous Coward · 2004-06-23 04:59 · Score: 0

"rm -f /var/spool/mail/$USER"

And you can use the same arguments that some people always use!
99.x% accuracy, no false positives etc.

I didn't RTFPDF... by john_smith_45678 · 2004-06-22 15:32 · Score: 3, Interesting

The best-performing filters reduced the volume of incoming spam from about 150 messages per day to about 2 messages per day.

How many false positives though?

--
John Kerry is a Joke!

Re:I didn't RTFPDF... by Anonymous Coward · 2004-06-22 15:40 · Score: 0

Fuck off.
Re:I didn't RTFPDF... by Malc · 2004-06-22 15:58 · Score: 1

Why's this moderated "troll". It's a very good question. I'd rather receive some spam than have just one valid message blocked. I use Yahoo and they piss me off sometimes with their false-positives.
Re:I didn't RTFPDF... by timeOday · 2004-06-22 16:26 · Score: 1

Yup, I can easily reduce spams to fewer than 2 per day. Just redirect all mail to /dev/null.
Re:I didn't RTFPDF... by Anonymous Coward · 2004-06-22 19:33 · Score: 0

Oh how fucking clever. But I guess a fucktard like yourself never gets any real email.
Re:I didn't RTFPDF... by Daniel_Staal · 2004-06-23 02:47 · Score: 1

I've been using Spamassassin for several years at home. I've got it set so it lets through about 10 spam emails a week. (On a semi-bad week.)

I normally have in the range of 2-10 false positives a year, just about all of them automated replies of one sort or another. (Webcards, emailed receipts, that sort of thing.)

Oh, I get around 60-150 spam emails a day. (Lately it has been going down. I used to get 150 a day quite regularly.)

--
'Sensible' is a curse word.

I use two... by hkfczrqj · 2004-06-22 15:33 · Score: 2, Interesting

I use Spamassassin. Surviving mail then goes through CRM-114. At least in my case, it works better than each of the filters on its own.

No HTML, Just ps or pdf, conclusions inside by randyest · 2004-06-22 15:34 · Score: 5, Informative

And a long document it is (funny placeholder images though.) Here's the conclusions for the impatient but interested in a little more than the summary:

Supervised spam filters are effective tools for attenuating spam. The best-performing filters reduced the volume of incoming spam from about 150 messages per day to about 2 messages per day. The corresponding risk of mail loss, while minimal, is difficult to quantify. The best-performing filters misclassified a handful of spam messages early in the test suite; none within the second half (25,000 messages). A larger study will be necessary to distinguish the asymptotic probability of ham misclassification from zero.

Most misclassified ham messages are advertising, news digests, mailing list messages, or the results of electronic transactions. From this observation, and the fact that such messages represent a small fraction of incoming mail, we may conclude that the filters find them more difficult to classify. On the other hand, the small number of misclassifications suggests that the filter rapidly learns the characteristics of each advertiser, news service, mailing list, or on-line service from which the recipient wishes to receive messages. We might also conjecture that these misclassifications are more likely to occur soon after subscribing to the particular service (or soon after starting to use the filter), a time at which the user would be more likely to notice, should the message go astray, and retrieve it from the spam file. In contrast, the best filters misclassified no personal messages, and no delivery error messages, which comprise the largest and most critical fraction of ham.

A supervised filter contributes significantly to the effectiveness of Spamassassin's static component, as measured by both ham and spam misclassification probabilities. Two unsupervised configurations also improved the static component, but by a smaller margin. The supervised filter alone performed better than than the static rules alone, but not as well as the combination of the two.

The choice of threshold parameters dominates the observed differences in performance among the four filters implementing methods derived from Graham's and Robinson's proposals. Each shows a different tradeoff between ham accuracy and spam accuracy. ROC analysis shows that the differences not accountable to threshold setting, if any, are small and observable only when the ham misclassification probability is low (i.e. hm
CRM-114 and DSPAM exhibit substantially inferior performance to the other filters, regardless of threshold setting. Both exhibit substantial learning throughout the email stream, leading us to conjecture that their performance might asymptotically approach that of the other filters. From a practical standpoint, this learning rate would be too slow for personal email filtering as it would take several years at the observed rate to achieve the same misclassification rates as the other systems. Both these systems were designed to be used in a train on error configuration, and do not self-train. This configuration could account for a slow learning rate as each system avails itself of the information in only about 1,000 of the 50,000 test messages. In an effort to ensure that we had not misinterpreted the installation instructions, we ran CRM-114 in a train-on-everything configuration and, as predicted by the author, the result was substantially worse.

Spam filter designers should incorporate interfaces making them amenable for testing and deployment in the supervised configuration (figure 4). We propose the three interface functions used in algorithm 1 - filterinit, filtereval, and filtertrain - as a standardized interface. Systems that self-train should provide an option to self-train on everything (subject to correction via filtertrain) as in algorithm 2.

Ham and spam misclassification proportions should be reported separately. Accuracy, weighted accuracy, and precision should be avoided as primary evaluation measures as th

--
everything in moderation

Mozilla Messenger / Thunderbird Performance? by Mark_MF-WN · 2004-06-22 15:34 · Score: 5, Interesting

I wonder how Mozilla Messenger/Thunderbird's spam filtering stacks up against these filters? I've heard some negative comments about the Mozilla filtering system, but it's worked wonders for me.

Re:Mozilla Messenger / Thunderbird Performance? by k.ellsworth · 2004-06-22 15:47 · Score: 2, Informative

100% agreed I use mozilla thunderbird spam filter (after some human teaching to it) and it works marvelous, on a spam-me(account used on usenet, and some forums and to anything that i know that will become a spam source but i need to give a valid email address anyways) email account i have i recive ~38K spams a month and thunderbird only misses 3 or 4 per day... sometimes i look the JUNK folder of it and i haven't seen any false positive on it so far. ThunderBird is THE email client, works on Linux and Windoze, the spam filter work better than 99% , any many other tricks.

--
Putting a windows cd backwards, plays evil messages, but it gets worse, putting it right, installs windows.
Re:Mozilla Messenger / Thunderbird Performance? by mbourgon · 2004-06-22 15:54 · Score: 1

Mozilla 1.8 has (had?) a real problem with it's Junk Mail controls... namely, they don't (didn't?) work nearly as well as 1.7's. Someone feel free to karma whore the details, but I think the problem is that they're using a bunch of different spam filters, and it's not as powerful as whatever was used in 1.7.

--
"Sometimes a woman is a kind of religion, she can save your soul & set you free from all your sins" - Bad Examples
Re:Mozilla Messenger / Thunderbird Performance? by darkmeridian · 2004-06-22 16:41 · Score: 1

I used Thunderbird and the SpamBayes proxy concurrently for a while. SB kicks the crap out of the Thunderbird.

Just one example. I get spam from VIPClubber. I don't know why and I'm afraid to click the "Cancel Me" link because I didn't sign up for anything. Anyway, they don't spoof their headers. Everything from VIPClubber.com is spam. Thunderbird, after ~30 from VIPClubber, still lets some through. SB does not.

Perhaps the TB should integrate SB. This demonstrates the power of open-source software. Just imagine.

--
A NYC lawyer blogs. http://www.chuangblog.com/
Re:Mozilla Messenger / Thunderbird Performance? by dasmegabyte · 2004-06-22 17:00 · Score: 2, Interesting

From person experience, it works pretty well (I think Mail.App is good too, but the management of the junk once marked needs to be customized). But since it's not really a server side program, you can't run a server-side test on it. Hence why it wasn't included in this test.

Some anecdotal "evidence" for you: some of the users at my office run their own spam engines on their desktops because they're control freaks. I let them pass by SpamAssassin entirely. In my observation, SpamAssassin works WAY better. It cleans about 90% of the spam we get, whereas most of the add-on desktop clients I've seen are 60-70% effective. Meaning about every third email gets through.

Either way, I would never run an email address "in the wild" without some kind of spam software. Not any more. I resisted for YEARS, but when I started pulling up Squirrelmail...and the first three PAGES of mail were all spam missed by the (SLOWWWWW) Squirrelmail bayesspam plugin...I moved on to using only IMAP client apps with SOME KIND of spam detection built in.

--
Hey freaks: now you're ju
Re:Mozilla Messenger / Thunderbird Performance? by tilrman · 2004-06-22 17:27 · Score: 1

According to the article, Mozilla was "initially chosen for evaluation but later excluded because a prohibitive effort would have been required to isolate the interfaces...."
Re:Mozilla Messenger / Thunderbird Performance? by Anonymous Coward · 2004-06-22 17:45 · Score: 1, Informative

I have measured Mozilla at 97% accurate and SpamProbe at 99.6% accurate. My mail is very skew, since I get about 20 times more spam than ham.

Mozilla is OK if you only get about 100 spams a day, but I get about 4000 spams a day - and less than 20 legit messages, so I need something better than Mozilla.

For me, Spamprobe had zero false positives, after 18 months of use, so I think if it ever does make a mistake, it would be a message so close to spam that I would not want to read it anyway.
Re:Mozilla Messenger / Thunderbird Performance? by jrumney · 2004-06-22 19:21 · Score: 1

I recently switched to Thunderbird 0.7 from Mozilla 1.7, and I'd say the same. Mozilla 1.7's spam filter caught about 80% of spam with no false positives. Thunderbird 0.7's catches about 50%, but more disurbingly it also marks a lot of genuine mail as spam.
Re:Mozilla Messenger / Thunderbird Performance? by Anonymous Coward · 2004-06-22 22:51 · Score: 0

I wonder how Mozilla Messenger/Thunderbird's spam filtering stacks up against these filters? I've heard some negative comments about the Mozilla filtering system, but it's worked wonders for me.

I think its effectiveness depends on a lot of factors. For me, I started using Moz mail exclusively the first version that had Bayesian filtering (I was hooked on the idea after first reading Paul Graham's article).

At first, it was great. Now that Bayesian (and similar) filtering is commonplace, however, I'm finding a LOT of spam slipping through. Most that does will contain intentional misspellings (v1agra, etc). Now that it's in common use spammers are specifically working around this type of filtering.

On top of that, I've had my own newsletter marked as junk, as well as things like domain renewal notices, etc... so I always double-check the filtering, and likely will never rely on any filtering to the point of letting it delete messages automatically.

Worse though is when large ISPs implement server-side filtering without informing their customers. It seems Earthlink is one, but AOL has some kind of junk filtering that (I believe) is enabled by default. My business sends out software licenses via email, and some ISP's filters block the very thing their customer has paid for... it's become quite a support issue for us.

To Earthlink's credit, they email back a message with a link that, once clicked/confirmed, the original email makes its way to the customer. And I believe most ISPs offer some way for the user to disable the filtering, or to be able to see/retrieve the blocked messages -- but in my experience most of these users have no idea such filtering is in place.
Re:Mozilla Messenger / Thunderbird Performance? by WuphonsReach · 2004-06-23 01:35 · Score: 2, Informative

I used Thunderbird and the SpamBayes proxy concurrently for a while. SB kicks the crap out of the Thunderbird.

Definitely agree.

I use the SpamBayes MSOutlook plugin for my work e-mail and it is extremely good at discriminating spam from ham. I use Thunderbird for my non-corporate e-mail. SpamBayes has two additional (and rather important features) that Thunderbird/Mozilla just don't have:

1. SpamBayes (at least the Outlook plug-in) actually has (3) levels of classification... definite ham, maybe, and definite spam; and you can route the "maybe" and "definite spam" to two different folders. That means, instead of having to sift through 229 spam messages for false positives, I really only have to closely examine the 29 "maybes". The other 200 I can just give a cursory glance at.

2. SpamBayes keeps track of the folder where a spam message was found. Then, if you click the "you goof! that's ham!" button, SpamBayes is smart enough to put the message back into that folder. Moz's junk mail filter just turns off the junk flag and leaves the message to rot in the junk folder. Sounds like a small thing, but it's a big usability issue.

--
Wolde you bothe eate your cake, and have your cake?
Re:Mozilla Messenger / Thunderbird Performance? by Salamander · 2004-06-24 07:53 · Score: 1

I found Thunderbird's spam filtering to be utterly useless. When I realized that as a result of using it I just had to sort through my junk folder as well as my inboxes, with the former just as likely to contain real mail as any of the latter, I just turned it off. I think my problem was the same as others have reported: a significant percentage of the spam I get is specifically designed to fool Bayesian filters, and as soon as the filters crank up to catch the spam they start catching ham as well. It's an arms race, and Thunderbird's filters lost.

--
Slashdot - News for Herds. Stuff that Splatters.
Re:Mozilla Messenger / Thunderbird Performance? by drinkypoo · 2004-06-24 09:25 · Score: 1

If you are sorting your email into folders based on rules, you can just un-junk the mail in question, then use options in the tools menu to delete everything still marked as junk, and then apply the filters to the junk mail box and re-file that stuff. It's not as nice as just remembering where the mail belongs but it works.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"

Spamassasin is great! by JohnFromCanada · 2004-06-22 15:35 · Score: 2, Informative

I have been using SpamAssassin in conjunction with Evolution and it has cut my spam to virtually nothing. I wish it was built right into Evolution so that it was a little faster however it is worth the wait as I barely ever get any spam in my Inbox anymore. I set it up with evolution very similar to how it is shown here. I really like using it with Evolution however I am curious if anyone knows of anything that would work faster and as efficient in conjuntion with Evolution?

Re:Spamassasin is great! by Anonymous Coward · 2004-06-23 01:12 · Score: 0

I setup spamassassin with MailScanner, MailWatch, razor, dcs, bayes, and several other options. Our corporate wide spam level has dropped to almost nothing, and zero virus infected emails slipped through.

Everything flagged as spam get quarantined for ten days, then deleted. MailWatch makes it easy to generate reports, and release any false positives from quarantine. I have had only two false positives brought to my attention out of 300,000 emails in the last 60 days. After I did the initial setup I personally scanned over 1000 quarantined messages, and didn't find one I wouldn't consider spam. I informed all email users in the company what we are doing and to contact me if they are expecting something and don't get it, but I never get a call.

So yeah, I think it is great too.
Re:Spamassasin is great! by WhiteDragon · 2004-06-24 00:33 · Score: 1

I really like using it with Evolution however I am curious if anyone knows of anything that would work faster and as efficient in conjuntion with Evolution?
I don't use evolution, but I do use bogofilter which is very fast. I have heard that it does work with evolution.

--
Did you mount a military-grade, variable-focus MASER on an unlicensed artificial intelligence?

Real way to block spam by DRWHOISME · 2004-06-22 15:35 · Score: 2, Interesting

Is to do away with current email protocols and go with new ones with verification.

That should take care of the problems. The gov is now concentrating on this.

Re:Real way to block spam by PornMaster · 2004-06-22 15:39 · Score: 2, Insightful

Is to do away with current email protocols and go with new ones with verification. That should take care of the problems. The gov is now concentrating on this.

Except for making a new standard that's a requirement for doing business with federal agencies, just what do you think government's capable of doing regarding replacing protocols?

-PM

--
500GB of disk, 5TB of transfer, $5.95/mo
Re:Real way to block spam by Anonymous Coward · 2004-06-22 15:43 · Score: 0

They'll say "Hey, look at our new anti-spam list!" and the list will only be available for users of the new protocol. People will want this and demand it from their ISPs.
Re:Real way to block spam by wmacgyver · 2004-06-22 16:03 · Score: 1

color me skepical, but I'm not sure government is the magical solution to this problem. Just look at how much good the new anti-spam law they passed is doing. :)
Re:Real way to block spam by Technician · 2004-06-22 17:30 · Score: 1

Already done that. I have a geocaching account. It doesn't permit bulk mail of any kind. To mail me, get an account, choose send mail to another user, and fill in the online form. This type of mail so far has been spam free and works. I know for those on bulk lists, it doesn't work for you. But it's a place my family can reach me without haveing to weed out a stuffed inbox and possibly loose the important stuff.

Mailboxes and bulk mail just don't mix. Newsgroup notifications and such should use another protocol other than mail. Mail should be personal person-to-person like a phone call. That is why IM has begun to replace e-mail. You can close your input to a select group that you trust and reject everyone else. E-mail needs to do the same.

--
The truth shall set you free!

Re:Best anti-spam code by britneys+9th+husband · 2004-06-22 15:36 · Score: 0, Insightful

How exactly does the US (or other first world country) go about writing a code of law that puts Nigerian spammers in jail?

--
Hear recorded Slashdot headlines on your phone! New service beta testing. Just call (248) 434-5508

A little advice by Anonymous Coward · 2004-06-22 15:37 · Score: 5, Funny

You don't want to face an assassin in a shootout. Maybe a pie eating contest, or a spelling bee... but not a shootout.

I've had CRM114 running for a few months . . . by klevin · 2004-06-22 15:38 · Score: 4, Informative

CRM114's best was about 80%, which lasted for a few of weeks (weeks 3-5). Before and after that, it's doing good to catch 25% of the spam. I'm not sure why, but for the last month it's only been catching about 10%. When one gets through, I run it through mailfilter.crm with the learnspam switch. It'll say it's learned it, but if I have it check the spam again, it still lets it past.

Re:I've had CRM114 running for a few months . . . by CoolGopher · 2004-06-22 16:19 · Score: 2, Informative

I've been running CRM114 for about a year now, and it's performing extremely well. Far better than my Mozilla filter. In fact, just the other week I scrapped Mozilla's junk filter completely and am now relying on CRM alone. It's very rare that I get any misses in either direction.

If I was to make an estimate, I'd say that the error rate is something like .1%, quite possibly less (say 1 miss/5 days, with 200 mails per day). This is having started with clean corpus files and train-on-error only.
Re:I've had CRM114 running for a few months . . . by Bakaneko · 2004-06-22 17:04 · Score: 1

My problem with CRM114 (and I'm using it exclusively lately) is its likelyhood of false positives.

It catches nearly all spam now, but I have to be careful to watch for and relearn anomalies a lot.

For instance, I rarely receive JPGs as attachments. Family either mails me photos or zips them up first, work obviously never has the need to send photos... normally.

About a month ago I had a project which required quite a few people who never send me email to send me a brief email and an attached photo. EVERY single one of them CRM114 threw away, and since I only tend to check the spam pile every few days or so, I had thought people were ignoring my requests.

It also seems a bit cyclic. Every few days, I get one or two through that I can't figure for the life of me how they made it, but their in the single digits in terms of non-spam classification. So I retrain them as spam, but then a few days later it marks as spam an email that really isn't... So I retrain... but then a few days later. Etc... The distinctions seem to be very fine (as in narrow) for it.
Re:I've had CRM114 running for a few months . . . by Amoeba+Protozoa · 2004-06-22 17:45 · Score: 1

I concur. CRM114 has been working /extremely/ well for me as well.

This month I had 1295 messages, 210 that were spam (my mail account is relatively new). 2 out of the 210 were false-negatives and 1 was a false positive. That is a failure rate at worst of 3/210 or ~1.5%

Not too bad in my book.

-AP
Re:I've had CRM114 running for a few months . . . by fferreres · 2004-06-22 17:47 · Score: 2, Interesting

Me too. I couldn't check email for about a week and grew 4200 or so spam messages and 300 ham ones. 1 spam misclassified...(but some false positives also).

I try to teach the program the least possible (if a message doesn't look like spam for me, even if it is though, I do not teach it).

I also delete de ADV: (prefix) in the subject and the crm114 spam metadata (TAG) and fix it in general so it doesnt get confused when learning spam.

Bad teaching at the beggining leads to lower quality filtering (I did this at the beggining, not cleaning tags amongh other mistaques).

I tryed spamassasing and got fed up. The rules system made Spamassassin pass as ham everything that spooed a PINE filter. WTF...I deleted the entry, then one day upgraded and voila, lots and lots of spam again. And accuracy was much lower (the PINE problem reproduced with a lot of other "whitlisting rules" that I never needed).

After a week with CRM114, I deleted spamassain preprocessing for my account.

--
unfinished: (adj.)
Re:I've had CRM114 running for a few months . . . by fferreres · 2004-06-22 18:27 · Score: 1

Same problem here, though less intensive.

What I do is delete the parts of spam messages that I do not want the spam filter to consider. For example??? MY FUCKING NAME! It seems they go my name from Whois, so whenever I get a spam message that has my name on it, I delete it. Look at the message bodies and headers to see if that is causing you troubles, delete anything common sense says you will like not tag as spam (name, address, text that is normally expected).

Maybe it can help you.

--
unfinished: (adj.)
Re:I've had CRM114 running for a few months . . . by Anonymous Coward · 2004-06-23 00:28 · Score: 0

When I ran Spamassassin it would also block about 80% of my spam.

I've been running DSPAM for a couple of months and it is blocking 100% of my spam. I haven't gotten a single spam message and it has not blocked a single ham message. DSPAM was the other along with CRM that they said was the worst. Somebody screwed up if you ask me.
Re:I've had CRM114 running for a few months . . . by ansa · 2004-06-23 01:35 · Score: 1

I'm running DSPAM too for 6 months, and after some time it starts letting some spam through... it's happening now with german spam and it happened often with non english languages; no false positives though, and no configuration needed!
I used spamassassin for about 1 year and I always got a lot of false positives, and it needed constant tweaking on the ruleset to let some blocked new mail in.

--

--
"The crux of the biscuit is the Apostrophe(*)" - FZ

Good results with spamprobe by bigberk · 2004-06-22 15:38 · Score: 2, Informative

I have been using spamprobe for some time, with the webfilt front-end, and I'm very pleased with the speedy spamprobe program (written in C++).

I receive approximately 10 legit emails/day and about 300 spam/day. I have only had 2 false positives overall (that's 2 out of about 100,000 total emails received) and on average only 2 spams/day split past the filter. Now I'm testing Spambayes on one of my most spammed accounts, but it's definitely much slower than spamprobe and not more accurate as far as I can tell.

compute farms for anti-spam AI? by potus98 · 2004-06-22 15:39 · Score: 4, Informative

From page 24: Hidalgo suggests the use of ROC curves, originally from signal detection theory and used extensively in medical testing, as better capturing the important aspects of spam filter performance.

Perhaps a distributed analysis system (similar to SETI@home) could be used to combat spam. Not only could the idle time of bazillions of CPUs be levereaged to improve "signal" analysis, but perhaps the clients could analyize local incoming mail to corelate new trends in spam originators and then share that information with all of the other clients. Then you could combine that with the genetic evolution improvements of the F1 sim-cars recently mentioned on /.

So there's the high-level idea, now you smart people go make it work. :-)

--
This one gang kept wanting me to join cause I'm pretty good with a bo staff.

Re:compute farms for anti-spam AI? by damiangerous · 2004-06-22 16:38 · Score: 4, Informative

There are already spam packages that do this, at least the collaborative part. Vipul's Razor (which is under the Artistic license) at the personal level and Brightmail (which is closed and not free) at the enterprise/ISP level, off the top of my head.
Re:compute farms for anti-spam AI? by ZorbaTHut · 2004-06-22 16:57 · Score: 1

See gmail.

No, I don't *know* that that's what they're doing . . . but it wouldn't surprise me :)

--
Breaking Into the Industry - A development log about starting a game studio.
Re:compute farms for anti-spam AI? by Clovert+Agent · 2004-06-23 00:18 · Score: 1

I just tested a bunch of anti-spam tools. Brightmail was the best of the bunch in terms of detection, and had zero false positives. Not the most flexible though, and far from being userfriendly to admin and monitor.

YMMV - spam is not only subjective, it varies enormously from one organisation to another, and even between groups within organisations.
Re:compute farms for anti-spam AI? by Anonymous Coward · 2004-06-23 08:53 · Score: 0

I took your idea to the Smart People and they told me to tell you this:

"It won't work because the distribution of processing clients among vast, unknown entities results in easy tampering by malicious parties."

Then they said something like "ineffectual to harmful."
Re:compute farms for anti-spam AI? by drinkypoo · 2004-06-24 09:32 · Score: 1

I can't help but notice that the install documentation for vipul's razor, well, sucks. Is there such a thing as a mozilla plugin for the razor?

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"

Re:in related news by sqlrob · 2004-06-22 15:41 · Score: 1

Content RBLs have been working fairly well for me

Re:in related news by bigberk · 2004-06-22 15:42 · Score: 4, Insightful

Content-based spam filtering is a waste of time. . . RBLs WORK

But content-based filters can very accurately determine what is spam and what's not, and so they can feed RBLs/DNSBLs. Let real spam to real user accounts form the blocklist! One such project is WPBL.

Isn't Human Accuracy always 100% by PetoskeyGuy · 2004-06-22 15:43 · Score: 4, Insightful

From the CRM-114 site...

News Flash: As of Feb 1 through March 1, 2004, 8738 messages (4240 spam, 4498 nonspam), and my total error rate was ONE. That translates to better than 99.984% accuracy, which is over ten times more accurate than human accuracy

Maybe I'm missing something human accuracy always going to be 100%? I tell the computer what is spam, it learns. I may decide that regardless of what it thinks, this last message is OK. So aside from clicking too fast or changing your mind (which is a common thing to do) how can a filter ever suggest it is be better then people at deciding what people want to see?

Re:Isn't Human Accuracy always 100% by sholden · 2004-06-22 15:50 · Score: 4, Insightful

People make mistakes.

Yes, given one message to classify as spam or ham you are going to get it right 100% of the time.

Given 8000 messages to classify the wonders of boredom is going to mean you make a mistake every so often (not an "oops I clicked the wrong button" mistake, but an "oops I put it in the wrong folder because the subject looked spammy and I couldn't be bothered checking the body" mistake).

In practice though, those stats on human accuracy are provided by having one person classify email that has been classified by others - which of course means some of the mistakes in fact be disagreements...
Re:Isn't Human Accuracy always 100% by fireman+sam · 2004-06-22 16:28 · Score: 4, Funny

Remember, an email being classified as spam is sujective. For example, you might consider a message from a Nigerian bank manager spam, but I may consider it a way to pay of the house :)

Or, presonally I consider all email I get with the from hotmail.com is spam. But that is my opinion.

OT: btw, a friend at work actually got a Nigerian scam letter in the post. Because it was not email, he thought it was real.

--
it is only after a long journey that you know the strength of the horse.
Re:Isn't Human Accuracy always 100% by Anonymous Coward · 2004-06-22 16:35 · Score: 4, Funny

OT: you need smarter friends.
Re:Isn't Human Accuracy always 100% by Surazal · 2004-06-22 16:36 · Score: 0, Offtopic

For example, you might consider a message from a Nigerian bank manager spam, but I may consider it a way to pay of the house :)

Nope. Once you fall for that, it will be the Nigerian scammer who will pay off your house. Under his name.

--
--- Journals are boring; Go to my web page instead
Re:Isn't Human Accuracy always 100% by Anonymous Coward · 2004-06-22 16:42 · Score: 1, Funny

Of course, the grandparent could be a Nigerian bank manager, making you both correct.
Re:Isn't Human Accuracy always 100% by norton_I · 2004-06-22 19:59 · Score: 1

If I give you a batch of 8000 emails and ask you to classify them, then do it again a week later, you will not make the same partition. If you then go look at the few messages you assigned differently between the two trials, you will (usually) decide that it is either spam or not, in a (mostly) repeatable fashion.

The difference is, you don't look at each message carefully when scanning in bulk.

When I first trained spamprobe, I trained it on about 3000 messages, then ran it over those messages to classify them. It came up with about two spams and one notspams that I had misclassified, plus one ambigious case (marketing email from a company I buy stuff from at work on a regular basis, and who I let scan my badge at a tradeshow).
Re:Isn't Human Accuracy always 100% by AnotherBlackHat · 2004-06-23 03:51 · Score: 1

In practice though, those stats on human accuracy are provided by ...

I'd bet that like 97.3% of all statistics, this one was made up on the spot too.

-- not a .sig
Re:Isn't Human Accuracy always 100% by Jerf · 2004-06-23 06:30 · Score: 1

Maybe I'm missing something human accuracy always going to be 100%?

I use Mozilla's filter. Yesterday, somebody I've never heard of sent me an email entitled, simply, "Hello".

This is not a rare event for me. It was unusual that my mail filter didn't label it as spam. So I "corrected" it.

Then I thought I should at least check it out. And lo, it was a 100% serious email from somebody trying to find an old friend, who had good reason to believe I might know something. (I didn't, but the only way to find out was to ask.)

One case where the spam filter was more right than me, at least superficially.

It can be observed, correctly, that I just looked at the title and sender, and it "read" the whole message, but generally, that is enough for me to beat my filter, so it is at least somewhat fair to hold this up as an example of a filter beating a human practically, even though I obviously control the definition of "spam".
Re:Isn't Human Accuracy always 100% by Anonymous Coward · 2004-06-23 07:51 · Score: 0

Remember, an email being classified as spam is sujective. For example, you might consider a message from a Nigerian bank manager spam, but I may consider it a way to pay of the house :)
I know you're making a joke, but I still have to point out that this is not true. Spam is unsolicited commercial email. There's a little bit of leeway in "unsolicited"[*], but not enough that Nigeran bank manager emails would ever be considered ham. Even if you want spam for some reason, you should at least agree what it is.
[*] If I'm a Qwest customer, is a message from them advertising new services still spam? That's debatable. But if I don't have any business relationship with someone and have never talked with them before, that's not: a message from them is unsolicited.

Re:in related news by plasm4 · 2004-06-22 15:43 · Score: 2, Insightful

filtering tools work fairly well, but more importantly they work right now. Waiting for the authorities to "wake from their slumber" might take years, if it ever even happens.

Spamassassin uses collaborative spam-tracking by vivek7006 · 2004-06-22 15:43 · Score: 2, Informative

Razor: Vipul's Razor is a collaborative spam-tracking database, which works by taking a signature of spam messages. Since spam typically operates by sending an identical message to hundreds of people, Razor short-circuits this by allowing the first person to receive a spam to add it to the database -- at which point everyone else will automatically block it.

This is a really cool.

Re:Spamassassin uses collaborative spam-tracking by Anonymous Coward · 2004-06-22 15:51 · Score: 0

What protection does it have against users (intentionally or unintentionally) adding non-spam to the database, thus blocking legitimate e-mail to everyone who uses Razor?
Re:Spamassassin uses collaborative spam-tracking by bigberk · 2004-06-22 15:53 · Score: 4, Informative

It gets better. Vernon Schryver, networking genius, is responsible for the Distributed Checksum Clearinghouse which does something similar, but as I understand it, is much more efficient for large servers. When our university turned on DCC filtering combined with greylisting, the daily spam to inboxes dropped from hundreds daily to ZERO (I kid you not). I am not aware of any false positives, at least on my account. DCC blew my mind.
Re:Spamassassin uses collaborative spam-tracking by Anonymous Coward · 2004-06-22 16:18 · Score: 1, Informative

What protection does it have against users (intentionally or unintentionally) adding non-spam to the database, thus blocking legitimate e-mail to everyone who uses Razor?

People have done this before by adding mailing list posts to Razor. But SpamAssassin doesn't automatically block messages listed in Razor, it just assigns them a higher spam score.

Razor has some protection too, like the truth evaluation system - see this page for info.
Re:Spamassassin uses collaborative spam-tracking by Anonymous Coward · 2004-06-22 19:23 · Score: 0

I've used DCC and similar systems (Razor/Pyzor) in the past. I eventually abandoned them due to high false positive rates. It seems many people classify legitimate receipts, newsletters, mailing list messages, etc., as spam.
Re:Spamassassin uses collaborative spam-tracking by gonk · 2004-06-22 21:49 · Score: 1

It wasn't DCC, it was they greylisting.

robert

So I'm not the only one... by sholden · 2004-06-22 15:44 · Score: 4, Informative

I did a *much* smaller test of spam filters earlier this year (which was published in hakin9 but not in English).

I also found that crm114 gave poor results in comparison to other filters - but figured I must have set something up incorrectly...

Why don't people use catch-all accounts? by mattkinabrewmindspri · 2004-06-22 15:44 · Score: 5, Interesting

When you register with a hosting company, very frequently, they set up what's called a catch-all account, and any email to your domain that's not addressed to a real address goes there. This is how I use it:

I only use my main email address with friends and family, and never post it online.
Whenever I post an email address or register for anything online, I put thatsite@mydomain.com as my email address.
All email is received by one account, but each message can have a different "to:" header. I set my filters to filter mail to different boxes. Email sent to amazon@mydomain.com goes to the amazon folder. Same with ebay, slashdot, whatever.
Any time I start receiving spam, I just set my mail server to disregard email sent to whatever email address is getting the spam, and I can stop doing business with the company that sold my email address.

I receive on average 0 spams per day.

--
Albuquerque PC

Re:Why don't people use catch-all accounts? by Anonymous Coward · 2004-06-22 15:50 · Score: 0

Why can't a spammer just start spoofing different popular sites you may have done business with? You should work a secret code system.
Re:Why don't people use catch-all accounts? by YrWrstNtmr · 2004-06-22 15:59 · Score: 1

Because not everyone has a mydomain.com
Re:Why don't people use catch-all accounts? by mattkinabrewmindspri · 2004-06-22 16:00 · Score: 1

I don't think it's likely that spambots will pick up on more than one of my addresses within several months. I'm probably only registered at about 30-40 sites(about 10 of which I visit really frequently), and most of them can be set to hide your email address. I haven't had to block any of the addresses I've used at popular sites so far.
Even if they did, I could knock the spam I received back down to zero just by having my server disregard any mail sent to that address and then if I'm still visiting that site, changing my address in that site's preferences.

--
Albuquerque PC
Re:Why don't people use catch-all accounts? by sr180 · 2004-06-22 16:13 · Score: 4, Informative

Wait till the spammers decide to spam your whole domain. They can start with aaaaaaaa@yourdomain.com and keep going till they get to zzzzzzzz@yourdomain.com, and your mailserver will accept and pass on every single one of these emails.
I would recommend not using a catch all account, but if you have the domain, create, delete and rename email accounts as you need to...

--
In Soviet Russia the insensitive clod is YOU!
Re:Why don't people use catch-all accounts? by burns210 · 2004-06-22 16:24 · Score: 1

what if it isn't ebay that sold the account, rather a random generation spammer sent to ebay@DOMAIN.TLD? Or if the company(or you, by accident) post the email address to the web, and a spider grabbed it and was added to a spammers list?

how many CORP_X accounts do you go through? ebay1@DOMAIN.TLD, ebay2@, ebay3@... ditching each once it starts to recieve spam.

A most interesting approach, though.
Re:Why don't people use catch-all accounts? by FrenZon · 2004-06-22 16:26 · Score: 3, Insightful

Why don't people use catch-all accounts?

Because you will always have one main 'obvious' address - be it something that goes on your business card, or something you tell to people you meet. For example, I use glen at glenmurphy.com.

Now all it takes is one slip - someone you know to get a virus, whatever, and your address is 'out there' for the taking. Your only possible recourse then is to stop using that address, but for some people that's just not an option, and it's a just bit defeatist to sit there surrendering email address after email address.
Re:Why don't people use catch-all accounts? by mrpuffypants · 2004-06-22 16:27 · Score: 1

alas, that also equates to you receiving 0 emails total per day :(

Some of us don't use spam filters to give us a feeling of life...
Re:Why don't people use catch-all accounts? by videodriverguy · 2004-06-22 16:31 · Score: 1

Very true. This happened to me recently and my spam count went from around 30 to over 400!

Thankfully, my host has a 'blackhole' option for the default account. Turned that on and the spam volume dropped back to the previous level.
Re:Why don't people use catch-all accounts? by someguy456 · 2004-06-22 16:46 · Score: 1

I do the exact same thing, except I don't have my own domain. Instead, I have a free subdomain at cjb.net, which goes something like: somesite.cjb.net I can get every e-mail sent to *@somesite.cjb.net from one login, and can sort and filter it accordingly.

--
Robert Bindler
A Computer Science student's views on technology.
Re:Why don't people use catch-all accounts? by Anonymous Coward · 2004-06-22 16:51 · Score: 0

> I only use my main email address with friends and family, and never post it online.

And when one of your friends or family's PC gets 0wned, and that email addr gets sent back to the master IRC bot, then what? Get a new addr, reprint all the business cards?..
Re:Why don't people use catch-all accounts? by Anonymous Coward · 2004-06-22 16:54 · Score: 2, Interesting

I do that too. Works great (0/day). The problem is, unlike you, for my job, I have to have a public e-mail address.
I even got spam from the president of the univesity I work for. (Why spam, because it was a political response to a news paper article that had nothing to do with my job.) When I asked to be removed, I was told I couldn't opt-out, since I worked for the university. So I removed my e-mail address from the offical database. I was lucky. It got worse. I know five other people who did the same thing over the next few years. Our univeristy has a pro-spam policy (from a committee of course). Anyone who works at your level or above can spam the entire list below for any reason as long as they don't break any existing rules. I could sent three a day to thousands of people without breaking the rules. I'm not required to have an e-mail address in the offical database.
I can't remove my e-mail address from my webpage. I work with lots of people all over the world. I don't think that just because I need an accessable address that I should have to put up with spam. It's not like I'm going to buy from someone selling child-incest-porn e-mailed to a .edu account, yet I get that every month. I've never gotten a single UCE related to my job.
Your solution work great for you, but it doesn't work for me. I wish it did.
BTW, I don't use a catch-all. I only forward specific addresses (300 max). One day, you'll find that once they get your domain, you'll get e-mail for john@yourdomain.com, even though no one ever thought of that address. I have john@mydomain forwarded to uce@ftc.gov.
Re:Why don't people use catch-all accounts? by lewko · 2004-06-22 17:00 · Score: 4, Informative

I used to do the same. Now I'm paying for it.
Several viruses were sent to jane@mydomain, pete@mydomain, sedlskjl@mydomain etc.

Inevitably these same addresses are now being used for Spam and viruses as the source OR destination address (meaning I get bounce messages as well).

I HATE it when moron anti-Virus gateway administrators set them up to return confirmed viruses to sender with a polite note - except I am NOT the sender, my address was spoofed.

Unfortunately I have been using the catch-all trick for so long (e.g. ebay.com@mydomain etc.) that it's not as simple as turning it off or setting up filters - I don't even know what all the 'legit' addresses are as I used to create them on the fly and may only get email to some of them once a year or so.

I only ever busted one person for passing on the account details which was satisfying, but I am getting PLENTY of Spam/viruses now instead.

I use the excellent Spam Gourmet now for instantly creating disposable addresses with the added advantage that they can actually die when I want/need them to.

--
Do you or your partner snore? - Visit www.snoring.com.au
Re:Why don't people use catch-all accounts? by dasmegabyte · 2004-06-22 17:06 · Score: 2, Informative

Why would I wait until spammers did that?

Already if a server tries to send the same email to more than three fake addresses at my company, I blacklist the IP for two days. Not just for email, but for any IP traffic. I did this to prevent trojans, but it's a somewhat effective spam deterrant as well.

--
Hey freaks: now you're ju
Re:Why don't people use catch-all accounts? by sr180 · 2004-06-22 17:22 · Score: 1

Now that is a kick arse idea....

--
In Soviet Russia the insensitive clod is YOU!
Re:Why don't people use catch-all accounts? by dubl-u · 2004-06-22 17:46 · Score: 1

Wait till the spammers decide to spam your whole domain. They can start with aaaaaaaa@yourdomain.com and keep going till they get to zzzzzzzz@yourdomain.com, and your mailserver will accept and pass on every single one of these emails.

Here's my tip: look at your logs take the few hundred most popular guesses for dictionary attacks like this. Now feed those directly in to your spam filter's training input. Now odds are that the spammers will feed several shiny new spams trap for every one that gets through to an actual account.
Re:Why don't people use catch-all accounts? by bruthasj · 2004-06-22 18:59 · Score: 1

Since most SPAM harvesters just take it as-is, I have taken to adding that in with the username for emails i cannot setup a catchall address: username+thatsite@mydomain.com.

However, my question is how do you deal with mailing lists and cross-posting to members-only mailing lists? Maybe you should only use "list" for the 'thatsite'...
Re:Why don't people use catch-all accounts? by houghi · 2004-06-22 19:01 · Score: 1

All email is received by one account, but each message can have a different "to:" header

I do not use the to: to filter, I use the from: to filter.
Also I have turned off the catch-all account, especialy because suddenly I got spam to dave@example.com, john@example.com and a set of other names. Also bounces from these names that spammers used. It redused my spam from hundreds to tens a day. I am the sole and lonely user of my domain and I am sure I never, ever, used those names.

Luckily I am with a provider that allowed me to manage my own mailboxes. Naturaly I also add aliasses for different reasons, like jobaplications where it is less cool to have an adress like h0ax3r@example.com

(and people, please use example.com if you use a domainname as an example, just as RFC 2606 intended it)

--
Don't fight for your country, if your country does not fight for you.
Re:Why don't people use catch-all accounts? by Anonymous Coward · 2004-06-22 19:46 · Score: 1, Interesting

In Postfix you can set it up, so all mail to user+anything@mydomain.com is send to user@mydomain.com. This way you still limit the damage from the directory attacks somewhat (you can catch them with some smart greylisting anyway). and you can still track the emails you use elesewhere.
Re:Why don't people use catch-all accounts? by morten+poulsen · 2004-06-22 20:51 · Score: 1

Well, great for you. It is just not 100% foolproof. See http://www.schneier.com/crypto-gram-0305.html#6
Re:Why don't people use catch-all accounts? by nacturation · 2004-06-22 21:12 · Score: 1

Same here. Two incorrect email accounts from the same IP blackholes the IP address. Combine that with sbl-xbl.spamhaus.org, blocking all of China and Korea, various cable/adsl blocks, subject-based blocking (eg: for common virus email subjects), and blocking any IP address which dares to identify itself as being my domain and the spam gets cut down *very* dramatically.

To manage spam client-side, I use POPFile. All spam gets tagged with [spam] in the subject and is moved to the junk folder. It usually runs at 99.7%+ accuracy rate. A quick scan through the junk folder to see if it missed anything and I select all and delete. Very effective.

--
Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
Re:Why don't people use catch-all accounts? by sfe_software · 2004-06-22 23:05 · Score: 2, Insightful

Wait till the spammers decide to spam your whole domain.

That's exactly when I decided to disable the "catch-all" and allow only specific addresses. Some spammer sent several hundred identical messages, in a few hours, to made-up names at my domain.

Catch-all is no longer a good idea in my opinion...

--
NGWave - Fast Sound Editor for Windows
Re:Why don't people use catch-all accounts? by jeavis · 2004-06-23 01:28 · Score: 1

mattkinabrewmindspri wrote:
I only use my main email address with friends and family, and never post it online.
This no longer matters. Virus infections already do thorough searches of infected computers for email addresses to send themselves to. Some viruses (e.g. Sobig) appear tailor-made for spammers to abuse as SMTP relays. Given this cozy relationship between virus writers and spammers, it seems reasonable to me that those viruses are (or could be) harvesting the addresses they find for later sale to spammers. What better way to get a large number of deliverable addresses than from the victims' own computers?
In other words, you'd better hope none of your friends or family members, or anyone they forward mail to, ever gets infected with such a virus. If viruses aren't yet doing this, they will be soon.
Re:Why don't people use catch-all accounts? by Bert64 · 2004-06-23 01:37 · Score: 1

I do something similar, but i use a subdomain for each service signed up to or such.. i also use a different username, so mail to spamuser@domain.com wont work but mail to spamuser@spamsite.domain.com will.. The beauty of this approach is that i can remove the dns record for a subdomain, or point it somewhere invalid, such as the website who sold me out.

--
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
Re:Why don't people use catch-all accounts? by neosake · 2004-06-23 03:29 · Score: 1

I used to use catch-all addresses, untill I found out about self-destructing addresses.

The idea is that you make up an address whenever you need one, and then the address is created the first time it is used, and destroyed after the specified limit is reached.

--
"When a ball dreams, it dreams it's a frisbee"
Re:Why don't people use catch-all accounts? by eudas · 2004-06-23 04:48 · Score: 1

not to mention, sometimes people try to email you web pages or little hallmark cards or whatnot, and to do so they have to submit your email to some server somewhere, where it gets stored and re-sold to spammers anyway.

or people put it in their outlook contact list, when they get hit with some virus or trojan or whatnot then your email address can get put out that way too.

you can't hide your email address for very long... it's the same model as a "secure computer" ... disconnected from the 'net, turned off, encased in concrete.

eudas

--
Blessed is he who expects the worst, for he shall not be disappointed.
Re:Why don't people use catch-all accounts? by edesio · 2004-06-23 05:50 · Score: 1

Delegate it to someone else.

For years I used user+site@acm.org as my main e-mail address. When a spam came I just had to block the +site address.

Since <A HREF="http://www.acm.org/">ACM</A> is a big organization they have the resources to handle most of the e-mail for me. They block spam and virus and for free since I am an ACM member :-)

P.S.: <A HREF="http://www.ieee.org/">IEEE</A> does the same!

Another data point. by juuri · 2004-06-22 15:45 · Score: 4, Interesting

OSX's built in mail seems to be pretty close to the accuracy numbers listed in the above summary. I tend to have one to three pieces of spam slip through which are almost always entirely image based with some poetry or equivalent attached.

I must say I've been pleasantly surprised with the spam filtering it provides and it has been a lot easier than the hoops I used to utilize to clean out my inbox.

--
--- I do not moderate.

Re:Another data point. by Matts · 2004-06-22 19:02 · Score: 1

This is not a "data point". You've provided anecdotal evidence at best. You cannot compare this to an 8 month long academic study.

--

Matt. Want XML + Apache + Stylesheets? Get AxKit.
Re:Another data point. by juuri · 2004-06-23 14:37 · Score: 1

Yes yes, I didn't provide detailed stats, then again it is a COMMENT on slashdot.

--
--- I do not moderate.

Re:in related news by Anonymous Coward · 2004-06-22 15:45 · Score: 0

Content-based spam filtering is a waste of time.

Whatever. Your "never-ending battle of updating filters and formulas" works fine.

OPE by Anonymous Coward · 2004-06-22 15:47 · Score: 0

Anyone know that three letter prefix to get through the CRM-114?

DSPAM by More+Trouble · 2004-06-22 15:48 · Score: 4, Insightful

In real world deploys of statistical filters, something like DSPAM's "global user" feature is necessary. The ability to begin with a relatively mature dictionary is critical to the user experience. Personally, DSPAM is filtering around 200 SPAMs per day for me, allowing one through every few days. It's 99.985% effective for me.

:w

Re:DSPAM by Daniel+Quinlan · 2004-06-22 17:30 · Score: 3, Informative

Quoting the (unfinished) paper:
CRM-114 and DSPAM exhibit substantially inferior performance to the other filters, regardless of threshold setting. Both exhibit substantial learning through outthe email stream, leading us to conjecture that their performance might asymptotically approach that of the other filters. From a practical standpoint, this learning rate would be too slow for personal email filtering as it would take several years atthe observed rate to achieve the same misclassification rates as the other systems.

This is interesting considering the harsh words the DSPAM author directs towards SpamAssassin in the DSPAM FAQ. In contrast, I think, the SpamAssassin developers say they are interested in testing the "dobly" noise reduction technique that DSPAM employs, see SpamAssassin bug 3078.
Re:DSPAM by Anonymous Coward · 2004-06-22 20:33 · Score: 0

CRM-114 and DSPAM exhibit substantially inferior performance to the other filters, regardless of threshold setting. Both exhibit substantial learning through outthe email stream, leading us to conjecture that their performance might asymptotically approach that of the other filters. From a practical standpoint, this learning rate would be too slow for personal email filtering as it would take several years atthe observed rate to achieve the same misclassification rates as the other systems.
You left out a part:
Both these systems were designed to be used in a train on errror configuration, and do not self-train. This configuration could account for a slow learning rate [...]
I don't know about CRM-114, but that is a mis-configuration of DSPAM. The toe learning meathod is discouraged for the reasons the paper correctly identify. The teft mode, aka. "train everything", is the reccomended one:
--mode=[toe|tum|teft|notrain] Configures the training mode to be used for this process:
teft : Train-Everything. Trains on all messages processed. This is a very thorough training approach and should be considered the standard training approach for most users. TEFT may, how ever, prove too volatile on installations with extremely high per-user traffic, or prove not very scalable on systems with extremely large user- bases.

From the manual page.
Re:DSPAM by Daniel+Quinlan · 2004-06-23 04:30 · Score: 1

I don't know about DSPAM either, but the paper does note about CRM-114:
In an effort to ensure that we had not misinterpreted the installation instructions, we ran CRM-114 in a train-on-everything configuration and, as predicted by the author, the result was substantially worse.
and earlier:
As with CRM114, we trained DSPAM only on misclassifications, as suggested in the documentation.
and that does indeed seem to be the recommended DSPAM training method.
Re:DSPAM by Anonymous Coward · 2004-06-23 05:38 · Score: 0

TS:7093 TI:16423 SM:278 IM:5 SC:0 IC:0

Been training on misclassification for a couple months now.

Out of 7093 spam messages, 5 have been wrong (0.07 %false positive).

Out of 16423 legitimate messages, 278 have been wrong (1.69 %false negative).

This is much better than Spambayes or Spamassassin ever were! Try them all; use the ones which work the best.
Re:DSPAM by Anonymous Coward · 2004-06-23 08:56 · Score: 0

As with CRM114, we trained DSPAM only on misclassifications, as suggested in the documentation.
and that does indeed seem to be the recommended DSPAM training method.

No. DSPAM should train from everything. That is the so-called TEFT mode. That you should "forward the messages that are spam" means that should send the misclassified messages back to DSPAM so that it may look up the tokens associated with that particular message and correct them to be "spammy" instead of "innocent". (Or vice verca, but that's much more rare IME.)
So there's no doubt about it as I see it - the author hasn't used DSPAM in an optimal way.
I understand that you've got no first-hand experience with DSPAM and are only quoting the paper, though.
Re:DSPAM by More+Trouble · 2004-06-23 15:13 · Score: 2, Informative

Here's a response from the DSPAM author.

:w

No DSPAM by XMichael · 2004-06-22 15:50 · Score: 2, Interesting

It's unforchunately that DSPAM was left out of this very good quality report. I have personally used SpamAssassin, SpamProbe and DSPAM

After using each for a couple months at a time, I found DSPAM to be by far the most effective (after it was properly trained)

DSPAMS claim "DSPAM (as in De-Spam) is an extremely scalable, open-source statistical hybrid anti-spam filter. While most commercial solutions only provide a mere 95% accuracy (1 error in 20), a majority of DSPAM users frequently see between 99.95% (1 error in 2000) all the way up to 99.991% (2 errors in 22,786). DSPAM is currently effective as both a server-side agent for UNIX email servers and a developer's library for mail clients, other anti-spam tools, and similar projects requiring drop-in spam filtering. DSPAM has been implemented on many large and small scale systems with the largest systems being reported at about 125,000 mailboxes." was quite accurate for me

Also check out some priceless photos Priceless Photos

--
Gamblers Forum

Re:No DSPAM by Xochil · 2004-06-22 16:40 · Score: 1

Does DSPAM have the depth of user contributions (such as lots of great public rule sets) like SA does? SA does a good job...but it's add-ons like antidrug, weeds, backhair, etc. which make SA a great tool.

Does DSPAM have the following SA does?

--Mike
Re:No DSPAM by Anonymous Coward · 2004-06-22 17:52 · Score: 1

DSPAM doesn't need daily contributions by authors because it's not a heuristic spam filter. DSPAM learns the user's email behavior, and learns what is spam by itself. I has an amazing ability to adapt t new types of spams without the need for geeks to sit down and reprogram it every week. And yes, to answer your question many are using DSPAM.
Re:No DSPAM by Matts · 2004-06-22 19:13 · Score: 1

Umm, DSPAM is in the report. The significant bit you're probably interested in is:

CRM-114 and DSPAM exhibit substantially inferior performance to the other filters, regardless of the threshold setting.

--

Matt. Want XML + Apache + Stylesheets? Get AxKit.
Re:No DSPAM by Anonymous Coward · 2004-06-22 22:12 · Score: 0

Dspam is a real pain in the ass to configure. I used it for a while. It does indeed work well, but it's a bear to configure and it seems a new version is released on a weekly basis. It also suffers from code/feature bloat as each weekly release includes piggy features that might only benefit a tiny subset of users of the software at the cost of added complexity and maintenance by the admin.

I'll pass.
Re:No DSPAM by fyonn · 2004-06-22 22:35 · Score: 1

yes, I saw that,

I've just implemented dspam 3 into my email flow because SA was too high maintenance for me and it was letting too much through. I don't know if the docs have changed since the version that was tested there, btu certainly now it is recommended that it runs in train everything mode, so it learsn your ham and spam as it comes in, and you only make "train on error" moves when you send it an email it mis-classified.

I've only run about 300 or so emails through it so far but it's already doing a better job for me than SA.

dave

Problems with Bayesian filtering by dlevitan · 2004-06-22 15:54 · Score: 4, Informative

Up to this past weekend I was using only bogofilter (which is a pure bayesian filter). I seem to get about 200 spam a day on my main account. Until about a month or two ago bogofilter was amazing - I'd get maybe 1 or 2 spam a day, if that many. Then recently I suddenly started getting hit with 20 spam messages a day, and I noticed most of those were using lots of common words to bypass bogofilter. Most spam was still being removed by bogofilter, but enough to make me annoyed. This past weekend I also enabled spamassassin (without its bayes filter though), and its cut down the number of spam to maybe 5 a day, but its still too much for me. I'm hoping we have the next breakthrough in spam filtering technology soon (akin to bayesian filtering) because it seems that every new technique we use to filter the spam is eventually targeted by the spammers and bypassed.

Re:Problems with Bayesian filtering by swillden · 2004-06-22 17:55 · Score: 2, Informative

Then recently I suddenly started getting hit with 20 spam messages a day, and I noticed most of those were using lots of common words to bypass bogofilter.
This is very surprising to me, and it's not my experience at all (also using bogofilter). My bogofilter doesn't seem to be fooled one bit by those common words, at least not in a way that causes it to missclassify spam. That makes sense, actually, since most common words end up being viewed by the filter as neutral, and if the spammers want to sell their wares, they still have to put the spammy words in. So that big chunk of text from "Huckleberry Finn" at the beginning doesn't fool bogofilter at all.
Well, sort of. What I have noticed is that since lots of spam started putting chunks of non-spammy text in, Bogofilter has begun occasionally missclassifying ham. This also is logical. A word that happens never to have been used in any ham messages may show up in many fool-the-filter blocks in spam messages and therefore be perceived by the filter as a spammy word, with bad results when a ham message shows up that does use it.
One thing that I find very helpful is to use bogofilter's optional three-way classification, which allows you to set two different thresholds. Messages which score above the higher threshold are considered spam, messages which score below the lower threshold are considered ham and messages that fall in between are unknown. Using this system I find that I can pretty safely assume that everything in my Inbox is ham and everything in my Spam folder is spam. About 20 messages per day make it into the "Possible" box, about half spam. So, out of the 2000 e-mail messages that arrive daily (about half spam -- and no I don't read all of my ham), I have to examine 20 for spamminess.
Another issue I've run into, probably mostly because I set my "possible" range very wide, is the problem of "persistent possibles". When a message shows up in the possible box, I drop it into one of two folders "IdentifiedHam" and "IdentifiedSpam". A cron job grabs the messages out of these folders, retrains bogofilter appropriately and then puts them back into the mail queue for reprocessing. The persistent ones still fall into the possible range even after retraining, and it can be very difficult to get them to finally drop into the right category.
My solution is to automatically continue retraining on a message until it evaluates correctly, up to a point. After trying various limits I've found that a maximum of 20 training cycles gives pretty good results. Going much higher tends to cause overtraining problems, so the cron job will retrain at most 20 times on each message before giving up and just putting the message back into the queue. When it shows up in the possible folder again, I just delete it.
Speaking of overtraining, I've found that to be a more general problem. When I first started using bogofilter, the accuracy was terrible the first day, good after the first week, amazing after the first three weeks, but then started to decline after about three months. The problem was that it was overtrained, and was putting too much weight on some words. There's no perfect way to avoid this problem (and the retraining my scripts do tends to exacerbate it a little), but I've found that cleaning out database entries older than 30 days does a pretty good job of keeping the filter operating at peak performance. A daily cron job keeps my filter clean and fresh.

--
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Re:Problems with Bayesian filtering by Wakkow · 2004-06-22 19:33 · Score: 1

On my server, I have spamassassin enabled along with all the blacklist checks. The blacklists alone are horrible to use as an indicator, but when the sender is on a couple of them, it works wonders.

With the bayesian enabled with auto-learn, many of those spams that try to bypass the bayesian filtering get caught anyways by all the blacklists. It's then auto-learned so those tricks in the body are less effective the next time.

Of course this doesn't work for everyone. All those remote tests takes a lot of time, but it works great on my personal server.
Re:Problems with Bayesian filtering by Anonymous Coward · 2004-06-22 23:42 · Score: 0

My SpamAssassin degraded badly about 6 months back. At first I tried to lower the threshoold, but after inspecting the spams, I noticed I was getting hit with the "random words" variants and SpamAssassin actually gave them negative ratings.
Then I stopped deleting my spam folder and put in a cron job to teach the Bayes web every night with the contents of the spam folder.
Result: Out of about 200 spams per day, I only see one or two.
Re:Problems with Bayesian filtering by Anonymous Coward · 2004-06-23 01:44 · Score: 0

I'm hoping we have the next breakthrough in spam filtering technology soon (akin to bayesian filtering) because it seems that every new technique we use to filter the spam is eventually targeted by the spammers and bypassed.

Not all bayesian systems are the same... some blindly treat words in the subject as equally important as words in the body, or don't even look at the header lines. Unfortuantely, a lot of them don't tell you what they're doing.

SpamBayes, at least the MSOutlook plug-in, will give you a detailed report as to what it found and how it scored the message. So you'll see that it creates separate tokens for words in the subject line (e.g. instead of the token being "foo", it's "subject:foo") and also separate tokens for some other e-mail header items. That alone tends to make it more discriminating.

The next techniques for Bayesian are probably along those lines... words in the HTML portion of the body should get a different token the the same word in the plain text. Any HTML tags used should be scored. URLs should be scored. Domain names used in URLs should be scored. Maybe even name servers associated with domains should be scored.

3-way classification systems are also a lot better then binary "spam/ham" designations. Knowing that an e-mail fell into the "maybe" folder is very useful information.

Out of 200/day on my corporate account, SpamBayes missed 1-2 and leaves them in my inbox. And maybe once a week, a non-spam will end up in my "maybe" folder. (The "definite spam" folder is almost not worth checking because it's so picky.)
Re:Problems with Bayesian filtering by barik · 2004-06-23 01:44 · Score: 1

I recently noticed that SpamAssassin has been able to detect these common random dictionary words as well, so it's not that much of a problem anymore:
X-Spam-Report: 1.0 NO_PUNC BODY: Large groupings of dictionary words without punctuation. *
As usual, the game of spam and spam blocking is a constant game of cat and mouse. When spammers realize this, I'm sure they'll start using random words with intermittent punctuation.

--
Titus Barik
Re:Problems with Bayesian filtering by darrylo · 2004-06-23 04:47 · Score: 1

This past weekend I also enabled spamassassin (without its bayes filter though), and its cut down the number of spam to maybe 5 a day, but its still too much for me.

You need to tweak your (user account's) spamassassin user_prefs, and tailor it for the kind of email you receive. (You should also enable the Bayes filter to help with spam detection.) I've done that, and I'm down to maybe 5 spams a week getting through (mostly on the weekend, strangely enough). I do, however, have to spend 5-10 minutes once or twice a month tweaking user_prefs. It's a small price to pay for a virtually spam-free inbox, though.
Re:Problems with Bayesian filtering by Emrys · 2004-06-23 08:59 · Score: 1

They already are. I wondered why they thought punctuation would help, since most Bayesian filters tokenize punctuation out. I guess this is why.
Re:Problems with Bayesian filtering by Emrys · 2004-06-23 09:04 · Score: 1

Our corporate Bogofilter installation blocks 1,100 spams per hour and has been doing so consistently for over a year. Very little spam gets through. We've had two real false positives since we installed, and one of those was today.
Re:Problems with Bayesian filtering by WhiteDragon · 2004-06-24 01:24 · Score: 1

I have been having similar problems with bogofilter. I had been using auto-training, but it seems to have really fallen off. Your scripts seem to be quite nice, would you mind posting them?

--
Did you mount a military-grade, variable-focus MASER on an unlicensed artificial intelligence?

Re:in related news by Anonymous Coward · 2004-06-22 15:55 · Score: 0

Not everyone is as much of an RBL cheerleader as you are.

Holy Shit.... by Dunarie · 2004-06-22 15:56 · Score: 1

Only 2 messages out of 150 normally get through that are spam? Good god, I normally get 5-10 spam messages a day that get through SpamAssassin. That's 750-1,500 spam e-mails a day! I thought it was bad before I enabled spamassasin a few months ago... but Jesus, man am I glad I got SA!

Re:Holy Shit.... by fdiskne1 · 2004-06-22 16:29 · Score: 2, Interesting

It's getting just plain rediculous. When I started keeping track about a year ago, the email filtering system I set up was blocking about 10,000 spams per week for just under 1500 users. Last week, it blocked over 170,000. That is an average of over 100 spams per user and the vast majority of my users don't get any at all. There are a couple dozen that get the vast majority of it. Of course, these are addresses that would be a major pain in the ass to change because of all the people that would have to be notified, and only if I could convince the user they want to. Of course, with this many users, I can't get a good grasp on the number of spams that make it through, but I do know it's enough to have several people continually complaining about it. It's just plain sickening all the resources and bandwidth that gets wasted. I use three different black-hole lists, so about 110,000 of those don't get any further than initial helos, but still. Disgusting. Bring on the protocol change. I've told everyone that I would be willing to work 24 hours a day for an entire weekend to implement a server and/or gateway that uses a new email protocol if it meant most spam would disappear.

--
But why is the rum gone?

the true cause of the majority of spam... by Etaipo · 2004-06-22 15:58 · Score: 3, Interesting

users. those silly, silly users. i was in charge of spam for my company for the greater part of a year. using an outdated KEYWORD based system > I was forced to read every.caught.message to look for false positives. ... did you catch that? yeah...i had to go through EVERY 'spam' tagged e-mail that went through the company. needless to say, after the first week i was ready to gouge my eyes out. but hey, at least i earned that 'i read your e-mail' sticker! anyways, the point that i'm failing to make here is the cause of the spam... the damn users. whether it be responding to spam, putting their e-mail address in every single webform they encounter while surfing instead of working, signing up for spam voluntarily, or whatever the cause may be.. i ran some numbers on the logs, and came to an astounding find. a few people were getting literally a thousand messages blocked, per month. i, on the other hand, had maybe one or two a month. and i'm not a nazi with my e-mail address....but i do take some care in what places i type it in. an ounce of prevention goes a long way folks.

Re:the true cause of the majority of spam... by stevesliva · 2004-06-22 16:14 · Score: 1

Sure man, blame the victim. She was asking for it.
All sarcasm aside, I DO ask for it with my hotmail account (see above) and that just makes me so glad that I keep my other addresses quiet!

--
Who do you get to be an expert to tell you something's not obvious? The least insightful person you can find? -J Roberts
Re:the true cause of the majority of spam... by Anonymous Coward · 2004-06-22 17:13 · Score: 0

Such a shame. You really COULD have been replaced by a small perl script. Anyway, be glad you work for such a large company with so much money. Most places would have canned your sifting-through-email ass like solid white albacore...and spent one day of your paycheck on a decent spam filter.
Re:the true cause of the majority of spam... by Apreche · 2004-06-22 17:47 · Score: 1

oh yeah, tell me about it. I barely post my e-mail anywhere dangerous or open and I get relatively few spam. Until recently I got 0 spam, but now I get 1 every other day or so. Thunderbird cleans those out. Everytime I tell people to be more careful with their address they tell me they can't do that. Oh well, your loss.

--
The GeekNights podcast is going strong. Listen!
Re:the true cause of the majority of spam... by Etaipo · 2004-06-22 18:50 · Score: 1

it might not matter much, but you do what you're told. /shrug my incompetent manager was fired, someone who is actually qualified for the job was hired, who then implemented spamassasin, along with a zillion other improvements.. i get to be useful now. yay.
Re:the true cause of the majority of spam... by norton_I · 2004-06-22 20:25 · Score: 1

Sorry about your job sucking, but I can't stand having to worry about giving out my email address. I *want* people to be able to email me. I don't give my address to people that I think have no use for it, nor do I reply to spam, but I refuse to post obfuscated versions of my email address, which I believe is rude to people that I actually want to contact me.
Re:the true cause of the majority of spam... by MikeBabcock · 2004-06-23 04:35 · Score: 1

Ditto; I give out slightly customized versions of my address on websites and whatnot so I can track where my spam is coming from. Other than that, I want my potential clients to be able to reach me easily.

--
- Michael T. Babcock (Yes, I blog)
Re:the true cause of the majority of spam... by Bleck · 2004-06-23 11:42 · Score: 1

I wish it were that simple.

For myself, for example, I work with a web comic called Sluggy Freelance. It's a popular site, and we need people to contact us. That means that, even if it's not on the front page, we have contact information such as "tom at sluggy dot com" spread around the site.

Now, let's say I'm clever and obfuscate my address in various ways. That still doesn't help, because of the 1,000 people who decide to contact us this week, 800 of them end up with a copy of my address in Outlook's address book. So then, when *they* get hit with a virus, off go the spoofed e-mail notes, and here come the notes telling me that "The note you sent contained a virus!"

Not that I'm saying users don't cause much of the trouble themselves ... but many of us who need to be contacted just don't have the option of not having a visible address.

--Tom

SpamAssassin used to work but recently... by squisher · 2004-06-22 15:58 · Score: 3, Interesting

SpamAssassin used to be super-good for me, but recently it has become a nightmare... even with Bayes filters on and training it with about almost 2000 spam messages that have escaped it before, I STILL get an enourmous amount of spam every day... maybe I'm doing something wrong with the config, I admit that I haven't spent that much time on that, but it seems like it should be working better :-((.

Spam sucks. Everyone stop buying the products advertised and it'll be over. But then again, people will always be too dumb for an easy solution like that (reminds me of the gooback southpark...)

Re:SpamAssassin used to work but recently... by sploxx · 2004-06-22 23:22 · Score: 1

I also use SpamAssassin and I think it got a bit overtrained on commercial advertisements and scams, i.e. viagra, nigeria, xanax & Co.

Has anyone besides me noticed that shortly before the european elections took place, very nasty POLITICAL spam got through?
No, I don't mean "vote for me and not for him"-type of spam. I mean very right-wing, I'd say nazi spam.

The only spam I'm getting at the moment is this nazi spam (It seems not to be closely related to the elections because it is still send) and I have a hard time to convice SpamAssassin that it is indeed spam...

Re:in related news by Anonymous Coward · 2004-06-22 16:00 · Score: 0

In this message you claim that no content-based filter "comes close" to the 95% accuracy of your RBLs, but some of the content-based filters in this story do better than that (which is consistent with my own personal accuracy rate from SpamBayes, with e.g. a spam misclassification rate of maybe ~2%).

Issues with testing corpus by w_mute · 2004-06-22 16:00 · Score: 5, Interesting

I haven't read everything in detail yet, but one of the things that stands out is that their 'gold standard' representing the best result consists of 9,038 ham messages (18.4%) 40,048 spams (81.6%). While large, the dataset is unbalanced. One of the things that is recommended by many of the filters is training on equal proportions of ham/spam in order to prevent biasing (overfitting).

Their train on errors approach may simulate what goes on with some filters it doesn't reflect the scenario where there is a initial dataset to be trained on _before_ new messages are processed. Instead, each message is in essence 'new'. So in their tests the machine learning filters start out knowing nothing, but SpamAssassin starts out with its inbuilt ruleset. Not exactly fair.

-Greg

Re:Issues with testing corpus by PlusFiveTroll · 2004-06-22 16:40 · Score: 2, Insightful

Not exactly fair.

Huh, since when did spammers start playing fair!. This is about winning, not software political correctness.

Also on the unbalanced dataset, I train my filter with spam corpuses that reflect my what I receive in my email. Many accounts receive 10 spams for every ham. The biggest thing that I've had to retrain on is receipts for airplane tickets, spamassassin seems to think they are spam the first time I receive them, and from the article, they had the same issues too.
Re:Issues with testing corpus by w_mute · 2004-06-22 17:36 · Score: 1

Huh, since when did spammers start playing fair!. This is about winning, not software political correctness.

The point about fairness is that for a meaningful in a performance comparison, one must have fairness.

-Greg
Re:Issues with testing corpus by dubl-u · 2004-06-22 17:49 · Score: 2, Interesting

So in their tests the machine learning filters start out knowing nothing, but SpamAssassin starts out with its inbuilt ruleset. Not exactly fair.

Perhaps for some definitions of "fair". That strikes me as a reasonable scenario for real-world use, which seems pretty fair to me.
Re:Issues with testing corpus by Anonymous Coward · 2004-06-22 17:51 · Score: 0

worse, the version of spamassassin contains rules which have been added since the spam corpus was built which means it will catch messages now which it wouldn't have at the time.

this is a flawed comparison i believe.
Re:Issues with testing corpus by fferreres · 2004-06-22 18:12 · Score: 1

CRM114 works well (extremely well really) when only trained on errors. Exacty NOT fair. SpamAssasin is very good, but i had it inmediately replaced with CRM114 after actually trying and training it for a week (not after reading how good the CRM114 thinks his filter is, or after reading report from Guy X).

Not training only on errors lowers accuracy significantly for CRM114. I tryed pretraining with CRM and was a mistake.

In any experiment, you first fit the model against the data, then forecast (or test). CRM data must be collected in a teach on errors only, the author could have writen a simple script to automate a train on errors if he wanted to get real life results not half assed numbers.

Now I only whish someone would do a plugin for Squirell mail that could strip the Subject prefix for spam, the CRM114 tag, and could add the comand to learn ham/spam.

--
unfinished: (adj.)
Re:Issues with testing corpus by fferreres · 2004-06-22 20:12 · Score: 1

You MUST train CRM114 differently, it learns a different way, but getting to know what it did wrong. You can tune the learning for automatic train-on-error if you know beforehand what is spam and what is not (you have to write your own script for this). Else, you must train it live.

If you train it any other way, it doesnt work as well.

After a week of correct training, you can start comparisons. I dont care about the first two or two, but afterwards, I expect few errores, and that is what up until now CRM does for me.

CRM is also somewhat hard to set up, I had to do strange things. But it sorts spam very nicelly, rarely makes mistakes, except it seems simetrical (probability of getting a false positive = probability of false negative). I don't like false positives...but I preffer that to lots of spam, I don't have time anymore for regular checks of the spam warehouse.

--
unfinished: (adj.)
Re:Issues with testing corpus by miley · 2004-06-23 03:21 · Score: 1

Not fair? The percentages used mirror real life -- 80% of mail is spam if not more. For a spam filter to say that it can't do well unless you run it in an academic environment -- one that does not mirror the real world -- strikes me as a lot more unfair than this unbalanced set.

Re:Not that good by Anonymous Coward · 2004-06-22 16:01 · Score: 0

I have tried a number of Baysian type filters and none of them filter the spam when I send it...

why I don't use spam filters by Begemot · 2004-06-22 16:08 · Score: 2, Interesting

just my humble opinion...

i use email for business and receive many letters from clients. i just afraid to loose any of these because of a spam filter. therefore even when i used one, i checked all the emails anyway.

Re:why I don't use spam filters by Anonymous Coward · 2004-06-22 16:26 · Score: 0

The spam filters don't (have) to delete the emails, they can just put them in a Junk/Spam folder.. that way you know the most likely suspects.
Re:why I don't use spam filters by Anonymous Coward · 2004-06-22 17:07 · Score: 0

Your business must be pretty poor if you can't even spell lose correctly. I know I break relations with any business whose members cannot spell common words.
Re:why I don't use spam filters by Anonymous Coward · 2004-06-22 17:11 · Score: 0

I know I break relations with any business whose members cannot spell common words.

on informal web message boards? this ain't exactly letterhead, chief.
Re:why I don't use spam filters by Anonymous Coward · 2004-06-22 17:17 · Score: 0

I had the same attitude, until the owner told me that losing mail was better than spam. I fired away with the RBL's and procmail moving spam into a folder the pop users don't check.

Whatever. Different philosophies, I guess. For my side business, and my hosting clients, I do not delete any mail, no matter how egregious it is, spam-wise. I do tag it with SpamAssassin, however.
Re:why I don't use spam filters by Anonymous Coward · 2004-06-22 18:02 · Score: 0

Then you are not receiving a significant amount of spam yet! Once you are afraid of losing legitimate messages amongst all the crap, then try spamfilters again.

I get about 4000 spams a day and only about 20 good messages. It is impossible for me to find my good mail in all the shit manually. A good spam filter is essential.
Re:why I don't use spam filters by ananke · 2004-06-22 18:31 · Score: 1

How does ability to spell differ from an 'informal web message board' and 'letterhead'? If a person can't spell, they can't spell.

--
--- d'oh
Re:why I don't use spam filters by imroy · 2004-06-23 03:25 · Score: 1

I can imagine that many spams have similar language to business email (e.g the standard Nigerian spam). I had a similar problem with my mother recieving financial emails (stocks and stuff). Spamassassin initially classed them as spam so for a while I had to drag the emails out of the Spam folder and specifically re-train Spamassassin's bayesian filter to understand the fine line between legitimate business/financial emails and spam. Now we're getting hit by spam emails with nothing but an image (or several) taken from an external server and a block of benign text. An extra honeypot account and custom perl code has helped (e.g re-train automatically when messages are dragged into/out of the spam folder). But these spams look like they'll be a challenge for bayesian filters. Hopefully the new heuristics coming in Spamassassin 3.0 will help.

SpamAssassin is a dud by Animats · 2004-06-22 16:10 · Score: 1

My hosting service, EZ Publishing, uses SpamAssassin. Their hosting service is fine, but incoming mail filtering is terrible. SpamAssassin is only filtering out about 25% of the incoming spam. I'm getting about 2000 spams per day after SpamAssassin filtering.

I use Netscape's Bayesian filter as a second tier, and that removes about 60% of the remaining spam.

SpamCop was better, until IronPort bought them and they went black-hat, with Bonded Spammer and the Spam Engine.

Re:SpamAssassin is a dud by sloanster · 2004-06-22 16:21 · Score: 1

No offense, but that's a pretty ignorant statement, unless you know that "spam assassin" is indeed running, and what version, with what added rule packs, and what the scoring threshold is set at.

There's a wide range of things that could be called "spam assassin", but without competent administrators who keep the program and the rulesets up to date, the effectiveness can degrade significantly, especially in a vanilla install of an older version, that's never been trained.
Re:SpamAssassin is a dud by Animats · 2004-06-22 16:51 · Score: 1
The most recent e-mail SpamAssassin botched has this header:
- X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp) on main6.ezpublishing.com X-Spam-Status: No, hits=1.1 required=4.0 tests=HTML_40_50,HTML_MESSAGE, HTML_TAG_EXISTS_TBODY,MIME_HTML_ONLY,NO_REAL_NAME, RCVD_IN_SORBS autolearn=no version=2.60
The mail content is "Major income on eBay", sent via a free account on Netster. If it can't recognize that as spam, it's not doing much.
I turned the threshold down from 5 to 4; at 5, it was useless. I tried "autolearn" at one time, but the "learning" algorithm choked on today's volumes of spam and used up 100MB on the server.
Yeah, maybe a later version is better. But if the product sucks, maybe it just sucks in all versions.
Re:SpamAssassin is a dud by BenjyD · 2004-06-22 21:22 · Score: 1

I use spamassassin on my mail server (procmail pipe |/usr/bin/spamassasin -a) and I see around one spam a week in my inbox, the other 400 go straight to mail/Spam. I haven't had a false positive in the last three months.
The trick is to store up spams that get through to your inbox and then occasionally run sa-learn --spam on them.
Re:SpamAssassin is a dud by imroy · 2004-06-23 03:36 · Score: 1

You have to provide Spamassassin with good samples to train its bayesian filter properly. Autolearning only kicks in when the message gets a suitably high score (along with other criteria) that it can be sure the message really is spam. If you're not training it and autolearning isn't kicking in, then it's no surprise that Spamassassin is not performing well.

Collect up all your false-positives and false-negatives and run these commands:

$ spamassassin -r -L < realspam
$ spamassassin -k -L < notspam

The -L option stops Spamassassin sending off reports to Razor. Razor doesn't seem to like it when you hammer it suddenly with a load of reports. I've had my Razor account revoked several times doing that :P

Why am I so Blessed? by auburnate · 2004-06-22 16:14 · Score: 1

How come I have an @hotmail.com email for 4+ years (pre-MSN) and I only get 15 junk mails a week?

Now I have gmail.

Re:Why am I so Blessed? by Anonymous Coward · 2004-06-22 16:17 · Score: 0

What's your address? I'll look it up and see if it's on my do-not-send list.
Re:Why am I so Blessed? by lewko · 2004-06-22 17:05 · Score: 3, Funny

How come I have an @hotmail.com email for 4+ years (pre-MSN) and I only get 15 junk mails a week?
Because the 15 junk mails put you over quota?

--
Do you or your partner snore? - Visit www.snoring.com.au
Re:Why am I so Blessed? by dasmegabyte · 2004-06-22 17:18 · Score: 3, Insightful

Because you don't put it into wierd text boxes, you don't use newsgroups, you don't have any enemies, you don't have any domains, and you don't have it in plaintext on your website.

I do all 4. I get my share of spam. It's not a HUGE deal, but it made it worth my while to get a spam filter.

--
Hey freaks: now you're ju
Re:Why am I so Blessed? by dtfinch · 2004-06-22 17:38 · Score: 1

I won a stuffed monkey at treeloot.com. Now I only get a few spams a week, though partly because I abandoned the address I gave them after the spam load exceeded 100 a day. I was young, foolish. "A little's not going to hurt" I told myself.

Now using Thunderbird to catch the rare spams to my new addresses, which I think come from accidently using my real address on a list I've been posting to since '97. Giving throwaway email addresses to every site that wants one. Blocking remote images in email to stop address verifiers. And using email obfuscation scripts in my pages containing email addresses.
Re:Why am I so Blessed? by Anonymous Coward · 2004-06-22 22:48 · Score: 0

And you do not communicate with people using windows (either directly or by mailing lists). Or else you would be drowned by worm mails.
Re:Why am I so Blessed? by freezin+fat+guy · 2004-06-23 04:03 · Score: 1

The largest handlers like Yahoo!, Hotmail and AOL simply have more resources to combat spam. Because they handle millions of accounts they can identify mailing patterns in real time. It also gives them better means to hunt down spam sources.

Re:in related news by djmurdoch · 2004-06-22 16:15 · Score: 1

RBLs only work against honest admins, getting them to clean up the holes in their security. Spammers aren't honest, and as you say, will just use worms to invade machines to create proxies.

RBLs have been around for years, but the amount of spam Spamassassin catches on its way in to me is ever-increasing. If RBLs worked, the spam problem would have been solved years ago.

On the other hand, the amount of spam getting past Spamassassin to me is pretty steady. I guess that indicates it's getting better. Mostly what gets past is what the article calls "backscatter": delivery failure messages caused by spammers forging my email address.

Should systems that send backscatter be blacklisted? I'd tend to say yes: they should only send failure notices to senders who pass some sort of verification like SPF. Putting them in an RBL really would encourage them to do that.

No, REAL MEN... by Dimensio · 2004-06-22 16:16 · Score: 2, Insightful

...hammer the spammer's ISP with complaints until the advertised website is DEAD, DEAD, DEAD.

--
STOP MISUSING APOSTROPHES, YOU MORONS!!!

Let me help you by Anonymous Coward · 2004-06-22 16:19 · Score: 0

The shift key is next to the Z on the left of the keyboard, and next to the / on the right.

It's often used on the first letter after a full stop - '.' character.

I'm running SpamAssassin at work. by khasim · 2004-06-22 16:21 · Score: 4, Insightful

People LOVE it.

There are some false positives and some false negatives.

But I have it set to delete anything 12+. That gets rid of the worst of the worst spam. So far, not a single complaint of any email being deleted.

Everything else has the subject re-written so people can run their own rule set against it.

In the past 8 hours
1867 messages received
375 messages deleted
1266 messages flagged as spam

So, only a few hundred actual, good emails.

Of course, that's only 4 hours during the regular work day (and 4 hours after work). But you can see the proportions. It saves people a TON of time.

And it makes them happier when they don't have to constantly dig through crap to see if any real messages have arrived.

Now, those spam messages are NOT distributed evenly. Our HR manager had her email address posted on the website. So she gets about 20-25% of the spam.

It's not exactly Big Brother 'cause no human sees the deleted spam.

Re:I'm running SpamAssassin at work. by YetAnotherDave · 2004-06-22 17:22 · Score: 1

I have a similar spamassassin setup on the server for my family's email - 5.5 and up gets redirected to a spam box (and I sort thru it - we're family, so BB issues are less, besides which I haven't had a false positive in months) and 10 or greater gets tossed.

The two thresholds have been creeping down as the bayes system gets more trained. I started with 7 or greater getting redirected, and 15 or greater getting tossed...

If only I could convince work to use this great, free system. They're using a really expensive commercial product, that simply sucks. I just bought a house, and any email from my real estate agent got nuked silently by the filter (mortgage references => spam). This messed communications up hugely, until I figured out to use my home email for everything...
Re:I'm running SpamAssassin at work. by Anonymous Coward · 2004-06-22 19:27 · Score: 0

Some false positives? That sounds bad. My tolerance for false positives is zero, I'll prefer a few false negatives every now and then. Which is why I never allow any spam filter to automatically delete spam - I want to check them myself (but it's a lot less trouble when they're in a junk folder).

My experience with a well-trained spamoracle, as well as with recent versions of Apple Mail (older versions gave false positives) has showed that it's possible to eliminate false positives altogether, at least for the kind of legitimate mail I receive.

Another thing I would never trust is a filter run by the MTA. All filters that work well require continuous training, and the corrections for the training can only be done by the legitimate recipient.
Re:I'm running SpamAssassin at work. by Robmonster · 2004-06-22 20:57 · Score: 2, Insightful

So far, not a single complaint of any email being deleted

How do they know they are missing any emails to complain about it?

I had a recent argument with my email provider. They introduced blacklist filtering to eliminate the worst of their spam. In the process it also blacklisted some legitimate email. (The mails in question were Topic Reply notifications from a message board)

I dont have a problem with filtering, as long as there is a way to review undelivered mails

In my case I only realsied something was wrong when the mails I regularly recieved stopped being delivered. I went right up the admin ladder of the message board as I assumed the problem was at their end (after all, my mail provider was supposed to tell me about any changes they make to my mail settings)

My mail provider eventually found the problem and amended the blacklist settings and all was fine. However, without them providing me with a method of finding out if any of my mail is being blocked I have no way of knowing if I am missing any further legitimate mails. Even something as simple as a notification that they blocked a mail, with the senders email address included would be enough.

Spam filtering either needs to be done Client Side, as who better to judge which of my email is spam than me, or Server Side with a mechanism to view and check undelivered emails. Programs like K9 (http://keir.net/k9.html) work very well and are easily trainable. Mine runs at 99.5 % accuracy.

If servers HAVE to delete mail that is intended for me then it should be at the strictest possible setting.

--
I have no sig yet I must scream.
Re:I'm running SpamAssassin at work. by Anonymous Coward · 2004-06-23 00:07 · Score: 0

Are you really suggesting they send you a mail saying that they have blocked a mail?

Sounds like the popup stoppers that display a message saying they stopped a popup!

I think what you meant (and correct me if I am wrong) is a list showing all blocked mails over the last (24hours), with a way to retrieve those you are unsure about.
Re:I'm running SpamAssassin at work. by sTeF · 2004-06-23 01:02 · Score: 2, Insightful

I'm also running spamassassin, but i am absolutely not satisfied with the performance of it. how long does it take for your SA to scan one message? My mailserver is only a Athlon 600, but still this does not justify a few seconds hit per message.

other than the performance, i'm really happy with SA.
Re:I'm running SpamAssassin at work. by Anonymous Coward · 2004-06-23 04:39 · Score: 0

Do you realize what it takes to get a score of 12 on spamassassin? If someone sends you a legit email that scores that high, someone should inform them to stop sending spammy emails.

I've never seen a legit email score higher than about 7, and I'm the one that gets to review the spam folder for my company.
Re:I'm running SpamAssassin at work. by Cheile · 2004-06-23 05:47 · Score: 1

We were having similar issues at work and it turned out to be a rbl check that was checking a dead server.

Try using timelog_path in your local.cf

I.E. timelog_path /var/log/spamassassin

This will send logfiles to that directory for each message that is processed. You can then see how long each part of the processing took.
Re:I'm running SpamAssassin at work. by Robmonster · 2004-06-23 07:06 · Score: 1

Yes, that was my intent.

There should definately be a mechanism in place to get details on the mails there were blocked. Subject and Sender. Whether thats via a Daily Digest, or a web based application doesnt matter. If someone is going to delete mail on my behalf I should at least be able to say "Ooooh, I wanted that one. Dont delete it in future"

I would post more but my popup stopper keeps throwing up popups ;)

--
I have no sig yet I must scream.
Re:I'm running SpamAssassin at work. by sTeF · 2004-06-23 22:37 · Score: 1

ty, i will try this.

Re:in related news by alexborges · 2004-06-22 16:22 · Score: 1

RBLs WORK. This is why spammers are forced to use worms to invade users' machines to create proxies. As soon as the authorities wake from their slumber and start prosecuting these scumbags who break into others' machines, the whole spam thing will essentially be over. But don't tell that to the little content-based-filtering-fools. They obviously have money to burn.

In case you havent heard, most of us with real jobs that require spam control cant wait for 'authorities to wake up' and cannot be expected to take advice from people that do, whatever the fuck it is your do, which is OBVIOUSLY not related at all with protecting people and/or resources from the abuse of spammers.

--
NO SIG

Active Spam Killer by Admiral+Llama · 2004-06-22 16:23 · Score: 1

No false positives, disgusting amounts of spams killed. 'Tis a glorious thing.

Re:Active Spam Killer by nexus987 · 2004-06-23 05:02 · Score: 1

Agreed. It's really a great tool, when it's properly configured.

Re:in related news by Crudely_Indecent · 2004-06-22 16:24 · Score: 4, Interesting

I can certainly see how waiting on our government will decrease the number of messages transmitted through my mail servers daily.

It's reassuring to know that the "authorities" have effectively reduced the number of messages through my server by 10-14k per day......What great guys, those 'authorities', aren't they thoughtful and quick to respond. We've only been waiting for a spam-relief law for....10 years and they finally gave one to us. Oh wait....SpamAssassin is what reduced those messages.

The reason we don't wait for the gov to step in and take care of business is that THEY'VE DONE NOTHING SO FAR. You expect me to believe the government will solve my spam problems? I'm not holding my breath.

A combination of RBLs, DNSBLs, F-Prot, and SpamAssassin is what reduced the number of messages sent through my servers. I'm interested in results NOW, not legislation tomorrow.

--

"Lame" - Galaxar

I've been using SpamAssassin about 6 months by cool_st_elizabeth · 2004-06-22 16:29 · Score: 2, Interesting

And it has just now learned to filter out almost all the spam. IIRC, SpamAssassin said it would learn what to mark as spam after a couple hundred obvious spams and the same number of obvious non-spams. I still get the occasional false positive.

REAL REAL way to block spam by Mad+Bad+Rabbit · 2004-06-22 16:31 · Score: 1

[Ripley] "I say we take off and nuke the entire planet
from orbit. That's the only way to be sure."

[Hudson] "F--kin' A..."

[Burke] "Ho-ho-hold on a second! The Earth has a
very substantial dollar value attached to it!"

[Ripley] "They can BILL me."

--
>;k

Don't you dare say... by MalikChen · 2004-06-22 16:33 · Score: 1

The first person who says gmail is getting shot. By me.

Earthlink stats are different by Anonymous Coward · 2004-06-22 16:34 · Score: 0

This article from the beeb puts human accuracy over machine accuracy...

Yahoo! Spam Protection by letoworm · 2004-06-22 16:38 · Score: 1

Yahoo! allows you to have suspected spam automatically deleted or moved to a spam folder. It also allows you to disable the spam filter completely. (Mail Options -> Spam Protection)

As for SpamAssassin, I've been using it for about a week on my mail server. There have been about 500 filtered spams and one false positive - an AOL greeting card.

Re:Yahoo! Spam Protection by Jedi+Alec · 2004-06-22 17:25 · Score: 1

There have been about 500 filtered spams and one false positive - an AOL greeting card.
[Insert obligatory AOL pun here]

--

People replying to my sig annoy me. That's why I change it all the time.
Re:Yahoo! Spam Protection by Malc · 2004-06-22 17:43 · Score: 1

I don't use their web interface unless I'm on the road for more than a few days at a time. I have them forward everything and base my filtering on the X-YahooFilteredBulk header field. I have to have some filtering... but every couple of months I will find a rash of false-postives from friends and family.

Spamgourmet (antichef) and SpamSieve by dougman · 2004-06-22 16:38 · Score: 4, Informative

Why people don't use disposable accounts is beyond me. Once you start using Spamgourmet you'll never go back. I've been active with them over two years and here's my current stats:

Your message stats: 339 forwarded, 43,796 eaten. You have 155 disposable address(es).

yeah, that's right, thanks to disposable addresses I *haven't* read 43,457 spam emails! When I do need (want) to use my real address, I use SpamSieve (with Entourage X) - very good baysean filter (not sure if it Mac only or not).

Re:Spamgourmet (antichef) and SpamSieve by Anonymous Coward · 2004-06-23 00:10 · Score: 0

Disposable addresses are a bit of self-advertizing bullshit. First, most of the spam you're not reading wouldn't have been sent in the first place if you hadn't used hundreds of addresses, each of those getting a copy of the spam.

Then, disposable addresses actually and in the long run become a burden themselves. Email addresses are there to make it possible for other people to reach you via email. When you finally have hundreds of addresses floating around in the net, most of them pointing directly into the bitbucket... well, this doesn't make it exactly simpler to reach you.

Mind you, disposable addresses are not a new idea. Lots of people having control over their mail server and domain have tried this years ago. Most of them learned that this was a bad idea and they learned it the hard way. Spamgourmet now is selling this bad idea to the unwashed masses and I fear they will have to learn on their own the same things.

For what it's worth, I'm using the same, one and only true address everywhere since more than 8 years now. Each day 1 to 3 spams are getting past my filters. One false positive a year (always some newsletter). I just can't imagine what sort of trouble I would've had if I had used hundreds of disposable addresses instead to keep track of. Especially since I even forget my /. password regularly...
Re:Spamgourmet (antichef) and SpamSieve by e4ward · 2004-06-23 03:20 · Score: 1

Why people don't use disposable accounts is beyond me.
Me too. Perhaps filters are just technically more interesting. But the best place to stop spam is at the outer walls (the SMTP handshake - 550 user unknown). With a disposable account, if your email address is not in circulation (and is not easily guessable) it simply won't get any traffic of any kind except through your aliases, which can be disposed of if they become compromised. I believe keeping one's mailbox address secret is the only fullproof defense against receiving unsolicited email.
E4ward.com has very flexible aliases allowing any localpart name you want plus you can use your own domain name(s), and the aliases are not restricted by expiration or usage count.

--
http://www.e4ward.com
Re:Spamgourmet (antichef) and SpamSieve by Anonymous Coward · 2004-06-23 03:45 · Score: 0

issue joined! :) spamgourmet is in its fourth year of operation -- nobody's complaining...

Anyway, the goal of using disposable addresses is *not* to make it exactly simpler to reach you, right? And remember that the reason those people learned it was a "bad idea" was because they had to go enable or expire addresses all the time. You don't have to do this with spamgourmet - it's transparent -- and it's free as in beer, btw, so what's with the advertizing stuff?
Re:Spamgourmet (antichef) and SpamSieve by edesio · 2004-06-23 05:55 · Score: 1

Some sites, even online newspapers, refuses address from spamgourmet.com :-(

Re:in related news by PlusFiveTroll · 2004-06-22 16:54 · Score: 1

Putting them in an RBL really would encourage them to do that.

Err, You can try that, but I would not recommned it. I think you would quickly find there are not many server out there that you could talk to.

Maybe once X% of the internet adopts some sort of sender verification, an RBL may stand a chance.

Still, spammers would just send backscatter to you through hosts that are permitted to send for a domain, ever see how many 0wn3d windows boxes there are out there.

Where there is a hole, spammers will find it. Too bad there isnt a spammer death sentance.

Bayes SHOULD be better than vanilla SpamAssassin by khasim · 2004-06-22 16:55 · Score: 2, Interesting

For an INDIVIDUAL, Bayesian filter works far better than just the regular SpamAssassin rulesets.

That's because the Bayesian system will LEARN from you what you consider to be spam and ham.

I use SpamAssassin with Bayesian filtering turned on and it catches over 90% of the spam. But then I've fed it a decent sized corpus.

SpamBayes + Thunderbird by Anthracks · 2004-06-22 17:03 · Score: 2, Informative

Thunderbird already has integrated significant improvements based on SpamBayes, I believe. See http://bugzilla.mozilla.org/show_bug.cgi?id=230093 , which was closed about a month ago. The test data from that patch is encouraging, although obviously results will be different for everyone since not everyone gets the same type of spam. If you want to keep tabs on upcoming refinements to junk mail filtering, take a look at the dependencies of this meta bug: http://bugzilla.mozilla.org/show_bug.cgi?id=228674 . Please don't "spam" up that bug with comments though, if you have something to say put it in a specific bug or file a new one if something relelvant doesn't exist.

--
Rock over London, Rock on Chicago. Wheaties: Breakfast of Champions.

The Mozilla ThunderBird SPAM filter-MDK10 by Anonymous Coward · 2004-06-22 17:09 · Score: 0

That's funny. Evolution under MDK 10 uses Spamassassin.

Mr. President we can NOT have a Spam Blocking Gap! by vollmerk · 2004-06-22 17:09 · Score: 0

It has to be said, did they set the CRM-114 to discriminator to OPE or some other arrangment of P,O,E cause ya'll know unless you specify the code prefix you can't recall the spam and the doomsday device will go off.. cause that spam can get in there real low, I mean if the spammer is _really_ good he can fly, er send that e-mail right under their radar

Re:What d'you think spamassissin would make of thi by dasmegabyte · 2004-06-22 17:15 · Score: 1

No time to read it, son, just email it to me.

--
Hey freaks: now you're ju

POPFile? by gmuslera · 2004-06-22 17:19 · Score: 2, Interesting

I'm using since months POPFile and it have an accuracy of 99.75% with 17k messages. Its not very dependant on the client, it just sit as a pop3 proxy, and it classifies mails in buckets that you can define (so no need to just split mail in spam/ham, for some time i even have categories for virus, nigerian-like scams, automated reports, etc).

Would be interesting to see how that message sample reacts against more spam filtering technologies, or even webmails with spam protection integration.

Re:POPFile? by puppetman · 2004-06-22 17:53 · Score: 4, Interesting

Yah, I ran this for about a year before I switched ISPs (and got a new, spam-free email account).

It was amazingly accurate, with about one mistake per thousand emails once I had it trained. I'll go back to it if I start to get a bunch of crap in my in-box. I remember reading that spammers would test their emails against the most popular anti-spam filters, but they still almost never got through Popfile.

I tried SpamAssassin as well, after I had some issues with PopFile (it would stop responding after a large volume of email), and it was more difficult to set up, and didn't have the nice configuration options of Popfile.
Re:POPFile? by Artful+Codger · 2004-06-23 01:30 · Score: 1

I'm another happy Popfile user - had it for over 8 months now.

It's currently running around 99.2 % correct. Out of roughly 300 messages/day, almost all spam, I see maybe 2 spams a day tagged as good, and there's maybe one or two valid messages tagged as spam per month.

I currently just have Popfile sort into 2 folders - spam and good. About twice a week i look at the spam folder and sort by Subject, which groups the messages nicely (... spam titles are sooo unimaginative), and the valid messages tend to be very visible. Once i've reviewed the spam folder and moved any valid messages over, I empty the spam folder and train Popfile on any errors.

The above procedure takes maybe 10 minutes to complete, even if there's 2000 spam messages to scan. the key is to sort by subject.

I also use Popfile's "magnets" setting for a whitelist of addresses to unconditionally pass.

I just upgraded from Popfile v19 to v21, and it's now about twice as fast booting up and sorting.

--

... plans that either come to naught, or half a page of scribbled lines...
Re:POPFile? by Anonymous Coward · 2004-06-23 01:47 · Score: 0

It was amazingly accurate, with about one mistake per thousand emails once I had it trained. I'll go back to it if I start to get a bunch of crap in my in-box. I remember reading that spammers would test their emails against the most popular anti-spam filters, but they still almost never got through Popfile.

Which is one of the strengths of individually trained bayesian filters. Since nobody trains their filter identically as the one down the road, there is no way that spammers can "test" against the filter prior to sending out their e-mails.

(Bayesian doesn't work well for large groups of users, but is very good at the individual desktop or small-group level.)

I keep hearing about how great spamassasssin is... by rsilvergun · 2004-06-22 17:21 · Score: 1

maybe I'm doing something wrong (wouldn't be the first time). I run the spamd command as root (tried it with the -d option too), pointed sa-learn at 3000+ spams and about 200 hams and set up kmail filters to pipe everything less than 250k through spamc and move anything with X-Spam-Flag=Yes to junk. It's slow as heck and only filters about 60% of my spam. Bogofilter was doing about 80% (it's more trouble to set up though). But I keep reading posts of people with 98% filter rates.

--
Hi! I make Firefox Plug-ins. Check 'em out @ https://addons.mozilla.org/en-US/firefox/addon/youtube-mp3-podcaster/

Re:in related news by Anonymous Coward · 2004-06-22 17:26 · Score: 0

This is probably a good life lesson for you.

Learning not to rely on the government for something as trivial as spam legislation will help ease you into not relying on the government for more critical things that it could screw up, like healthcare.

CRM114 works better for me by wmshub · 2004-06-22 17:35 · Score: 1

I have used it sitewide (small site, about a dozen active mailboxes) for a few months. Currently it has an error rate of about 1 or 2 mistakes per week per mailbox (in mailboxs that get 100+ spams per day). I did have to do a lot of work to configure it properly though, which may be the reason the authors saw poor performance from it; the "forward to yourself to train" didn't work at all because both my IMAP server and my mail reader would slightly reformat my headers, meaning that CRM114 was training on different text then it saw when it was filtering! So I put together my own system to save pristine copies of all inbound mail and train on them as needed. Maybe the reason CRM114 fared so poorly is the difficulty in setting it up properly?

How Apple Mail filters Spam by jjga · 2004-06-22 17:46 · Score: 2, Informative

There is a somewhat interesting article where they more or less explain how the Mac OS X Mail application works regarding Spam:

http://www.macdevcenter.com/pub/a/mac/2004/05/18/s pam_pt2.html

Re:in related news by dubl-u · 2004-06-22 17:56 · Score: 2, Insightful

Content-based spam filtering is a waste of time. [...] It's a never-ending battle of updating filters and formulas.

I update my SpamAssassin config file once a year or so. This hardly seems burdensome. And generally my updates have to do with which RBLs it uses for assiging point values. Other than that, I use the defaults plus the Bayesian filter.

Since the filter self-trains based in part on the RBL scores, it autoadjusts to new spam. And if you have spamtrap addresses, you can feed those back in, too.

My setup is well over 99% accurate, with no false positives in months.

RBLs WORK.

Yes, and I use those, too. Some I use for outright rejection of connections, and some count toward the spamminess score. As soon as they get the URL-based RBLs working, I'll use those, too. Why wouldn't you use all the tools at your disposal?

Re: SpamSieve by hondo77 · 2004-06-22 17:57 · Score: 3, Interesting

I'd like to second SpamSieve. If more than one piece of spam gets through in a day (where each day I receive > 500 pieces of email), I am truly surprised. My stats for June are:

1007 Good Messages
13729 Spam Messages (93%)
1 False Positives
24 False Negatives (96%)
99.8% Correct

Works for me. Oh, the false positive was a list that I just signed up for. They sent a confirmation mail, I checked to see if it was caught (it was), and marked it as "good". Piece of cake.

--
I live ze unknown. I love ze unknown. I am ze unknown.

Re:I keep hearing about how great spamassasssin is by Anonymous Coward · 2004-06-22 18:06 · Score: 0

Are you sure you trained it on a proper corpus? You have to look at your mail with a real mail reader, eg. Mutt. You will probably find that your good mail corpus is full of spam that was marked for deletion but isn't really deleted. This will cause the filter to train badly.

Re:I keep hearing about how great spamassasssin is by Champaign · 2004-06-22 18:24 · Score: 1

I think 200 hams is way too small. Keep sorting and it should improve.

Bad taste in my mouth... by bruthasj · 2004-06-22 19:02 · Score: 1

I lost a ton of emails in v2.63 of spamassassin. I use a chain of fetchmail -> postfix -> kmail get -> filter through spamc -> kmail inbox/spam.

I had to turn off spamc processing because I lost a bunch of email. Maybe it was a bad interaction with kmail, but it was disheartening nontheless. Taking out the spamc filter, I did not run into the problem again.

SpamAssassin to be owned by Microsoft? by tinla · 2004-06-22 19:33 · Score: 1

Firstly it should be remembered that the 'owned' part is a bit subjective as most of the project could live on regardless of 'ownership' thanks to it being opensource. But regardless of that.. am I the only one that finds the prospect of microsoft buying SpamAssassin a bit odd?
Microsoft to buy Network Associates?

At the very least they'd be buying the name and the tarted up version of SpamAssassin sold as SpamKiller.

--
0daymeme.com: Great stuff.

Re:SpamAssassin to be owned by Microsoft? by Etrigan · 2004-06-22 20:16 · Score: 1

http://www.theinquirer.net/?article=16738 is the actual link for the story in question.

Mind you, The Inquirer is famed through the industry for the accuracy of its reporting, so you might want to take whatever it says as merely one of many possible outcomes.

Alan.

Counterintuitive Advertising by KalvinB · 2004-06-22 19:36 · Score: 4, Interesting

Some guy a few stories back mentioned he was getting 3000 ad impressions and 15 clicks a day or so with AdSense. Which is terrible. At first I assumed he was just oversaturating his visitors with ads. But his ad placement is also terrible. It's at the very bottom of the page where few are going to see it. But he is also over saturating. His pages are very busy with information and the ads are on every single page.

What happens when you constantly shove something in someone's face is that they learn to ignore it. Either consciously or subconsciously. In the case of advertising if someone is shown an ad and they aren't interested and another ad is shown there's a very good chance they won't even notice it. Even if they would have been interested in what it was offering. This is because they were annoyed by the first ad so they just mentally block any additional ads.

This is why the response rate to spam is so terrible. People for the most part just subconsciously ignore it. It's just noise.

Advertisers like radio stations because it tends to be a captive audience. People are very unlikely to turn the station when ads come on. However there is one local station that I've learned to turn the channel on when the ads start because I know I'm going to get to my destination before another song comes on. There are other stations that I don't change the channel on because I know it's just a short break.

Just like the guy pumping out 2985 ads that no one clicks on, spammers would benefit immensly by pulling a large chunk of the ads. People are more likely to notice when they aren't bombarded by ads and the response percentage goes up.

It seems counterintuitive that less advertising means a greater response but that's actually the case.

I normally notice the ad banners on Slashdot because that's pretty much all the advertising there is. I rarely ever notice the text ads. Even though they're placed on the left side in the best position as anyone who scrolls the page is probably going to see them. Slashdot's problem is that the ads blend in with the web-site's color scheme too well so they're pretty much invisible to anyone with a scroll wheel.

On GameDev the site is so littered with advertising that I never notice it anymore. By the time I close the stupid popup ads that circumvent Google's pop up blocker using evil little tricks I'm too annoyed to even look at the other ads.

Web-sites get desperate and think more ads == more money. And the actual result is less valuable ad space because the click thru rate is so low and fewer clicks because users tune the ads out which results in less money than if they had focused on the click thru percentage rather than the number of impressions. If you have a web-site with a high click thru rate advertisers are more likely to pay more because they know that if they show an ad there's a very good chance they'll get a click thru.

But then I'm guess spammers have never taken a course in marketing or bothered to think about things from their potential customer's perspective.

Keeping ineffective ads visible hurts the effectiveness of the better ads. Spammers are in effect destroying themselves in that area. As are ad happy web-sites.

Ben

--
Work Safe Porn

Re:Counterintuitive Advertising by Tom · 2004-06-22 23:07 · Score: 1

But then I'm guess spammers have never taken a course in marketing or bothered to think about things from their potential customer's perspective.

Even if they had, they're in a typical dilemma:

Let's assume that response rates would go up. Let's even say it's a simple linear factor of 2, i.e. halving spam would quadruple the response rate, and vice versa.

Let's say spam today is at what we define 1.

You are spammer A. You currently churn out 1 mio. spam a day and have a response rate of 1 ppm, i.e. 1 per day.
You know that if spam globally would drop to 0.5, response rates would rise to 4 ppm, so you would get your 1 response with only 0.25 mio. However, with 1 mio. you'd get 4 replies! And with 2 mio, you'd get 8!
Your (local) choice would be clear: More spam.

Let's assume everyone does that. Spam rises to 2 and response rates drop to 0.25 ppm. So for your 2 mio spams you only get a response every other day. If you lower your spam level, you'll get even less. But by doubling again, you will get back to your initial level of 1 response per day. Again, your choice is obvious.

This, btw., is a very well-known problem of cooperation. Unless everyone in the system works together, you cannot realize the profits of reduced activity. On the contrary, doing what's good for you individually/locally drags the whole system down a slippery slope, but you can't do the "right thing", because it is a stupid local choice.

There are a few simple games made by the UN to teach this to the world leaders and their UN representatives. I'm not kidding you, they're (sometimes) playing games at the UN, because many people don't "get" this and similiar problems without experiencing them. I've played a few of these games, it is revealing to realize mid-game that the only way for you to win is to cooperate with your "enemy", so that both of you win.

--
Assorted stuff I do sometimes: Lemuria.org
Re:Counterintuitive Advertising by angle_slam · 2004-06-23 06:47 · Score: 1

Advertisers like radio stations because it tends to be a captive audience. People are very unlikely to turn the station when ads come on.
In contrast, I always change the station when an ad comes on. The only time I stay on a station is when I know a traffic report is coming on. (Traffic on the 8s). Otherwise, I switch stations as soon as a commercial comes on.

Re:in related news by Anonymous Coward · 2004-06-22 19:46 · Score: 0

RBLs and DNSBLs has way too many false positives on their own! - Especially if you use the big lists like SPEWS and SpamHaus. As they list all IP space of ISPs regardless of whether they're spam-sources or not, you'd end up blocking 99%+ ham from ISPs who simply (allegedly) provide spam-support (often just dns-hosting or less).

Using them in conjunction with SpamAssassin is a much better idea. Then ham will not score above the threshold (only spam-characteristic is the source of the mail), while true spam will get a boost in score and thus pass the threshold with much more certainty.

Remember that SPEWS and SpamHaus are not listing spam-sources, they're actually listing the opposite as about 98% of the listed IP-space is not (and has never been) a spam-source. The purpose is blackmail of course, and the people behind these lists clearly don't realize (or refuse to realize) that the victims cannot do anything to change an ISPs policy. Often they cannot switch ISPs either, either for practical reasons (no unlisted alternatives) or for financial reasons (it can easily cost far more that the yearly proceeds to move hundreds of servers, renumber many thousands of domains etc.).

Is SpamAssassin being counterattacked? by jcjewell · 2004-06-22 20:06 · Score: 2, Insightful

I've been getting spams lately that seem to be trying to get around the highly effective statistical solutions, such as SpamAssassin, that have been implemented. Spammers seem to be adding random, or possibly even carefully selected dictionary words to skew their statistical rating. Here is an example from the several I've received lately--has anyone seen information about this on /. or elsewhere?

[spammers irritating message snipped]

Thu, 17 Jun 2004 19:42:34 -0500

No Thanks

beatify

sacred atom drank deprecate cathodic thermionic sherman delinquent hanley swum wooster asteroidal bilayer haiti saudi wink bijective reserpine baronial gloss ambrose threadbare chianti predatory earmark bilingual angora palazzi chartres alveolar phosphate civet radish barricade diem laurie minutem! en crusty

camilla jade lineman bendix masonic dublin incontrovertible defecate generous buddhist yesterday endow bitten conley trunk pitchfork beret bloat gelatine dovetail gambia medea niggardly blackburn suey dialogue ilyushin anastigmatic berth abort bodied contractor of ridden embarcadero corset trademark

ID: W993gt72

carnation

constructor maltese bantam airfield pique douglas pungent criterion cloudburst illiterate sausage career stile pebble bonnie shim carbonium

magnesite pembroke abrade jogging dynast physiochemical stochastic sumac conference obtain villain midwinter incompetent eradicable madhouse airline antony household cursory instinctual gratuitous clown shaven des cornflower

Re:Is SpamAssassin being counterattacked? by murky_lurker · 2004-06-22 21:36 · Score: 1

There's a good Wired article on it here. You were on the money with your guess - some spam filters weigh rarely-used words more heavily (considering them more likely to be legitimate email) than commonly used ones. This also is why few emails will offer you "Viagra", but many offer "V1aGr/\" or "s1ld1nafr1l citr/\t3" - the spam filter is likely to view any stranger that emails you about erectile disfunction in their first email dimly :)

DSPAM. by asackett · 2004-06-22 20:20 · Score: 4, Interesting

I've been using DSPAM for nearly a year now, and it's just kept on getting better. I can't imagine life without it now.

I have 17 DNS-based blacklists in front of it, because I would rather block the messages at the network interface than filter them with my own resources, but those that slip through don't stand much of a chance of reaching my inbox. I have had my current email address out there on the web and in Usenet for six years, so I see a lot of junk -- DSPAM stops all but one or two per month. SpamAssassin can't even come close to that.

--

Warning: This signature may offend some viewers.

Spam Filtering for Exchange 2003? by Robmonster · 2004-06-22 21:06 · Score: 1

Can anyone suggest the best way of filtering spam received into a mail server running Exchange 2003?

--
I have no sig yet I must scream.

Re:Spam Filtering for Exchange 2003? by imroy · 2004-06-23 03:45 · Score: 1

Like the AC said, put Exchange behind a proper MTA. Keep your exchange server inside the firewall for the suits to fiddle with their calendars and crap. Setup Postfix, Qmail, Sendmail, Exmim or some other MTA as your internet-facing email server. I use Postfix with Amavis forming a nice interface to Clam-AV and SpamAssassin. I don't run exchange though. Can't help you there.
Re:Spam Filtering for Exchange 2003? by Anonymous Coward · 2004-06-23 05:07 · Score: 0

Microsoft released a spam filtering plug in for Exchange 2003 a few weeks ago. IMF (Intelligent Message Filter) is free to download and install.

http://www.microsoft.com/exchange/downloads/2003 /i mf/default.asp

150/day is trivia by anaplasmosis · 2004-06-22 21:16 · Score: 1

150 spams a *day*?????

Hell, that's what gets through my filters. On a bad day I get 1000/hour.

May sound weird but: by R.Caley · 2004-06-22 21:17 · Score: 1

I run both spamprobe and bogofilter and find that the OR of the two is noticably better than either alone. Haven't managed to spot why, but the moral is that if you have CPU to burn it can be worth getting a second opinion.

--
_O_ .|< The named which can be named is not the true named

GPG and PGP. by oliverthered · 2004-06-22 21:25 · Score: 1

I didn't see results for how much span GPG and PGP block.

It's normally around 100% on my pc, but sometimes about 110%.

--
thank God the internet isn't a human right.

CRM114 works much better for me by agi · 2004-06-22 21:27 · Score: 1

I've been using it since March and the stats talk form themselves:
My spamfilters stats.
It's worth mentioning that I don't get false positives with SA, and CRM114 gets 1 every now and then. On a daily basis I get 70 spams caught by CRM114 and not by SA.

--
EOF

Disposable accounts don't mix with business by Xconnect · 2004-06-22 21:47 · Score: 0

I can empathise with that. The other problem with using disposable accounts as far as business contacts or clients is the potential fall-out from the LACK OF TRUST! What would your contact or client think if you give them a spamgourmet address and they know what spamgourmet does? Or, if you give them a sneakemail address... "Can you spell your sneak e-mail address to me again please? That's a-5-b-z-what?"

--
--- root@127.0.0.1

SpamAssassin not to be owned by Microsoft? by Anonymous Coward · 2004-06-22 21:48 · Score: 0

Even if Microsft buys NAI, they would not get the SpamAssassin trademark.

SpamAssassin is in the process of becoming a project in the Apache Software Foundation. That process requires the trademark to be assigned to the ASF, which is already in progress as can be seen in this status report.

You've got it misconfigured (or buggy version) by Anonymous Coward · 2004-06-22 22:02 · Score: 0

I'm also using CRM114. On a bad week, maybe 4-5 spam messages sneak by and I probably get 200-300 messages a day (which is overwhelmingly spam). There was a bug in previous versions of CRM that would cause the filter to claim in learned when you trained on an error and would give you a "I already know that's spam, I don't need to relearn that" message upon training. You can fix that by getting the new version and using a training command like this:

command password spam force

That VASTLY improved performance for me.

Re:You've got it misconfigured (or buggy version) by johnstoj · 2004-06-23 00:38 · Score: 1

That isn't entirely correct. If you send the message back to crm114 *exactly* as it came in, the original headers must be included, you should never have to use the force switch. I have shyed away from forwarding the email back to myself and have implemented a way of having the original email sent directly to the mailfilter program with the appropriate learn command.
Re:You've got it misconfigured (or buggy version) by klevin · 2004-06-23 05:43 · Score: 1

It doesn't say that it already knows that it's a spam message. I could run the exact same message through a hundred times, and it would act like it had never seen it before. I'm beginning to think that it's not actually saving the data to the spam.css file (although it is accessing it; `stat spam.css` shows the access, modify & change times all being updated when I run the mailfilter program).

use postfix by Anonymous Coward · 2004-06-22 22:32 · Score: 0

Seriously. Exposing an Exchange server directly to the net is just asking for trouble. Your best bet is to put the exchange server behind your firewall and relay all incoming mail through a hardened unix machine running your favourite email transport (like postfix). Then you can use any of the well documented spam sifters discussed here and offer your exchange server more protection from the elements.

Re:use postfix by Robmonster · 2004-06-23 07:03 · Score: 1

I believe my bosses plan was to keep Exchange behind the firewall, but route the incoming/outgoing SMTP to it through our Cisco firewall.

I must admit that the details are a little fuzzy at this stage...

--
I have no sig yet I must scream.

Re:I keep hearing about how great spamassasssin is by gonkem · 2004-06-22 22:49 · Score: 1

You really need to make sure Spam Assassin is using DNS RBL and the like. I was seeing the same kind of thing - lots of spam getting through. Once I turned on the RBL checkers, spam levels dropped immediately. The other thing to do is make sure SA is the latest version - the newest spam techniques beat old SA versions.

Delete is for sissies by clone22 · 2004-06-22 23:27 · Score: 1

Make trash your inbox. 100% effective.

--
Ask me about my vow of silence!

CRM114 Author Response by Anonymous Coward · 2004-06-22 23:56 · Score: 3, Informative

I am the author of CRM114 and I corresponded with Professor Carmack for setup assistance during this study; he did have some problems with CRM114 that he brought to my attention and which were possibly never quite resolved.

I can also state that *do* run CMR114 myself; I also run SpamAssassin (regularly maintained by the systems staff) on a parallel account. I find that SA gets about 90+ percent of what makes it past the firewall's immediate RBL lists (which matches Prof. Cormack's Figure 8 pretty closely); CRM114 nails 99.9% or more (this week, ending June 21, 2004, my CRM114 stats are 2528 nonspam and 1114 spam messages, and had just 1 error (a false reject) which is 99.972% accuracy.

I have gotten reports from some very happy users who are seeing similar accuracies; I've also gotten sad reports similar to Prof. Carmack's that show very weak accuracy.

I can conclude from this (and other reports) that filter performance varies _greatly_ with spam mix - that is to say, Your Mileage Will Vary.

Further, consider Fig 15, which compares CRM114's accuracy with respect to nonspam v. spam. Note that the two curves are displaced considerably, by a factor of accuracy between 3 and 5 times!

This is odd, because CRM114 is _entirely_ symmetrical; it does NOT have any predisposition toward (or against) erring on the side of caution; the only difference between nonspam and spam is the names of their files, which could be changed to "foo.css" and "bar.css" (or even interchanged) without affecting anything else.

Therefore, the two accuracy curves _should_ therefore lie on top of each other; there is no difference in the processing. The fact that the nonspam v. spam curves seem to differ by a factor of 3 to 5 in magnitude gives me some reason to believe that the setup issues Prof. Carmack encountered never really were completely addressed.

-Bill Yerazunis

Re:CRM114 Author Response by gvc · 2004-06-23 01:32 · Score: 1

Bill,

First, I'd like to thank you for your correspondence with me. The information you provided me with was very helpful.

I believe that you said the number of ham errors and spam errors should be roughly equal. They are. It is the proportions that are unequal, because there is much more spam than ham in the test suite.

An interesting issue arose in our correspondence as to how to introduce bias into CRM-114 so it would place more emphasis on ham accuracy. My understanding is that this is not an easy task, and obvious approaches like double-training don't work.

GVC

Why just spam? by Anonymous Coward · 2004-06-23 00:01 · Score: 0

Personally, I look forward to Bayesian categorization. Not just Spam, but Personal, Work, Bills, etc. It would be splendid if I could have some more dynamic rules instead of doing this stuff manually.

CRM-114 didn't work out for me by Lemuel · 2004-06-23 00:07 · Score: 1

I tried CRM-114 after the previous Slashdot article. I payed a lot of attention to my email and did all the required training. After getting over the initial hump of misclassified email it got to a steady low level. Once it made a mistake and I had to train it, though, I would get a run of false positives and negatives for a bit until it settled out again.

What sent me back to SA was that a number of CRM-114 misclassifications were marking ham as spam. Losing a real message in the sea of spam is much more of a concern for me than getting a bit of spam with the regular stuff. It is very rare that I get ham classified as span in SA.

Postfix Address Verification by DispassionateObserve · 2004-06-23 00:10 · Score: 2, Informative

Turning on Postfix 2.1's "address verification" feature immediately eliminated 90% of the spam that my company was receiving! (SpamAssassin + ClamAV + CRM114 catch the rest). This feature confirms that the incoming email is coming from an account that also accepts email. (Spambots don't normally accept mail, of course...) The spam email never even makes it into your system this way, because the SMTP transaction is deferred until the address is verified. - Mike

And SpamAssassin is just getting better by KjetilK · 2004-06-23 00:22 · Score: 3, Informative

I've been using SA 2.63 for some time now. At first, my statistics was about 90% rejected at SMTP-time, 0.1% false negatives and 0.01% false positives. Spammers have learned to adapt, so now I have about 2% false negatives.

But SpamAssassin is just getting better and better. Version 3.0 is coming up, and 3.0-pre1 was recently released. I do not have a test system available for it, but those who have may want to take it for a spin.

Especially for large sites, this is extremely interesting. It adds relational database support for the Bayes database, so it should be a lot easier to set up on a large site.

I find the lack of individual training the main reason why SA works so well for me, but not very well at my old university.

--
Employee of Inrupt, Project Release Manager and Community Manager for Solid

spamassassin is sloooooooow... by Wdomburg · 2004-06-23 01:16 · Score: 1

I evaluated SA as a possible filtering solution where I work, and it was a full order of magnitude slower than bogofilter even with every test disabled. And that *is* running spamc/spamd. Without the daemon it was even worse.

So it may be a nice solution for people who are running it on a small scale, for large installations (e.g. we get over six million SMTP connections a day) it requires a lot more hardware thrown at it.

Re:spamassassin is sloooooooow... by Anonymous Coward · 2004-06-23 01:27 · Score: 0

Did you use the spamd daemon version of spamassassin? Its extremely fast.

Re: False positives by nil0lab · 2004-06-23 01:54 · Score: 1

One thing I really like about SA is how they are very careful to make sure their error rate is on the right side. It's better to let some spam get though than to mark good mail as spam.

My ISP implemented postini the other day and it had collected 30000 messages before I realized that it was blocking my Mexican cousin's email- his trip to visit was almost fubar'd.

And the only way to get the messages back was via frickin scroll, click click several hundred times. (Or open the ssl client scripting can of worms)

Give POPFile to a Friend... by endofoctober · 2004-06-23 02:15 · Score: 1

I've also been using POPFile for about a year, and it's done an amazing job - 99.87% accuracy, very few false-positives, and great summary info with six email accounts collectively filtered through it.

I recently helped a few friends install it on their machines, and, rather than just having them start from scratch, I copied my Spam corpus for them. With the spam corpus already in place, all of them noticed spam drop to close to zero while they trained their other buckets.

--
- Jack

What About ASSP? by SlipJig · 2004-06-23 02:18 · Score: 1

I (and the company I work for) use ASSP and have been very impressed with its results. Spam in my boss's inbox went from 100-200 messages per day down to a handful... I'd like to see it compared to the other anti-spam packages mentioned.

--
Read my keyboard review.

Re:What About ASSP? by Anonymous Coward · 2004-06-23 05:02 · Score: 0

Amen! ASSP rocks.

Re:in related news by Anonymous Coward · 2004-06-23 03:37 · Score: 0

The problem using embedded URLs in spam is that the spammers are already adapted to address this method. They create new domains every day to get around this type of content filtering. for instance. I might receive a spam message with 239e29.23ijei.com and the next day I'll receive the same spam message with hsh9x.39u329.com

I found a much more reliable way to detect spam, unfortunately I will not share it here because I am sure spammers will read this post and adapt.

oversimplified comparison? by ubiquitin · 2004-06-23 04:39 · Score: 1

No comparison of Bayesian systems would be complete without some method to normalize the training of them. In other words, different Bayesian approaches to anti-spam will learn differently from a different training set. So ironically, this comparison is only as good as the completeness of the spam used to train the filters.

--
http://tinyurl.com/4ny52

Re:in related news by Crudely_Indecent · 2004-06-23 04:52 · Score: 1

amen brother....or sister.....or whoever you are.

--

"Lame" - Galaxar

No ASSP mentioned either by Anonymous Coward · 2004-06-23 04:52 · Score: 0

We've been using ASSP for just over 200 days now. http://assp.sourceforge.net.

785621 messages processed
334565 messages rejected as spam
159278 viruses blocked (attachments of .exe, .pif, etc)

Major points in ASSP's favor include the fact that it blocks the email at the network interface (it takes over port 25 and forwards only the stuff on to sendmail that isn't spam), it's easy to install at the server-wide level, anyone on the whitelist can help train the spam filter by emailing it, and it rejects most viruses immediately which keeps the machine running smoothly even during M$ virus blitzkreigs.

But what about the emails I'm missing, you say? They get a message telling them they were rejected and why. Better yet, they get the message even if they are a spammer using a fake return address, which gives you a chance to "opt out" in a fashion they can't legally ignore (yeah, like they care, but still...). We've gotten no complaints from valid users so far and the message tells them to use our phone number to get whitelisted.

What still gets through? Bayesian poisoned emails do occasionally make it past--usually about 3-5 a day, but the spam rate is quite low compared to the deluge we would be under without it.

Humans don't make the big mistakes. by ron_ivi · 2004-06-23 04:53 · Score: 1

Humans may make mistakes on the borderline cases, will not make the *important* mistakes.

Examples.

A mom reading an email from her daughter saying "help, i'm being sexually assaulted by a football team" is far less likely go "gee, that contains the word rape so its spam"

A CEO readin an email from his biggest customer saying "you're getting rich. we're placing the order you need to survive" won't dismiss it because of spam words.

Spam filters have a higher chance of deleting the important emails than these overall percentages suggest.

From the article "'The best-performing filters reduced the volume of incoming spam from about 150 messages per day to about 2 messages per day.'"

I don't give a damn if they reduce 150 spams to 2 or 3 or 4 or 5. I care that they do NOT delete the one important email hidden there. Spam filter writers -- start focusing on avoiding false positives, not on trying to delete everything.

Re:Humans don't make the big mistakes. by BagOBones · 2004-06-23 10:23 · Score: 1

Most spam filters don't delete.

The put all the SPAM in a spam folder for you to look through later and only delete messages that are very old or based on your settings.

--
EA David Gardner -"... but the consumers have proven that actually what they want is fun."

Unfair headline by nil0lab · 2004-06-23 05:10 · Score: 1

The article admits that they didn't follow the training guidelines for CRM114. Its HOWTO and FAQ clearly indicate that training of the type used in the Shootout decreases accuracy significantly. I followed the author's recommendations carefully (having found his rationale for them very rational) and have had very good results.

"The more spam you get the less you read"... by nil0lab · 2004-06-23 05:19 · Score: 1

"The more spam you get the less you read" is what somebody told me at a recent user group meeting.

The trick is to train the spamfilter against the spamtrap addrs so that when they hit the good addrs the spamfilter knows they're spam.

I use CRM114 train-on-error, so my .procmailrc includes somthing like this... :0
* ^X-CRM114-Status: Good
{ :0
* ^TO_compromisedaddr@mydomain.org
{ :0 c
| $HOME/my/etc/crm114/learnspam :0:
$MAILDIR/checkspam-learnt
}

}

RBLs are the way to go by Aggrazel · 2004-06-23 06:33 · Score: 1

Of the million or so emails I process per day, 80% are marked as spam. Of those approximately 75% are caught by the RBLs before it even reaches the spamassassin engine.

I highly recommend RBLs to anyone. Not only are they fast and usually pretty accurate, but they are very fast learners usually. :)

One of my favorites is the SURBL which seems to catch a good chunk of it. Bayes filters are always gonna be thrown off by the dummy words thrown in there but the minute they try to link the person to their site BAM the surbl gets them.

Re:greylisting by nexus987 · 2004-06-23 06:53 · Score: 1

I'm surprised greylisting hasn't become more widely used... I've not used it personally, but it sounds effective & fairly benign for non-spam mails.

Monty? by Atario · 2004-06-23 07:30 · Score: 1

SpamBayes - Spambayes version 1.061. A Python Bayes filter inspired by the
proposals of Graham and Robinson.

Why do I suddenly have the urge to invent a silly walk?

--
"A great democracy must be progressive or it will soon cease to be a great democracy." --Theodore Roosevelt

Re:DSPAM Author Response by Ragica · 2004-06-23 13:51 · Score: 1

And here is a very long and detailed response on the DSpam site by Jonathan A. Zdziarski himself..

Re:in related news by humankind · 2004-06-23 15:38 · Score: 1

Without RBLs, your content-based system wouldn't work nearly as well. It's like adding caviar to kool-aid. It might make the drink more paletable, but it's more efficent to cut out the kool-aid.

Re:in related news by dubl-u · 2004-06-23 16:28 · Score: 1

Without RBLs, your content-based system wouldn't work nearly as well. It's like adding caviar to kool-aid. It might make the drink more paletable, but it's more efficent to cut out the kool-aid.

Well, according to my data, it would work a bit less well. And my data doesn't support your kool-aid analogy at all. Why don't you show us your data? You do have data to back up your claims, right?

Re:in related news by humankind · 2004-06-23 17:03 · Score: 1

I do have data. My RBL knocks out about 97% of all spam. And that's without much maintenance. When I get proactive and start monitoring worm-infected PCs, I can up this rate to 99.5%. This is with virtually no measureable legitimate mail being blocked.. something the content-based systems can't say without whitelisting.

Re:in related news by dubl-u · 2004-06-23 20:45 · Score: 1

I do have data. My RBL knocks out about 97% of all spam.

Thanks, I already have data that RBLs can help get rid of spam. That's why I use them. I also have data that content-sensitive approaches can help get rid of spam. What I don't have is any data to back up your claim that RBLs are to SpamAssassin's content-related filtering as caviar is to koolaid.

And that's without much maintenance. When I get proactive and start monitoring worm-infected PCs, I can up this rate to 99.5%.

As I said in my original post, I can get the same rates, including the lack of false positives, using a combination approach. I get 99.5+% with minimal maintenance.

If you don't know how to make use of content-related tools like bayesian filters, fine. Don't use them. But I'm telling you that they work great for me as part of a combination approach, and I have the data to back it up.

Re:I keep hearing about how great spamassasssin is by SweenyTod · 2004-06-24 12:38 · Score: 1

I have exactly the same setup as you do. As some of the others said, you need to keep running sa-learn and it will eventually work.

I was doing this for about two weeks with no noticiable effect, and all of a sudden it started to catch well over 90% of all spam.

With the razor and other remote site checking in place it is slow very though.

--
Alas gallinaceas de urbe bovis volo

Slashdot Mirror

Spamassassin Beats CRM-114 In Anti-Spam Shootout

330 comments