SpamAssassin Gets a Promotion

← Back to Stories (view on slashdot.org)

SpamAssassin Gets a Promotion

Posted by michael on Friday June 25, 2004 @06:04PM from the now-more-assassinating-power dept.

darthcamaro writes "The folks at internetnews.com are reporting that the Spam Assassin project has been promoted to a full top level Apache Software Foundation project..the project has been in incubation for a while and it's finally made it through...the article also reveals that Apache is now using Spam Assassin themselves: 'I think spam filtering is now a critical part of the network infrastructure and Spam Assassin is a leader in the area,' said Daniel Quinlan, chairman of the Apache Spam Assassin Project Management Committee."

22 of 168 comments (clear)

Min score:

Reason:

Sort:

Bout Time! by Irie+Brother · 2004-06-25 18:07 · Score: 3, Interesting

A well configured installation of SA got me employee of the month way back when. Sadly, UCE/UBE is/has ruined the Internet. Finally.

--
"To deny our own impulses, is to deny the very thing that makes us human." - Mouse
1. Re:Bout Time! by jest3r · 2004-06-25 18:50 · Score: 4, Interesting
  
  Today spam assassin filtered (flagged) 19,246 incoming emails out of 20,145 total on my mail server. Absolutely no false positives since I installed it a year ago .. and only a few false negatives. I silently drop anything with a score over 13 ... my cstomers are happy .. my qmail remote queue has been happy .. spam assassin is a quality app .. spam is really not a concern anymore.
2. Re:Bout Time! by tzanger · 2004-06-25 23:41 · Score: 3, Interesting
  
  I do the exact same thing, but with a score of 12. Anything that trips the filter as spam gets dumped into a spam folder off the main maildir and they can use IMAP or check with webmail to see what spam they have. A cron script erases anything in the spam folder older than 2 weeks. Oh yeah, and individual users can alter their own white/blacklists and scores since I pull the username and match the scores in a postgres database. Combined with clamd and qmail-scanner, it's heaven. :-)
  
  As for the incoming mail I found that checking against a couple of RBLs has made ALL the difference in keeping the system load down. tcpserver checks against a .cdb file filled with entries from CBL and DNSRBL. Anything matching doesn't even see the real SMTP server and thus doesn't get scanned at all.
3. Re:Bout Time! by paperguy · 2004-06-26 05:48 · Score: 2, Interesting
  
  That's a 95.5% spam rate. What are your users doing to generate so much spam?
DSpam by Pinball+Wizard · 2004-06-25 18:21 · Score: 5, Interesting

After using SpamAssassin for quite a while, it just wasn't cutting it - 75%-80% accuracy is still a lot of spam to go through and delete. I added DSpam to my mail server and my spam catching rate is now better than 99%.

DSpam also came with much better directions for integrating with Exim than did SpamAssassin. As fond as I was of SpamAssassin, they have some catching up to do.

--
No, Thursday's out. How about never - is never good for you?
1. Re:DSpam by Anonymous Coward · 2004-06-25 18:53 · Score: 5, Interesting
  
  DSpam 3.0 is definitely not easy to set up. Add to that there is a database that needs to be set up on the back-end, and lots of configure flags at compile-time, plus permissions issues, etc. etc.
  It's also not very easy to understand how it works, or configure your mail client to easily train it, or to configure procmail how to properly call it (there are a lot of command-line flags as well).
  
  That being said, IT IS WORTH IT. A properly set up and trained DSPAM filter will SOLVE your spam problem. Training time usually takes about 2 weeks and the results are fantastic after that.
  
  You can also set it up a number of ways - server-side, user-side, with postfix or another mail server, with procmail or without. Relay or not. It's up to you.
2. Re:DSpam by fyonn · 2004-06-25 20:35 · Score: 3, Interesting
  
  I've only had dspam installed for a week or so but my stats are as follows: I've taught it 43 spams (ie from a database of nothing, 43 got through and I've trained on them) and 1 false positive (an itms reciept)(again taught to the system) and since then it's been pretty damn good. it's flagged 632 spams and let 730 innocent spams through correctly.
  
  I've got my system set to deliver spam to a spambox which I check nightly for false positives.
  
  and the docs say that I ought to have alot more training before it's up to standard. it's already better for me than SA was.
  
  dave
3. Re:DSpam by Huge+Pi+Removal · 2004-06-25 21:17 · Score: 2, Interesting
  
  I have to say I had the same problem with SA missing a lot (mind you, I have yet to upgrade to newer versions), and Dspam solved it. Having said that, I still use SA as a "first pass", and delete any mail with a score of >9 or so (I would put it lower, but any false positives and users would complain). This leads to less mail in the dspam quarantine.
  
  It's a bugger to set up with Procmail, but if anyone wants a peek at my config file, just e-mail... One thing I did do was forget about that whole "forward spam to this e-mail address" thing: just too much trouble for users. Instead I created a special IMAP folder into which users could save spam, then a simple script corpus-feeds the contents of that folder into Dspam each night.
  
  Oliver.
  
  --
  - Oliver
  
  The right to bear arms is only slightly less stupid than the right to arm bears...
4. Re:DSpam by Chief+Typist · 2004-06-26 03:42 · Score: 3, Interesting
  
  The best feature of DSPAM, in my opinion, is that the SPAM never leaves the mail server.
  
  The bad messages go into a quarantine on the server and can be reviewed by the end user using a web-based interface (looking for false positives.) In the press of a button, that quarantine can be emptied, freeing up disk resources on the server.
  
  Other SPAM solutions (like SpamAssassin) mark the message and continue with delivery. What's the point in downloading the SPAM to your mail client just to throw them away?
  
  -ch
Re:erm by simoniker · 2004-06-25 18:30 · Score: 3, Interesting

Fixed, sorry about that.
what to do with spam after it's id'd? by Hollins · 2004-06-25 18:32 · Score: 5, Interesting

What do you do with mail SA has flagged?

I like SA, and find it is very good for identifying around 95% of my incoming spam. However, I also have around 0.1% false positive rate, which means at some point I have to look through all the filtered spam messages and make sure none of them were legit.

I need a better tool for handling mail SA has identified as spam, either server-side or client-side. I'd like to delete anything with a score > 15, simply store anything with a score > 5, and send an auto-reply for scores between 5 and 10 indicating that the message was marked as spam and I'll probably never look at it.

A good set of procmail and formail rules will accomplish this, but my hosting company has a weird procmail setup and I'd prefer something easier to implement.

Any ideas?
1. Re:what to do with spam after it's id'd? by Twirlip+of+the+Mists · 2004-06-25 19:10 · Score: 3, Interesting
  
  I need a better tool for handling mail SA has identified as spam, either server-side or client-side.
  
  Yes, you sure do.
  
  Odds are that this doesn't apply to you, but the Mac OS X mail program, Mail, does a brilliant job. It recognizes the YES or NO header that SpamAssassin adds to filtered messages and, depending on your preferences, filters accordingly. By default it merely flags spam messages with a little trash-bag icon and leaves them in your inbox. At the flip of a switch, you can have the program automatically move spams into a Junk folder that (again, depending on your prefs) can be automatically emptied every week or month or day or whatever.
  
  If your mail program doesn't already do this, then your mail program sucks. ;-)
  
  --
  
  I write in my journal
2. Re:what to do with spam after it's id'd? by Cato · 2004-06-25 20:46 · Score: 2, Interesting
  
  Auto replies would also get your address marked as 'confirmed valid' i.e. able to receive emails, even if you don't read the spam, so you'll probably just get even more spam.
Novell NetMail by Anonymous Coward · 2004-06-25 18:47 · Score: 1, Interesting

Novell NetMail even supports SpamAssassin now.

http://netmail.sourceforge.net/
Challenge-Response schemes are more effective by cpghost · 2004-06-25 20:59 · Score: 1, Interesting

Filtering spam generates way too many false positives. Challenge/Response schemes are IMHO much more effective. TMDA and similar programs can be configured with whitelists for your regular mail partners, auto-whitelists for everyone who confirms their e-mail identity, and, if necessary, with blacklists too.

--
cpghost at Cordula's Web.
Re:Great News! by NigritudeUltramarine · 2004-06-25 20:59 · Score: 5, Interesting

A success rate of 95% really sucks when (like me) you get just over 2,500 spams a day. That'd still mean around 125 spams a day would be getting through. (I've had the same email address since the early 1990's, back when there was no reason to keep your email address "secret.")

Personally I do use SpamAssassin, but as an intermediate step.

First step: Check a whitelist of known senders. Deliver if the sender is on the list, AND the message originated from an IP subnet that I allow for them personally.

Second step: Scan with SpamAssassin. If the score is really high (above 20) throw it the hell out.

Third step: If the score is less than 20, and the person wasn't whitelisted, run the message through TMDA and politely tell the sender I'm not sure who they are, and I get a lot of spam, and could you please click this link to prove that you're a real person.

I've been using this three-step system for eighteen months now, and out of over one million messages that have come into my mailbox (really), exactly FOUR spam messages have made it all the way through. Apparently the spammers decided to go ahead and click on the little link, or they used a real person's return address, and when that person got they autoreply, they were too stupid to understand what was going on.

Even better, I have not received ANY indiciation that I've lost any messages; at least, no one has ever mentioned anything about an email that I didn't get.

I've got five other people at my domain using the same system, although for not quite as long (one for fifteen months, three for about a year, and one for just a month now); they have all had similar success.

So based on those numbers I'd estimate a success rate of 99.9997% for eliminating spam (which is, admittedly, COMPLETELY INSANE), and a false-positive (or at least "lost message") rate of 0% so far (fingers crossed). A few people have had to confirm their messages, of course, but I've whitelisted them as that happens.

I actually wrote all the connecting code in PHP, believe it or not, with a MySQL database as a backend. It's invoked using .qmail files. PHP is indeed good for things other than web pages; and was a little bit easier for me to maintain and deal with than Perl. The whole thing is less than 25KB of code. There is also a web backend which I use to configure it; that adds another 40KB.

The whole system took about twelve hours of programming to set up, on one Saturday.

Now, for correspondence to companies (such as Microsoft, or Amazon.com), I use a different scheme (although it's handled by the same PHP code). I create up a unique email address for each of them, which ONLY allows mail to or from that domain (for example "rptamazon@mydomain.com" only allows messages from amazon.com). Those addresses are also easily cancellable, individually, if the company starts to annoy me with spam. Basically, each email address can be assigned its own unique whitelist, and can be cancelled individually at any time, through the little web interface.

I also have a number of email addresses for things such as customer support for our company (I write computer software). I'm using the same system for those, also, but instead of checking whitelists based on the sender, I've found a simple way to do it is to check for ANY of our product names anywhere in the message body or subject. If the message doesn't mention any of them, it sends a simple autoreply back similar to that in (3) above, but mentioning that the message didn't seem to be about any of our products, but if it was, please click here, blah blah. We don't have a high volume of support messages (about one or two a day; we're a small company) but in the last year only three or four people have had to click through like that, and, honestly, their support requests were so f*cked up anyways that I'd rather it just dropped them on the floor. ;-)

Then, as a very last ste
Re:I prefer my method - sacrificial subdomains by cpghost · 2004-06-25 21:30 · Score: 2, Interesting

As a side note, I don't use these email addresses for personal emails - I can hopefully trust that the people I personally send emails to are not, or are not going to become spammers.

Well, that is not a very secure assumption. Unless you know that all those people are not using an MUA/OS combination that is vulnerable to viruses or worms. Harvesting addresses is done that way nowadays...

--
cpghost at Cordula's Web.
throws away ANY bulk mail by gfody · 2004-06-25 22:05 · Score: 4, Interesting

not all bulk mail is spam. spam assassin gives 2.4 points if it finds anything that looks like a unique identifier for X-Sender, and another 1.4 points for anything that looks like a tracking image or tracked link.

that plus the points for any non-safe html colors or any html at all, SA effectively tags ANY bulk mail as spam!

For an end user to setup on their client (as a "junk mail" folder) thats great.. I like to have bulk mail seperated from my personal mail, but for an ISP to throw it away before it even gets to the intended recipient is fucking rediculous and should be illegal.

The only email an ISP should be allowed to discard are the ones with attached viruses or some known email worm. The only reason your customers are happy with you throwing away their email is because you don't fucking tell them.

--

bite my glorious golden ass.
1. Re:throws away ANY bulk mail by hyperlinx · 2004-06-25 22:46 · Score: 1, Interesting
  
  Now that emails are used in court as evidence, ISPs or Webmail providers should never auto delete emails....ISPs should at least offer a link to a quarantine folder and let u choose to delete or not...Webmail services could generate an automatic folder called SPAM? that users could review occasionally if they want to rule out any misfiled emails. With the results others are getting with SpamAssassin, I would appreciate some filtering especially for my Webmail account...The filters currently used by several of my Webmail account providers certainly don't catch anywhere near the percentage of spam like those reported here.
  
  --
  In /.space, no one can hear you SCREAM!
Re:Great News! by WebCrapper · 2004-06-25 22:21 · Score: 2, Interesting

I'd be interested in seeing the scripts you have setup for a project I'm involved with. Any thought of sharing?
Re:Great News! by mkettler · 2004-06-26 02:12 · Score: 4, Interesting

Word salad I can understand (if you bayes isn't aggressively trained at least).. I don't have problems with it, but my bayes is very heavily trained. (100-300 spams a day manual training)

What I don't understand is the base64 problem.. One of the first thing SA does is decode base64. Even "rawbody" rules get base64 decoding, so really base64 encoding shouldn't make a difference at all, as SA never examines the encoded text.

As for the intentional mis-spellings of V!agr0, check out antidrug.cf (use google) or wait for SA 3.0 which includes this set of rules as a part of the standard distribution.

Disclaimer: I am the author of antidrug, and thus do have a bias here.

--
-Matt
Re:Great News! by NigritudeUltramarine · 2004-06-26 04:14 · Score: 3, Interesting

Yes, I would definitely like to make this stuff publicly available; I know a lot of people would be interested. I need to find a good way to do it. I'm a bit worried about drawing needless attention to myself by releasing such a thing--for example, the system is NOT foolproof, so I could certainly see myself becoming a target for attacks and such.

Hopefully I'll find some free time later this summer (two big big programming projects I'm working on now are ending next month) and I'll see if I can take a weekend and put a site together. I'll submit it as a story to Slashdot (and if it doesn't make it, post it in my signature and leave comments about it everytime someone mentions spam here).

The unfortunate thing is that making this public will increase work for me, of course (people needing help with installations, or submitting patches, etc.), so I'd like to find a way to mitigate the work involved. I don't really know what's involved in setting up an open source project; perhaps I'll look into SourceForge and see what the deal is. Normally I write commercial software; I don't know whether or not something like this could be sold or not. Obviously, if people were paying for it, providing support and taking time away from paying projects wouldn't be as big a problem for me since I'd be compensated. :-)

Alternatively, I've also gotten suggestions that I should keep the software to myself, and offer a paid service where my servers are the MX (mail) hosts for people's domains, giving them POP and IMAP access. I've actually been doing exactly that for my friends over the past six months or so; it's worked out well (four domains for friends currently) but I'm not sure how much the system can scale before I start running out of resources (bandwidth, CPU time, etc.). I'd really have to calculate everything carefully and work out the economics in order to do something like that as a real commercial venture.