SpamAssassin Gets a Promotion

← Back to Stories (view on slashdot.org)

SpamAssassin Gets a Promotion

Posted by michael on Friday June 25, 2004 @06:04PM from the now-more-assassinating-power dept.

darthcamaro writes "The folks at internetnews.com are reporting that the Spam Assassin project has been promoted to a full top level Apache Software Foundation project..the project has been in incubation for a while and it's finally made it through...the article also reveals that Apache is now using Spam Assassin themselves: 'I think spam filtering is now a critical part of the network infrastructure and Spam Assassin is a leader in the area,' said Daniel Quinlan, chairman of the Apache Spam Assassin Project Management Committee."

25 of 168 comments (clear)

Min score:

Reason:

Sort:

Great News! by Anonymous Coward · 2004-06-25 18:09 · Score: 5, Informative

This is great news! I have been running SpamAssassin on my box for quite a while, just to filter my own mail. I recently installed it on my mother's Windows 98 box to filter her mail when she checks it with Outlook Express, and she hasn't complained about Spam since. With a bit of tweaking, its been catching 95% with no false positives. Hopefully the SpamAssassin project will keep on getting better :)
1. Re:Great News! by NigritudeUltramarine · 2004-06-25 20:59 · Score: 5, Interesting
  
  A success rate of 95% really sucks when (like me) you get just over 2,500 spams a day. That'd still mean around 125 spams a day would be getting through. (I've had the same email address since the early 1990's, back when there was no reason to keep your email address "secret.")
  
  Personally I do use SpamAssassin, but as an intermediate step.
  
  First step: Check a whitelist of known senders. Deliver if the sender is on the list, AND the message originated from an IP subnet that I allow for them personally.
  
  Second step: Scan with SpamAssassin. If the score is really high (above 20) throw it the hell out.
  
  Third step: If the score is less than 20, and the person wasn't whitelisted, run the message through TMDA and politely tell the sender I'm not sure who they are, and I get a lot of spam, and could you please click this link to prove that you're a real person.
  
  I've been using this three-step system for eighteen months now, and out of over one million messages that have come into my mailbox (really), exactly FOUR spam messages have made it all the way through. Apparently the spammers decided to go ahead and click on the little link, or they used a real person's return address, and when that person got they autoreply, they were too stupid to understand what was going on.
  
  Even better, I have not received ANY indiciation that I've lost any messages; at least, no one has ever mentioned anything about an email that I didn't get.
  
  I've got five other people at my domain using the same system, although for not quite as long (one for fifteen months, three for about a year, and one for just a month now); they have all had similar success.
  
  So based on those numbers I'd estimate a success rate of 99.9997% for eliminating spam (which is, admittedly, COMPLETELY INSANE), and a false-positive (or at least "lost message") rate of 0% so far (fingers crossed). A few people have had to confirm their messages, of course, but I've whitelisted them as that happens.
  
  I actually wrote all the connecting code in PHP, believe it or not, with a MySQL database as a backend. It's invoked using .qmail files. PHP is indeed good for things other than web pages; and was a little bit easier for me to maintain and deal with than Perl. The whole thing is less than 25KB of code. There is also a web backend which I use to configure it; that adds another 40KB.
  
  The whole system took about twelve hours of programming to set up, on one Saturday.
  
  Now, for correspondence to companies (such as Microsoft, or Amazon.com), I use a different scheme (although it's handled by the same PHP code). I create up a unique email address for each of them, which ONLY allows mail to or from that domain (for example "rptamazon@mydomain.com" only allows messages from amazon.com). Those addresses are also easily cancellable, individually, if the company starts to annoy me with spam. Basically, each email address can be assigned its own unique whitelist, and can be cancelled individually at any time, through the little web interface.
  
  I also have a number of email addresses for things such as customer support for our company (I write computer software). I'm using the same system for those, also, but instead of checking whitelists based on the sender, I've found a simple way to do it is to check for ANY of our product names anywhere in the message body or subject. If the message doesn't mention any of them, it sends a simple autoreply back similar to that in (3) above, but mentioning that the message didn't seem to be about any of our products, but if it was, please click here, blah blah. We don't have a high volume of support messages (about one or two a day; we're a small company) but in the last year only three or four people have had to click through like that, and, honestly, their support requests were so f*cked up anyways that I'd rather it just dropped them on the floor. ;-)
  
  Then, as a very last ste
2. Re:Great News! by mkettler · 2004-06-26 02:12 · Score: 4, Interesting
  
  Word salad I can understand (if you bayes isn't aggressively trained at least).. I don't have problems with it, but my bayes is very heavily trained. (100-300 spams a day manual training)
  
  What I don't understand is the base64 problem.. One of the first thing SA does is decode base64. Even "rawbody" rules get base64 decoding, so really base64 encoding shouldn't make a difference at all, as SA never examines the encoded text.
  
  As for the intentional mis-spellings of V!agr0, check out antidrug.cf (use google) or wait for SA 3.0 which includes this set of rules as a part of the standard distribution.
  
  Disclaimer: I am the author of antidrug, and thus do have a bias here.
  
  --
  -Matt
Here is the real link to spam assasins site by vespazzari · 2004-06-25 18:09 · Score: 4, Informative

For those looking for the official spam assasin site here it is

The link in the text goes to some search page

--
"Alcohol, cause of, and solution to, all of life's problems" -Homer Simpson
DSpam by Pinball+Wizard · 2004-06-25 18:21 · Score: 5, Interesting

After using SpamAssassin for quite a while, it just wasn't cutting it - 75%-80% accuracy is still a lot of spam to go through and delete. I added DSpam to my mail server and my spam catching rate is now better than 99%.

DSpam also came with much better directions for integrating with Exim than did SpamAssassin. As fond as I was of SpamAssassin, they have some catching up to do.

--
No, Thursday's out. How about never - is never good for you?
1. Re:DSpam by Anonymous Coward · 2004-06-25 18:53 · Score: 5, Interesting
  
  DSpam 3.0 is definitely not easy to set up. Add to that there is a database that needs to be set up on the back-end, and lots of configure flags at compile-time, plus permissions issues, etc. etc.
  It's also not very easy to understand how it works, or configure your mail client to easily train it, or to configure procmail how to properly call it (there are a lot of command-line flags as well).
  
  That being said, IT IS WORTH IT. A properly set up and trained DSPAM filter will SOLVE your spam problem. Training time usually takes about 2 weeks and the results are fantastic after that.
  
  You can also set it up a number of ways - server-side, user-side, with postfix or another mail server, with procmail or without. Relay or not. It's up to you.
If Only It Was For Real by Nom+du+Keyboard · 2004-06-25 18:30 · Score: 5, Funny

If only it truly assassinated spamers.

--
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
The problem is... by Lord_Slepnir · 2004-06-25 18:30 · Score: 4, Funny

See, i'm not interested in Assassinating Spam. Now if there was a SpammerAssassin, then I'd be all over using that.
what to do with spam after it's id'd? by Hollins · 2004-06-25 18:32 · Score: 5, Interesting

What do you do with mail SA has flagged?

I like SA, and find it is very good for identifying around 95% of my incoming spam. However, I also have around 0.1% false positive rate, which means at some point I have to look through all the filtered spam messages and make sure none of them were legit.

I need a better tool for handling mail SA has identified as spam, either server-side or client-side. I'd like to delete anything with a score > 15, simply store anything with a score > 5, and send an auto-reply for scores between 5 and 10 indicating that the message was marked as spam and I'll probably never look at it.

A good set of procmail and formail rules will accomplish this, but my hosting company has a weird procmail setup and I'd prefer something easier to implement.

Any ideas?
1. Re:what to do with spam after it's id'd? by dasunt · 2004-06-25 19:03 · Score: 4, Insightful
  
  I need a better tool for handling mail SA has identified as spam, either server-side or client-side. I'd like to delete anything with a score > 15, simply store anything with a score > 5, and send an auto-reply for scores between 5 and 10 indicating that the message was marked as spam and I'll probably never look at it.
  
  Procmail can do it, but please reconsider the auto-replies. What happens if I'm pissed at bob and decide to sent out 1m spams with the return address of bob@example.com? More common, what about viruses that forge headers?
  
  I would consider auto-whitelisting instead.
2. Re:what to do with spam after it's id'd? by Anonymous Coward · 2004-06-25 20:04 · Score: 5, Informative
  
  Sending an auto-reply on scores between 5 and 10 (or any other range) makes you part of the problem, not part of the solution.
  
  I have a very well known address (which is why I'm posting as an Anonymous Coward :-) that receives many hundreds of messages every day. My mail server deals with about half of the spam I get. Well over half of the rest is autoreply responses from idiots who don't understand that *I* never sent that message in the first place -- the from address was forged by a virus.
  
  The correct response to spam is to throw it away. Trying to reply to it makes the world worse, not better.
3. Re:what to do with spam after it's id'd? by antsquish · 2004-06-25 20:57 · Score: 4, Informative
  
  I know you mentioned procmail, but for those using Courier IMAP's maildrop, here's what I use in my ~/.mailfilter for SpamAssassin. I've just pasted the relevant sections, but it logs all deliveries, I then filter known recipients into their own folders (not shown here), then any unknown messages are filtered through Spam Assassin. Messages with a score > 10 are sent to /dev/null, while others are delivered to a spam folder. logfile "/path/to/my/home/dir/maildrop.log" ### ### Maildrop variable substitution ### MAILBOX="./Maildir" DEFAULT= "$MAILBOX" SPAM="$MAILBOX/.Spam" ### ### SpamAssassin :: filter out spam mail ### # Filter through SpamAssassin xfilter "/usr/local/bin/spamc" # Handle messages marked as spam if ( /^X-Spam-Flag: YES/ ) { # Store messages flagged as spam in another folder; uncomment # this during testing just in case any legit mail gets sent # to /dev/null #cc "./spam-store" # Delete messages with a score of 10 or higher, filter all other # spam messages into a spam folder /^X-Spam-Status: yes, hits=![:digit:]+\.[:digit:]+!.*/ if ( $MATCH2 >= 10.0 ) to "/dev/null" else to $SPAM }
What's the big deal? by FireBreathingDog · 2004-06-25 18:46 · Score: 4, Funny

Everyone on Slashdot always seems to be complaining about spam. I don't see what the big deal is. I enjoy receiving e-mail from people and companies I don't know. Each morning when I run my e-mail program, it starts downloading, and the unexpected e-mail is a pleasant surprise that brightens my day. Well, a few hundred pleasant surprises that is, and they brighten my day in the same way that stepping in a pile of dogshit brightens my day. A few hundred times. So what the fuck? Why are all you whiny bitches on Slashdot always complaining about spam? Don't waste your time writing or deploying spam blockers. Enjoy life. And relax. Assholes.

--
Shame on Google.
Re:Bout Time! by jest3r · 2004-06-25 18:50 · Score: 4, Interesting

Today spam assassin filtered (flagged) 19,246 incoming emails out of 20,145 total on my mail server. Absolutely no false positives since I installed it a year ago .. and only a few false negatives. I silently drop anything with a score over 13 ... my cstomers are happy .. my qmail remote queue has been happy .. spam assassin is a quality app .. spam is really not a concern anymore.
You people need to stop being so cynical by Enlarge+Your+Penis · 2004-06-25 18:55 · Score: 5, Funny

I don't employ Spamassassin or any other spam blocker. As a result, I now have a penis that will make her scream, hot lesbian schoolgirls lusting after my every move, a wide range of generic drugs, 2 PhDs and a completely clean credit record

A step up from living in your parent's basement and whacking off to an inflatable doll, right?

I'd stay and chat, but I have to get back to a Nigerian man about a bank transfer
1. Re:You people need to stop being so cynical by cryms0n · 2004-06-25 19:28 · Score: 4, Funny
  
  I am no expert on inflatable dolls, but I think you are supposed to make love to them, not whack off looking at them.
sorting mail by spamassassin score by David+Jao · 2004-06-25 19:08 · Score: 4, Informative

I'd like to delete anything with a score > 15, simply store anything with a score > 5, and send an auto-reply for scores between 5 and 10 indicating that the message was marked as spam and I'll probably never look at it.
I can't speak for auto-replies, but you can do the sorting part client-side. The key is that spamassassin adds a line like "X-Spam-Level: *****" where the number of *'s is the score of the email. Almost any email client can filter mail to different folders based on headers. The unary representation of the spam score ensures that even a primitive filter can work.
For example, one popular client is Microsoft Outlook, and there are several web pages in google (such as this one) that explain how to reroute mail to specific folders depending on the spamassassin score.
Don't worry by KalvinB · 2004-06-25 19:20 · Score: 4, Funny

they'll get it when they post the story again.

Ben

--
Work Safe Porn
Get the owner, not the dog..... by Univac_1004 · 2004-06-25 20:13 · Score: 5, Insightful

Spam Assassin, while a very clever program, is as misdirected as the "Canned Spam" legislation. It has no effect on the real economics of spam: who pays for it.

Somebody is paying for the spamming, and we know exactly who it is. The URL of that organization is prominently displayed in every item of spamail. It is the advertiser.

The advertiser is right there out in the open, easy to locate. If they're not, the spam isn't doing its job, and wouldn't have been sent. And easy to locate means easy to go after, easy to sue, to fine, DoS or whatever.

Dinging the advertisers, and dinging them hard, will instantly put the spammers out of business.

Spamming can be eliminated without blocking, white lists, or anti-spoofing RFC's. Just go to where it's pointing.

To draw an [ugly, graphic] picture: a dog comes and poops on sidewalk in front of my house, and I step in it. Yelling at the dog is going to be only moderately successful, building a poop filter is difficult, messy, and leaky (as Spam Assassin demonstrates) . Following the dog's leash and fining the owner is what works.

The owner doesn't bring the dog back since s/he doesn't want to pay another fine.

No owner, no dog, no spam.

Get the owner.

Kill the spam.
Re:Bout Time! by Mazem · 2004-06-25 20:21 · Score: 5, Insightful

Absolutely no false positives since I installed it a year ago ..
... that you know of.
3.0, late-July, early August by chathamhouse · 2004-06-25 20:31 · Score: 4, Informative

3.0.0pre1 was made available last week.

It will apparently take another month or so to finalize the weighting of the rules.

I've put 3.0.0pre1 on a production system that filters ~350k messages per day. With some tweaking of the RBL, bayes, and AWL rules, it is much (~10%) more efficient at tagging spam than 2.63, which I'm running on a parallel server that also sees ~350k messages/day (load balancing is your friend).

More info: http://www.au.spamassassin.org/full/3.0.x/dist/bui ld/3.0.0_change_summary
Re:Challenge-Response schemes are more effective by Vellmont · 2004-06-25 21:35 · Score: 4, Informative

I've been running SA since February, and have had a grand total of ONE false positive out of a few thousand emails. The message was from a new account, very short, and in HTML. That address has since been added to my autowhitelist. SA couple with Amavisd-new and clamav has reduced my spam volume by about 95%, and my virus emails to zero. It's a great product and I'm looking forward to 3.0.

--
AccountKiller
Re:3.0? by Brian+the+Bold · 2004-06-25 22:03 · Score: 4, Informative

Have a look at the Rules Emporium at:

I use the rules there, and even minor spam gets obliterated with no problems of catching real mail.

I recommend it!

--
-- BtB
throws away ANY bulk mail by gfody · 2004-06-25 22:05 · Score: 4, Interesting

not all bulk mail is spam. spam assassin gives 2.4 points if it finds anything that looks like a unique identifier for X-Sender, and another 1.4 points for anything that looks like a tracking image or tracked link.

that plus the points for any non-safe html colors or any html at all, SA effectively tags ANY bulk mail as spam!

For an end user to setup on their client (as a "junk mail" folder) thats great.. I like to have bulk mail seperated from my personal mail, but for an ISP to throw it away before it even gets to the intended recipient is fucking rediculous and should be illegal.

The only email an ISP should be allowed to discard are the ones with attached viruses or some known email worm. The only reason your customers are happy with you throwing away their email is because you don't fucking tell them.

--

bite my glorious golden ass.
Re:Bout Time! by Just+Some+Guy · 2004-06-28 01:45 · Score: 4, Informative
I "augmented" SpamAssassin with an extremely tight Postfix ruleset. A remote server has to jump through these hoops before SA ever gets a crack at it:
1. HELO Filtering
1. Reject any connection that doesn't start with HELO or EHLO.
2. Allow any host on my LAN to continue on to step 2.
3. Reject any host not on my LAN that sends a hostname or IP of a machine on my LAN.
4. Reject non-FQDN hostnames (ala "mailserver").
5. Reject invalid hostnames (ala "432$@@112").
6. Let everyone who makes it this far continue on to step 2.
2. Sender Filtering
1. Allow authenticated senders to continue on to step 3.
2. Allow hosts on my LAN to continue on to step 3.
3. Reject non-FQDN sender domains ("foo@bar").
4. Reject unknown sender domain ("foo@imaginarydomain.com") - after all, if I can resolve their domain, then I couldn't reply to them anyway, right?
5. Let everyone who makes it this far continue on to step 3.
3. Recipient Filtering
1. Reject non-FQDN recipient domains (they'd bounce anyway).
2. Reject unknown recipient domains (same as above).
3. Allow authenticated users to send their mail and stop processing.
4. Allow hosts on my LAN to send their mail and stop processing.
5. Reject mail from anyone else that isn't to one of my domains, or one I'm an MX for.
6. Use SPF to reject spoofed email.
7. Use the relays.ordb.org, list.dsbl.org, and sbl-xbl.spamhaus.org DNS blackhole lists.
8. Greylist all email not coming in from or going out to peer MXes.
9. Pass everything else to step 4.
4. Content Filtering and Delivery
1. Use ClamAV to reject viruses. This takes a big load off SpamAssassin.
2. Use SpamAssassin to tag messages.
3. Use Cyrus's Sieve to reject high-probability spam, put medium-probability messages into a "review" folder, and filter everything else into the appropriate folders.
I reject over 95% of all incoming mail before it ever gets to SpamAssassin. This means that SA's success rate isn't as good as on other systems (since I weed out all of the obvious spam), but my mailbox is happy and shiny.
SpamAssassin is a brilliant last line of defense, but I wouldn't advise just dumping your raw incoming stream into it. Much of the useful information about a message isn't available to spamd (such as your list of local domain names, relay domains, etc.) and you should consider using a set of cheaper filters to flush out the blatant chaff.
--
Dewey, what part of this looks like authorities should be involved?