Critical Eye on SpamAssassin
ErrorBase writes "In this Infoworld article, Logan G. Harbaugh makes a great deal about an ancient (2.44) version of SpamAssassin comparing it with newer comercial variants.
Quote : You get what you pay for. [...] However, it took more than 10 times as long to install and configure SpamAssassin as it did any of the other products. "
Why did he not ask Kevin Railsback who had the whole thing working some while ago?)"
"SpamAssassin 2.44, an open source spam filter included with Red Hat Linux 9." Included with RedHat 9 is spamassassin v. 2.54, not 2.44
SpamBayes, by far.
Webmin is great for setting up just about anything you can think of.
A psychopath can't tell the difference between right and wrong. A sociopath knows the difference - he just doesn't care.
Great - compare generation or more older open source to fresh shrinkwrap. Who's zooming (or shilling) for who.
.01% (yes Bucko, less than 1/1000) false positives. When they implemented it several versions ago it was just as good.
My ISP (souther NH) runs SpamAssassin 2.6 - and I can tell you that at the default settings it catches 90-95% with
I've got one client where the run NO filter - some folks (the names GOTTA be on the web site) get up to 100 spams a day. IT are basically monkeys with hands. I have no idea what the CEO thinks. They wouldn't even think OS as they're a total MS shop.
I don't understand why he's so critical of a free product. I upgraded to 2.60 and it's running near flawless, and since the program is so simple, you just upgrade it, no need to change configuration options if you don't need to, you just call it from procmail.
Yeah all those GUI options look nice, but 90% of the time, why do I need to change my spamblocking settings? The Bayesian filter autoadjusts itself with little or no user intervention -- it's near transparent.
I run a mail server at home on a Linux box, with Postfix and Spamassassin 2.60. I have it configured to label mail as spam once it hits 8 points, and to automatically chuck it into /dev/null once it hits 12 (using Postfix's header_checks).
It works pretty well for me -- the mail server's only for my personal use so I don't really have to worry about irate subscribers sueing me for dropping them legit mail =p and the 8-12 point range in the spam marking gives me a chance to vet through those suspicious mails briefly before deleting them.
I've never tried any other spam filters on the server-side, so I can't really compare. I guess I'm also a bit of a Linux hacker so I don't mind tweaking all those config files along the lines of the FAQ and other hints on forums to get it to work the way I want it to.
Gan Family Homepage
I know people have been recommending SpamBayes but be warned - it is very slow to parse and move the emails. Only bother with this if you receive only a small volume of spam or have a pretty fast computer.
He sent a long open letter to SAtalk. You can find it in the mailing list archive
you need to change them because the easy install solutions suck(and have default installs that somebody can try to get around and test untill it goes through).
world was created 5 seconds before this post as it is.
Each product was tested with a different stream of mail, so the number of messages received varied, but all received enough messages to assess their capabilities.
Can you imagine someone writing "Oracle, Sybase and Postgres were compared. While the data and workloads were different, all products performed enough work to assess thier capabilities."
All the products except Brightmail and SpamAssassin allow end-users to add senders to the domain whitelist themselves.
I don't know anything about Brightmail. Spamassassin end user whitelists entries can be set up in a number of ways.
And all the products but SpamAssassin use dynamic updates to keep up with the evolving technologies spammers use to circumvent less sophisticated filters.
As aluded to in the summary, this is false with modern versions of Spamassassin, which uses Baysian filtering. (The author later says he couldn't get it working.
However, it took more than 10 times as long to install and configure SpamAssassin as it did any of the other products. [...] But just because the software is installed does not mean it will work -- filtering criteria must be added manually, and until that's done nothing is filtered out. Getting the various configuration files edited properly so that the whole package worked was not simple. Documentation was difficult to find, and not always easy to follow.
While it is true that one must be comfortable with a text editor to configure Spamassassin, thus perhaps putting it out of reach of point-and-click admins and technical journalists, I also wouldn't be prone to put my mail servers in the hands of either of those groups of people.
It looks for keywords in the subject or body of e-mails, but is frustrated by words not in the dictionary, such as "V!agra," or words that contain invisible HTML characters.
While I am not sure what tests appeared in which version, I'm pretty sure 2.44 handled off-by-one works such as V!agra. I have no idea what he's talking about when he says "invisible HTML characters", but it does seem to point to a certain technical incompetence, similar to the ostritch belief - "If I can't see you, then you can't see me."
This is not to say Spamassassin is the easiest thing in the world to deal with. I happen to love it, because of the extreme flexibility.
I just get sick of tech journos who decide that because a tool doesn't have a gui and they don't want to take the time to configure it, it sucks.
I forget what 8 was for.
You think 2.44 is ancient? Feh - Debian 'stable' is still stuck with 2.20.
i have been using spamassassin for a year and it works great! granted, in the beginnings about 18% of the spam (in my case 18% of about 30 emails per day) would get trough. BUT if you read the manpage and tweak with the different scores a bit, you can get that down to 1 - 2% with about the same amount of false positives. as an admin, you should be able to tweak any spam filter to match your needs best.
what i can highly recommend is to increase the score of MICROSOFT_EXECUTABLE as it generally is a piece of spam. in addition the bayesian statistics are a great idea: a spam filter that learns!
as for the reviewer: if it takes this person 10 times longer to read a manpage and punch in some trivial scores into a trivially set up configuration file, then you should take his review with a HUGE grain of salt... especially since he reviewed an ancient version of the software.
finally a general comment about spamassassin: EXCELLENT software, especially for the bargain price of $0.
The version he's using might make the difference.
I was using 2.20 until recently. After updating to 2.60, the level of spam still coming through the filter dropped right off. It's about 1 msg. per day now, used to be at least 5 times that.
Assorted stuff I do sometimes: Lemuria.org
I've found the easiest way to implement SpamAssassin is to invoke it through MailScanner. MailScanner uses third-party virus scanners and can optionally invoke SpamAssassin as well. With the free ClamAV antivirus product, you can build a powerful open source mail scanner. Even without a virus scanner, MailScanner detects and quarantines executable attachments and other dangerous content which represent the most common types of mail-borne viruses and worms.
RedHat installs the daemonized version of SA as well as the SA Perl scripts. Using the daemon, the easiest implementation is to invoke SA in /etc/procmailrc on the mail delivery host; for mail gateways running sendmail, you need to use the milter interface. I've found the MailScanner+SpamAssassin approach much easier to configure than either of these methods, and you get virus scanning to boot!
I suspect if the reviewer had compared SA 2.60+ to the commercial products, rather than the older 2.44 version used in the review, SA would have shown better results.
I'd agree with the reviewer that one of the things SA lacks is an easy method for users to interact directly with the program. (Part of the issue has to do with security; SA runs as root. As I read the review, I wondered how the other products allow users to interact directly with the scanners without sacrificing security.) It's not easy to maintain per-user Bayesian filtering, for instance, but I generally recommend having the mail client, e.g., Mozilla, handle these tasks.
Not only is this somewhat old news, it's been discussed on the spamassassin mailing list. Apparently, the article was edited so that it's more anti-spamassassin than the reviewer intended, but Mr. Harbaugh also defends his review of an older version of spamassassin as "it came with my Redhat 9" (NOT a direct a quote). He also claims it took nearly an hour to install and set up. (I counter that it took seconds to install and minutes to set up).
The current version of spamassassin is 2.60.
Since then, I've downloaded a bunch of rules from The SA Custom Rule Emporium and almost nothing gets through.
If this guy had trouble, it is the fault of the documentation, not the product. Either that, or he was dumb enough not to upgrade to perl 5.8 or above, and spent forever installing modules.
He says:
Funny how when you install an old version of the product, it seems outmoded, hmmm?
Sheesh.
Pixie
don't mess with those geekgrrls
Spamassassin didn't seem that hard to install. I just typed "apt-get install spamassassin" and just piped my mail through it with a procmail recipe:
:0fw
| spamassassin -P
* ^X-Spam-Status: Yes
spam
Seemed simple and straight forward. Granted, if you're doing it on an entire machine basis you'd just use spamd/spamc and setup a filter on the mail server itself. For one user though I'm not sure how it could be any simpler. If I want to whitelist people I just add them to my ~/.spamassassin/whitelist file. *shrug*
So if you want to whinge at anyone, whinge at RH. At least this shows that reviewers now think they should include FOSS in their reviews.
Justin.
You're only jealous cos the little penguins are talking to me.
I don't know anything about SpamBayes so I cannot comment on it at all.
POPFile is easy to use. It also performs Bayesian filtering. It is what I use.
http://popfile.sourceforge.net/
My current POPFile statistics:
Messages classified: 1,440
Classification errors: 19
Accuracy: 98.68%
saconf works for the Windows versions of spam assassin.
http://www.openhandhome.com/saconf.html
Spamagogo doesn't have quite the same setup, but it is good, and free for now.
Time for a snack.
indeed - I've been using this for a while now. No false positives, I see bits and pieces in my unsure folder - including the "Hi, heres that link you asked for http://spam.spam.spamcorp, cheers .." that Paul Graham reckons is the future of spam.
Given I get over 100 spams a day and I see non of them I am very happy with this indeed.
We replaced an SMTP relay/spam filter/virus scanner based on Exchange and a commercial product (not one of the reviewed products) about a month ago with one using PostFix and SpamAssassin (and amavisd) on RH. Incoming spam levels have been reduced by about a factor of ten with no false positives to date. This solution was not much of a challenge to implement - for a primarily Windows-oriented admin for whom it was a learning exercise. I haven't tried the products reviewed, but am more than impressed with what we now have.
I'll third that - SpamBayes ROCKS. I use it at work where our IT department just wasted huge amounts of money on a back-end solution that stops less than half my spam while at the same giving me trouble with blocking legitimate messages. SpamBayes cleans up what the back-end commercial solution misses every time.
Takes about 2 seconds per message on my 1 GHz Mini-ITX based machine.
The Bayes filter in SA 2.6 works very well but unfortunately is not well-suited to site-wide learning.
-- casual readers may skip the following details
In an attempt to mitigate this, SA makes an unfortunate mistake in its unsupervised learning algorithm - it uses a different set of rules for training than it uses for marking mail as spam or not. So you can easily have email marked as spam but have the system trained as non-spam (or vice versa). This introduces systematic bias into the learning so that spam detection can get worse in the long run. As a further attempt to mitigate this problem, the learner uses a higher spam threshold, so many spams that are correctly marked do not contribute to the learning process. There is no way to set the SA configuration parameters to eliminate these biases (setting the learn threshold does *not* do it).
--- end of gory details
It is not too difficult to set up SA for personalized learning. Just pipe your mail to the following command:
spamassassin -e
If the return code is 0 (non-spam) also pipe the mail to
sa-learn --ham --single
If the return code is 1 (spam) pipe to
sa-learn --spam --single
If you do this you are guaranteed that the statistics recorded in your personal bayes db correspond exactly to the judgements made by SA.
In addition to this you must correct SA when it makes a mistake, by piping the message to sa-learn again with the right flag. You may be able to set up a macro in your mail reader to do this.
This isn't as easy to set up as it should be, but it is *very* effective.
In the last year I've received 20,000 non-spam and over 100,000 spam messages & viruses (30,000 if you eliminated the "Cumulative Update" messages, which SA caught just fine.) About 100 spams have gotten through (a couple a week) and about 10 false positives have occurred. All of the false positives have been 'weird' - advertising, automatic responses, or web pages that were forwarded to me. As far as I know (and I do check periodically) I've had no false positives in the last 50,000 spams.
My preliminary analysis indicates that personalized learning reduces both false negatives and false positives by a factor of ten. I'll report more systematic analysis in due course.
That is called scoring. Gnus and other good email/news clients have this. Very useful for reading high-volume lists and avoiding USENET kooks.
I, my wife, and yes - even the inlaws - run PopFile
It can be used locally, or used at the mail server. Either way, I'm over 98% alltime accuracy - with thousands of mail's checked and its very easy to config via its web interface.
I am sure he was as disappointed as me that the installation didn't follow the ./configure && make && make install standard procedure, and that it defaulted to /usr instead of /usr/local as installation directory.
- su -
- perl -MCPAN -e shell
- cpan> install Mail::SpamAssassin
Nice easy way to install and keep up to date with the latest version of SA. This might be why theAnd if I remember correctly, the CPAN method does install the programs to /usr/local/.
Two things, first, it is probably more proper to match the X-Spam: YES header than the number of asterisks in the X-Spam-Level header. Then you configure you can tweak your cutoff level for X-Spam: Yes in the SA config.
Also, rather than running SA from procmail or other means, it is much more efficient and clean to run it from a seperate daemon like amavisd-new and then configure postfix to use amavisd-new as a content_filter. There are several advantages of this approach, the greatest one being that you do not have process startup penalties for incoming mails to be scanned since amavisd-new is written in perl, references the SA engine through the perl module rather than the commandline, and has a similar scalable child process architecture to apache and many other network server daemons. Other nice things about amavisd-new is that you can integrate many different virus scanners with it as well as SA and it will handle all the subject rewriting, mail deleting, etc for you.
Are you running the newest version? 2.60 is much improved over previous versions.
.5%
If you are running 2.60, have you trained and enabled the bayesian filters? By default you need to feed SpamAssassin about 300 spam and 300 ham (non-spam) messages for it to learn the difference. It will auto-train itself over time but it only auot-learns on messages that are very obviously (to it) spam or ham.
If you normally only get email from a select list of people then you may want to lower your threshold. For people you routinely recieve email from, SpamAssassin will remember that they usually don't send you spam so if you occasionally get something with a high score from them it will automatically lower it a bit. So, you can lower your threshold and still not get any false positives.
I have my required_hits set to 3 and the only false positives I've seen (since switching to 2.60) have been mailing lists (one was from LinuxWorld, the other from another news site) and not person-to-person email. I recieve 50-60 spam messages a day and only one or two a week gets into my inbox.
spam cathing - >99%
false positives (normal email) - 0%
false positives (mailing lists) -
I do some stuff to keep SpamAssasin's bayesian filters well trained. Every couple weeks I will go in to my spam folder and quickly page through it. If I see a spam that the bayesian filters gave a low score (less than 90% sure it is spam) I will pipe it (I use pine) to sa-learn to train the bayesian filters (unless it was autotrained).
We've just started using MailScanner on a box running Fedora Core 1 here. So far MailScanner with SpamAssassin, DCC, Razor and Pyzor is doing a good job, but it is too early for us to get meaningful statistics. A nice web front end for MailScanner is MailWatch, and we monitor the throughput and performance of the box with MailScanner-MRTG.
Phil
I wrote an article about the open source tools that I use to keep Spam out of my inbox here:
http://www.involution.com/spamstats.php
Here's how I catch false positives. But basically you should just learn to live with either false positives or spam. Take your pick.
:0 H :0 H
I turned subject rewriting on:
rewrite_subject 1
Then I set the subject tag to include the hit number:
# Text to prepend to subject if rewrite_subject is used
subject_tag *****SPAM****:*_HITS_*
then in your email client you can sort your JUNK messages based on subject. This will put the tagged spam messages with the fewest hits at the top. That way you can easily look at messages with the fewest hits.
I added another level of filtering to avoid looking at totally bogus spam messages. I setup two folders in my email client. "SPAM" and "EVILSPAM". I have a procmail filter that pipes spam messages with hits greater than 10 to EVILSPAM, that way I don't even look at them. All other spam goes to SPAM:
* ^X-Spam-Status: Yes, hits=[0-9][0-9]
mail/EVILSPAM
* ^X-Spam-Status: Yes
mail/SPAM
Your email client can probably do this for you, instead of a procmail filter. But this way I can use webmail and all my rules are on my server, not on my client.
joe.
Thank you to Mr. Harbaugh for replying. His second paragraph still indicates that he doesn't realize that the current release of SA has all the features he said were missing. I look forward to this being corrected in a future article. I didn't go into much of a free vs commercial debate in my reply; however it seems that some folks did. I also didn't touch on the support issue. Frankly I find that support really isn't needed as long as the admin is compotent. I was involved in a discussion yesterday with a company I consult with. The topic of the discussion was which Linux distro we should use in the future now that RH is going towards an entreprise distribution and support contracts. Many seemed to believe that we should have technical support for whatever distro we chos