Critical Eye on SpamAssassin
ErrorBase writes "In this Infoworld article, Logan G. Harbaugh makes a great deal about an ancient (2.44) version of SpamAssassin comparing it with newer comercial variants.
Quote : You get what you pay for. [...] However, it took more than 10 times as long to install and configure SpamAssassin as it did any of the other products. "
Why did he not ask Kevin Railsback who had the whole thing working some while ago?)"
What is a good, free client-side spam filter for Outlook?
All my incomming mail comes through SpamAssassin (cant remember which version off the top of my head), and once in a blue moon a single piece of spam will manage to find it's way through. When it does, I guess i should just applaud the spammer for being so devious.
TrollAssasin would be nice, imagine seeing posts subjects as *****TROLL***** heh
Why did he not ask Kevin Railsback who had the whole thing working some while ago?)"
He expected to get the results that he normally gets with most commercial software. Click Setup.exe, answer a question or two and it's done, up and running. Further configuration is not required though it may be desired.
The commercial vendors of Spamassassin have not improved the core product in any way. What they have improved is the packaging, the installation, the default configuration and the interface to modify that configuration. The stock SpamAssassin does not offer that although, Spamassassin setup is far more simple than some other packages out there.
versus
The first found Spamassassin easy, the second found it hard. Hmmm.
What really aggravates me is the typical "There are blacklists available that you can subscribe to, and some are updated regularly, but these are noncommercial lists with no guarantees." I'd like to see what guarantees the commercial lists come with.
[SpamAssassin] filtered only 62 percent of spam, whereas the other products produced great results, blocking 90 percent to 96 percent of all the spam they encountered with few, if any, legitimate messages blocked.
To me, this statement is pretty telling. Harbaugh must get some completely different kinds of spam than me, because, even though I receive about 60 spam mails a day (directed to my "spam" folder, so I never see them until I scan the "From:" field and then delete them), maybe one per week makes it through the filter. And seeing as how I can't even remember the last time I got a false positive, that's a pretty damn good number.
I can believe that if you receive a variety of mail and if you took no time to configure SpamAssassin other than cranking it up, maybe then it'll only catch 80% of the spam. But 62%? I'm not sure if Harbaugh is skewing the benchmarks or if he just doesn't know what he's doing.
There are some legitimate issues with SpamAssassin that might not make it ready for the enterprise, but for a handful of users, I have been more than satisfied. And the price is right.
-- "Complacency is a far more dangerous attitude than outrage." -Naomi Littlebear
I was using version 2.44, I was able to compile and upgrade spamassassin before the number of posted replies hit 60! Can't be too hard!
All my mail comes through spamassassin as well, but I am not having nearly the success you are...
.2-.5% false positive. Don't get me wrong, I am WAY happier now that before spamassassin, but if I could be getting better performace, that would be great...
I get about 60-70% of my spam correctly tagged, and about
"I'll have a Guinness, no wait, make that a Coors Light" -Grad student I work with, who shall remain anonymous...
While his review was perhaps not scientifically conducted. I think there was a point to be made with the SpamAssasin blurb.
Notice that he deliberately took a standard install from RedHat 9, something some IT person (Not a tr00 g33k) might buy at CompUSA. He then tried to install the provided product. Clearly, a tr00 g33k would go and download the latest release, but keep in mind that not everyone is so comfortable with being on the bleeding edge - I believe that this was a point he tried to make. There is also the perception that the release provided with a "product" such as RedHat 9 will be up to the same standards as the OS.
While it's true the latest version has default rules and whatnot - it's quite likely that his older, more out of date version does not. In fact, going briefly to the spamassin home page the links for the 2.5 and 2.4 release documentation are broken.
The point to be made was: OSS needs to be more buttoned up. Notice that he said that he had no trouble installing redhat 9. That's becuase the installer is rather good.
I know you're just joking, but to be serious for a minute, the reason not to do that is because you'd be transparently altering someone else's copyrighted property. Overzealous and/or overworked sysadmins misconfigure SA to globally analyze all incoming content and then to alter email subjects based on its opinion. This is an invasion of content, certainly prone to false positives because antispam scanning is an individually trained process, and breaks the trail of reply threads at least on a visual basis. There are always going to be tons of misconfigured or RFC ignorant smtp servers out there, and being compatible with them is what makes the Internet work. That would include corporate servers, legitimate opt-in bulk mail, and opt-in mailing lists run by Some Dude. There will be people on a mailing list whose personal content is always publicly marked by certain recipients as spam! It's confusing, insulting, and unnecessary. SMTP has invisible meta-tags in its headers to allow for that, and agents are supposed to respect them.
This is fine for using SA's global config as your personal config for your own little systems, but not for an ISP or business.
According to spamassassin.org:
I believe the article is a bit unfair on spamassassin. Spamassassin does fairly good at what it is good at -- filtering spam. The other commercial products seem to be a total solution package, which would not only filter spam but lets you configure it so that, for example, you could have special spam folders with an auto expiry date.
I would be more interested in seeing comparisons on how well it compares with other commercial products on the success rate of identifying spam email (false positives would also be quite interesting).
Having said that, I agree that it would be nice if there were some programs or scripts that would automate the setting up of these nice ``extra'' features for you.
A final note, it seems that the article is not very accurate. I am quite sure that spamassassin would allow you to define whitelists, however, that requires running it as root and that has security implications.
Does he by any chance love outlook rules as well?
:)
Spam assasin is on my server and is absolutely brilliant.. it catches 99.9% of all my spam, and has only on 5-10 occasions in the past month (i get about 50-60 emails a day) counted 'innocent' mail as spam... and even those were newsletters....
Anyone who slates SpamAssasin is one very deluded person... its Open Source, constantly improved... open to editing by it's users, rules can be added.... marvellous.
Commercial variants ive seen have been painfully badly implemented and not worked properly. Get SpamAssasin and fight the closed source lovers
Humorous how the guy who liked SpamAssassin (Kevin Railsback) was a tech who actually set it up for use at infoworld and the guy who didn't like it is an "IT consultant the author of two books on networking." Always trust a tech.
The heat from below can burn your eyes out
Exactly, I had SA integrated into exim with custom rules and what not, but it would break on upgrading the debian package, happened twice, needed to tweak exim.
:( I really should re-enable the bayes stuff, and figure out how to teach it what isn't spam.
Then I found out about the beauty of procmail once I looked into filtering all spam to it's own folder without email client filters. So now, I have different emails filtered to specific folders before it ever hits my inbox. Oh and I had to disable the bayesian filter, it was catching way to many not spam emails. Stuff that didn't have any keywords in it at all. One was just a couple quick sentences from a friend, who knows why it thought it was spam.
Here's a watered down version of my procmail file for those interested: http://gid0ze.net/dl/dot.procmailrc
Bayesian filtering is a bit like fuzzy-logic. Right now, it's best known for filtering spam. SpamAssassin uses a whole long list of tests and assigns +ve or -ve scores to each test that comes out positive (a bit like Slashdot's moderation).
I know someone who did a project on classifying video using Bayesian filtering. It looked at stuff like brightness, contrast, volume, basically everything they could extract from the movie file and give a value to. The concept itself is quite powerful; the difficulty is getting a list of tests that can accurately predict / classify what you have (spam/non-spam, or for video, thriller/drama/etc).
If you're interested in finding out more about actually coding Bayesian filters, you can check out the Bayes ++ project page.
Gan Family Homepage
spampal does the trick for me.
:)
quick and effective identification. can check the online black hole lists for IP ranges to block and you can manually set the thing up to ignore email from any country.
goooooodbye china!
A problem we had here at Netmar was that spam assassin, in conjunction with mime-defang, really slams the system. We have several clients who run listserv-type email lists (for various reasons, all verified non-spam, most for like non-profit orgs), and when you send a 500k listserv digest email to 2,000 people, in the default spam assassin config, it would spawn a perl process for each attempted email. So, for about 3 minutes, our mail server would be swamped (load creeping up over 10ish), even though it's a 1.2 ghz duron.
So, we solved it by figuring out how to run spam assassin / defang as daemons. Works great now, and when someone tries to send 2,000 messages, it just queues them and delivers them as it can. Takes less time to get through them one at a time than it did to spawn max_file_descripters perl processes.
~Wx
sig?
And you better change that sime, straightforward procmail recipe to use ":0fw:" on the first line. That trailing ":" is important if you are not running spamd, as it makes procmail use a lock file and only run 1 instance of SpamAssassin at a time. Otherwise, if you get 30 messages, you'll get 30 instances of SpamAssassin, which is 30 instances of Perl, etc. Large load spike.
As for maintainence, there isn't any. I set up exim two or three years ago and have hardly touched it since.