Spamassassin Beats CRM-114 In Anti-Spam Shootout
Simon Lyall writes "A new study of antispam software shows that Spamassassin performed well in various configurations along with Spamprobe , Bogofilter and Spambayes also came out good while CRM-114
failed to live up to its previous claims . The study shows: 'The best-performing filters reduced the volume of incoming spam from about 150 messages per day to about 2 messages per day.'"
I must admit that I am not upto date on these new anti-spam software packages, which operate on the server side. However, what is the probability of real mail getting rejected by these things. It seems almost like an invasion of privacy to block my own email even if it is from a "benevolant big brother" perspective.
I guess that is why there are privacy policies though.
aj
GroupShares Inc. - A Free and Interactive Stock Market community!
-------
artlu.net
How exactly does the US (or other first world country) go about writing a code of law that puts Nigerian spammers in jail?
Hear recorded Slashdot headlines on your phone! New service beta testing. Just call (248) 434-5508
Except for making a new standard that's a requirement for doing business with federal agencies, just what do you think government's capable of doing regarding replacing protocols?
-PM
500GB of disk, 5TB of transfer, $5.95/mo
Maybe I'm missing something human accuracy always going to be 100%? I tell the computer what is spam, it learns. I may decide that regardless of what it thinks, this last message is OK. So aside from clicking too fast or changing your mind (which is a common thing to do) how can a filter ever suggest it is be better then people at deciding what people want to see?
filtering tools work fairly well, but more importantly they work right now. Waiting for the authorities to "wake from their slumber" might take years, if it ever even happens.
In real world deploys of statistical filters, something like DSPAM's "global user" feature is necessary. The ability to begin with a relatively mature dictionary is critical to the user experience. Personally, DSPAM is filtering around 200 SPAMs per day for me, allowing one through every few days. It's 99.985% effective for me.
:w
...hammer the spammer's ISP with complaints until the advertised website is DEAD, DEAD, DEAD.
STOP MISUSING APOSTROPHES, YOU MORONS!!!
People LOVE it.
There are some false positives and some false negatives.
But I have it set to delete anything 12+. That gets rid of the worst of the worst spam. So far, not a single complaint of any email being deleted.
Everything else has the subject re-written so people can run their own rule set against it.
In the past 8 hours
1867 messages received
375 messages deleted
1266 messages flagged as spam
So, only a few hundred actual, good emails.
Of course, that's only 4 hours during the regular work day (and 4 hours after work). But you can see the proportions. It saves people a TON of time.
And it makes them happier when they don't have to constantly dig through crap to see if any real messages have arrived.
Now, those spam messages are NOT distributed evenly. Our HR manager had her email address posted on the website. So she gets about 20-25% of the spam.
It's not exactly Big Brother 'cause no human sees the deleted spam.
Because you will always have one main 'obvious' address - be it something that goes on your business card, or something you tell to people you meet. For example, I use glen at glenmurphy.com.
Now all it takes is one slip - someone you know to get a virus, whatever, and your address is 'out there' for the taking. Your only possible recourse then is to stop using that address, but for some people that's just not an option, and it's a just bit defeatist to sit there surrendering email address after email address.
Not exactly fair.
Huh, since when did spammers start playing fair!. This is about winning, not software political correctness.
Also on the unbalanced dataset, I train my filter with spam corpuses that reflect my what I receive in my email. Many accounts receive 10 spams for every ham. The biggest thing that I've had to retrain on is receipts for airplane tickets, spamassassin seems to think they are spam the first time I receive them, and from the article, they had the same issues too.
Because you don't put it into wierd text boxes, you don't use newsgroups, you don't have any enemies, you don't have any domains, and you don't have it in plaintext on your website.
I do all 4. I get my share of spam. It's not a HUGE deal, but it made it worth my while to get a spam filter.
Hey freaks: now you're ju
Content-based spam filtering is a waste of time. [...] It's a never-ending battle of updating filters and formulas.
I update my SpamAssassin config file once a year or so. This hardly seems burdensome. And generally my updates have to do with which RBLs it uses for assiging point values. Other than that, I use the defaults plus the Bayesian filter.
Since the filter self-trains based in part on the RBL scores, it autoadjusts to new spam. And if you have spamtrap addresses, you can feed those back in, too.
My setup is well over 99% accurate, with no false positives in months.
RBLs WORK.
Yes, and I use those, too. Some I use for outright rejection of connections, and some count toward the spamminess score. As soon as they get the URL-based RBLs working, I'll use those, too. Why wouldn't you use all the tools at your disposal?
Better to do spam filtering with your MTA/MDA anyway, if possible. That way, the same filter is used no matter which email client you use from which computer. Plus, it means you don't have to download spams to your MUA when on a slow connection.
Now if only I could get the rest of my mail configuration to be shared between evolution, mutt, and squirrelmail.
I've been getting spams lately that seem to be trying to get around the highly effective statistical solutions, such as SpamAssassin, that have been implemented. Spammers seem to be adding random, or possibly even carefully selected dictionary words to skew their statistical rating. Here is an example from the several I've received lately--has anyone seen information about this on /. or elsewhere?
[spammers irritating message snipped]
Thu, 17 Jun 2004 19:42:34 -0500
No Thanks
beatify
sacred atom drank deprecate cathodic thermionic sherman delinquent hanley swum wooster asteroidal bilayer haiti saudi wink bijective reserpine baronial gloss ambrose threadbare chianti predatory earmark bilingual angora palazzi chartres alveolar phosphate civet radish barricade diem laurie minutem! en crusty
camilla jade lineman bendix masonic dublin incontrovertible defecate generous buddhist yesterday endow bitten conley trunk pitchfork beret bloat gelatine dovetail gambia medea niggardly blackburn suey dialogue ilyushin anastigmatic berth abort bodied contractor of ridden embarcadero corset trademark
ID: W993gt72
carnation
constructor maltese bantam airfield pique douglas pungent criterion cloudburst illiterate sausage career stile pebble bonnie shim carbonium
magnesite pembroke abrade jogging dynast physiochemical stochastic sumac conference obtain villain midwinter incompetent eradicable madhouse airline antony household cursory instinctual gratuitous clown shaven des cornflower
just a different button...
;-) There is more to life than hitting the delete key.
I assume you are not referring to the delete key.
The truth shall set you free!
Wait till the spammers decide to spam your whole domain.
That's exactly when I decided to disable the "catch-all" and allow only specific addresses. Some spammer sent several hundred identical messages, in a few hours, to made-up names at my domain.
Catch-all is no longer a good idea in my opinion...
NGWave - Fast Sound Editor for Windows