DSPAM v3.2 Released
Nuclear Elephant writes "After four months of development DSPAM v3.2 has been released, bringing many new enhancements and filtering technologies. These include distributed computing support, implementation of Bill Yerazunis' Sparse Binary Polynomial Hashing algorithm (from CRM114), and v1.2 of Bayesian Noise Reduction. Other enhancements include SQLite support and many significant performance enhancements for PostgreSQL. DSPAM's official release is next week, but you can download the preview release now. Users of the project have also contributed towards creating a new logo for this release."
I am using D-Spam on a qmail/vpopmail server and I find that its great in terms of accuracy. Most of my users have never had a false positive and many havent seen a spam after a couple of weeks of training.
The problem that I have with DSpam is the integration side. Im not sure how it goes with other mail systems but integrating it with vpopmail was a major pain. It seems easy, you just put the command in the dotfiles, but in practice getting it to work was quite a trial. Even now it doesnt integrate properly with the web administration, etc despite some scripting and minor code changes.
Because of this Ive been thinking of switching to Spam Assassin simply because of its integration with qmail-scanner. Has anyone else had similar problems or been in a similar situation and found a good solution?
I would have thought that running 2 bayesian filters would cause more trouble than good. The first filter would be ok as it would be trained like usual.
The second filter would probably have problems because it would only see a small subset of all your mail as the first filter would have removed most of the spam. The second filter's sample would therefore be skewed and it would have far less data to accurately classify spam.
Just my thoughts on the subject anyway...
I'm running Postfix with RBLs. Looking at SpamCop, SpamHaus, and SORBS. It auto rejects all e-mail coming from banned IPs. This brings me down to 1 spam a day. If your IP is blocked, tough, find a new ISP (these lists tend to be more self-expiring and not 'permament ban' types, which is good).
What I dislike is the centralized Antispam. What is spam for me could not be for you. I was using the antispam filter on thunderbird but at least in previous was not good then I switched to use K9 ( http://keir.net/k9.html ). Is there nothing around for Linux like K9 ? K.
a few months ago those features were available, too. while dspam is great at filtering mail, I faced two crucial problems, which forced me back to spamassassin. I haven't heard that they fixed any of those: .. 10, so I can check 0..4 where 0 is ok (few false negatives) and 1..4 spam (few false positives), and I can directly delete thousands of mails in 5..10 without looking at them.
- the database did grow huge. when my single user server with 128 mb had to use a 512 mb spam token database, performance was terrible. even with the tools included I could not do anything to fix the issue.
- dspam knows only yes or now, there is no usable value that gives you some grey information. as a result, I had to check all those spam postings for false positives. Spamassassin on the other hand has that spam result 0
i wont go back to dspam unless someone can offer speciic help for those issues. I believe everyone will face them sooner or later.
The DSPAM site mentioned that it can be compiled on Mac OSX, but what about Winblows? I only have one box (go ahead and laugh) and it is an older Pentium III Winblows machine. I'd like to have a seperate box to act as a mail server but it just isn't currently feasable (translation: I'm broke.) Is there any way they can compile DSPAM for Win9X?
even false positives are not important. if I get 1000 spams a day, but only 40 legal mails, then marking everything as spam is 96% correct. if 35 of my mails are easily identified as legal mail (a procmail rule could do - closed and filtered mailing list) then marking those as good and everything else as bad is 99.5% correct. note that still all 5 personal mails I would get are marked as spamm.
the big question for me is: how many mails do I need to check for false detection? and here is the dspam issue: it doesn't give you a grey marking, so you either check on of the mails marked as spam, and could possibly loose many important mails, or you need to check all of the spam messages, which loweres the advantage of a spam filter.
i get a huge volume of spam, a some ML and a few other postings. If i get one word document from some windows using relative every few weeks, it still must not be marked as spam. with dspam i had to check the mails marked as spam to find such false positives. because i had to check all spam mails, dspam was useless. with spam assassin i only check level 1..5 and can ignore thousands of mails in spam leven 5..10 which gives me a very good middle way between a very unlikely case of overlooking a false positive and thus loosing a mail (hasn't happened as far a i know) and looking only a few spam mails for false positives.
No. Since spammers mostly use fake addresses, it's pretty pointless trying to send mail back to them. All that would achieve would be that you would receive all the bounces back and you'd get double the junk mail.
Netblock blacklisting is a really poor solution.
/24 and then a /16 to be blocked.
It is the only solution when the ISP will do nothing to stop the spammer on their network.
In some cases a single spammer causes a
That is rather difficult without the ISP's assistance (or them repeatedly ignoring the complaints).
Btw, do you understand that changing ISP may not be an option?
Sometimes that is true. In which case, you should get on the phone and make sure that your ISP understands that they have customers who will be upset if the ISP doesn't handle its spammer problem.
Those lists, by themselves, do not block any email at all. Those lists are used by people who are fed up with trying to get ISP's to deal with their spammers.
Having users sort their mail and train a statistical filter from scratch is just way too much to ask - you'll get inundated with support calls and executives just don't have time to sort out the crud - they hired YOU to do it - passing the buck back to them ain't gonna fly...
The system should get rid of 99.9% of the crud by default, then let the users wholfeel like doing it, report the remaining 0.1% to a central mailbox where you can sort it and retrain the statistical filter if necessary.
Oh well, what the hell...