DSPAM v3.0 RC1 Spam Filter Released
Nuclear Elephant writes "DSPAM v3.0 RC1 is now available for download, with a stable release scheduled for June 13. DSPAM has appeared on Slashdot and in Wired News in the past for its high levels of accurate spam filtering. v3.0 is the product of three solid months of work. Some of the highlights include a very sleek redesigned interface, PostgreSQL support, many mathematical enhancements, and support for many of Gary Robinson's algorithms (such as Chi-Square, Geometric Mean Test, and Robinson's technique for combining P-Values)."
I am using this filter and after some training it is very effective. Especially useful is the inoculation feature, which you can use to register a spam only address to spam sending sites so that it trains faster.
My heart is pure, but make no mistake, it's pure evil
Been looking for a new spam filter, hope this one does the trick. I tend to have alot of false positives with most spam filters i have tried. I would rather have a few spam slip through rather than having to weed through all my spam just because it may have blocked a real email.
A Fatal OE Exception has occurred, Sig will now reboot.
My copy of t-bird (0.6) its spam filter seems to suck more and more lately (perhaps its just the spammers are getting better at bypassing the filters). I just switched to server side spam filtering (just adding a tag to the subject), and then I key off that in t-bird.
The Doormat
If you're not outraged, then you're not paying attention.
Warning, it seems to be designed more for high volume use than individual sites. I've fed dspam almost 3000 spams and it is still only catching 80%, does seem to be getting better though.
The difference between Canada and the USA is that in Canada healthcare is a right and gun ownership is a privilege.
Well, using Gentoo Linux and evolution you don't really need to do too much in the way of configuring... I just emerged the package and added a piped filter rule to evolution. Unfortunately, it didn't seem very usable to me... no easy way to train it from within evolution, and it was taking like one to three seconds per message to process, which is kind of frustrating when your account tends to receive 80+ spams a day. (I know, that's still fairly minor, but that gives me like a 100:1 spam to real mail ratio.)
(\(\
(^v^)
(")")
This is the cute vorpal bunny virus, copy to your sig or runaway, runaway in fear!
While I agree with you that it does lack integration with Evolution (I have a similar setup to yours), regarding the time it takes to process each message - you can add '-local' to your SA commandline and it will speed things up considerably. As far as training it, I set up a cron job to have it read and learn from my Spam and Inbox nightly. Not the most elegant solution, but it works okay.
Don't call me a cowboy, and don't tell me to slow down!
Spam filters compared, here. This article was linked from Slashdot a few months ago. Good info, too.
As far as I know, the main difference is DSPAM does not use weighted filter rules at all like SpamAssassin's hybrid approach does - DSPAM is designed to purely rely on analysis of spam's properties (Bayesian, etc).
The other cool thing about DPAM is that it is designed to let users add/modify their own spam database - every email DPAM processes is tagged with an identifier, and is logged in a server-side database. If a delivered email is in fact spam but wasn't tagged as such, the user can then forward the email to the designated spam-sorting address, and DSPAM will automatically update that user's spam corpus (eg, because it's tagged with an identifier, you don't have to worry about the user forwarding the full headers, as the server already has that info on file).
AFAIK you can't do that with SpamAssassin.
An excellent spam filter for Windows is K9 found here.
if you know how to bounce, you can --enable-signature-headers
You can configure DSPAM to not use the ID, but this requires users to "bounce" the incorrect e-mails instead of forwarding them (as forwarding strips the headers).
Is the ID really that inconvenient?
Easy is a relative term, but I think it's safe to say that you found spamassasin a hassle, you will not have an easy time with DSPAM.
Like most good server-side software, it requires a moderately good understanding of it's general operation and at least a passing familiarity with its command line arguments and such. Having a handle on how to make your MTA do whatever you want, and the willingness to do some reading of faqs, mailing lists etc doesn't hurt either.
In short, it's does take some mucking around to tweak it all out properly. Also of note, if you intend to use the cgi pictured in the screenshots, you should know something about setting up a webserver with proper exec priviledges for cgi.
If you're thinking about using it only for yourself, I would recommend a cleint side solution like Mozzila Thunderbird or Eudora (win32 only) instead. They both have bayesian spam filtering built in and they're *really* easy to set up.
No sig.
How does DSPAM compare to other OSS projects like Spamassassin?
In short:
I am currently running an older version of DSPAM, which I switched to after the last time it hit /. I had been using SpamAssassin for years, and lately my SA false negatives had been creeping up, to the point where I could expect to see 3-10 spam a day in my inbox.
With DSPAM, my false negatives have dropped to a trickle - somethine like 5 messages in the last month. My false positives are a bit higher; it tends to trigger more easily on various kinds of mass email - Daily Shark, alumni association events, Amazon.com email, DOD briefing transcripts. At the moment, that's less of a burden than the high false negatives were with SA.
I had more trouble wedging DSPAM into my configuration, but that's because I didn't want to do it DSPAM's way (e.g., signatures in message body, forward email to an address when it is a false result, web interface for management). I basically want it to update the message headers, then let procmail/maildrop filter accordingly, and if it's a false pos/neg I want to just drop it into an IMAP folder which is emptied via the "learn from this mistake" program on a regular basis. YMMV but I think fitting into the mail pipeline is something DSPAM could do better.
I trained off my existing corpus - e.g., let my SA-generated spam folder build up a bit, removed any false positives, removed SA markups, and ran that into DSPAM as spam corpus; did the same with all the normal mail that came in over a week or so, THEN switched. I've also set my wife up without as much training, and it took DSPAM longer to learn what was spam for her and what wasn't. So I think training it up beforehand with a corpus is a good idea.
Overall, it was worth it to switch, and if I was good about upgrading to the newest I'd hopefully see my false positive rate drop.
Just my .02.
You can now set DSPAM to add headers with signatures etc instead of a tag in the body.
The only thing to note is that users forwarding mail back to DSPAM for training must include the X-DSPAM headers. Apparently, some email clients do not do this by default.
No sig.
Otherwise your weights will be all wrong.
Equal parts ham and spam will yield good spam catching. RTFAQ.
--
Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
I've been using DSPAM for about three months. A few criticisms:
First, by default DSPAM wants to run as the "root" user and usurp delivery of e-mails. (With Exim, they actually want it to recursively reinvoke the mail server for actual delivery!) It took quite a bit of configuring to get it to work like SpamAssassin from procmail.
This software is somewhat buggy, so running DSPAM as root would also introduce security concerns. For example, I'm using 2.10.6 because the 3.0.0 compiled and installed with no problems, but failed to classify anything. (Even with several hours of gdb tracing I was unable to determine why). Another bug is that if I run the "--falsepositive" on an e-mail that's lacking the "!DSPAM" signatures, the message should be ignored, but apparently this is not the case because the statistics counters are incremented.
From the FAQ:
"Q. Does DSPAM support whitelists?
A. DSPAM doesn't have a whitelist manager, rather whitelisting is an automatic function of DSPAM's Bayesian filtering mechanism."
This is crazy -- the whole point of whitelists is for when the Bayesian filtering fails! And DSPAM does fail. Twice now I've had to reset my database because the classifications were wrong and training wasn't helping. All I can say is I'm glad I've got procmail to rescue the important e-mails.
I think one source of my problems was that the default training mode ("train on everything") causes incorrect learning when you fail to report a false positive. This was a big problem for me, since I get around 700-800 spams/day. While false negatives are easily caught, the false positives go unnoticed unless I happen to wonder why someone never responded, and invest some time to search my spam folders. (I'm still trying to figure out exactly how to deal with this problem. E.g. maybe I could have it challenge the sender with Turing Test or something.)
I will say that DSPAM's basic technology is quite good. It's just that the software still has a "prototype" feel, and I'd caution you to do some experiments before unleashing it on your users. (For example, there's no manpage, and there isn't even a command-line option to print out the current version number!)
-Gonz
I ^H^H a guy I know used to retaliate, stopped for a while when the spammers built up their defenses, and then tried it again last week against some spams which started leaking thru his filters.
They are wide open again, brothers, because apparently no one else is dossing them anymore either and they have let down their guard.
I would guess that they lost money when they overprotected their forms against that type of "response," which made too many legit buyers say fuck it instead of filling out some bossy form.