DSPAM v3.0 RC1 Spam Filter Released
Nuclear Elephant writes "DSPAM v3.0 RC1 is now available for download, with a stable release scheduled for June 13. DSPAM has appeared on Slashdot and in Wired News in the past for its high levels of accurate spam filtering. v3.0 is the product of three solid months of work. Some of the highlights include a very sleek redesigned interface, PostgreSQL support, many mathematical enhancements, and support for many of Gary Robinson's algorithms (such as Chi-Square, Geometric Mean Test, and Robinson's technique for combining P-Values)."
I'm all for throwing technology at the problem, but I hope people still realise that having a complex (and effective) spam filter does not take away the millions of megabits of traffic wasted on UCE when it's in transit.
When you run your own mail server, or administrate a mail server for a large number of people, server-side anti-spam filters and countermeasures start making a lot more sense. Do the math on a company with 100 employees (at $25/hr) who check mail twice a day and spend 5 minutes each time hassling with anti-spam measures in client-side mail apps. In this scenario, a seamless anti-spam solution is worth conservatively $400 per day, or $100k/year not counting bandwidth savings. There are definitely cases when client-side filtering makes sense, but if you can handle it at the server, email-based business methods scale better.
http://tinyurl.com/4ny52
I have not actually used DSPAM, but have just read the specs.
Yawn. Yet another, albeit well designed, content-based filter. While content-based filters are a valuable tool, let's not forget that the spam problem is one of anti-social behavior and consent and has nothing to do with content. Using content as a factor in deciding what is spam or not spam will always be flawed. Even if you tweak your favorite filter from 99% to 99.9%, the spammers can just up the ante by sending more. Scaling up costs them little on an individual basis. It saddens me to see really brilliant people put great amounts of work into a project whose underlying premise is flawed.
With all the time spent on making spam filters, why don't we spend that time working out a new protocol for email transfers, one that would not be able to spoofed,
Because there's nothing wrong with SMTP. SMTP already has extensions to allow authentication but it still requires a central authority to say "He is Senior Frac, we verify it." No one will trust such an authority even if it was scalable enough. If you think spam is caused by a lack of authentication, you're sadly misinformed. The cause is a lack of responsibility by the sending networks to enforce proper behavior of their users.
or spend that time installing server side programs that put a small time delay between messages as well as bandwidth restrictions for all outgoing mail?
These technologies exist. Unfortunately, most that install them stop monitoring them. Such work is considered a resource hog which the ISP would much rather spend on signing up new customers. Bandwidth restrictions on a customer who is running their own MTA makes things much more complex and much less scalable.
I wanted to try DSPAM some time ago, but I stopped as soon as I read that DSPAM puts an ID string in every mail it processes. In the mail body, that is. I have no problems with a program that adds headers, but it should leave the message body alone.
Does DSPAM do that now? Can't find anything about it...
I'm the one running the spam filter (SpamAssassin) at work. Overall, it has been VERY popular with everyone else. They don't receive the most obnoxious sex spams any more.
On the other hand, there are a few false positives that reduce the overall savings in your post. I auto-delete anything about 10 and flag anything above 5.
But the end users still have to look through the flagged stuff to see if there are any false positives. Then they drop them into the false positive folder. The users also have to identify all the missed spam and drop that into the spam folder.
It's still work for them so the costs aren't as clear as in your post. But the non-tangible benefits are also important.
I think we're at the point of dimishing returns on simple scanning processes. I think we need to look at actively seeding the spammer's lists with false names and tuning the spam filters with those.
I find that the spam letters that do get through T-Bird's junk mail filter are the ones padded with random strings of letters. My guess is that T-Bird is able to identify the spam words (eg: debt consolidation, enlargement) but the mispelled words (eg: peni5) are unknown to T-bird. So T-Bird makes the conservative decision not to mark the e-mail as spam. I figure a simple filter criteria that requires the correct spellings for at least half the words in the body (for unknown senders) should get rid of this problem. Anyone care to enlighten me if such a rule is in T-bird or is in the works? At the very least, this will have the side effect of encouraging people to at least spellcheck their e-mails before sending. :)