DSPAM v3.0 RC1 Spam Filter Released
Nuclear Elephant writes "DSPAM v3.0 RC1 is now available for download, with a stable release scheduled for June 13. DSPAM has appeared on Slashdot and in Wired News in the past for its high levels of accurate spam filtering. v3.0 is the product of three solid months of work. Some of the highlights include a very sleek redesigned interface, PostgreSQL support, many mathematical enhancements, and support for many of Gary Robinson's algorithms (such as Chi-Square, Geometric Mean Test, and Robinson's technique for combining P-Values)."
I'm all for throwing technology at the problem, but I hope people still realise that having a complex (and effective) spam filter does not take away the millions of megabits of traffic wasted on UCE when it's in transit.
How does DSPAM compare to other OSS projects like Spamassassin?
Laugh while you can, monkey-boy.
When you run your own mail server, or administrate a mail server for a large number of people, server-side anti-spam filters and countermeasures start making a lot more sense. Do the math on a company with 100 employees (at $25/hr) who check mail twice a day and spend 5 minutes each time hassling with anti-spam measures in client-side mail apps. In this scenario, a seamless anti-spam solution is worth conservatively $400 per day, or $100k/year not counting bandwidth savings. There are definitely cases when client-side filtering makes sense, but if you can handle it at the server, email-based business methods scale better.
http://tinyurl.com/4ny52
I have not actually used DSPAM, but have just read the specs.
Yawn. Yet another, albeit well designed, content-based filter. While content-based filters are a valuable tool, let's not forget that the spam problem is one of anti-social behavior and consent and has nothing to do with content. Using content as a factor in deciding what is spam or not spam will always be flawed. Even if you tweak your favorite filter from 99% to 99.9%, the spammers can just up the ante by sending more. Scaling up costs them little on an individual basis. It saddens me to see really brilliant people put great amounts of work into a project whose underlying premise is flawed.
With all the time spent on making spam filters, why don't we spend that time working out a new protocol for email transfers, one that would not be able to spoofed,
Because there's nothing wrong with SMTP. SMTP already has extensions to allow authentication but it still requires a central authority to say "He is Senior Frac, we verify it." No one will trust such an authority even if it was scalable enough. If you think spam is caused by a lack of authentication, you're sadly misinformed. The cause is a lack of responsibility by the sending networks to enforce proper behavior of their users.
or spend that time installing server side programs that put a small time delay between messages as well as bandwidth restrictions for all outgoing mail?
These technologies exist. Unfortunately, most that install them stop monitoring them. Such work is considered a resource hog which the ISP would much rather spend on signing up new customers. Bandwidth restrictions on a customer who is running their own MTA makes things much more complex and much less scalable.
I wanted to try DSPAM some time ago, but I stopped as soon as I read that DSPAM puts an ID string in every mail it processes. In the mail body, that is. I have no problems with a program that adds headers, but it should leave the message body alone.
Does DSPAM do that now? Can't find anything about it...
I'm the one running the spam filter (SpamAssassin) at work. Overall, it has been VERY popular with everyone else. They don't receive the most obnoxious sex spams any more.
On the other hand, there are a few false positives that reduce the overall savings in your post. I auto-delete anything about 10 and flag anything above 5.
But the end users still have to look through the flagged stuff to see if there are any false positives. Then they drop them into the false positive folder. The users also have to identify all the missed spam and drop that into the spam folder.
It's still work for them so the costs aren't as clear as in your post. But the non-tangible benefits are also important.
I think we're at the point of dimishing returns on simple scanning processes. I think we need to look at actively seeding the spammer's lists with false names and tuning the spam filters with those.
I find that the spam letters that do get through T-Bird's junk mail filter are the ones padded with random strings of letters. My guess is that T-Bird is able to identify the spam words (eg: debt consolidation, enlargement) but the mispelled words (eg: peni5) are unknown to T-bird. So T-Bird makes the conservative decision not to mark the e-mail as spam. I figure a simple filter criteria that requires the correct spellings for at least half the words in the body (for unknown senders) should get rid of this problem. Anyone care to enlighten me if such a rule is in T-bird or is in the works? At the very least, this will have the side effect of encouraging people to at least spellcheck their e-mails before sending. :)
Friend, you need to take a look at the specs on CRM114 at crm114.sourceforge.net. While the interface and initial setup are fairly painful for people who don't build their own email setups, various folks are publishing that they get over 99.9% correct detection of both spam and non-spam. That's far better than any other single filter out there.
Others I've had direct experience with are spamprobe, spambayes, and CRM114.
My best experience has been with spamprobe, because it compiles as a standalone app, is very fast (at one point I was filtering over 10,000 emails a day on a Pentium 200 MHz) and is completely command-line oriented, best for scripting/custom mail systems. Colleagues of mine who use CRM114 are very happy with it, but I got discouraged by its large database files. I'm now experimenting with spambayes, the only difficulty so far being installing the python/bsddb environment.
...is that spammers have access to the anti-spam tools.
They have access to DSPAM. They have access to SpamAssassin. They have access to the Bayesian filters found in Mozilla and other products.
When crafting their spams, they run them through these tools, and they keep obfuscating their spams until they get one through. Once they've got it perfect, they send a hundred million copies out to the world, and whammo! Your mo.rt-gage has been ap.prov/ed, and your v1ag---ra is ordered!
Tired of FB/Google censorship? Visit UNCENSORED!
Since this is a spam subject, this is at least partly relevant:
I am a Direcway subscriber, and I was accustomed (angry, but accustomed) to receiving about 15-20 spams per day for as long as I can remember.
Slashdot ran a story within the last 6 months (I don't remember which one exactly) about the FBI raiding one or two of the largest spammers and confiscating their setup.
Almost to the day that the raid was to have occurred, all spam to my inbox instantly stopped. I haven't gotten a single spam message since the about the same time as the second raid.
It seems to me that those guys may have been the sole sources of all the spam going through Direcway to my account. Are there any other Direcway subscribers here that had the same experience, was the whole thing just an extraordinary coincidence, or did Direcway find the holy grail of anti-spam?
As far as I can tell, all my regular email is getting through and going out. No email that I knew was coming has yet failed to arrive, so any filtering at Direcway's servers, if such a tactic is being employed, is doing a great job.
So T-Bird makes the conservative decision not to mark the e-mail as spam.
T-Bird makes the mistake of making spam/ham a binary decision. I really wish it would work more like SpamBayes which has a trinary system (spam / unsure / ham). That works well because the stuff it tags as spam is almost always spam, and the false positives usually end up in the unsure pile. The "unsure" pile is also usually 1/10th the size of the "spam" pile, so it takes a lot less time to verify before tagging all of the "unsure" as spam.
T-Bird has a ways to go before their system is as easy to use as SpamBayes for MSOutlook is. (e.g. moving messages back to the original folder if they were mis-tagged and then un-flagged by the user)
Wolde you bothe eate your cake, and have your cake?