DSPAM v3.0 RC1 Spam Filter Released
Nuclear Elephant writes "DSPAM v3.0 RC1 is now available for download, with a stable release scheduled for June 13. DSPAM has appeared on Slashdot and in Wired News in the past for its high levels of accurate spam filtering. v3.0 is the product of three solid months of work. Some of the highlights include a very sleek redesigned interface, PostgreSQL support, many mathematical enhancements, and support for many of Gary Robinson's algorithms (such as Chi-Square, Geometric Mean Test, and Robinson's technique for combining P-Values)."
I don't get it.
I am using this filter and after some training it is very effective. Especially useful is the inoculation feature, which you can use to register a spam only address to spam sending sites so that it trains faster.
My heart is pure, but make no mistake, it's pure evil
I'm all for throwing technology at the problem, but I hope people still realise that having a complex (and effective) spam filter does not take away the millions of megabits of traffic wasted on UCE when it's in transit.
But will it find out who sent the SPAM and hurl them into the Sun? Until I get this feature, I don't think it'll be perfect :)
DSPAM has a strong focus on providing better data to already existing algorithms (Bayesian, Chi-Square, etcetera) Combination algorithms work inherently well, but depend on the quality of data. Some of the approaches deployed in DSPAM towards this goal include Chained Tokens, Inoculation Groups, Classification Groups, advanced de-obfuscation techniques, and a new noise reduction algorithm called Bayesian Noise Reduction. The goal is to incorporate processing algorithms that can withstand the long haul of ever increasing message complexity. So far we're doing a great job.
The idea of combining more than one anti-spam heuristic is not new. But one thing that cant be denied is that all methods are just complementar to Bayesian analysis, that can reach up to 95% precision by itself. Chi-Square, itself, can reach up to 85% precision
Look! We came out with this great filter so nobody else gets spam! This solves the problem of spam once and for all! Even though spam is still clogging our networks and wasting bandwidth, this filter will solve all of our problems.
With all the time spent on making spam filters, why don't we spend that time working out a new protocol for email transfers, one that would not be able to spoofed, or spend that time installing server side programs that put a small time delay between messages as well as bandwidth restrictions for all outgoing mail?
unless mail sending protocol is redesigned(for example,in a way you have to have your fingerprints recognized when you type it) we will have to face the fact SPAM will be in our daily news. Soon slashdot will put an article where the best 3 spam filters are compared, like a normal review.
"The quality of life is inversely proportional to the number of keys on your keyring."
Been looking for a new spam filter, hope this one does the trick. I tend to have alot of false positives with most spam filters i have tried. I would rather have a few spam slip through rather than having to weed through all my spam just because it may have blocked a real email.
A Fatal OE Exception has occurred, Sig will now reboot.
I tried to setup spamassasin a couple of months back and I found it to be too much of a hassle to setup. Could someone who used both spamassasin and dspam comment on easy or difficult it is to setup dspam?
Do not read this
Warning, it seems to be designed more for high volume use than individual sites. I've fed dspam almost 3000 spams and it is still only catching 80%, does seem to be getting better though.
The difference between Canada and the USA is that in Canada healthcare is a right and gun ownership is a privilege.
When you run your own mail server, or administrate a mail server for a large number of people, server-side anti-spam filters and countermeasures start making a lot more sense. Do the math on a company with 100 employees (at $25/hr) who check mail twice a day and spend 5 minutes each time hassling with anti-spam measures in client-side mail apps. In this scenario, a seamless anti-spam solution is worth conservatively $400 per day, or $100k/year not counting bandwidth savings. There are definitely cases when client-side filtering makes sense, but if you can handle it at the server, email-based business methods scale better.
http://tinyurl.com/4ny52
I have not actually used DSPAM, but have just read the specs.
Yawn. Yet another, albeit well designed, content-based filter. While content-based filters are a valuable tool, let's not forget that the spam problem is one of anti-social behavior and consent and has nothing to do with content. Using content as a factor in deciding what is spam or not spam will always be flawed. Even if you tweak your favorite filter from 99% to 99.9%, the spammers can just up the ante by sending more. Scaling up costs them little on an individual basis. It saddens me to see really brilliant people put great amounts of work into a project whose underlying premise is flawed.
Wonderful, if you just want to stop seeing the spam. I, however, would enjoy not having to pay for it's delivery. This is the ostrich method of spam fighting.
would be to publicly humiliate/boycott the companies that use the spammers services. Like drug dealers, as long as there is a market, the spammers will be around. Remove the demand, and the suppliers will eventually move onto selling something else.
If you can't kill the leeches because the water is too murky, then boil off the pond!
CodeTrap (www.codetrap.net)
So how does this help me reduce the amount of bandwidth and server resources used by spammers who continue to try sending spam to me and my users?
now we need to go OSS in diesel cars
Be careful what you wish for.
My mail hosting used to out and out block spam, and their filter wasn't very well maintained so it blocked lots of legitimate mailing list mail (like Securityfocus and NANOG).
They've went to tagging mail now instead of dropping it, which is a lot better.
ISP/mail server based blocking isn't really a good idea, even with ultra-conservative blocking, you'll still block legitimate emails.
I've had enough abrasive sigs. Kittens are cute and fuzzy.
I wanted to try DSPAM some time ago, but I stopped as soon as I read that DSPAM puts an ID string in every mail it processes. In the mail body, that is. I have no problems with a program that adds headers, but it should leave the message body alone.
Does DSPAM do that now? Can't find anything about it...
As far as I know, the main difference is DSPAM does not use weighted filter rules at all like SpamAssassin's hybrid approach does - DSPAM is designed to purely rely on analysis of spam's properties (Bayesian, etc).
The other cool thing about DPAM is that it is designed to let users add/modify their own spam database - every email DPAM processes is tagged with an identifier, and is logged in a server-side database. If a delivered email is in fact spam but wasn't tagged as such, the user can then forward the email to the designated spam-sorting address, and DSPAM will automatically update that user's spam corpus (eg, because it's tagged with an identifier, you don't have to worry about the user forwarding the full headers, as the server already has that info on file).
AFAIK you can't do that with SpamAssassin.
An excellent spam filter for Windows is K9 found here.
I'm the one running the spam filter (SpamAssassin) at work. Overall, it has been VERY popular with everyone else. They don't receive the most obnoxious sex spams any more.
On the other hand, there are a few false positives that reduce the overall savings in your post. I auto-delete anything about 10 and flag anything above 5.
But the end users still have to look through the flagged stuff to see if there are any false positives. Then they drop them into the false positive folder. The users also have to identify all the missed spam and drop that into the spam folder.
It's still work for them so the costs aren't as clear as in your post. But the non-tangible benefits are also important.
I think we're at the point of dimishing returns on simple scanning processes. I think we need to look at actively seeding the spammer's lists with false names and tuning the spam filters with those.
I find that the spam letters that do get through T-Bird's junk mail filter are the ones padded with random strings of letters. My guess is that T-Bird is able to identify the spam words (eg: debt consolidation, enlargement) but the mispelled words (eg: peni5) are unknown to T-bird. So T-Bird makes the conservative decision not to mark the e-mail as spam. I figure a simple filter criteria that requires the correct spellings for at least half the words in the body (for unknown senders) should get rid of this problem. Anyone care to enlighten me if such a rule is in T-bird or is in the works? At the very least, this will have the side effect of encouraging people to at least spellcheck their e-mails before sending. :)
You can configure DSPAM to not use the ID, but this requires users to "bounce" the incorrect e-mails instead of forwarding them (as forwarding strips the headers).
Is the ID really that inconvenient?
How does DSPAM compare to other OSS projects like Spamassassin?
In short:
I am currently running an older version of DSPAM, which I switched to after the last time it hit /. I had been using SpamAssassin for years, and lately my SA false negatives had been creeping up, to the point where I could expect to see 3-10 spam a day in my inbox.
With DSPAM, my false negatives have dropped to a trickle - somethine like 5 messages in the last month. My false positives are a bit higher; it tends to trigger more easily on various kinds of mass email - Daily Shark, alumni association events, Amazon.com email, DOD briefing transcripts. At the moment, that's less of a burden than the high false negatives were with SA.
I had more trouble wedging DSPAM into my configuration, but that's because I didn't want to do it DSPAM's way (e.g., signatures in message body, forward email to an address when it is a false result, web interface for management). I basically want it to update the message headers, then let procmail/maildrop filter accordingly, and if it's a false pos/neg I want to just drop it into an IMAP folder which is emptied via the "learn from this mistake" program on a regular basis. YMMV but I think fitting into the mail pipeline is something DSPAM could do better.
I trained off my existing corpus - e.g., let my SA-generated spam folder build up a bit, removed any false positives, removed SA markups, and ran that into DSPAM as spam corpus; did the same with all the normal mail that came in over a week or so, THEN switched. I've also set my wife up without as much training, and it took DSPAM longer to learn what was spam for her and what wasn't. So I think training it up beforehand with a corpus is a good idea.
Overall, it was worth it to switch, and if I was good about upgrading to the newest I'd hopefully see my false positive rate drop.
Just my .02.
Others I've had direct experience with are spamprobe, spambayes, and CRM114.
My best experience has been with spamprobe, because it compiles as a standalone app, is very fast (at one point I was filtering over 10,000 emails a day on a Pentium 200 MHz) and is completely command-line oriented, best for scripting/custom mail systems. Colleagues of mine who use CRM114 are very happy with it, but I got discouraged by its large database files. I'm now experimenting with spambayes, the only difficulty so far being installing the python/bsddb environment.
Otherwise your weights will be all wrong.
Equal parts ham and spam will yield good spam catching. RTFAQ.
--
Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
...is that spammers have access to the anti-spam tools.
They have access to DSPAM. They have access to SpamAssassin. They have access to the Bayesian filters found in Mozilla and other products.
When crafting their spams, they run them through these tools, and they keep obfuscating their spams until they get one through. Once they've got it perfect, they send a hundred million copies out to the world, and whammo! Your mo.rt-gage has been ap.prov/ed, and your v1ag---ra is ordered!
Tired of FB/Google censorship? Visit UNCENSORED!
Your post advocates a
(*) technical ( ) legislative ( ) market-based ( ) vigilante ( ) lack of an
approach to fighting spam. Your idea will not work. Here is why it won't work. (One or more of the following may apply to your particular idea, and it may have other flaws which used to vary from state to state before a bad federal law was passed.)
( ) Spammers can easily use it to harvest email addresses
(*) Mailing lists and other legitimate email uses would be affected
( ) No one will be able to find the guy or collect the money
( ) It is defenseless against brute force attacks
( ) It will stop spam for two weeks and then we'll be stuck with it
(*) Users of email will not put up with it
(*) Microsoft will not put up with it
( ) The police will not put up with it
( ) Requires too much cooperation from spammers
(*) Requires immediate total cooperation from everybody at once
(*) Many email users cannot afford to lose business or alienate potential employers
( ) Spammers don't care about invalid addresses in their lists
( ) Anyone could anonymously destroy anyone else's career or business
Specifically, your plan fails to account for
( ) Laws expressly prohibiting it
(*) Lack of centrally controlling authority for email
( ) Open relays in foreign countries
( ) Ease of searching tiny alphanumeric address space of all email addresses
( ) Asshats
( ) Jurisdictional problems
( ) Unpopularity of weird new taxes
( ) Public reluctance to accept weird new forms of money
(*) Huge existing software investment in SMTP
(*) Susceptibility of protocols other than SMTP to attack
( ) Willingness of users to install OS patches received by email
( ) Armies of worm riddled broadband-connected Windows boxes
( ) Eternal arms race involved in all filtering approaches
( ) Extreme profitability of spam
( ) Joe jobs and/or identity theft
( ) Technically illiterate politicians
( ) Extreme stupidity on the part of people who do business with spammers
( ) Dishonesty on the part of spammers themselves
( ) Bandwidth costs that are unaffected by client filtering
( ) Outlook
and the following philosophical objections may also apply:
(*) Ideas similar to yours are easy to come up with, yet none have ever
been shown practical
( ) Any scheme based on opt-out is unacceptable
( ) SMTP headers should not be the subject of legislation
( ) Blacklists suck
( ) Whitelists suck
( ) No-lists suck
( ) We should be able to talk about Viagra without being censored
( ) Countermeasures should not involve wire fraud or credit card fraud
( ) Countermeasures should not involve sabotage of public networks
(*) Countermeasures must work if phased in gradually
( ) Sending email should be free
(*) Why should we have to trust you and your servers?
( ) Incompatiblity with open source or open source licenses
( ) Feel-good measures do nothing to solve the problem
( ) Temporary/one-time email addresses are cumbersome
( ) I don't want the government reading my email
( ) Killing them that way is not slow and painful enough
Furthermore, this is what I think about you:
(*) Sorry dude, but I don't think it would work.
( ) This is a stupid idea, and you're a stupid person for suggesting it.
( ) Nice try, assh0le! I'm going to find out where you live and burn your
house down!
I've been using DSPAM for about three months. A few criticisms:
First, by default DSPAM wants to run as the "root" user and usurp delivery of e-mails. (With Exim, they actually want it to recursively reinvoke the mail server for actual delivery!) It took quite a bit of configuring to get it to work like SpamAssassin from procmail.
This software is somewhat buggy, so running DSPAM as root would also introduce security concerns. For example, I'm using 2.10.6 because the 3.0.0 compiled and installed with no problems, but failed to classify anything. (Even with several hours of gdb tracing I was unable to determine why). Another bug is that if I run the "--falsepositive" on an e-mail that's lacking the "!DSPAM" signatures, the message should be ignored, but apparently this is not the case because the statistics counters are incremented.
From the FAQ:
"Q. Does DSPAM support whitelists?
A. DSPAM doesn't have a whitelist manager, rather whitelisting is an automatic function of DSPAM's Bayesian filtering mechanism."
This is crazy -- the whole point of whitelists is for when the Bayesian filtering fails! And DSPAM does fail. Twice now I've had to reset my database because the classifications were wrong and training wasn't helping. All I can say is I'm glad I've got procmail to rescue the important e-mails.
I think one source of my problems was that the default training mode ("train on everything") causes incorrect learning when you fail to report a false positive. This was a big problem for me, since I get around 700-800 spams/day. While false negatives are easily caught, the false positives go unnoticed unless I happen to wonder why someone never responded, and invest some time to search my spam folders. (I'm still trying to figure out exactly how to deal with this problem. E.g. maybe I could have it challenge the sender with Turing Test or something.)
I will say that DSPAM's basic technology is quite good. It's just that the software still has a "prototype" feel, and I'd caution you to do some experiments before unleashing it on your users. (For example, there's no manpage, and there isn't even a command-line option to print out the current version number!)
-Gonz
Your post advocates a
( ) technical (*) legislative ( ) market-based ( ) vigilante ( ) lack of an
approach to fighting spam. Your idea will not work. Here is why it won't work. (One or more of the following may apply to your particular idea, and it may have other flaws which used to vary from state to state before a bad federal law was passed.)
( ) Spammers can easily use it to harvest email addresses
( ) Mailing lists and other legitimate email uses would be affected
(*) No one will be able to find the guy or collect the money
( ) It is defenseless against brute force attacks
( ) It will stop spam for two weeks and then we'll be stuck with it
( ) Users of email will not put up with it
( ) Microsoft will not put up with it
(*) The police will not put up with it
( ) Requires too much cooperation from spammers
( ) Requires immediate total cooperation from everybody at once
( ) Many email users cannot afford to lose business or alienate potential employers
( ) Spammers don't care about invalid addresses in their lists
( ) Anyone could anonymously destroy anyone else's career or business
Specifically, your plan fails to account for
( ) Laws expressly prohibiting it
( ) Lack of centrally controlling authority for email
(*) Open relays in foreign countries
( ) Ease of searching tiny alphanumeric address space of all email addresses
(*) Asshats
(*) Jurisdictional problems
( ) Unpopularity of weird new taxes
( ) Public reluctance to accept weird new forms of money
( ) Huge existing software investment in SMTP
( ) Susceptibility of protocols other than SMTP to attack
( ) Willingness of users to install OS patches received by email
( ) Armies of worm riddled broadband-connected Windows boxes
( ) Eternal arms race involved in all filtering approaches
(*) Extreme profitability of spam
( ) Joe jobs and/or identity theft
(*) Technically illiterate politicians
( ) Extreme stupidity on the part of people who do business with spammers
(*) Dishonesty on the part of spammers themselves
( ) Bandwidth costs that are unaffected by client filtering
( ) Outlook
and the following philosophical objections may also apply:
(*) Ideas similar to yours are easy to come up with, yet none have ever
been shown practical
( ) Any scheme based on opt-out is unacceptable
(*) SMTP headers should not be the subject of legislation
( ) Blacklists suck
( ) Whitelists suck
( ) No-lists suck
(*) We should be able to talk about Viagra without being censored
( ) Countermeasures should not involve wire fraud or credit card fraud
( ) Countermeasures should not involve sabotage of public networks
( ) Countermeasures must work if phased in gradually
( ) Sending email should be free
( ) Why should we have to trust you and your servers?
( ) Incompatiblity with open source or open source licenses
(*) Feel-good measures do nothing to solve the problem
( ) Temporary/one-time email addresses are cumbersome
(*) I don't want the government reading my email
( ) Killing them that way is not slow and painful enough
Furthermore, this is what I think about you:
(*) Sorry dude, but I don't think it would work.
( ) This is a stupid idea, and you're a stupid person for suggesting it.
( ) Nice try, assh0le! I'm going to find out where you live and burn your
house down!
Your post advocates a
( ) technical ( ) legislative ( ) market-based (*) vigilante ( ) lack of an
approach to fighting spam. Your idea will not work. Here is why it won't work. (One or more of the following may apply to your particular idea, and it may have other flaws which used to vary from state to state before a bad federal law was passed.)
( ) Spammers can easily use it to harvest email addresses
( ) Mailing lists and other legitimate email uses would be affected
(*) No one will be able to find the guy or collect the money
( ) It is defenseless against brute force attacks
( ) It will stop spam for two weeks and then we'll be stuck with it
( ) Users of email will not put up with it
( ) Microsoft will not put up with it
(*) The police will not put up with it
(*) Requires too much cooperation from spammers
(*) Requires immediate total cooperation from everybody at once
( ) Many email users cannot afford to lose business or alienate potential employers
( ) Spammers don't care about invalid addresses in their lists
( ) Anyone could anonymously destroy anyone else's career or business
Specifically, your plan fails to account for
(*) Laws expressly prohibiting it
( ) Lack of centrally controlling authority for email
(*) Open relays in foreign countries
( ) Ease of searching tiny alphanumeric address space of all email addresses
(*) Asshats
(*) Jurisdictional problems
( ) Unpopularity of weird new taxes
( ) Public reluctance to accept weird new forms of money
( ) Huge existing software investment in SMTP
( ) Susceptibility of protocols other than SMTP to attack
( ) Willingness of users to install OS patches received by email
( ) Armies of worm riddled broadband-connected Windows boxes
(*) Eternal arms race involved in all filtering approaches
(*) Extreme profitability of spam
(*) Joe jobs and/or identity theft
( ) Technically illiterate politicians
(*) Extreme stupidity on the part of people who do business with spammers
(*) Dishonesty on the part of spammers themselves
( ) Bandwidth costs that are unaffected by client filtering
( ) Outlook
and the following philosophical objections may also apply:
(*) Ideas similar to yours are easy to come up with, yet none have ever
been shown practical
( ) Any scheme based on opt-out is unacceptable
( ) SMTP headers should not be the subject of legislation
( ) Blacklists suck
( ) Whitelists suck
( ) No-lists suck
( ) We should be able to talk about Viagra without being censored
( ) Countermeasures should not involve wire fraud or credit card fraud
(*) Countermeasures should not involve sabotage of public networks
(*) Countermeasures must work if phased in gradually
( ) Sending email should be free
(*) Why should we have to trust you and your servers?
( ) Incompatiblity with open source or open source licenses
( ) Feel-good measures do nothing to solve the problem
( ) Temporary/one-time email addresses are cumbersome
( ) I don't want the government reading my email
( ) Killing them that way is not slow and painful enough
Furthermore, this is what I think about you:
(*) Sorry dude, but I don't think it would work.
( ) This is a stupid idea, and you're a stupid person for suggesting it.
( ) Nice try, assh0le! I'm going to find out where you live and burn your
house down!
Since this is a spam subject, this is at least partly relevant:
I am a Direcway subscriber, and I was accustomed (angry, but accustomed) to receiving about 15-20 spams per day for as long as I can remember.
Slashdot ran a story within the last 6 months (I don't remember which one exactly) about the FBI raiding one or two of the largest spammers and confiscating their setup.
Almost to the day that the raid was to have occurred, all spam to my inbox instantly stopped. I haven't gotten a single spam message since the about the same time as the second raid.
It seems to me that those guys may have been the sole sources of all the spam going through Direcway to my account. Are there any other Direcway subscribers here that had the same experience, was the whole thing just an extraordinary coincidence, or did Direcway find the holy grail of anti-spam?
As far as I can tell, all my regular email is getting through and going out. No email that I knew was coming has yet failed to arrive, so any filtering at Direcway's servers, if such a tactic is being employed, is doing a great job.
I ^H^H a guy I know used to retaliate, stopped for a while when the spammers built up their defenses, and then tried it again last week against some spams which started leaking thru his filters.
They are wide open again, brothers, because apparently no one else is dossing them anymore either and they have let down their guard.
I would guess that they lost money when they overprotected their forms against that type of "response," which made too many legit buyers say fuck it instead of filling out some bossy form.
So T-Bird makes the conservative decision not to mark the e-mail as spam.
T-Bird makes the mistake of making spam/ham a binary decision. I really wish it would work more like SpamBayes which has a trinary system (spam / unsure / ham). That works well because the stuff it tags as spam is almost always spam, and the false positives usually end up in the unsure pile. The "unsure" pile is also usually 1/10th the size of the "spam" pile, so it takes a lot less time to verify before tagging all of the "unsure" as spam.
T-Bird has a ways to go before their system is as easy to use as SpamBayes for MSOutlook is. (e.g. moving messages back to the original folder if they were mis-tagged and then un-flagged by the user)
Wolde you bothe eate your cake, and have your cake?