Slashdot Mirror


DSPAM v3.2 Released

Nuclear Elephant writes "After four months of development DSPAM v3.2 has been released, bringing many new enhancements and filtering technologies. These include distributed computing support, implementation of Bill Yerazunis' Sparse Binary Polynomial Hashing algorithm (from CRM114), and v1.2 of Bayesian Noise Reduction. Other enhancements include SQLite support and many significant performance enhancements for PostgreSQL. DSPAM's official release is next week, but you can download the preview release now. Users of the project have also contributed towards creating a new logo for this release."

36 of 157 comments (clear)

  1. DSpam with qmail / vpopmail by hayds · · Score: 4, Interesting

    I am using D-Spam on a qmail/vpopmail server and I find that its great in terms of accuracy. Most of my users have never had a false positive and many havent seen a spam after a couple of weeks of training.

    The problem that I have with DSpam is the integration side. Im not sure how it goes with other mail systems but integrating it with vpopmail was a major pain. It seems easy, you just put the command in the dotfiles, but in practice getting it to work was quite a trial. Even now it doesnt integrate properly with the web administration, etc despite some scripting and minor code changes.

    Because of this Ive been thinking of switching to Spam Assassin simply because of its integration with qmail-scanner. Has anyone else had similar problems or been in a similar situation and found a good solution?

    1. Re:DSpam with qmail / vpopmail by Inda · · Score: 3, Interesting

      I'll admit I don't really understand your post.

      All these new spam removal programs are all very well and good but from an end user's point of view, all I would like to know is:

      How long am I going to have to put up with emails like this?

      Hi. This is the qmail-send program at somewhere.com.
      I'm afraid I wasn't able to deliver your message to the following addresses.
      This is a permanent error; I've given up. Sorry it didn't work out.

      info@somewhere.com
      This address no longer accepts mail.

      --- Below this line is a copy of the message.

      ...

      COMPLETE COPY OF NETSKY VIRUS

      ...

      ------=_NextPart_000_001B_01C0CA80 .6B015D10--

      I've had well over a thousand of these types of email in the last 30 days.

      DSPAM v3.2 is probably a rock solid application in the right hands.

      --
      This post contains benzene, nitrosamines, formaldehyde and hydrogen cyanide.
    2. Re:DSpam with qmail / vpopmail by Anonymous Coward · · Score: 2, Informative

      What you want is ClamAV:
      http://clamav.sourceforge.net/

    3. Re:DSpam with qmail / vpopmail by hayds · · Score: 4, Informative

      This is a legit message from someones mail system. You are receiving this because someone has been infected with a virus. Their computer is sending messages from your email address, and some of these messages are going to non-existant mail addresses. Because they are spoofing your mail address in the From: you are receiving all the bounces.

      So technically, this isnt spam or junk mail. Its someones email system doing what its supposed to, returning 'your' email because the sender didnt exist.

      Unfortunately, probably not much you can do about this without blocking all such legit system messages.

    4. Re:DSpam with qmail / vpopmail by Anonymous Coward · · Score: 3, Insightful

      Administrators really shoudn't configure their systems to return mail that contains virusses. Most of these are sent from spoofed addresses anyway and don't make it to the system that is actually infected. They just annoy people that are not responsible for the original messaga. And on top it just generates an unnecessary amount of traffic and I really just consider this to be spam.

    5. Re:DSpam with qmail / vpopmail by DaMeatGrinder · · Score: 3, Interesting
      Unfortunately, probably not much you can do about this without blocking all such legit system messages.

      Here's a crazy idea: if you crypto-sign all messages you send, it should be possible to check the signature in bounced messages and filter any unsigned bounced messages.

  2. Re:second post? by hayds · · Score: 5, Interesting

    I would have thought that running 2 bayesian filters would cause more trouble than good. The first filter would be ok as it would be trained like usual.

    The second filter would probably have problems because it would only see a small subset of all your mail as the first filter would have removed most of the spam. The second filter's sample would therefore be skewed and it would have far less data to accurately classify spam.

    Just my thoughts on the subject anyway...

  3. Is DSPAM... by DLR · · Score: 2, Funny

    ...any better than CSPAN?

    --
    "Like fire and fusion, government is a dangerous servant and a terrible master."~RAH
  4. Re:second post? by atrus · · Score: 2, Interesting

    I'm running Postfix with RBLs. Looking at SpamCop, SpamHaus, and SORBS. It auto rejects all e-mail coming from banned IPs. This brings me down to 1 spam a day. If your IP is blocked, tough, find a new ISP (these lists tend to be more self-expiring and not 'permament ban' types, which is good).

  5. DSPAM version 3.2 has _NOT_ been released by TheMysteriousFuture · · Score: 3, Informative
    Check out the download page

    Here's what it shows.


    October 1, 2004 3.2 Release Candidate 1
    October 8, 2004 3.2 Release Candidate 2
    October 14, 2004 Devel Frozen - Critical Changes only
    October 15, 2004 3.2 Preview Release 1
    October 20, 2004 Devel Absolutely Frozen. Release to packagers.
    October 22, 2004 3.2-STABLE Official Release


    ONLY the 3.2 Preview Release 1 is currently out!
    --
    .sig
    1. Re:DSPAM version 3.2 has _NOT_ been released by Anonymous Coward · · Score: 2, Informative

      Oh.. is that why the article says, "DSPAM's official release is next week, but you can download the preview release now"? I never, ever would have guessed.

  6. What about false positives. by Anonymous Coward · · Score: 5, Insightful
    From TFA, "around 99.95% (1 error in 2000)"

    I'm sick of spam filters braging about their overall error rate. All of them do OK at getting rid of the bulk of spams and saving the bulk of time.

    The real important differentating factor is how many false positives they mistakenly accuse of being spam.

    The consequenses of a spam message getting through are minimal - under a seconds of time, on average, to skip them.

    The consequenses of a non-spam getting blocked can be huge - loss of a customer - a mom not knowing her kid is in trouble.

    I wish the spam filters focused entirely on reporting how few false positives they produce.

    1. Re:What about false positives. by Scaba · · Score: 4, Funny
      The consequenses of a non-spam getting blocked can be huge - loss of a customer - a mom not knowing her kid is in trouble.

      Dear Mom,

      I hope this email finds you well. All is fine here, out in your garage. As you know, I love working on my cars. I'm currently replacing the engine block in my '76 Trans Am. Well, wouldn't you know it, but just moments ago, this 550 lb engine block fell on my legs and I cannot stand up, and in fact, am probably bleeding to death. Luckily, I have my cell phone handy and so am able to send you this email - the marvels of technology!! Anyway, I know you only check your email about twice weekly, but when you do, please send help.

      Your loving son,

      Dexter

    2. Re:What about false positives. by Anonymous Coward · · Score: 2, Interesting

      even false positives are not important. if I get 1000 spams a day, but only 40 legal mails, then marking everything as spam is 96% correct. if 35 of my mails are easily identified as legal mail (a procmail rule could do - closed and filtered mailing list) then marking those as good and everything else as bad is 99.5% correct. note that still all 5 personal mails I would get are marked as spamm.

      the big question for me is: how many mails do I need to check for false detection? and here is the dspam issue: it doesn't give you a grey marking, so you either check on of the mails marked as spam, and could possibly loose many important mails, or you need to check all of the spam messages, which loweres the advantage of a spam filter.

      i get a huge volume of spam, a some ML and a few other postings. If i get one word document from some windows using relative every few weeks, it still must not be marked as spam. with dspam i had to check the mails marked as spam to find such false positives. because i had to check all spam mails, dspam was useless. with spam assassin i only check level 1..5 and can ignore thousands of mails in spam leven 5..10 which gives me a very good middle way between a very unlikely case of overlooking a false positive and thus loosing a mail (hasn't happened as far a i know) and looking only a few spam mails for false positives.

    3. Re:What about false positives. by -noefordeg- · · Score: 2, Informative

      I'm running a mailserver with postfix, dspam, squirrelmail, courier pop/imap, amavis and Postfix Admin where I also integrated the DSPAM phpControlCenter.

      DSPAM has currently given my 0 false positives.
      The clue with dspam is to start with a clean database for each user and let them start to 'sort out their spam'. For imap it's stupidly simple. Everyone has two folders "spam" and "notspam", where you can drag&drop an email to the right folder. A script picks up any emails in each folder every hour and do the necessary add-spam/not-spam processes.
      For pop it's just a matter of forwaring the email to add-spam/not-spam adresses.

      This works so very well, because each use get to decide which emails he think is spam and which emails he would like to recieve.

      Also, if they log on to their webmail they can control what emails are marked as spam from their DSPAM phpControlCenter, and also correct any false positives, if there are any, or choose to block sender adresses and more.

    4. Re:What about false positives. by BasilBrush · · Score: 2, Insightful

      Why isn't that relative in tyour address book? / Why don't you have whitelisting set up?

    5. Re:What about false positives. by BasilBrush · · Score: 2, Insightful
      With a 99.9% accuracy on spam filters, and better performance on false positives, it just isn't worth the time. On the occasional chance that you are sent something from an address that isn't in your address book, and also happens to be a false positive, the chance of it also being vitally important are slim. And if it is vitally important, the sender will in all probability chase you when you don't respond.

      There's a story about a CEO that used to sweep his pile of memos into the waste bin every morning. The theory being that 99% of them were about things that were irrelevant, and for the 1% of important stuff, people would chase him. I can't remember whoich CEO it was supposed to be though, and it's probably apocryphal. But it does hint at a truth. People who manually go through any amount of spam manually to search for false positives are probably being too anally retentive. Life is too short.

  7. Filters? by Anonymous Coward · · Score: 3, Funny

    Me've always found that the best filter still is the humble (and the not so humble) human :p

  8. Re:second post? by kalman5 · · Score: 4, Interesting

    What I dislike is the centralized Antispam. What is spam for me could not be for you. I was using the antispam filter on thunderbird but at least in previous was not good then I switched to use K9 ( http://keir.net/k9.html ). Is there nothing around for Linux like K9 ? K.

  9. did they fix the problems? by Anonymous Coward · · Score: 4, Interesting

    a few months ago those features were available, too. while dspam is great at filtering mail, I faced two crucial problems, which forced me back to spamassassin. I haven't heard that they fixed any of those:
    - the database did grow huge. when my single user server with 128 mb had to use a 512 mb spam token database, performance was terrible. even with the tools included I could not do anything to fix the issue.
    - dspam knows only yes or now, there is no usable value that gives you some grey information. as a result, I had to check all those spam postings for false positives. Spamassassin on the other hand has that spam result 0 .. 10, so I can check 0..4 where 0 is ok (few false negatives) and 1..4 spam (few false positives), and I can directly delete thousands of mails in 5..10 without looking at them.

    i wont go back to dspam unless someone can offer speciic help for those issues. I believe everyone will face them sooner or later.

    1. Re:did they fix the problems? by Anonymous Coward · · Score: 2, Informative

      There are a lot of things you can (and should) do to keep small databases in DSPAM when disk is an issue. The problem is some of this is in the FAQ rather than the docs...but you can change your training mode to TOE (which only trains on error), set up merged groups (which uses a global db and then each user only stores corrections, almost as accurate), do some creative purging, and if you're really paranoid about disk, turn off some features like chained tokens (although i don't think it's necessary).

      As for a gray area, DSPAM has a confidence level (has for many versions now) which you can use to greylist messages, or you can set up classification networks and neural networks to have DSPAM consult other users' dictionaries (neural networks is kind of cool because it seeks out the most reliable users for classifying your mail).

      So yeah, it's done what you want for quite a while now. I've managed to get my system down to about 5MB per user using merged groups and TOE, and most of my users get 99.9% or better.

    2. Re:did they fix the problems? by hacker · · Score: 2, Informative
      "the database did grow huge. when my single user server with 128 mb had to use a 512 mb spam token database, performance was terrible. even with the tools included I could not do anything to fix the issue."

      Did you run the nightly and weekly purge scripts, as documented? (purge.sql for your DBI driver)

      Did you also change the model to TUM from the default? ( MUCH more accurate results over TOE or TEFT in our case, and we get a lot of spam!)

      "dspam knows only yes or now, there is no usable value that gives you some grey information. as a result, I had to check all those spam postings for false positives."

      I'm not sure what this means, but I've never personally had this problem. dspam gives each spam a percentage, which I can sort on using the web interface. Those with a lower percentage "might be" spam, but need to be checked. Those with a higher percentage (confidence), ARE spam. After 6 months of running dspam, I hardly ever check the quarantine now, because they're all spam. Its learned what is and what is not spam, and delivers accordingly.

      I, like you, used SA for a year or two, and had it trained down to a 2.0 threshold (from the default of 5.0). I also had over 300 custom rulesets that blocked based on incoming subject at the MTA side, before even accepting the mail message and sending it to SA. I also used 13 RBLs. We were getting over 5,000 incoming spam a day, and about a dozen would slip through to the user's mailboxes. After 2 years and all of that, we were only at about 90% effectiveness (and yes, my SA rulesets were kept updated all the time)

      After 2 weeks of using dspam, we were already at 98%, and not a single spam had slipped through to any user's mailbox. Granted, in the early period of using it for us, some messages were marked as False Positives, but that hasn't happened for ANY user in several months now.

      We also stopped using the custom MTA rulesets, and don't use any RBLs either.

      dspam absolutely blows away SA (currently, until/unless SA changes) in our particular subset of the mail we receive.

    3. Re:did they fix the problems? by SendBot · · Score: 2, Informative

      the database did grow huge... ...performance was terrible.

      Did you try TOE mode? Instead of analyzing everything, it just uses the errors. That means significantly less utilization of your data backend. From the FAQ:

      Switch to TOE Mode. DSPAM v2.10 supports TOE (Train-On-Error) mode, which only performs writes to the database in the event that a misclassification has occured (or if a user has fewer than 4000 innocent messages in corpus). Train-on-error mode should make a significant reduction in the number of writes (and therefore locks) being performed on your database, and may actually improve accuracy as TOE has been known to do so. The default mode of learning is TEFT (Train Everything). This performs a much more detailed training of incoming messages and can more easily adapt to new types of email behavior for users, but does use up a significant number of resources. This is a definite thing to try if you're bottlenecked!

  10. Re:second post? by flynn_nrg · · Score: 5, Informative

    It's your server and hopefully you'll never have to suffer the 'collateral damage' of living near a spammer (network neighbourhood wise). It has happened to me a couple of times. The first time I actually spent time sending my reply from my gmail account, and told the guy about it. The second time I didn't even bother.

    Netblock blacklisting is a really poor solution. In some cases a single spammer causes a /24 and then a /16 to be blocked. It doesn't make sense to me. OTOH, I discovered some time ago that blocking Windows boxes works wonderfully, and it's extremely easy to do with OpenBSD's pf :-)

    Btw, do you understand that changing ISP may not be an option?

  11. Does DSPAM inform the sender? by Axoiv · · Score: 3, Funny

    Does DSPAM inform the sender that his/her e-mail has been filtered out?

    1. Re:Does DSPAM inform the sender? by hayds · · Score: 4, Interesting

      No. Since spammers mostly use fake addresses, it's pretty pointless trying to send mail back to them. All that would achieve would be that you would receive all the bounces back and you'd get double the junk mail.

  12. Platforms... by Anonymous Coward · · Score: 2, Interesting

    The DSPAM site mentioned that it can be compiled on Mac OSX, but what about Winblows? I only have one box (go ahead and laugh) and it is an older Pentium III Winblows machine. I'd like to have a seperate box to act as a mail server but it just isn't currently feasable (translation: I'm broke.) Is there any way they can compile DSPAM for Win9X?

  13. Re:second post? by Neophytus · · Score: 2, Informative

    Spamcop and Spamhaus I agree with. SORBS demand payment for removal of clean servers (albeit not to them). That just doesn't chime when people spam through an isp's smtp server and get caught.

  14. A little Harsh! by andyfaeglasgow · · Score: 2, Insightful

    Didn't your mother tell you that if you haven't anything nice to say, then don't say it all!

  15. Re:None of the above by iBod · · Score: 2, Insightful

    Well that may work for you but it doesn't work for businesses. Change your name every 6-9 months? I don't think so.

  16. Informative, yes... by warrax_666 · · Score: 2, Insightful

    but somewhat besides the point.

    I have to disagree with you on whether it's spam, however. Just making up statistics here, but I'd guesstimate that the sender address of >99,99% (probably even more) of all virus emails is forged and probably points at an innocent third part. That means that the message from the virus scanner is completely and utterly worthless to the reciptient (i.e. the "sender" of the virus email). That makes it "junk" or "spam" in my book.

    You're right that there isn't much you can do, but I usually check to see if the mailer-daemon/postmaster address in the message looks legit and send off a boilerplate message saying something to the effect of "what you're doing is stupid and counterproductive, please stop".

    Hopefully SPF can stop some of this sender spoofing.

    --
    HAND.
  17. Call me bitter, but... by JohnGrahamCumming · · Score: 4, Informative

    Why does DSPAM get front page treatment when the latest POPFile release (which now handles POP3, IMAP, SMTP and NNTP filtering) and has an XML-RPC external interface, supports different databases, etc. etc. gets rejected as a story?

    Perhaps it's because I don't tend to make super-wild claims about POPFile's accuracy? Or come up with cool marketing names for the internal technology?

    POPFile's the only Bayesian filter that can:

    1. Do more than spam vs. anti-spam and
    2. Filter POP3, IMAP, SMTP and NNTP (that's right Usenet news)

    Do I have an axe to grind with Jonathan and DSPAM? No, it's a cool project. Does it annoy me that /. has recently turned into some combination of Freshmeat and PC Magazine? Yes.

    John.

    1. Re:Call me bitter, but... by rsax · · Score: 2, Funny
      Do I have an axe to grind with Jonathan and DSPAM? No, it's a cool project. Does it annoy me that /. has recently turned into some combination of Freshmeat and PC Magazine? Yes.

      Do I like to ask questions aloud and then answer them myself? You bet.

      ;)

  18. Before filtering by Phatmanotoo · · Score: 2, Informative

    I got nothing against content-filtering measures, as long as one is aware that this should be just the last layer of defense againts spam. Think about it, if your SMTP has already swallowed the spammer's email content, you have already lost precious bandwith.

    Especially if you host your own SMTP, you should put up a layered system of defenses: RBL lists, maybe tarpitting, white/graylisting, and then content filtering.

  19. Complaints come first. by khasim · · Score: 2, Interesting

    Netblock blacklisting is a really poor solution.

    It is the only solution when the ISP will do nothing to stop the spammer on their network.

    In some cases a single spammer causes a /24 and then a /16 to be blocked.

    That is rather difficult without the ISP's assistance (or them repeatedly ignoring the complaints).

    Btw, do you understand that changing ISP may not be an option?

    Sometimes that is true. In which case, you should get on the phone and make sure that your ISP understands that they have customers who will be upset if the ISP doesn't handle its spammer problem.

    Those lists, by themselves, do not block any email at all. Those lists are used by people who are fed up with trying to get ISP's to deal with their spammers.

  20. Re:Just use RBLs by HermanAB · · Score: 2, Interesting
    Yup - agreed - the best solution is a combination attack.

    Having users sort their mail and train a statistical filter from scratch is just way too much to ask - you'll get inundated with support calls and executives just don't have time to sort out the crud - they hired YOU to do it - passing the buck back to them ain't gonna fly...

    The system should get rid of 99.9% of the crud by default, then let the users wholfeel like doing it, report the remaining 0.1% to a central mailbox where you can sort it and retrain the statistical filter if necessary.

    --
    Oh well, what the hell...