Slashdot Mirror


Working Bayesian Mail Filter

zonker writes "A real, working honest to god Bayesian spam filter. I've been waiting for something like this for a while (since I first read Paul Graham's research paper on this very topic a few weeks ago). Well here's POPFile, a small but extremely effective Perl script that runs on just about any system Perl does. After just a little training was I able to get very effective filtering out of it. From what I understand the new email client that comes with OS X Jaguar has a feature similar to this, but I don't know if it is true Bayesian. Hopefully this kind of feature will become more prevalant in client software as I see the Google results for it are growing."

6 of 312 comments (clear)

  1. Sure it's promising by bigberk · · Score: 4, Insightful

    And I'm going to check it out right now :) But one long standing I fear with such solutions is spammer's adapting to new environments (changing wording used, making the emails look more professional). Sure, they're dumb shits but they're still humans with brains.

    1. Re:Sure it's promising by tsg · · Score: 4, Insightful

      Any solution that requires spammers to be more clever is going to reduce the number of spammers. And that is the end goal.

      --
      People's desire to believe they are right is much stronger than their desire to be right.
  2. That Google search... by Jugalator · · Score: 4, Insightful

    Try searching for "bayesian email filter" instead of just "bayes email filter" (as in the news post). You'll get better results and more hits since Google doesn't match "*bayes*" (as one would think) when searching for "bayes", but only the actual word "bayes".

    --
    Beware: In C++, your friends can see your privates!
  3. Professional Looking Spam May Be Impossible by Bob9113 · · Score: 4, Insightful

    This may be self-regulating. Consider the Skinner box; if something is capable of perfectly emulating recognition of Chinese, then it can be said to recognize Chinese. Likewise, if a spammer becomes sufficiently skilled at writing undetectable prose, he or she will have reached a skill level at which he or she can pursue more profitable writing ventures. The margins in spam are pretty small. Those spams are being written by morons because morons are cheap.

  4. this battle cannot be won by mboedick · · Score: 4, Insightful

    These technologies are interesting, but the problem of spam should be solved at the source. Why should we waste our time, money, CPU and drive space trying to outwit spam with clever software? As has been said before, if you filter spam at the inbox, a lot of resources have already been wasted by the time it arrives.

    Spam is anti-social behavior - a perversion of technology to make a quick buck. It's a cancer, and we should try to kill it. If you try to fight it any other way, you will constantly be playing catch-up, as the spammers have technology on their side too.

    1. Re:this battle cannot be won by shayne321 · · Score: 4, Insightful

      These technologies are interesting, but the problem of spam should be solved at the source.

      And how do you propose we solve the problem at its source? Make it illegal? They'll just find loopholes in the law and/or move to a country where it is legal. Hunt them down and murder their wife and kids in front of them then hang them from a tree? Satisfying though it may be, last I checked murder was illegal.

      Techniques like this CAN eventually solve the problem.. As others have pointed out, for someone to buy something from a spammer they have to READ the spam. If they send out 1 million spams and 500,000 read them and 20 of them buy something, they'll keep doing it. If they send out 1 million and only 500 people read it and 1 person buys something, they'll loose their source of income and have to find a new line of work.

      Also, for each obstacle we put in their way (checksum databases, open relay databases, filters, etc) it costs them more time, effort and therefore, money to send their crap - all for less income.

      Shayne

      --
      Today I didn't even have to use my AK; I got to say it was a good day -- Icecube