Slashdot Mirror


Sorting the Spam from the Ham

MrClever writes "The Sydney Morning Herald (Aust) is running an article about the merits of Bayesian filtering and a good plain-english description of how it works. Might be handy if you need to explain it to non-technophiles. The main thing that may be useful is a Bayesian spam filter written to drop straight into Outlook 2k/XP available here and written in Python by Mark Hammond." Math buffs might enjoy reading these pages or browsing this writeup and its many links.

6 of 249 comments (clear)

  1. Re:But without spam... by IthnkImParanoid · · Score: 3, Insightful

    You're getting modding as funny, but I just figured out exactly how true this is. My main email account is used primarily for work, so it was very easy to set up white lists for 30 or so email addresses with a few family and friends thrown in, and route to a special folder. I still check the default folder, of course, but I turned off notification for everything except the white folder.

    I went from checking my email every 5-10 minutes to a handful of times a day.

    --
    It's nothing but crumpled porno and Ayn Rand.
  2. So-so article by scottme · · Score: 3, Insightful

    For an article in an "IT tech" section of a paper, this is really very weak.

    It really doesn't do much more than precis Paul Graham's arguments, then ends in a blatant plug for just one Outlook addon.

    I suppose if there are still people in the column's audience who haven't heard this all before, and it gets the message out that spam can be effectively filtered, it's a minor goodness.

  3. Why stop at classifying spam? Why not all e-mail? by Anonymous Coward · · Score: 5, Insightful

    As I wrote only late last night, using Bayesian classification with only two categories (spam and "non-spam") is somewhat short-sighted, since if properly trained, a Bayes classifier can do a much better job than ordinary mail filtering (procmail, Mozilla or Mail.app filters, you name it).

    In fact, if I had to bet on the next "killer apps", mail sorting and RSS filtering based on Bayesian classification would be right at the top of my list, based solely on the actual time-saving benefits for users. And I can't see any reason for Bayesian filtering not being included in Mozilla Mail and Apple's own (revamped) Mail.app.

    I have to use Outlook at work, and after setting up Outclass (which requires POPfile) with several "buckets" to classify my corporate e-mail by project and field, I'm definetly not going back. Outlook, even with extensive use of Rules Wizard and categories, simply cannot cope with the diverse kinds of project-related e-mail I swap with colleagues, and Outclass is the only thing I could find that could deal with Exchange, PST folders and multiple Bayesian "buckets" categories.

    Come on, do the right thing and tell Apple and The Mozilla Project that you want configurable Bayesian filtering on their mail clients.

  4. Re:Spambayes by AssFace · · Score: 4, Insightful

    I have seen all of the local client software and I personally have never bothered with it.

    I always felt that the whole point of spam being annoying was that it wasted bandwidth. It gets sent to my server, and then I have to download it all from my server, and then it gets sorted away from my eyes in my client.

    It is fairly trivial if you get enough regular mail for it to matter, and you are on a fast connection.

    But I can't tell you how annoying it is to be on a slow dial-up connection and download 50 messages and then see that they all got filtered into the spam folder and that there were no "real" messages.
    While there is a nice feeling of seeing them all get caught, it is annoying to have to wait for a download (and pay for it) and then get no return on the investment.

    That is why I always try to have the spam blocking on the server side. Although I now spend most of my time using ssh into my server and that way it isn't downloading all of the mail until I want to see something.

    Perhaps if I combine the fact that I have SA on the server, and then if I also had a client side option, I would get everything properly blocked that way (the only reason stuff gets through my server setup right now is if the server is under a high load, then my SA script will time out and the mail gets through).

    --

    There are some odd things afoot now, in the Villa Straylight.
  5. Re:This is bad news!!! by Mikey-San · · Score: 4, Insightful

    I know your post was meant to be funny, but it brings up a point:

    So what? If more computer products benefit, don't we all? Anything that makes Outlook better is good in my book. Perhaps this will eliminate some virus-and-worm-carrying spam--and that's good for /all/ of us on teh intarweb. ;-)

    --
    Mikey-San
    Karma: +Eleventy billion (mostly affected by watching Celebrity Jeopardy)
  6. Not it's not... by Goonie · · Score: 4, Insightful
    Client side filtering is not an ideal spam solution, but it's a good thing on both a micro and macro scale.
    • For the 99% of people who don't respond to spam, it makes no difference to the spammer whether they filter it or delete it manually. At an individual level, it reduces the amount of spam I have to deal with to managable levels.
    • For the 1% that *do* respond to spam, having a filter might reduce the amount of spam they respond to and thus reduces the financial rewards for spammers. Anything that reduces the financial rewards for spammers is going to help reduce the spam levels.
    • If spammers are spending all their time and money figuring out how to beat filters, that's time and money that they're not using to send spam.

    As for your indictment of spam filtering providers, could you please explain where the spamassassin devteam is making money?

    My choices with regards to spam at the moment are simple. Use spamassassin or something like it, or wade through spam myself. I know which I'd prefer.

    --

    Any sufficiently advanced technology is indistinguishable from a rigged demo
    --Andy Finkel (J. Klass?)