Slashdot Mirror


Mozilla Adding Spam Filters

ksheka writes "Mozilla mail now has Spam Filters, using Bayesian filtering method, no less. This is a very good thing, because it learns from the spam you receive, and constantly modifies itself, based on new spammer techniques!"

11 of 464 comments (clear)

  1. A little misleading by TobyWong · · Score: 5, Informative

    The news article makes it sound like this feature is up and running, in reality it is partially phased in - alpha stage stuff.

    It will be great when it's more complete but there is a lot of work to do yet.

    --
    - Toby
    1. Re:A little misleading by DeadSea · · Score: 5, Informative

      It is up and running, it just may have a few bugs.

      I just downloaded the latest nightly build and enabled the features for my mail. So far I have seen that the icons are kind of funky, the dialog box is way oversized, there doesn't appear to be a good way of marking multiple messages as spam or not spam.

      On the other hand, it does seem to be doing a good job of filtering my messages. If you were one of the folks that complained about mozilla until mozilla 1.1 or 1.2, then I wouldn't go near it with a ten foot pole. If you are one of the folks, like me, who used mozilla since milestone 11 when it crashed every hour and couldn't render a heck of a lot of pages, you'll probably want to try it. Especially, if you use mozilla for mail anyway.

  2. Re:Filtering by Gabe+Garza · · Score: 5, Informative

    Actually, using only the body isn't just a hack, it's a relatively new technique invented by Paul Graham that seems to produce excellent results. It makes a lot of sense: Spam is Spam because the body contains commercial or otherwise unwanted material--it's only natural that the most direct and accurate Spam filters are going to analyze the body. Bayesian classification like this is computationally tractable and appears to work. You can read more about it here.

  3. Since some of us run Windows, by Dot.Com.CEO · · Score: 5, Informative
    I dare submit myself to the rage of the Slashdot crowd. I use Outlook and "Spamnet" is a way to stop most spam in Windows. Based on the Razor project (distributed spam detection), it is a great solution for whomever cannot or does not want to move to Mozilla. Granted, it is beta quality, but the Mozilla feature is still in the alpha stage.

    --
    Mother is the best bet and don't let Satan draw you too fast.
  4. Re:zilla by Neon+Spiral+Injector · · Score: 5, Informative

    It is so annoying to get an e-mail without a subject. My spam filters actually bump you a little bit closer to being considered spam if there is no subject. I consider it to be a required header.

    For one I sort my mail by thread, while Mozilla will use reference headers to thread messages, the fall back is the subject. Without a subject your message would be tossed in the thread with the other loosers who also forgot their subject.

    The easy way to keep that dialog box from popping up when you send a mail is to...put a subject on the message.

    If you want a spell checker go to the Netscape FTP server find the XPI file for the spell checker and install it.

  5. So you really want... by dpilot · · Score: 5, Informative

    You really want server-side filtering. I do that on my IMAP server with procmail, though not Bayesian. A quick google with "procmail bayesian filter" turns up quite a bit of interesting stuff to sift through. Of course if it's not your IMAP server, you're back to client-side solutions.

    --
    The living have better things to do than to continue hating the dead.
  6. "Bayesian filtering" aka "Naive Bayes" by ghamerly · · Score: 5, Informative

    This approach is more commonly called "Naive Bayes" classification in the field of machine learning. It is naive because it considers each word to be a feature (dimension), but it also considers each word in an email to be conditionally independent of all other words in the document (which is not true, but really useful in practice).

    The author of the web page on using this technique to classify spam (Paul Graham) has a better explanation of Naive Bayes on this web page.

    I've written my own naive Bayes classifier to identify spam, with less positive results than he reports. However, naive Bayes can be a very effective technique, and I can believe his results.

    The two things you have to beware of when using it are "smoothing" probabilities of words you've never seen (you don't want them to always be zero, as straight naive Bayes will give you), and you need LOTS of training data for naive Bayes to work well. That means that you need to already have a fair amount of spam to identify spam well.

    You can see a paper I wrote on using naive Bayes to classify hard drive failures here, or look for more stuff on naive Bayes on Google. Also, don't reinvent the wheel: Andrew McCallum has written a very good toolkit for doing these sorts of things in Bow.

  7. Re:Arms Race by Spock+the+Vulcan · · Score: 5, Informative

    Use Gotmail, which downloads your hotmail messages to an mbox-style file. Or use hotwayd which appears like a POP3 server running on localhost, and uses WebDAV to get messages from hotmail (like Outlook Express). Either way, no web-bugs will get activated.

    The added advantage is that you can pipe these through procmail/spamassassin just like ordinary incoming mail, and not have to manually delete all that spam.

  8. Re:How? by mark_lybarger · · Score: 5, Informative

    Preferences -> Privacy & Security -> Images, you can turn off images in mozilla, or only in mail/news.

  9. Re:Microsoft's Patent by DaveAtFraud · · Score: 5, Informative
    This is from Paul Graham's site with regard to the Microsoft patent. Patents tend to be very narrow in scope such that, if some aspects change, the patent may no longer apply. Pick on any typical consumer product such as hair dryers, stereos, you name it. They all have patents and they're all different and they don't "infringe" on each other unless they're virtually identical.

    --
    They that can give up essential liberty to obtain a little temporary safety deserve neither safety nor liberty.
    Ben
  10. Re:Microsoft's Patent by DeadSea · · Score: 5, Informative
    Specifically in this case:
    ...then stored in a corresponding folder for subsequent retrieval by and display to the recipient.
    So it looks as if this patent only covers server side implementations. A client side (Mozilla's) implementation retreives it and then filters and displays it.