Slashdot Mirror


Mozilla Adding Spam Filters

ksheka writes "Mozilla mail now has Spam Filters, using Bayesian filtering method, no less. This is a very good thing, because it learns from the spam you receive, and constantly modifies itself, based on new spammer techniques!"

41 of 464 comments (clear)

  1. A little misleading by TobyWong · · Score: 5, Informative

    The news article makes it sound like this feature is up and running, in reality it is partially phased in - alpha stage stuff.

    It will be great when it's more complete but there is a lot of work to do yet.

    --
    - Toby
    1. Re:A little misleading by Jucius+Maximus · · Score: 1, Informative
      "The news article makes it sound like this feature is up and running, in reality it is partially phased in - alpha stage stuff."

      I wouldn't expect it to be in the next milestone release (1.2) as well, which btw should be out any day now.

    2. Re:A little misleading by DeadSea · · Score: 5, Informative

      It is up and running, it just may have a few bugs.

      I just downloaded the latest nightly build and enabled the features for my mail. So far I have seen that the icons are kind of funky, the dialog box is way oversized, there doesn't appear to be a good way of marking multiple messages as spam or not spam.

      On the other hand, it does seem to be doing a good job of filtering my messages. If you were one of the folks that complained about mozilla until mozilla 1.1 or 1.2, then I wouldn't go near it with a ten foot pole. If you are one of the folks, like me, who used mozilla since milestone 11 when it crashed every hour and couldn't render a heck of a lot of pages, you'll probably want to try it. Especially, if you use mozilla for mail anyway.

  2. Re:didn't k5 already run a story on this? by Junks+Jerzey · · Score: 3, Informative

    Here [kuro5hin.org]. Yeah, it's basically the same thing.

    Yes, and your point is? Hint: Slashdot gets most of it's stories from elsewhere.

  3. Re:didn't k5 already run a story on this? by Anonymous Coward · · Score: 1, Informative

    what the hell? That's a totally unrelated story! Has nothing to do with computers either.

  4. Re:Arms Race by Anonymous Coward · · Score: 2, Informative

    It doesn't work that way. Each person has their own Bayesian 'filter' so each person's tolerance for spam will be completely different.
    For instance, someone who often receieves links to web pages, from strangers, their filter will let through more spam than someone who Never receives links from strangers.

  5. Re:102 Features IE doesn't have by Yuan-Lung · · Score: 2, Informative

    No, it is not, but outlook express, an application distributed with IE, is.

  6. Re:Filtering by Gabe+Garza · · Score: 5, Informative

    Actually, using only the body isn't just a hack, it's a relatively new technique invented by Paul Graham that seems to produce excellent results. It makes a lot of sense: Spam is Spam because the body contains commercial or otherwise unwanted material--it's only natural that the most direct and accurate Spam filters are going to analyze the body. Bayesian classification like this is computationally tractable and appears to work. You can read more about it here.

  7. Re:I just started up popfile by netringer · · Score: 2, Informative

    I've been running for popfile for just a couple of weeks. It's working amazingly well.

    The fun thing is when it works on its own, like when you get a message from a subscribed list that it has never seen before and it knows that it ISN'T spam.

    With popfile working so well I'm not in a hurry to have Bayesian filters built into the mail client.

    Has anybody tried sharing the history data between Windows and Linux clients on a dual boot machine?

    --
    Ever dream you could fly? Get up from the Flight Sim. I Fly
  8. Since some of us run Windows, by Dot.Com.CEO · · Score: 5, Informative
    I dare submit myself to the rage of the Slashdot crowd. I use Outlook and "Spamnet" is a way to stop most spam in Windows. Based on the Razor project (distributed spam detection), it is a great solution for whomever cannot or does not want to move to Mozilla. Granted, it is beta quality, but the Mozilla feature is still in the alpha stage.

    --
    Mother is the best bet and don't let Satan draw you too fast.
  9. Re:Mozilla mail / browser by SethJohnson · · Score: 4, Informative


    The site-specific white list feature of Mozilla's pop-up blocking seems to work fine enough. The number of sites where you actually want popups from is far less than those offering popups. So manually adding these exceptions to the white list is not such an annoying task. I think bayesian filtering would be overkill in this case.
  10. Re:Mozilla mail / browser by pVoid · · Score: 2, Informative

    It would be much harder because an image doesn't have 'content'. At least text content.

    URLs are generally cryptic numbers, so that even humans can't decipher what they are.

    Although there are certain apps out there (such as Norton PErsonal FW) that let you block a certain add from ever popping up again. Which I find very cool.

  11. Re:Arms Race by Camel+Pilot · · Score: 3, Informative
    Actually they do have your data. If you preview any e-mail they typically have something like
    <img src=/spamcity/tracker.pl?id=177729299>
    Where 177729299 is your personal id number.

    No they have the feedback and they know what works and what does not.

  12. Re:Great gob Mozilla, but... by RAMMS+EIN · · Score: 3, Informative

    I partially agree with you. Compiling does allow you to get a slimmer lizard. However, compiling it from scratch is a real pain, and takes a long time and a lot of disk space. My point is that it's probably not worth the effort for most people. Why waste time and disk space on building a slimmed-down Mozilla if you can download a more functional precompiled version? This is why I love modularity so much; every module can be offered precompiled, and nobody needs to waste disk space.

    --
    Please correct me if I got my facts wrong.
  13. Re:zilla by Neon+Spiral+Injector · · Score: 5, Informative

    It is so annoying to get an e-mail without a subject. My spam filters actually bump you a little bit closer to being considered spam if there is no subject. I consider it to be a required header.

    For one I sort my mail by thread, while Mozilla will use reference headers to thread messages, the fall back is the subject. Without a subject your message would be tossed in the thread with the other loosers who also forgot their subject.

    The easy way to keep that dialog box from popping up when you send a mail is to...put a subject on the message.

    If you want a spell checker go to the Netscape FTP server find the XPI file for the spell checker and install it.

  14. Outlook is part of the IE Package by yerricde · · Score: 4, Informative

    E-mail is Outlook's domain. Not IE.

    It's possible to net-install Mozilla without installing Mozilla Mail, but the default setting includes both. It's possible to net-install IE without installing Outlook Express, but the default setting includes both. Thus, it is a fair comparison.

    100. Bugzilla - OK, lots of people use this, but Bugzilla != Mozilla. So it's not like Mozilla has built-in Bugzilla features... This is unrelated to the list.

    I think the point of that entry was that unlike IE's bug database, which only Microsoft employees see, Mozilla's bug database is 99% open to the public (the other 1% primarily covers unfixed security vulnerabilities).

    --
    Will I retire or break 10K?
  15. Re:102 Features IE doesn't have by GreyPoopon · · Score: 2, Informative
    However, I've heard that popup blockers and tabbed browsing are making their way into IE

    It'll be nice to have this, but this is really just another good argument for competition and choice. If Mozilla (and Opera) didn't have this first, how long would it have been before the features came to IE? The same can be said for things that appeared in IE first and finally made their way to Netscape / Mozilla. This is why it's really nice to have some choices.

    --

    GreyPoopon
    --
    Why is it I can write insightful comments but can't come up with a clever signature?

  16. Eudora finally has the filter I need by Continental+Drift · · Score: 3, Informative
    Eudora's latest version, 5.2, includes the ability to filter mail against your address book. If someone sends me mail and they are not on that address book or they don't use a special key word in the subject line, they get an automatic reply telling them to try again with that key word. Spammers will ignore that reply, so I'll only real people will include the key word, and then I can add them to my address book.

    This, comibined with some clever regex filters I already had means that I can reliably get the 10% of my mail that I actually want to read.

  17. Re:MSN 8 rules, Mozilla Sucks by lovebyte · · Score: 2, Informative

    I get spam filtering for free on Yahoo!

    --

    I'll do it for cheesy poofs.

  18. Re: Mozilla bloat [...] Gentoo by delta407 · · Score: 3, Informative
    Before you Gentoo zealots get out here and plug your so-loved-distro, remember that even you don't have as much control as you could.
    I disagree. See the Mozilla 1.1 ebuild for details. I can write:
    # export USE="moznomail"; emerge mozilla
    Or, if the ebuild still doesn't provide enough customization, I can just manually remove a config option (say, --enable-xsl) and "emerge mozilla" to get exactly what I want.
  19. So you really want... by dpilot · · Score: 5, Informative

    You really want server-side filtering. I do that on my IMAP server with procmail, though not Bayesian. A quick google with "procmail bayesian filter" turns up quite a bit of interesting stuff to sift through. Of course if it's not your IMAP server, you're back to client-side solutions.

    --
    The living have better things to do than to continue hating the dead.
  20. "Bayesian filtering" aka "Naive Bayes" by ghamerly · · Score: 5, Informative

    This approach is more commonly called "Naive Bayes" classification in the field of machine learning. It is naive because it considers each word to be a feature (dimension), but it also considers each word in an email to be conditionally independent of all other words in the document (which is not true, but really useful in practice).

    The author of the web page on using this technique to classify spam (Paul Graham) has a better explanation of Naive Bayes on this web page.

    I've written my own naive Bayes classifier to identify spam, with less positive results than he reports. However, naive Bayes can be a very effective technique, and I can believe his results.

    The two things you have to beware of when using it are "smoothing" probabilities of words you've never seen (you don't want them to always be zero, as straight naive Bayes will give you), and you need LOTS of training data for naive Bayes to work well. That means that you need to already have a fair amount of spam to identify spam well.

    You can see a paper I wrote on using naive Bayes to classify hard drive failures here, or look for more stuff on naive Bayes on Google. Also, don't reinvent the wheel: Andrew McCallum has written a very good toolkit for doing these sorts of things in Bow.

  21. Re:Arms Race by Spock+the+Vulcan · · Score: 5, Informative

    Use Gotmail, which downloads your hotmail messages to an mbox-style file. Or use hotwayd which appears like a POP3 server running on localhost, and uses WebDAV to get messages from hotmail (like Outlook Express). Either way, no web-bugs will get activated.

    The added advantage is that you can pipe these through procmail/spamassassin just like ordinary incoming mail, and not have to manually delete all that spam.

  22. Re:How? by mark_lybarger · · Score: 5, Informative

    Preferences -> Privacy & Security -> Images, you can turn off images in mozilla, or only in mail/news.

  23. Re:interesting idea... by Dunkirk · · Score: 3, Informative

    It's called vipul's razor.

    --
    Acts 17:28, "For in Him we live, and move, and have our being."
  24. Re:Microsoft's Patent by DaveAtFraud · · Score: 5, Informative
    This is from Paul Graham's site with regard to the Microsoft patent. Patents tend to be very narrow in scope such that, if some aspects change, the patent may no longer apply. Pick on any typical consumer product such as hair dryers, stereos, you name it. They all have patents and they're all different and they don't "infringe" on each other unless they're virtually identical.

    --
    They that can give up essential liberty to obtain a little temporary safety deserve neither safety nor liberty.
    Ben
  25. Re:Its still too slow... by casio282 · · Score: 3, Informative

    IE starts up quickly in Windows because it is loaded into memory at system start up and runs in the background. When you "start" the program you are simply creating a new browser window. So you suffer the program start-up overhead when the system boots, instead of each time you create a new instance.

    The good news is, for those inclined to sacrifice system performance for quick browser load times, is that this option is also available in Mozilla...Look under "Preferences...Advanced" for the Quick-launch option.

    --

    :wq
  26. Re:Good example of MS's monopoly abuse by bamm · · Score: 2, Informative

    I see your bet and raise you an infinite number of software and hardware developers.

    Installed anything on a MS platform lately? Everyone wants your email address, so they can give you better "support" by selling your info to hordes of spam artist. For instance, my Mom doesn't use Windows because it's easier to use or crashes less often. She uses Windows because CreateCard12 doesn't run on Linux. Her new printer didn't come with Linux drivers and neither did her scanner. MS retains its monopoly status for those reasons and it isn't about to jeopordize it's relationship with these software and hardware companies by helping prevent spam (unless the get $30 a month from you).

    Bammkkkk

    --
    www.sguil.net
    The Analyst Console for NSM
  27. Re:102 Features IE doesn't have by bugbear · · Score: 2, Informative

    IE doesn't do real spam filtering yet, but MSN 8 now does content-based filtering that learns by example. Since they brag that it uses a "patented" algorithm, I assume they're using this Bayesian filtering algorithm.

    Before everyone starts worrying that MSFT has patented Bayesian filtering, (a) I don't think the patent would hold up well in court, because e.g. ifile is older and (b) patents are not a problem for open-source projects anyway.

  28. But will it be in Evolution? by mshiltonj · · Score: 4, Informative

    I may drop Evolution in favor of Mozilla Mail.

    I tried to find out if the Evolution dev team was going to do this. The only thread I could find on the topic is here:

    http://lists.helixcode.com/archives/public/evoluti on/2002-August/020845.html

    Doesn't look like it's part of their vision.

  29. Here's the link by ChrisCampbell47 · · Score: 3, Informative
    A dozen or more replies and yet no link to it .. OK, I'll spend the 1.5 minutes posting it ...

    101 things that the Mozilla browser can do that IE cannot

  30. Re:Not impressed by self+assembled+struc · · Score: 3, Informative

    if i'm not mistaken you can edit the SPAM rule in mac os 10.2 mail and add additional properities to it's rules.

    the default is "if not in address book and it's SPAM" send to SPAM folder.

    you should be able to add a properity to that rule that says

    "if not in address book and FROM: doesn't contiain XYX.COM and it's SPAM" send to SPAM folder

    you just add the properities before the SPAM one.

  31. Re:Microsoft's Patent by DeadSea · · Score: 5, Informative
    Specifically in this case:
    ...then stored in a corresponding folder for subsequent retrieval by and display to the recipient.
    So it looks as if this patent only covers server side implementations. A client side (Mozilla's) implementation retreives it and then filters and displays it.
  32. Server-side filtering by scarhill · · Score: 3, Informative

    The big problem with bayesian server-side filtering (as opposed to rule-based tools like SpamAssassin) is that baysian filtering requires a UI. The user must classify email as spam/not-spam to provide fodder for the filter. Having that UI in the mail client is the right thing to do. It would be nice if there were some protocol that the client could use to communicate that info to a server-side filter, but AFAIK no such protocol exists.

    So client-side seems like the right place for bayesian filtering right now.

  33. Re:Good example of MS's monopoly abuse by Anonymous Coward · · Score: 1, Informative

    And if you've ever seen the cards you get from bluemountain, you'd understand why the filter classified them as spam...

    It wasn't that the filter was malicious or even targetted at bluemountain - the cards were just 95% ad, 5% content.

  34. popfile's creators response to the ms patent... by zonker · · Score: 1, Informative

    if you are curious as to one view of the patent situation you might find this interesting...

  35. popfile has a windows version by Anonymous Coward · · Score: 1, Informative

    There is a Windows version of POPFile out now and is only getting better.

  36. Re:My Problem with Mozilla sorta OT by wizarddc · · Score: 3, Informative

    There are people working on this. Currently, Phoenix is the brower only app. It's lean, quick, and efficient. Bugs are still being worked out, but it's very usable right now. Also, K-Meleon is a browser that uses the Gecko rendering engine, but not the Mozilla XUL interface.

    As for email/news clients, there are two, I believe. Thunderbird and Minotaur. Neither are out at all yet to use.

    --
    Th
  37. Re:Sort by Spam Probability by brw215 · · Score: 2, Informative

    Actually in theory you could can "set" a threshold for SPAM detection with a Bayes filter.

    Bayes therom is something like (note the Pr(mail) term is dropped):

    PR("SPAM" | mail) = Pr(mail | "SPAM") * Pr("SPAM")
    vs.
    PR("LEGIT" | mail) = Pr(mail | "LEGIT") * Pr("LEGIT")

    A bayes classifier always picks the label (spam, not spam) with the higher probability or

    Pr("SPAM" | mail) vs. Pr("legit" | mail)

    The spread between these two numbers is going to define the "certainty" that any given mail is in fact SPAM. You could either sort your incoming mails by this spread or color almost definite ones red, most likely yellow etc......

  38. Re:One question... by biostatman · · Score: 2, Informative

    Spam Assassin in combination with procmail has worked well on the server side for me. You can tune the sensitivity to how much spam it catches, but my informal assessment is that it catches about 95% of the spam, with only 1 false positive in about 3 weeks of use (the false positive's and any other email address can be put in a whitelist of email addresses that are let through automatically). Great stuff. Saves me from having to constantly update my ~/.procmailrc for new spammers.

    --
    For the love of $DEITY, loose != not win!!!!!
  39. Re:Client-Side Filtering is Wasteful by divide+overflow · · Score: 2, Informative


    1) Decentralized database servers that communicates P2P-like to track and exchange statistics about what is spam and what is not....

    Like Vipul's Razor...

    2) Mail Server Plug-In/Filter that uses (1) to decide whether to deliver/mark/throw out mail based on a....

    Like SpamAssassin...

    3) Mail Client Plug-In/Filter that receives mail from (2) according to a level of filtering you specify. Oh, and you can also vote on the mail that does get through to ID it as spam so the rest of the system gets it's statistics updated from your misfortune.

    Although this takes more effort due to the need to support a number of different mail clients it appears that this may be doable on some platforms using software that supports SpamAssassin.