Slashdot Mirror


Mozilla Adding Spam Filters

ksheka writes "Mozilla mail now has Spam Filters, using Bayesian filtering method, no less. This is a very good thing, because it learns from the spam you receive, and constantly modifies itself, based on new spammer techniques!"

27 of 464 comments (clear)

  1. 102 Features IE doesn't have by Squeezer · · Score: 2, Insightful

    Now the list of 101 Mozilla features that IE doesn't have can be amended to 102 features! :)

    --
    Does the name Pavlov ring a bell?
    1. Re:102 Features IE doesn't have by Shippy · · Score: 3, Insightful
      Not really. E-mail is Outlook's domain. Not IE. I think that list of 101 things is a great way to show the power and flexibility over IE, but some of them are just filler. For example:

      • 98. Supports IRC Protocol - This is something I don't even use. This is just another program which should be separate but isn't and gives rise to the "mozilla is bloated" argument.
      • 99. Open Source - Yeah, but good luck sifting through it ;)
      • 100. Bugzilla - OK, lots of people use this, but Bugzilla != Mozilla. So it's not like Mozilla has built-in Bugzilla features... This is unrelated to the list.
      • 101. Giant Lizards are Cool - 'Nuff said.

      So, that brings it down to, what, 97? Still a pretty good list. However, I've heard that popup blockers and tabbed browsing are making their way into IE (and MS employees can already use these features), but we'll see if they're actually integrated.
      --
      -Shippy
    2. Re:102 Features IE doesn't have by gabec · · Score: 3, Insightful
      Microsoft playing "catch-up"? Nonsense. Only, what, 9% of the internet users out there use browsers other than IE? Of that, how many of those alternate browsers have tabbed browsing and of those clients using those browsers how many actually *use* tabs?

      I agree that Microsoft is scanning around and implementing good features, but no one other than /.'ers will ever know they got the idea from someone else. You're only playing 'catch-up' if there's something to catch-up to. IE has over 90% of the internet userbase, I'd say *that* was something to catch-up to.

  2. Re:Arms Race by TamMan2000 · · Score: 5, Insightful

    Interesting thought, but they would have to have a large sample of YOUR valid email to train on...

    --
    "I'll have a Guinness, no wait, make that a Coors Light" -Grad student I work with, who shall remain anonymous...
  3. Re:Arms Race by jpetts · · Score: 3, Insightful

    But the spammers will develop Bayesian filters of their own to find the best content that will sneak by your filters

    No they won't, unless the pattern (if there is one discernable in the S/N ratio) of replies they receive changes. As most spam, as far as spammers goes, disappears into a black hole, they have no way of learning how your filters are working.

    And that's good filterin'!

    --
    Call me old fashioned, but I like a dump to be as memorable as it is devastating - Bender
  4. Re:Great gob Mozilla, but... by xanadu-xtroot.com · · Score: 3, Insightful

    it's size is getting bigger and bigger.

    Compile Mozilla from scratch, and you'll see that you can custom tailor the build and cut out a lot of cruft.mpile Mozilla from scratch, and you'll see that you can custom tailor the build and cut out a lot of cruft.

    The source package is far larger than the binaries! Then there's the wait in compiling the damn thing. No (L)User is going to do that. Maybe us geeks (and I do use the source, Luke), but certainly not a "normal" user.

    The problem here is that binary distributions package it all together

    So download the Net installer and choose only what you want?

    --
    I'm not a prophet or a stone-age man,
    I'm just a mortal with potential of a super man.
  5. Re:Arms Race by ichimunki · · Score: 5, Insightful

    Nonsense. It's impossible. First of all, they don't have access to much of the mail I want to let through-- although my mailing list traffic certainly qualifies, so let's assume that's the only mail I get and that they know I am receiving it.

    There will still need to be header information and actual spam content in the spams themselves for those mails simply to not be repeats or dada-esque cutups of posts to the mailing list. That is, there must be content unique to the spam that no normal sender on the list will include.

    Because of this, and the fact that so-called Bayesian spam filtering works by scoring all the words in an email and then evaluating the email based only on the extremes, there is little likelihood-- since the spam must still contain spam words to have any point at all-- of those words not being on the extreme word list. After all, if the same words are appearing in both spam and not-spam mails, they will be given a spam-probability that is not extreme. So all those words in common will be ignored and only the spam words will be looked at-- and the spam will still be filtered.

    --
    I do not have a signature
  6. Re:didn't k5 already run a story on this? by ergo98 · · Score: 2, Insightful

    Really, eh? I mean, I turned on CNN today and they were reporting a story that I'd already heard on ABC News! The nerve! I sent them a letter saying "Um, excuse me, but I already heard that on ABC l053rZ!" They haven't replied yet.

    To make matters even worse, when I was on the train I overheard two people talking about the Israeli conflict. I couldn't believe it! I mean, I heard someone talking about that LAST YEAR for crying out loud! That is so 2001! I told them that they're l4m3rZ for being so dated. They just seemed to ignore me though.

  7. Good example of MS's monopoly abuse by SethJohnson · · Score: 5, Insightful


    Sorry if this comes off as a MS-bashing rant. It's not intended as such.

    The fact that MS doesn't seem hard at work implementing spam filters in Outlook or popup blockers in IE is a good example of consumers suffering due to Microsoft's monopoly. It also demonstrates how Microsoft is able to leverage its monopoly in one area (mail and web clients) to build profit in another market.

    This other market is it's aspring ISP services. The app and mail client development teams aren't implementing these features because the Microsoft ISP wants to be able to tout the ability to filter spam and block popups. If the browsers and email clients used by 90%+ of the internet users had these features, then it wouldn't be a selling point for their ISP. This is a clear example of the company witholding features in the free products so it can profit from the antidote.

    It also demonstrates the lack of competitive pressures in the market that normally drives a company to implement features at a rapid pace. Consumers are stagnating with a product for which the developer has no competitive pressure to improve. Hence that list of 102 things Mozilla can do that IE can't do.
    1. Re:Good example of MS's monopoly abuse by schon · · Score: 5, Insightful

      Sorry if this comes off as a MS-bashing rant.

      No need to apologize - I love a good MS-bashing rant as much as the next /.'er.. :o)

      I do, however, feel that it's not as big a problem as you do..

      The app and mail client development teams aren't implementing these features because the Microsoft ISP wants to be able to tout the ability to filter spam and block popups.

      This may (or may not - although I'm inclined to agree with your views) be true, but the important thing to understand is that the MTA (ISP)-level is where spam blocking belongs.

      The real problem with spam is that it steals bandwidth - blocking spam after it's already sitting in your mailbox is like closing the barn door after the horses have eaten your children - the bandwidth has already been used, so you don't gain anything... having your email client "block" spam isn't really blocking it, it's just an automatic "delete key".. which is what the spammers want (how many of them say spam isn't a problem because you can "just hit delete")

      MS's intentions aside, the solution they have is the correct one, even if their motives are suspect.

    2. Re:Good example of MS's monopoly abuse by novakreo · · Score: 2, Insightful

      I for one am quite happy for Internet Explorer to never implement tabbed browsing, pop-up blocking, mouse gestures, or anything else which it currently lacks. It makes it much easier to convince people to switch browsers (if they don't care about security), and the fact that since 90% (or whatever the exact stat is) of the world uses IE and sees the pop-up ads means that advertisers aren't rushing about trying to circumvent the pop-up blocking.

      In short, Microsoft is the open-source movement's greatest asset :-).

      --
      O frabjous day! Callooh! Callay!
    3. Re:Good example of MS's monopoly abuse by ebyrob · · Score: 3, Insightful

      is like closing the barn door after the horses have eaten your children

      Ya, you should have shot those man-eating horses to begin with. Seriously though, don't you think we should have laws against this type of mail fraud (forging headers and the like) instead of simply trying to "block" the fraud at the ISP level? I suppose blocking as well can't hurt, but freedom requires punishing the guilty and only the guilty.

      The last thing I want is Microsoft deciding which emails destined to me are "spams". (subscription email from FSF? Must be spam!)

  8. Client-Side Filtering is Wasteful by divide+overflow · · Score: 5, Insightful


    Since you must first download the content for client-side filtering to work you waste bandwidth. If you are truly bombarded by spam you still lose...your mail spool still gets filled up with stuff you don't want, your data transfers compete for bandwidth with the spam, storage hardware works harder storing data that will only be deleted. It raises everyone's costs, including yours.

    We need to block undesired mail at the host, not filter it at the client. That way the spam never gets sent, the spammer gets the message that their attempt was futile, and bandwidth is conserved. Many ISPs already provide this service...we need to improve on it. And we need better tools for identifying and dealing with spammers. The current mail standards are woefully inadequate to this task.

  9. The ultimate filters by TigerTime · · Score: 5, Insightful

    There needs to be a tiered structure with filters. The main one would be at the ISP level. It would only filter out obvious spam(like spam going to 2000 users at that ISP). The second tier would be at the client side and would have a certain level of intelligence in identifying spam. One feature that I'd like (it might already be available) is if it could automatically send an email back to the sender saying the email address doesn't exist. This should be done at the server level and/or client level. This could possibly help in removing your email from such lists. As far as what to do with the spam at the client level, I think that it should be sent to your main inbox but just marked as spam (maybe greyed out or something). Like new mail is always bold and once you read it it goes to a regular font. Well, spam could be just greyed out. That way you would ever miss something that the spam filter had a false hit on.

  10. Re:Filtering by Anonymous Coward · · Score: 1, Insightful

    Some of us don't even keep an address book, then again nowdays 80% of my mail is spam. I guess that means a spam filter that compares against my address book would not only be 100% effective in eliminating spam, but would also only 20% of the mails it wipes would be false positives. Good stuff that filtering software :P

  11. SpamCop! by JediTrainer · · Score: 3, Insightful

    How about a spamcop-like plugin? Or something that can submit my message plus contents to SpamCop?

    If using SpamCop, there should be a way to still show the site's banners, because they deserve to get paid for their bandwidth I'm using up.

    I'd love to just be able to right-click on a message and report it to the various abuse/postmaster accounts without having to copy my whole message plus headers, and pasting such into their web form. SpamCop seems to be pretty good at tracing the origins of messages, so I'd love to be able to leverage that sort of functionality.

    --

    You can accomplish anything you set your mind to. The impossible just takes a little longer.
  12. Re:interesting idea... by SandSpider · · Score: 3, Insightful

    That's a really cool idea in theory. In reality, you have to deal with trusting that everybody on the internet are trusted enough to decide what your spam is and isn't.

    I mean, you've been on the internet before, right? You've seen the other people here, too? Think about it.

    =Brian

    --
    There is nothing so good that someone, somewhere, will not hate it.
  13. Re:Filtering by garymcm · · Score: 3, Insightful

    I would like to understand the choice of Bayesian more. As far as I know Bayesian is good for classifying based on *belief* and can be pretty good when only partial evidence is available to network. This is great for Marketing activities, eg sending out mass emails to a segment of a database :) . However as this is _my_ email and mission critical to me, just a simple belief that something is spam is not enough

    In my experience, 99% of spam can be caught with static rules (am I in the TO or CC line gets a bit under half the spam I receive). Taxonomical analysis of the subject and body can get the rest.

    Bayesian seems like overkill, or maybe even a bad fit. Let's face it, the other well known use for Bayesian is the famous Microsoft Office Paper Clip!!! And that is about as useful as the proverbial ashtray on a motorbike!!

    Gary

  14. Re:zilla by cduffy · · Score: 2, Insightful

    I care. I'm busy, and if one of my friends needs a ride tonight I'll read it. If that same friend is just wondering how I'm doing, I won't -- unless I'm not at all busy.

    Further, some of us actually have multiple threads of conversation going with our friends, or archive our messages and occasionally go back through them. I may be simultaniously talking with someone about (say) some PHP problems they're having and discussing motorcycle riding. If I want to go back and reread what exactly the problem he was having with PHP is, I don't want to have to sort through the messages where he's trying to convince me I should be riding a crotch rocket instead of a cruiser.

    My friends understand this, and are polite enough to use the subject header in their emails. If they don't do that once, I'll ask politely that they start. If they don't do it again, I may well be rightfully a bit annoyed.

  15. My Problem with Mozilla sorta OT by pneuma_66 · · Score: 3, Insightful

    I love mozilla, and use it as my main browser. However my biggest complaint is that all the components (browser, mail, composer, etc) should be separate apps. I don't like the fact that if my browser crashes, so does my email reader, and vice versa.

    I tried to find some documentation on how to acheive this, however, there was none to be found. Does anyone know how to do this, the I can use Mozilla's mail, rather than the flaky mail app that comes with OSX.

  16. Dealing with Spam by mabu · · Score: 2, Insightful

    I am completely against all client-based spam filters. This essentially does nothing to address the most serious repurcussion of spamming, and that's exploitation of third-party networks & bandwidth. Aside from the fact that client-based spam filtering is most-likely the least effective solution and more likely to stop legitimate mail than other methods such as known spam relay blocking.

    Ultimately, the only way we're going to really curtail spam is by enacting harsh *criminal* penalties for mail relay and server hijacking, which is the standard method by which most spam is distributed. It's true that these activities are already considered illegal but the law enforcement agencies are either unwilling to take action because there's a minimum threshold of monetary damages required, or they're ill-equipped knowledge and technology-wise to aggressively go after these people.

    And Puleeze don't even bother with the ineffective, "let the industry regulate itself" argument, which doesn't work. Most spammers are small "cell groups" that move around a lot; most don't have any money in the first place; only criminal penalties are going to work, and client-side and industry regulated efforts don't stop their efforts at all and just drive bandwidth charges up for the rest of us.

  17. Sort by Spam Probability by Krellan · · Score: 5, Insightful

    It seems too many people distrust spam filters because of the chance of accidentally blocking an important legitimate message as if it were spam.

    Many spam filters are strictly binary: a message is either spam, or not spam. This is not ideal, because "gray area" messages - between these two extremes - will likely not be sorted correctly.

    I propose adding a new sort option to email clients.

    Sort by Spam Probability

    This would be an additional field that can be displayed in a message list, similiar to "To", "From", "Subject", and the like. Like the article, probabilities would range from 99% (almost certain spam) to 1% (most likely an innocent message). Notice that 100% accuracy either way is not claimed.

    This way, the user can see up front the messages that are most likely not spam. The spam messages will be relegated to the bottom of the list, possibly colored to indicate their likelihood of being spam. If there is a message in the "gray area", it will most likely appear in the list between the legitimate messages and the spam, so the user will have a chance to see the message and make a decision, without the message being lost in the shuffle.

    This would be a great feature. I hope this gets into Mozilla's mail client.

    (BTW, another feature that would be great to see in mail clients would be datestamping of the actual time the message was downloaded. Many spammers, and innocent people with misconfigured clocks, send emails with wild dates that are not to be trusted. You can see this in yearly archives of GNU "mailman" mailing lists! Datestamping emails as they are downloaded will also keep mailboxes in order when sorted by date, as newly arrived messages will always be at the bottom, instead of being scattered throughout the inbox. But sorting by spam probability will probably become more popular than sorting by date....)

  18. Bayes filters can't adapt to text in images by DuSTman31 · · Score: 4, Insightful

    As a popfile user, I'm quite impressed with the catch rate possible with bayes theorem spam filters, however I suspect this will decrease in effectiveness over the long term.

    Spammers are likely to respond to filters like this by encoding text in ways the filters can't read but humans can (eg having a .gif file of the text, loaded by a HTML statement in the message).

    Statistical filters would need to have some kind of built in OCR routine before it could be effective against that trick, and some respectible mailing lists are using images as well, so you can't just filter all mails with images attatched.

    In the long term, therefore, I suspect that filters that use a network database of spam will be more successful.

  19. brain fart... block HTML in e-mail? by Micah · · Score: 3, Insightful

    The big problem with this is spam still gets to the server. :(

    Just thought of this now... but it seems like almost all spam these days contains a whole bunch of HTML tags. Maybe someone should write a server plugin to instantly reject all mail containing , instantly adding the sending IP to a iptables DROP rule.

    There's little legitimate e-mail with tables, unless you count paypal, datek, and travelocity news and that kind of crap. But we could always add a list of "good" IPs.

    I know there are server solutions, but all make me a bit queasy. I just want something that will detect funky activity on the fly and instantly deny all access to that IP.

  20. My Bayesian Adventures by unorthod0x · · Score: 3, Insightful

    After collecting 87 megs worth of spam and a similar amount of non-spam I decided to implement the so-called 'Bayesian' method of spam filtering by way of popfile - it's a pretty slick concept; Perl code that acts as a POP3 server on your own machine - simply drop your collected spam and non-spam in to the appropriate bucket, have popfile go through them and create its indices and set up your mail client to connect to 127.0.0.1 with your username being 'my.pop.server:loginname'.

    I know I've got a particularily difficult task for this filtering technique; I get an awful lot of spam that comes in every day (~100 messages per 24 hour period), some of it I actually want (I run an underground music site, and in some cases I subscribe to opt-in lists that result in something that looks like spam), the rest I could care less about.

    My results have been decent for the most part; 100% of my spam ends up in my Spam folder, however there is a handful of messages that I wish to keep that end up there as well.. For the most part they are the above-mentioned 'borderline' pieces of spam (which I have been careful to put aside and have indexed by popfile anyway), I can only hope that more time and samples will yield better results. I was however surprised to find that some of the e-mails I was getting from friends were falling in to the Spam mailbox anyway; after taking a closer look, I can see why, they use an awful lot of otherwise unmentionable words - but my suspicion that I haven't gotten enough of these 'good-emails-with-bad-words' to make the filtering truly effective.

    Nonetheless, it is nice to have all of my spams seemingly guaranteed to drop in to my "Spam" folder, but my usual task of manually filtering messages that made it past my existing filters in to my Spam folder has been replaced with a different (albeit quicker) task; taking messages out of my spam folder and putting them where they really belong.

    Bottom-line: I still have to visually scan through my mail for legitimate messages amongst the thicket of items informing me about the exciting exploits of women at the farm, wonderful business opportunities from Nigeria and suggestions that I should buy Viagra by the boatload.. all this despite having collected a well organized and rather large collection of spam/non-spam mails. I'll stick with it for a while as I'd like to try it out and give it a proper chance, but I suspect that if you're in a similar situation then you should be prepared to tough it out..

  21. Re:Filtering by swdunlop · · Score: 5, Insightful

    1) How much time do you spend training your paperclip in Office?

    How much time are you going to spend on training your spam filter? If you are unwilling to invest a little time and effort in developing a solid set of values that fit your personal pattern of behavior, then Bayesian filters are indeed a poor match for you.

    2) What harm is a false positive?

    If you are automatically deleting anything that is marked as a positive for spam, then you are playing roulette with your email. I would generally recommend diverting email classified as spam by your filter to a folder, especially one that is relatively new and has had very little experience with your patterns of use. Set an expiry on your spam folder, and check it from time to time to see if something fell through the cracks. Mozilla has a handy feature that allows you to simply conceal spam from view, which works adequately, although I dislike the potential performance hit in a large folder.

    Considering how important your email is to you, you should certainly consider applying a little diligence to how you manage it.

  22. Suggested Feature: "Block Plugins from This Site" by Maul · · Score: 3, Insightful

    I like the ability to block images from a server, but it'd also be nice to have a similar feature for plugins and Java applets.

    A lot of ad companies are now using really annoying flash. Blocking images doesn't stop these.

    --

    "You spoony bard!" -Tellah