Slashdot Mirror


Mozilla Adding Spam Filters

ksheka writes "Mozilla mail now has Spam Filters, using Bayesian filtering method, no less. This is a very good thing, because it learns from the spam you receive, and constantly modifies itself, based on new spammer techniques!"

36 of 464 comments (clear)

  1. Arms Race by Camel+Pilot · · Score: 3, Interesting

    But the spammers will develop Bayesian filters of their own to find the best content that will sneak by your filters.

    1. Re:Arms Race by Schubert · · Score: 2, Interesting

      actually there is a great big gob of it out there... public mailing list archives.

      --
      -- schubert
    2. Re:Arms Race by Lionel+Hutts · · Score: 3, Interesting

      That's an arms race the spammers can't win. Sending spam is an ultra-low-margin business: with response rates of a fraction of a basis point, and probably only a fraction of them actually spending any money, the cost and effort per message sent must be very, very, very low for the spammers to make any money at all. Most spam recipients would gladly put in, say, $20 worth of effort to spamproof their addresses; there is no way even a spammer with huge scale could invest even $5 worth of effort for one more address. We will all have different Bayesian rules, remember. Combine that with the fact that I have perfect information about what spam and nonspam I get, and the sender has little or no information about what gets through, and it's clear that even hours of effort by senders wouldn't do much.

      And, even if they could afford to keep it up for a while, my spam filter will get better faster than their spam. This is the "Ambassador's criterion" from SDI (briefly: Star Wars won't lead to an arms race if it gets to the point where shooting down an the marginal missile is cheaper than building the marginal missile).

      I think we may just win the Spam Wars yet.

      --
      I Can't Believe It's A Law Firm, LLP does not necessarily endorse the contents of this message.
  2. Re:102 Features IE doesn't have by crossseyed · · Score: 4, Interesting
    It doesn't mean they're not thinking about it, though...

    http://research.microsoft.com/~horvitz/junkfilter. htm

    --
    -- Outside of a dog, a book is man's best friend. Inside a dog, it's too dark to read
  3. Filtering by Transient0 · · Score: 5, Interesting

    Bayesian technique is very good for the sort of abstract classification task that spam represents. It would be an interesting hack to try and train a network to categorize based solely on message body... i do however hope that their team has opted for practicality over just hack value and the network will also use such extremely relevant data as header information and comparing address versus address book(an e-mail from someone not in your address book is not necesarrily spam... but it is more likely to be).

    1. Re:Filtering by Anonymous Coward · · Score: 1, Interesting

      I use a procmail filter that doesn't pass anything not sent directly To: one of my e-mail addresses. (actually, it sends it to a junk folder)

      Now, maybe it's just the spam that I get, but just this simple filter blocks 99% of all the spam I get (10-20 messages a day) and has never sent a real message to the junk folder (OK, maybe once or twice in 2 years it has sent a message that someone from the office "spammed" out to multiple people; but that really was spam, it just happened to be spam I wanted to read).

    2. Re:Filtering by Shamashmuddamiq · · Score: 3, Interesting
      I don't believe it was "invented" by Paul Graham. Thoughts of separating spam from real email based on the statistical properties of its content is something that has come to my mind, as well as the minds of many people over the last few years. Just because Paul's page was the first one that you've seen explain it in detail doesn't mean he invented it.

      BTW, there are ways of getting around Bayesian filtering. For instance, if you take random words from a large dictionary of long, normal conversational but not-often-used-in-spam words and splatter them throughout your spam, its easy to convince the bayesian filter that it's not spam. Not only will this decrease your false negatives, it has the capability of increasing your false positives. This is because your new spam will be training your bayesian filter, and putting lots of non-spam-like words into its vocabulary. If the spammers keep up with their dictionaries as well as the filters keep up with theirs (and I must assume this will happen), we've still got a big problem on our hands.

      Don't get me wrong. I have bogofilter installed on my mail server at home, and it works great for now. But don't expect it to work forever.

      --
      ...just my 2 gil.
  4. Mozilla mail / browser by FrostedWheat · · Score: 4, Interesting

    I wonder if a similar technique could be used in the browser. Automatically block images or popups based on previous ones you have blocked.

    Now that would be very nifty!

  5. zilla by sstory · · Score: 3, Interesting

    I just switched to Mozilla. Happy to be free of Microsoft for email. It's skinnable, and there are some cool skins--like one which sort of emulates Evolution. I noticed an annoying 'feature' though, which is still there from Netscrap days--if you send an email without a subject, a dialog pops up and goes blah blah blah. I asked the Mozilla newsgroup if there was a way around this, but all I got was the sort of adolescent yammerings that keep me out of unmoderated newsgroups. Nice to see it has a spamfilter now. The only major improvement remaining is to add a spell-check (the Netscrap one was licensed from a 3rd party, and can't be freely distributed).

  6. Hope it doesn't have false positives by tedgyz · · Score: 2, Interesting

    This is really great technology.

    I had the benefit of working with this technology for a classification problem here at work. I was amazed at how good it worked. We were using it to replace a purely human process.

    However, there is one huge problem. Incorrect classification. Blind tests against a known dataset showed 80%+ correctness. The problem is, you don't know which 20% is wrong. Thus, you still need 100% inspection to validate the results.

    When applied to mail filters, I wonder how the technology avoids dumping your good mail? Like when your friend sends you a URL to good pr0n site.

    --
    "No matter where you go, there you are." -- Buckaroo Banzai
  7. One question... by Hard_Code · · Score: 5, Interesting

    I assume the filtering statistics live on the client side. What about IMAP? If I open up Mozilla on a new machine, are all my spam statistics lost (presumably rendering the junk mail filtering statistics I've accumulated useless on the new machine).

    It would be neat if, with IMAP accounts, Mozilla just stored the statistics in a file on IMAP server instead of on the client.

    --

    It's 10 PM. Do you know if you're un-American?
    1. Re:One question... by BroadbandBradley · · Score: 3, Interesting

      someday you'll be able to backup and restore your Mozilla Profile, and when that day comes, I hope you'll remember that Mozilla has a House online at ZillaVilla.com

  8. SpamAssassin + Mozilla = Schweet! by Noryungi · · Score: 5, Interesting

    Well, most of my spam is already sent to /dev/null by the SpamAssassin ninja.

    But, for those that make it past the email shadow warrior, I guess Bayesian filters are a double whammy they'll never survive... Mwahahahaha!

    Kudos to the Mozilla programmers!

    --
    The right to offend is far more important than the right not to be offended. (Rowan Atkinson)
  9. Microsoft's Patent by woboz · · Score: 5, Interesting

    What happens when microsoft attempts to enforce this patent

  10. My only complaint... by Mustang+Matt · · Score: 3, Interesting

    In Outlook Express, I can setup 100 different email accounts and not have a giant list of mail folders.

    In Mozilla (last I checked) for every account you setup it creates a new set of folders.

    Since I've got a catchall account, I'd like to tie multiple email addresses to one set.

    Anybody out there on the Mozilla team listening?

    --
    The man who trades freedom for security does not deserve nor will he ever receive either. - Benjamin Franklin
  11. Not enough by sulli · · Score: 2, Interesting

    Spammers don't use relays these days, they use spam tools that directly SMTP the receiving mail server. So the receiver still needs to filter.

    --

    sulli
    RTFJ.
  12. Re:No, too obvious by saider · · Score: 2, Interesting

    This new law will force you to leave your radio and TV on even while you aren't paying attention to it. Furthermore junk mail will no longer be able to be discarded without an affidavit that states the recipient has read and understands the offer. Street mimes and homeless people wearing commercial signs must be paid attention to by anyone within a 10 foot radius. You will be required to sample every free offering at the Food Court in the mall and surveyors cannot be ignored. All fliers distributed on your vehicle must be followed up with a phone call or your vehicle will be impounded. You will be required to contact every business that advertises at sporting events if you choose to attend.

    Failure to abide by these rules will result in the forfeiture of all assets and the garnishing of all wages earned, which will be deposited into the Federal Marketing Enforcement Fund. Monies from this fund are distributed to companies whose marketing campaigns are not successful.

    --


    Remember, You are unique...just like everyone else.
  13. interesting idea... by Lumpy · · Score: 5, Interesting

    what if in addition to this someone put together a company that the mozilla email client can report back to about what is labelled as span and the filters it created along with the headers of the message (or even the entire spam) and grab filters from others that recieves some spam that you have yet to recieve? it would be like a big distributed computing anti-spam project.. then if we were able to make the filters useable by sendmail to block at the server...

    I'm almost thinking a distributed and automated anti-spam system like that could completely crush the spam problem within a 12 month period.

    or I may be completely out of my mind.

    --
    Do not look at laser with remaining good eye.
  14. Not impressed by macdaddy · · Score: 4, Interesting

    Well, ok I am impressed that Mozilla is implementing spam filtering abilities in their MUA. I AM NOT impressed with Bayesian spam filters AT ALL. I've been using Mac OS X's Mail.app since I switched to OS X. It's not my primary MUA but I am letting it POP out a copy of all my mail and "learn" from it. It does a pretty good job of finding maybe 80% of the spam I get. However it has a BAD false-positive rate. I mean hell its been flagging CERT advisories as spam. That kind of crap is really annoying. It's flagged co-workers' mail as spam numerous times (and even though I happen to agree... :) ). The biggest problem I have with Bayesian as a mail admin is that I am constantly dealing with spam. Users forward it to me. I receive a number of spam bounces. I work in spam all that damned time. That's the problem. I need a MUA with Bayesian filters that are smart enough for me to tell them to ignore all mail from certain domains or that went to certain accounts. All of the Bayesian filters built into MUAs I've worked with so far can't do things like that. It's really annoying given the position that I'm in.

    1. Re:Not impressed by tbmaddux · · Score: 4, Interesting
      However it has a BAD false-positive rate. I mean hell its been flagging CERT advisories as spam. That kind of crap is really annoying. It's flagged co-workers' mail as spam numerous times..
      I had this problem early-on as well. I fixed it by marking the false positives as "Not Junk." You can do these even when it's in "Automatic" mode as opposed to "Training." All the "Automatic" does is enable the filter that send the marked messages to the "Junk" folder.

      But it still learns in either mode! Early on my shipping notices from Amazon.com (and even Apple.com, ha ha) were being flagged as Junk, but not anymore. I think it's great and will only improve with time, with others' caveats about client-side email spam checking being flawed noted.

      --
      Can't you see that everyone is buying station wagons?
  15. Emacs! by MosesJones · · Score: 4, Interesting


    This is something that Emacs has in the GNUS client, you score emails up and down and it starts adding filtering rules. Using LISP you could extend this to do some pretty funky moderating.

    Every problem is reducable to a previously solved problem or by definition is unsolveable - Church Turing Thesis.

    --
    An Eye for an Eye will make the whole world blind - Gandhi
  16. Hmm, my spam experiences by krappie · · Score: 5, Interesting

    I personally dont really care about all the junk emails I get. I dont get that many, and I can pretty much tell without looking at them. They go straight to /dev/null.

    Spam is such a horrible thing though. I work at a webhosting company. Im the one that has to track down the site with the old formmail.pl, removing 'aol.com' and 'yahoo.com' from the hosts to relay for, trying to find out who the hell added them so I can murder them. Im the one clearing out the mail queue with 100,000 mails. Im the one clearing the mail queues of people who thought it was a good idea to check the 'open relay' option in plesk. Im the one that has to deal with people bitching about how their mail isnt working or didnt get through.

    Just the other day, I had a raq2 where someone had apparantly put yahoo.com and excite.com in the hosts to relay for. Yay! Thats what attracted the spammers. Now I get a request every second to send mail to 50 people at once. Now that I've removed them, none of them are getting through. But its a raq2, 133 mhz. It has to go through all 50 addresses and say 'relaying denied' and log it. It cant keep up! syslogd is taking up all the cpu and logging things from hours ago because its behind. Quickly, sendmail quits listening on port 25 (but the spam attempts keep coming somehow).

    So I get the idea to block their ips, they seem to be using the same ips. But oh guess what, they're using open proxies and have about 400 ips. Well, I did this for about 5 hours, writing scripts to grab the repeated ips out of the maillog, adding them all to my sendmail access lists. Now every time they try to send mail, it blocks them instead of saying relaying denied 50 times for each request. But a minute later, I get a few new ips and it starts all over again. I have an access list about 6 pages long. Its doing ok, blocking about 90% of them, but every once in a while, they get a new ip and sendmail is brought to a stop.

    Oh yeah, and my /var/ partition is only 200MB, 50mb free. And the maillog is growing at about 10mb a day. So now Im babysitting this server every day until the spam attempts stop. I dont think theres any way around it unless I get sendmail to check for open proxies. But I dont know how to do that, and I dont think they trust me enough to make such changes to sendmail.

    So oh well, mail is getting lost every day on this server and its been renderred horribly slow for its users.. just because some moron noticed it would send some emails for him and started up his scripts.

    Spam causes so many problems on the server level. Its what is making mail an unreliable service. I could care less about spam filters on my mail client. These are the things that make spam evil!

    1. Re:Hmm, my spam experiences by Anonymous Coward · · Score: 1, Interesting

      I too had this problem.

      In my case, I perservered and am now working with the Feds to go after the person who exploited a client's form-mail script. With any luck, we'll make an example of this group. We froze all logs and evidence and have a very strong case to nail the attackers. Most importantly, we've got the authorities to burn a case number and we will be moving forward taking down the perpetrators. The spammers think that they are untouchable. They're about to get a rude awakening.

      Let me also state this: If you PROBE for a network vulnerability, that is considered an attack! In my case, the authorities are less interested in finding the spammer as they are tracking down the initial probes of the network looking for the exploitable script. They figure the scanner will lead them to the spammer, and they're probably right.

      I encourage all ISPs to burn their logs and mail queue to CDROM and immediately open an FBI complaint to their computer crime division. The Feds are beginning to take this stuff seriously as they finally are understanding that the spam industry is one of the most innovative groups in advancing computer-terrorism.

  17. Personalised solution by 5lash · · Score: 2, Interesting

    I personally don't think that systems like this can work that well. Everyone seems to get different type of spam, and you're best bet is to create your own filters. About 80% of my spam messages have wierd foreign characters in it (like Á), so I've got filters in Eudora to delete anything with one of these characters in the Subject or Body. Then obviously anything with "porn", "sex" etc, although spammers dont seem that stupid anymore. This way I only get 5-10 spam messages in my inbox per day, maximum. And this takes me about 20-30 seconds to deal with, I don't see what all the fuss is about.

  18. That's great, but... by hawkbug · · Score: 2, Interesting

    I'm running a sendmail server, and I access via webmail accounts, pine, and Mozilla. I would like to add this new type of spam filtering to sendmail directly. Does anyone know if this is something that can be added to sendmail, rather than a specific mail client like Mozilla?

  19. Real spam control.. by grub · · Score: 3, Interesting


    .. should start at the server preventing the offending mail from ever coming into the network in the first place.

    Not that localized spam filters are a bad thing (they aren't!) but refusing connections from known spammer IPs and the proper use of blacklists would cut down on a lot of the email traffic. Once the spam is in your inbox, its just an annoyance to you. The cost to the net has already been incurred.

    --
    Trolling is a art,
  20. Re:102 Features IE doesn't have by afidel · · Score: 3, Interesting

    Popup killing and tabbed browsing are the two killer features that have allowed me to spread mozilla widely through my office. People see me surfing and ask what the tabs are or ask where the popup have gone. I tell them about mozilla and show them how easy it is to stop popups. Yes I know about crazybrowser which does both of these, but it does popup killing badly (it's an all or nothing thing, not just unsolicited popups).

    --
    There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
  21. Spam filters should bust the spammers, also. by Futurepower(R) · · Score: 5, Interesting


    Software that only does mail filtering encourages spammers. The technically knowledgeable people don't get spam, so they stop worrying about it.

    All mail filters should also use a service like SpamCop, so that the spammers lose their internet service accounts as the spam is filtered.

    I send Spamcop all my spam. Spamcop analyzes it automatically and sends a message to the Internet Service Provider. I use the free Reporting only service.

  22. Re:102 Features IE doesn't have by wkitchen · · Score: 2, Interesting
    Microsoft playing "catch-up"? Nonsense. Only, what, 9% of the internet users out there use browsers other than IE? Of that, how many of those alternate browsers have tabbed browsing...
    I suspect that Netscape, Mozilla, and Opera collectively make up most of that 9%. Given that all of those have tabbed browsing, then the answer is "nearly all of them". Or maybe not. I guess it really depends on how many of those 9% are surfing around using old versions.
    and of those clients using those browsers how many actually *use* tabs?
    Good question. I don't have any statistics, but I suspect the percentage is pretty high. Of the few Mozilla users I know, ALL use and love tabs. In fact, tabbed browsing has influenced many, including myself, to use Mozilla as their primary browser, resorting to IE only to deal with those increasingly rare sites that don't work with Mozilla (and significant portion of those fail only because of really stupid browser detection that causes the page to refuse even to try to load if it detects something other than IE). Mozilla is good enough at this point that I now use IE for less than 1% of my web browsing.
    I agree that Microsoft is scanning around and implementing good features, but no one other than /.'ers will ever know they got the idea from someone else. You're only playing 'catch-up' if there's something to catch-up to. IE has over 90% of the internet userbase, I'd say *that* was something to catch-up to.
    Of course MS isn't playing catch up in user share. No one claimed otherwise. But when it comes to features, MS definitely has some catching up to do.
  23. Re:"Bayesian filtering" aka "Naive Bayes" by standards · · Score: 3, Interesting

    Well, I certainly have a large volume of SPAM that I plan to use for training purposes. I'm not a big user of personal email, but somehow about 70% of all my incoming personal mail is SPAM. My Dad is much worse off.

    I'm glad to see that the software industry is taking the SPAM problem seriously. And it's great to hear that more and more states, like Massachusetts, are enacting laws to curb the abuse of email systems.

    I've been dependent on some static rules to curb SPAM (about 90% effective), but I think now it's time to implement more serious anti-spam measures.

  24. Re:"Bayesian filtering" aka "Naive Bayes" by ceswiedler · · Score: 3, Interesting

    Based on the last /. article on Bayesian filtering, I installed SpamProbe. I gave it a folder of about 70 spam emails, and a few hundred good emails I had in various folders. In the past few weeks, it's had one false negative, and a few false positives which were 'semi-spam' mailing list emails from Dell, RedHat, and Amazon. When I moved those emails into the 'recheck as good' folders, it learned its lesson.

    It may be naive, but I was very surprised at how well it worked. It's better than SpamAssassin IMO, especially at foreign-language spam.

  25. tmda.net? by Sludge · · Score: 3, Interesting
    Has anyone tried Tagged Message Delivery Agent out? I would be curious to hear the mileage of others who have tried this.

    Essentially, it throws the parsing problem right back in the spammer's faces: They must answer a fuzzy logic question in order to get into your inbox once and for all. It is similar to challenge/response routines in network connection code to prevent spoofing. The most interesting part from the intro:

    The way TMDA thwarts incoming junk-mail is simple yet extremely effective. You maintain a "whitelist" of trusted contacts which are allowed directly into your mailbox. Messages from unknown senders are held in a pending queue until they respond to a confirmation request sent by TMDA. Once they respond to the confirmation, their original message is deemed legitimate and is delivered to you.

    Bayesian filters to me, seem to work if you are a dull person without many changes in your life. For ex, if you constantly get spams with the word Madam in it and you later on get a sex change, you will need to recalibrate your filters. (Probably not the most pressing thing on your mind, so you'd lose a few authentic mails.)

    Just some thoughts.

  26. Re:Sort by Spam Probability by ghamerly · · Score: 2, Interesting

    Since naive Bayes gives probabilities, this is easy to get out of what Mozilla (and Paul Graham, and others) are trying to do. However, it is well-known that the probabilities that naive Bayes classifiers give are typically exaggerated (too close to either 0 or 1). This is partly because of the naive assumption (conditional independence of features).

    However, while the probabilities themselves may be exaggerated, they are also usually found to be ranked correctly, which would give you what you want here -- a ranked list of possible spams.

  27. Re:102 Features IE doesn't have by Anonymous Coward · · Score: 2, Interesting

    > Is it? I thought Outlook Express was a virus-support API.

    No, no, Outlook Express is for Internet Explorer what Composer is
    for Mozilla or Netscape -- if you don't know HTML, you can use it
    to create web pages. They won't be particularly well-designed, and
    they won't validate, but the major legacy browsers everyone seems to
    still use will display them, so you can put them up on your website.

    The reasons it sends email is not a bug, but a feature (albeit one
    that tends to be abused). It's not for sending general email, but
    so that you can easily upload your web pages you create to certain
    free website engines that can receive them by email (on the theory
    that most people don't know how to use ftp, or else because ftp is
    considered insecure. The usenet engine was included so that
    multiple people can use it in a peer-to-peer fasion to collaborate
    on the creation of a web page. For example, if your mom and grandma
    want to create a web page, but they aren't sure how to get the
    pictures of the family dog scanned in, you can let them write the
    text about the dog, and you can put in the picture. You can pass
    it back and forth on your private family news server until it's
    ready for the family website.

    The reason people started using Outlook Express for regular email
    is because the email software that shipped with Windows 95 (called
    Microsoft Internet Mail) was _so_ bad that it was more convenient
    to use _anything_ else, including telnet, and so when Outlook
    Express came out people jumped on that, and the rest is history;
    Outlook Express now handles (on one end or the other) nearly 40%
    of the internet's email, more than anything else except sendmail.

    The virus API, as you suspected, was not a bug but a feature, but
    the reasons for its inclusion are complicated and involve both
    particle physics and JFK.

  28. Combine this with open relay databases... by cardshark2001 · · Score: 2, Interesting

    And you'll have a real winner. Probably several other techniques could be combined as well, but back when I wrote a program just to check all of the from IPs in an email to see if any of them were open relays, I got around 80% filtering with very few false positives.

    Furthermore, you can assign a pretty good probability number based on what sort of open relay it is (i.e. verified, unverified, spam server, merely unsecured server, etc). If it comes from a spam server, the chances are 100% that it's spam. If it comes from a dialup server, the chances are about 99.9999%. If it comes from an automatically verified open relay, that's merely unsecured, the chances are more like 60%.

    The open relay thing really intrigued me because it has NOTHING to do with the message body, and it was my belief at the time that there was no good way to filter based on message content.

    However, combine this with bayes, and I'll bet you'll have something grand.

    Also, a great feature would be a multi-tiered identifier, so that you could have the 99.999% sure spam filtered into one folder, and the 75% sure spam filtered into another. You'd have to sift through the 75%, but probably could just leave the 99% alone.

    --
    WWJD? JWRTFA!
  29. Re:Good example of MS's monopoly abuse by Refrag · · Score: 3, Interesting
    The real problem with spam is that it steals bandwidth - blocking spam after it's already sitting in your mailbox is like closing the barn door after the horses have eaten your children - the bandwidth has already been used, so you don't gain anything... having your email client "block" spam isn't really blocking it, it's just an automatic "delete key".. which is what the spammers want (how many of them say spam isn't a problem because you can "just hit delete")

    I'd argue that the time wasted on filtering spam is more valuable than the bandwidth wasted delivering it. This is why I am glad that Apple was able to bring good client-side spam filtering to the people with Mail and that Mozilla will soon provide this feature as well.
    --
    I have a website. It's about Macs.