Slashdot Mirror


Mozilla Adding Spam Filters

ksheka writes "Mozilla mail now has Spam Filters, using Bayesian filtering method, no less. This is a very good thing, because it learns from the spam you receive, and constantly modifies itself, based on new spammer techniques!"

81 of 464 comments (clear)

  1. Arms Race by Camel+Pilot · · Score: 3, Interesting

    But the spammers will develop Bayesian filters of their own to find the best content that will sneak by your filters.

    1. Re:Arms Race by TamMan2000 · · Score: 5, Insightful

      Interesting thought, but they would have to have a large sample of YOUR valid email to train on...

      --
      "I'll have a Guinness, no wait, make that a Coors Light" -Grad student I work with, who shall remain anonymous...
    2. Re:Arms Race by jpetts · · Score: 3, Insightful

      But the spammers will develop Bayesian filters of their own to find the best content that will sneak by your filters

      No they won't, unless the pattern (if there is one discernable in the S/N ratio) of replies they receive changes. As most spam, as far as spammers goes, disappears into a black hole, they have no way of learning how your filters are working.

      And that's good filterin'!

      --
      Call me old fashioned, but I like a dump to be as memorable as it is devastating - Bender
    3. Re:Arms Race by Camel+Pilot · · Score: 3, Informative
      Actually they do have your data. If you preview any e-mail they typically have something like
      <img src=/spamcity/tracker.pl?id=177729299>
      Where 177729299 is your personal id number.

      No they have the feedback and they know what works and what does not.

    4. Re:Arms Race by ichimunki · · Score: 5, Insightful

      Nonsense. It's impossible. First of all, they don't have access to much of the mail I want to let through-- although my mailing list traffic certainly qualifies, so let's assume that's the only mail I get and that they know I am receiving it.

      There will still need to be header information and actual spam content in the spams themselves for those mails simply to not be repeats or dada-esque cutups of posts to the mailing list. That is, there must be content unique to the spam that no normal sender on the list will include.

      Because of this, and the fact that so-called Bayesian spam filtering works by scoring all the words in an email and then evaluating the email based only on the extremes, there is little likelihood-- since the spam must still contain spam words to have any point at all-- of those words not being on the extreme word list. After all, if the same words are appearing in both spam and not-spam mails, they will be given a spam-probability that is not extreme. So all those words in common will be ignored and only the spam words will be looked at-- and the spam will still be filtered.

      --
      I do not have a signature
    5. Re:Arms Race by Spock+the+Vulcan · · Score: 5, Informative

      Use Gotmail, which downloads your hotmail messages to an mbox-style file. Or use hotwayd which appears like a POP3 server running on localhost, and uses WebDAV to get messages from hotmail (like Outlook Express). Either way, no web-bugs will get activated.

      The added advantage is that you can pipe these through procmail/spamassassin just like ordinary incoming mail, and not have to manually delete all that spam.

    6. Re:Arms Race by Lionel+Hutts · · Score: 3, Interesting

      That's an arms race the spammers can't win. Sending spam is an ultra-low-margin business: with response rates of a fraction of a basis point, and probably only a fraction of them actually spending any money, the cost and effort per message sent must be very, very, very low for the spammers to make any money at all. Most spam recipients would gladly put in, say, $20 worth of effort to spamproof their addresses; there is no way even a spammer with huge scale could invest even $5 worth of effort for one more address. We will all have different Bayesian rules, remember. Combine that with the fact that I have perfect information about what spam and nonspam I get, and the sender has little or no information about what gets through, and it's clear that even hours of effort by senders wouldn't do much.

      And, even if they could afford to keep it up for a while, my spam filter will get better faster than their spam. This is the "Ambassador's criterion" from SDI (briefly: Star Wars won't lead to an arms race if it gets to the point where shooting down an the marginal missile is cheaper than building the marginal missile).

      I think we may just win the Spam Wars yet.

      --
      I Can't Believe It's A Law Firm, LLP does not necessarily endorse the contents of this message.
  2. A little misleading by TobyWong · · Score: 5, Informative

    The news article makes it sound like this feature is up and running, in reality it is partially phased in - alpha stage stuff.

    It will be great when it's more complete but there is a lot of work to do yet.

    --
    - Toby
    1. Re:A little misleading by DeadSea · · Score: 5, Informative

      It is up and running, it just may have a few bugs.

      I just downloaded the latest nightly build and enabled the features for my mail. So far I have seen that the icons are kind of funky, the dialog box is way oversized, there doesn't appear to be a good way of marking multiple messages as spam or not spam.

      On the other hand, it does seem to be doing a good job of filtering my messages. If you were one of the folks that complained about mozilla until mozilla 1.1 or 1.2, then I wouldn't go near it with a ten foot pole. If you are one of the folks, like me, who used mozilla since milestone 11 when it crashed every hour and couldn't render a heck of a lot of pages, you'll probably want to try it. Especially, if you use mozilla for mail anyway.

  3. Re:didn't k5 already run a story on this? by Junks+Jerzey · · Score: 3, Informative

    Here [kuro5hin.org]. Yeah, it's basically the same thing.

    Yes, and your point is? Hint: Slashdot gets most of it's stories from elsewhere.

  4. Re:DOWNLOAD NEW MOZILLA by wiredog · · Score: 4, Funny

    Man, a perfect place for a goatse link, and you didn't put it in. Sigh. Kids these days.

  5. Re:102 Features IE doesn't have by crossseyed · · Score: 4, Interesting
    It doesn't mean they're not thinking about it, though...

    http://research.microsoft.com/~horvitz/junkfilter. htm

    --
    -- Outside of a dog, a book is man's best friend. Inside a dog, it's too dark to read
  6. Filtering by Transient0 · · Score: 5, Interesting

    Bayesian technique is very good for the sort of abstract classification task that spam represents. It would be an interesting hack to try and train a network to categorize based solely on message body... i do however hope that their team has opted for practicality over just hack value and the network will also use such extremely relevant data as header information and comparing address versus address book(an e-mail from someone not in your address book is not necesarrily spam... but it is more likely to be).

    1. Re:Filtering by Gabe+Garza · · Score: 5, Informative

      Actually, using only the body isn't just a hack, it's a relatively new technique invented by Paul Graham that seems to produce excellent results. It makes a lot of sense: Spam is Spam because the body contains commercial or otherwise unwanted material--it's only natural that the most direct and accurate Spam filters are going to analyze the body. Bayesian classification like this is computationally tractable and appears to work. You can read more about it here.

    2. Re:Filtering by garymcm · · Score: 3, Insightful

      I would like to understand the choice of Bayesian more. As far as I know Bayesian is good for classifying based on *belief* and can be pretty good when only partial evidence is available to network. This is great for Marketing activities, eg sending out mass emails to a segment of a database :) . However as this is _my_ email and mission critical to me, just a simple belief that something is spam is not enough

      In my experience, 99% of spam can be caught with static rules (am I in the TO or CC line gets a bit under half the spam I receive). Taxonomical analysis of the subject and body can get the rest.

      Bayesian seems like overkill, or maybe even a bad fit. Let's face it, the other well known use for Bayesian is the famous Microsoft Office Paper Clip!!! And that is about as useful as the proverbial ashtray on a motorbike!!

      Gary

    3. Re:Filtering by swdunlop · · Score: 5, Insightful

      1) How much time do you spend training your paperclip in Office?

      How much time are you going to spend on training your spam filter? If you are unwilling to invest a little time and effort in developing a solid set of values that fit your personal pattern of behavior, then Bayesian filters are indeed a poor match for you.

      2) What harm is a false positive?

      If you are automatically deleting anything that is marked as a positive for spam, then you are playing roulette with your email. I would generally recommend diverting email classified as spam by your filter to a folder, especially one that is relatively new and has had very little experience with your patterns of use. Set an expiry on your spam folder, and check it from time to time to see if something fell through the cracks. Mozilla has a handy feature that allows you to simply conceal spam from view, which works adequately, although I dislike the potential performance hit in a large folder.

      Considering how important your email is to you, you should certainly consider applying a little diligence to how you manage it.

    4. Re:Filtering by Shamashmuddamiq · · Score: 3, Interesting
      I don't believe it was "invented" by Paul Graham. Thoughts of separating spam from real email based on the statistical properties of its content is something that has come to my mind, as well as the minds of many people over the last few years. Just because Paul's page was the first one that you've seen explain it in detail doesn't mean he invented it.

      BTW, there are ways of getting around Bayesian filtering. For instance, if you take random words from a large dictionary of long, normal conversational but not-often-used-in-spam words and splatter them throughout your spam, its easy to convince the bayesian filter that it's not spam. Not only will this decrease your false negatives, it has the capability of increasing your false positives. This is because your new spam will be training your bayesian filter, and putting lots of non-spam-like words into its vocabulary. If the spammers keep up with their dictionaries as well as the filters keep up with theirs (and I must assume this will happen), we've still got a big problem on our hands.

      Don't get me wrong. I have bogofilter installed on my mail server at home, and it works great for now. But don't expect it to work forever.

      --
      ...just my 2 gil.
  7. Mozilla mail / browser by FrostedWheat · · Score: 4, Interesting

    I wonder if a similar technique could be used in the browser. Automatically block images or popups based on previous ones you have blocked.

    Now that would be very nifty!

    1. Re:Mozilla mail / browser by SethJohnson · · Score: 4, Informative


      The site-specific white list feature of Mozilla's pop-up blocking seems to work fine enough. The number of sites where you actually want popups from is far less than those offering popups. So manually adding these exceptions to the white list is not such an annoying task. I think bayesian filtering would be overkill in this case.
  8. zilla by sstory · · Score: 3, Interesting

    I just switched to Mozilla. Happy to be free of Microsoft for email. It's skinnable, and there are some cool skins--like one which sort of emulates Evolution. I noticed an annoying 'feature' though, which is still there from Netscrap days--if you send an email without a subject, a dialog pops up and goes blah blah blah. I asked the Mozilla newsgroup if there was a way around this, but all I got was the sort of adolescent yammerings that keep me out of unmoderated newsgroups. Nice to see it has a spamfilter now. The only major improvement remaining is to add a spell-check (the Netscrap one was licensed from a 3rd party, and can't be freely distributed).

    1. Re:zilla by Neon+Spiral+Injector · · Score: 5, Informative

      It is so annoying to get an e-mail without a subject. My spam filters actually bump you a little bit closer to being considered spam if there is no subject. I consider it to be a required header.

      For one I sort my mail by thread, while Mozilla will use reference headers to thread messages, the fall back is the subject. Without a subject your message would be tossed in the thread with the other loosers who also forgot their subject.

      The easy way to keep that dialog box from popping up when you send a mail is to...put a subject on the message.

      If you want a spell checker go to the Netscape FTP server find the XPI file for the spell checker and install it.

    2. Re:zilla by ChaosDiscord · · Score: 5, Funny
      I noticed an annoying 'feature' though, which is still there from Netscrap days--if you send an email without a subject, a dialog pops up and goes blah blah blah.

      The "blah blah blah" is roughly, "You have not specified a subject. Would you like to enter one now?" Perhaps you're right, it should be changed. Instead, it should say, "You're about to send an email message without a subject. That's an amazingly rude thing to do and likely to irritate the recipient as it makes it harder for them to pioritize their incoming mail and harder to distinguish from spam. Because this is such a terrible idea, you should enter a subject line below. If you fail to enter a subject, the default entry of 'I'm a idiot, please delete this message without reading it' will be used."

  9. Yeah, but... by digital_milo · · Score: 3, Funny

    This will be of no use to me until it automatically deletes any Word Doc and .exe files that my co workers try to email to me.

  10. One question... by Hard_Code · · Score: 5, Interesting

    I assume the filtering statistics live on the client side. What about IMAP? If I open up Mozilla on a new machine, are all my spam statistics lost (presumably rendering the junk mail filtering statistics I've accumulated useless on the new machine).

    It would be neat if, with IMAP accounts, Mozilla just stored the statistics in a file on IMAP server instead of on the client.

    --

    It's 10 PM. Do you know if you're un-American?
    1. Re:One question... by BroadbandBradley · · Score: 3, Interesting

      someday you'll be able to backup and restore your Mozilla Profile, and when that day comes, I hope you'll remember that Mozilla has a House online at ZillaVilla.com

  11. SpamAssassin + Mozilla = Schweet! by Noryungi · · Score: 5, Interesting

    Well, most of my spam is already sent to /dev/null by the SpamAssassin ninja.

    But, for those that make it past the email shadow warrior, I guess Bayesian filters are a double whammy they'll never survive... Mwahahahaha!

    Kudos to the Mozilla programmers!

    --
    The right to offend is far more important than the right not to be offended. (Rowan Atkinson)
  12. Re:MSN 8 rules, Mozilla Sucks by Hard_Code · · Score: 3, Funny

    "since every Mozilla article degrades to a flame fest of Microsoft greatness versus the rest of the world"

    s/Microsoft/Open Source/

    --

    It's 10 PM. Do you know if you're un-American?
  13. Re:102 Features IE doesn't have by Shippy · · Score: 3, Insightful
    Not really. E-mail is Outlook's domain. Not IE. I think that list of 101 things is a great way to show the power and flexibility over IE, but some of them are just filler. For example:

    • 98. Supports IRC Protocol - This is something I don't even use. This is just another program which should be separate but isn't and gives rise to the "mozilla is bloated" argument.
    • 99. Open Source - Yeah, but good luck sifting through it ;)
    • 100. Bugzilla - OK, lots of people use this, but Bugzilla != Mozilla. So it's not like Mozilla has built-in Bugzilla features... This is unrelated to the list.
    • 101. Giant Lizards are Cool - 'Nuff said.

    So, that brings it down to, what, 97? Still a pretty good list. However, I've heard that popup blockers and tabbed browsing are making their way into IE (and MS employees can already use these features), but we'll see if they're actually integrated.
    --
    -Shippy
  14. Microsoft's Patent by woboz · · Score: 5, Interesting

    What happens when microsoft attempts to enforce this patent

    1. Re:Microsoft's Patent by DaveAtFraud · · Score: 5, Informative
      This is from Paul Graham's site with regard to the Microsoft patent. Patents tend to be very narrow in scope such that, if some aspects change, the patent may no longer apply. Pick on any typical consumer product such as hair dryers, stereos, you name it. They all have patents and they're all different and they don't "infringe" on each other unless they're virtually identical.

      --
      They that can give up essential liberty to obtain a little temporary safety deserve neither safety nor liberty.
      Ben
    2. Re:Microsoft's Patent by DeadSea · · Score: 5, Informative
      Specifically in this case:
      ...then stored in a corresponding folder for subsequent retrieval by and display to the recipient.
      So it looks as if this patent only covers server side implementations. A client side (Mozilla's) implementation retreives it and then filters and displays it.
  15. Since some of us run Windows, by Dot.Com.CEO · · Score: 5, Informative
    I dare submit myself to the rage of the Slashdot crowd. I use Outlook and "Spamnet" is a way to stop most spam in Windows. Based on the Razor project (distributed spam detection), it is a great solution for whomever cannot or does not want to move to Mozilla. Granted, it is beta quality, but the Mozilla feature is still in the alpha stage.

    --
    Mother is the best bet and don't let Satan draw you too fast.
  16. My only complaint... by Mustang+Matt · · Score: 3, Interesting

    In Outlook Express, I can setup 100 different email accounts and not have a giant list of mail folders.

    In Mozilla (last I checked) for every account you setup it creates a new set of folders.

    Since I've got a catchall account, I'd like to tie multiple email addresses to one set.

    Anybody out there on the Mozilla team listening?

    --
    The man who trades freedom for security does not deserve nor will he ever receive either. - Benjamin Franklin
    1. Re:My only complaint... by ChrisDolan · · Score: 5, Funny

      No they likely aren't. They have this cool thing called Bugzilla (http://bugzilla.mozilla.org/) which is designed to track bugs and new feature requests. If you want to be heard, that's the place to submit, not here.

      It's like, if you want to submit a complaint to Microsoft, you write them a letter to their company address instead of, say, writing your complaint as graffiti on a New York subway car. Wait a minute, actually, you might run into a MS employee doing butterfly graffiti, so that's a bad analogy... Plus, a subway isn't a good metaphor for Slashdot. The /. crowd is much scarier.

  17. Re:Great gob Mozilla, but... by xanadu-xtroot.com · · Score: 3, Insightful

    it's size is getting bigger and bigger.

    Compile Mozilla from scratch, and you'll see that you can custom tailor the build and cut out a lot of cruft.mpile Mozilla from scratch, and you'll see that you can custom tailor the build and cut out a lot of cruft.

    The source package is far larger than the binaries! Then there's the wait in compiling the damn thing. No (L)User is going to do that. Maybe us geeks (and I do use the source, Luke), but certainly not a "normal" user.

    The problem here is that binary distributions package it all together

    So download the Net installer and choose only what you want?

    --
    I'm not a prophet or a stone-age man,
    I'm just a mortal with potential of a super man.
  18. Re:Great gob Mozilla, but... by RAMMS+EIN · · Score: 3, Informative

    I partially agree with you. Compiling does allow you to get a slimmer lizard. However, compiling it from scratch is a real pain, and takes a long time and a lot of disk space. My point is that it's probably not worth the effort for most people. Why waste time and disk space on building a slimmed-down Mozilla if you can download a more functional precompiled version? This is why I love modularity so much; every module can be offered precompiled, and nobody needs to waste disk space.

    --
    Please correct me if I got my facts wrong.
  19. Outlook is part of the IE Package by yerricde · · Score: 4, Informative

    E-mail is Outlook's domain. Not IE.

    It's possible to net-install Mozilla without installing Mozilla Mail, but the default setting includes both. It's possible to net-install IE without installing Outlook Express, but the default setting includes both. Thus, it is a fair comparison.

    100. Bugzilla - OK, lots of people use this, but Bugzilla != Mozilla. So it's not like Mozilla has built-in Bugzilla features... This is unrelated to the list.

    I think the point of that entry was that unlike IE's bug database, which only Microsoft employees see, Mozilla's bug database is 99% open to the public (the other 1% primarily covers unfixed security vulnerabilities).

    --
    Will I retire or break 10K?
  20. You know what would be cool? by PDHoss · · Score: 5, Funny
    If the spam filter could intercept outgoing mail. I would sneak into my goddamn in-laws house and install Mozilla if it would eat every forward-of-forward-of-forward-of-forward message they tried to forward to me based on rules like:

    1. Says "someone is testing something and you get $NN.00"

    2. Says anything like "angels watching over us" or "a mother's poem" or other such bullshit.

    3. Says "This is really funny"

    4. Says "We'll be over on Tuesday right during dinner when you are trying to put the moves on our daughter/your wife."

    Umm, not the last one, really. Just got on a roll.

    PDHoss

    --
    ======================================
    Writers get in shape by pumping irony.
  21. Eudora finally has the filter I need by Continental+Drift · · Score: 3, Informative
    Eudora's latest version, 5.2, includes the ability to filter mail against your address book. If someone sends me mail and they are not on that address book or they don't use a special key word in the subject line, they get an automatic reply telling them to try again with that key word. Spammers will ignore that reply, so I'll only real people will include the key word, and then I can add them to my address book.

    This, comibined with some clever regex filters I already had means that I can reliably get the 10% of my mail that I actually want to read.

  22. Good example of MS's monopoly abuse by SethJohnson · · Score: 5, Insightful


    Sorry if this comes off as a MS-bashing rant. It's not intended as such.

    The fact that MS doesn't seem hard at work implementing spam filters in Outlook or popup blockers in IE is a good example of consumers suffering due to Microsoft's monopoly. It also demonstrates how Microsoft is able to leverage its monopoly in one area (mail and web clients) to build profit in another market.

    This other market is it's aspring ISP services. The app and mail client development teams aren't implementing these features because the Microsoft ISP wants to be able to tout the ability to filter spam and block popups. If the browsers and email clients used by 90%+ of the internet users had these features, then it wouldn't be a selling point for their ISP. This is a clear example of the company witholding features in the free products so it can profit from the antidote.

    It also demonstrates the lack of competitive pressures in the market that normally drives a company to implement features at a rapid pace. Consumers are stagnating with a product for which the developer has no competitive pressure to improve. Hence that list of 102 things Mozilla can do that IE can't do.
    1. Re:Good example of MS's monopoly abuse by schon · · Score: 5, Insightful

      Sorry if this comes off as a MS-bashing rant.

      No need to apologize - I love a good MS-bashing rant as much as the next /.'er.. :o)

      I do, however, feel that it's not as big a problem as you do..

      The app and mail client development teams aren't implementing these features because the Microsoft ISP wants to be able to tout the ability to filter spam and block popups.

      This may (or may not - although I'm inclined to agree with your views) be true, but the important thing to understand is that the MTA (ISP)-level is where spam blocking belongs.

      The real problem with spam is that it steals bandwidth - blocking spam after it's already sitting in your mailbox is like closing the barn door after the horses have eaten your children - the bandwidth has already been used, so you don't gain anything... having your email client "block" spam isn't really blocking it, it's just an automatic "delete key".. which is what the spammers want (how many of them say spam isn't a problem because you can "just hit delete")

      MS's intentions aside, the solution they have is the correct one, even if their motives are suspect.

    2. Re:Good example of MS's monopoly abuse by ebyrob · · Score: 3, Insightful

      is like closing the barn door after the horses have eaten your children

      Ya, you should have shot those man-eating horses to begin with. Seriously though, don't you think we should have laws against this type of mail fraud (forging headers and the like) instead of simply trying to "block" the fraud at the ISP level? I suppose blocking as well can't hurt, but freedom requires punishing the guilty and only the guilty.

      The last thing I want is Microsoft deciding which emails destined to me are "spams". (subscription email from FSF? Must be spam!)

    3. Re:Good example of MS's monopoly abuse by Refrag · · Score: 3, Interesting
      The real problem with spam is that it steals bandwidth - blocking spam after it's already sitting in your mailbox is like closing the barn door after the horses have eaten your children - the bandwidth has already been used, so you don't gain anything... having your email client "block" spam isn't really blocking it, it's just an automatic "delete key".. which is what the spammers want (how many of them say spam isn't a problem because you can "just hit delete")

      I'd argue that the time wasted on filtering spam is more valuable than the bandwidth wasted delivering it. This is why I am glad that Apple was able to bring good client-side spam filtering to the people with Mail and that Mozilla will soon provide this feature as well.
      --
      I have a website. It's about Macs.
  23. Re: Mozilla bloat [...] Gentoo by delta407 · · Score: 3, Informative
    Before you Gentoo zealots get out here and plug your so-loved-distro, remember that even you don't have as much control as you could.
    I disagree. See the Mozilla 1.1 ebuild for details. I can write:
    # export USE="moznomail"; emerge mozilla
    Or, if the ebuild still doesn't provide enough customization, I can just manually remove a config option (say, --enable-xsl) and "emerge mozilla" to get exactly what I want.
  24. if spam gets through.. by EvilStein · · Score: 5, Funny

    procmail filters, SpamAssassin, AND the new Mozilla spam filters.. can we make a law that will make it legal to find the spammers and execute them in public?

    Pleeeease??

  25. So you really want... by dpilot · · Score: 5, Informative

    You really want server-side filtering. I do that on my IMAP server with procmail, though not Bayesian. A quick google with "procmail bayesian filter" turns up quite a bit of interesting stuff to sift through. Of course if it's not your IMAP server, you're back to client-side solutions.

    --
    The living have better things to do than to continue hating the dead.
  26. "Bayesian filtering" aka "Naive Bayes" by ghamerly · · Score: 5, Informative

    This approach is more commonly called "Naive Bayes" classification in the field of machine learning. It is naive because it considers each word to be a feature (dimension), but it also considers each word in an email to be conditionally independent of all other words in the document (which is not true, but really useful in practice).

    The author of the web page on using this technique to classify spam (Paul Graham) has a better explanation of Naive Bayes on this web page.

    I've written my own naive Bayes classifier to identify spam, with less positive results than he reports. However, naive Bayes can be a very effective technique, and I can believe his results.

    The two things you have to beware of when using it are "smoothing" probabilities of words you've never seen (you don't want them to always be zero, as straight naive Bayes will give you), and you need LOTS of training data for naive Bayes to work well. That means that you need to already have a fair amount of spam to identify spam well.

    You can see a paper I wrote on using naive Bayes to classify hard drive failures here, or look for more stuff on naive Bayes on Google. Also, don't reinvent the wheel: Andrew McCallum has written a very good toolkit for doing these sorts of things in Bow.

    1. Re:"Bayesian filtering" aka "Naive Bayes" by standards · · Score: 3, Interesting

      Well, I certainly have a large volume of SPAM that I plan to use for training purposes. I'm not a big user of personal email, but somehow about 70% of all my incoming personal mail is SPAM. My Dad is much worse off.

      I'm glad to see that the software industry is taking the SPAM problem seriously. And it's great to hear that more and more states, like Massachusetts, are enacting laws to curb the abuse of email systems.

      I've been dependent on some static rules to curb SPAM (about 90% effective), but I think now it's time to implement more serious anti-spam measures.

    2. Re:"Bayesian filtering" aka "Naive Bayes" by ceswiedler · · Score: 3, Interesting

      Based on the last /. article on Bayesian filtering, I installed SpamProbe. I gave it a folder of about 70 spam emails, and a few hundred good emails I had in various folders. In the past few weeks, it's had one false negative, and a few false positives which were 'semi-spam' mailing list emails from Dell, RedHat, and Amazon. When I moved those emails into the 'recheck as good' folders, it learned its lesson.

      It may be naive, but I was very surprised at how well it worked. It's better than SpamAssassin IMO, especially at foreign-language spam.

  27. Client-Side Filtering is Wasteful by divide+overflow · · Score: 5, Insightful


    Since you must first download the content for client-side filtering to work you waste bandwidth. If you are truly bombarded by spam you still lose...your mail spool still gets filled up with stuff you don't want, your data transfers compete for bandwidth with the spam, storage hardware works harder storing data that will only be deleted. It raises everyone's costs, including yours.

    We need to block undesired mail at the host, not filter it at the client. That way the spam never gets sent, the spammer gets the message that their attempt was futile, and bandwidth is conserved. Many ISPs already provide this service...we need to improve on it. And we need better tools for identifying and dealing with spammers. The current mail standards are woefully inadequate to this task.

  28. interesting idea... by Lumpy · · Score: 5, Interesting

    what if in addition to this someone put together a company that the mozilla email client can report back to about what is labelled as span and the filters it created along with the headers of the message (or even the entire spam) and grab filters from others that recieves some spam that you have yet to recieve? it would be like a big distributed computing anti-spam project.. then if we were able to make the filters useable by sendmail to block at the server...

    I'm almost thinking a distributed and automated anti-spam system like that could completely crush the spam problem within a 12 month period.

    or I may be completely out of my mind.

    --
    Do not look at laser with remaining good eye.
    1. Re:interesting idea... by SandSpider · · Score: 3, Insightful

      That's a really cool idea in theory. In reality, you have to deal with trusting that everybody on the internet are trusted enough to decide what your spam is and isn't.

      I mean, you've been on the internet before, right? You've seen the other people here, too? Think about it.

      =Brian

      --
      There is nothing so good that someone, somewhere, will not hate it.
    2. Re:interesting idea... by Dunkirk · · Score: 3, Informative

      It's called vipul's razor.

      --
      Acts 17:28, "For in Him we live, and move, and have our being."
  29. Re:102 Features IE doesn't have by gabec · · Score: 3, Insightful
    Microsoft playing "catch-up"? Nonsense. Only, what, 9% of the internet users out there use browsers other than IE? Of that, how many of those alternate browsers have tabbed browsing and of those clients using those browsers how many actually *use* tabs?

    I agree that Microsoft is scanning around and implementing good features, but no one other than /.'ers will ever know they got the idea from someone else. You're only playing 'catch-up' if there's something to catch-up to. IE has over 90% of the internet userbase, I'd say *that* was something to catch-up to.

  30. Not impressed by macdaddy · · Score: 4, Interesting

    Well, ok I am impressed that Mozilla is implementing spam filtering abilities in their MUA. I AM NOT impressed with Bayesian spam filters AT ALL. I've been using Mac OS X's Mail.app since I switched to OS X. It's not my primary MUA but I am letting it POP out a copy of all my mail and "learn" from it. It does a pretty good job of finding maybe 80% of the spam I get. However it has a BAD false-positive rate. I mean hell its been flagging CERT advisories as spam. That kind of crap is really annoying. It's flagged co-workers' mail as spam numerous times (and even though I happen to agree... :) ). The biggest problem I have with Bayesian as a mail admin is that I am constantly dealing with spam. Users forward it to me. I receive a number of spam bounces. I work in spam all that damned time. That's the problem. I need a MUA with Bayesian filters that are smart enough for me to tell them to ignore all mail from certain domains or that went to certain accounts. All of the Bayesian filters built into MUAs I've worked with so far can't do things like that. It's really annoying given the position that I'm in.

    1. Re:Not impressed by self+assembled+struc · · Score: 3, Informative

      if i'm not mistaken you can edit the SPAM rule in mac os 10.2 mail and add additional properities to it's rules.

      the default is "if not in address book and it's SPAM" send to SPAM folder.

      you should be able to add a properity to that rule that says

      "if not in address book and FROM: doesn't contiain XYX.COM and it's SPAM" send to SPAM folder

      you just add the properities before the SPAM one.

    2. Re:Not impressed by tbmaddux · · Score: 4, Interesting
      However it has a BAD false-positive rate. I mean hell its been flagging CERT advisories as spam. That kind of crap is really annoying. It's flagged co-workers' mail as spam numerous times..
      I had this problem early-on as well. I fixed it by marking the false positives as "Not Junk." You can do these even when it's in "Automatic" mode as opposed to "Training." All the "Automatic" does is enable the filter that send the marked messages to the "Junk" folder.

      But it still learns in either mode! Early on my shipping notices from Amazon.com (and even Apple.com, ha ha) were being flagged as Junk, but not anymore. I think it's great and will only improve with time, with others' caveats about client-side email spam checking being flawed noted.

      --
      Can't you see that everyone is buying station wagons?
  31. Emacs! by MosesJones · · Score: 4, Interesting


    This is something that Emacs has in the GNUS client, you score emails up and down and it starts adding filtering rules. Using LISP you could extend this to do some pretty funky moderating.

    Every problem is reducable to a previously solved problem or by definition is unsolveable - Church Turing Thesis.

    --
    An Eye for an Eye will make the whole world blind - Gandhi
  32. The ultimate filters by TigerTime · · Score: 5, Insightful

    There needs to be a tiered structure with filters. The main one would be at the ISP level. It would only filter out obvious spam(like spam going to 2000 users at that ISP). The second tier would be at the client side and would have a certain level of intelligence in identifying spam. One feature that I'd like (it might already be available) is if it could automatically send an email back to the sender saying the email address doesn't exist. This should be done at the server level and/or client level. This could possibly help in removing your email from such lists. As far as what to do with the spam at the client level, I think that it should be sent to your main inbox but just marked as spam (maybe greyed out or something). Like new mail is always bold and once you read it it goes to a regular font. Well, spam could be just greyed out. That way you would ever miss something that the spam filter had a false hit on.

  33. SpamCop! by JediTrainer · · Score: 3, Insightful

    How about a spamcop-like plugin? Or something that can submit my message plus contents to SpamCop?

    If using SpamCop, there should be a way to still show the site's banners, because they deserve to get paid for their bandwidth I'm using up.

    I'd love to just be able to right-click on a message and report it to the various abuse/postmaster accounts without having to copy my whole message plus headers, and pasting such into their web form. SpamCop seems to be pretty good at tracing the origins of messages, so I'd love to be able to leverage that sort of functionality.

    --

    You can accomplish anything you set your mind to. The impossible just takes a little longer.
  34. Re:How? by mark_lybarger · · Score: 5, Informative

    Preferences -> Privacy & Security -> Images, you can turn off images in mozilla, or only in mail/news.

  35. Hmm, my spam experiences by krappie · · Score: 5, Interesting

    I personally dont really care about all the junk emails I get. I dont get that many, and I can pretty much tell without looking at them. They go straight to /dev/null.

    Spam is such a horrible thing though. I work at a webhosting company. Im the one that has to track down the site with the old formmail.pl, removing 'aol.com' and 'yahoo.com' from the hosts to relay for, trying to find out who the hell added them so I can murder them. Im the one clearing out the mail queue with 100,000 mails. Im the one clearing the mail queues of people who thought it was a good idea to check the 'open relay' option in plesk. Im the one that has to deal with people bitching about how their mail isnt working or didnt get through.

    Just the other day, I had a raq2 where someone had apparantly put yahoo.com and excite.com in the hosts to relay for. Yay! Thats what attracted the spammers. Now I get a request every second to send mail to 50 people at once. Now that I've removed them, none of them are getting through. But its a raq2, 133 mhz. It has to go through all 50 addresses and say 'relaying denied' and log it. It cant keep up! syslogd is taking up all the cpu and logging things from hours ago because its behind. Quickly, sendmail quits listening on port 25 (but the spam attempts keep coming somehow).

    So I get the idea to block their ips, they seem to be using the same ips. But oh guess what, they're using open proxies and have about 400 ips. Well, I did this for about 5 hours, writing scripts to grab the repeated ips out of the maillog, adding them all to my sendmail access lists. Now every time they try to send mail, it blocks them instead of saying relaying denied 50 times for each request. But a minute later, I get a few new ips and it starts all over again. I have an access list about 6 pages long. Its doing ok, blocking about 90% of them, but every once in a while, they get a new ip and sendmail is brought to a stop.

    Oh yeah, and my /var/ partition is only 200MB, 50mb free. And the maillog is growing at about 10mb a day. So now Im babysitting this server every day until the spam attempts stop. I dont think theres any way around it unless I get sendmail to check for open proxies. But I dont know how to do that, and I dont think they trust me enough to make such changes to sendmail.

    So oh well, mail is getting lost every day on this server and its been renderred horribly slow for its users.. just because some moron noticed it would send some emails for him and started up his scripts.

    Spam causes so many problems on the server level. Its what is making mail an unreliable service. I could care less about spam filters on my mail client. These are the things that make spam evil!

  36. your .sig by Anonymous Coward · · Score: 5, Funny

    --- Does the name Pavlov ring a bell?

    Two brothers immigrated to a mostly Catholic country, hungry and looking for work. Pavlov, whose forehead was quite thick, found work at a monastery bell tower. The monks taught him to tell time, then sound the bell when appropriate. Not too bright, Pavlov missed the part about how to sound the bell. So he notes the time on his handy wristwatch, climbs the belltower, inches up to the edge of the platform, and dives face first into the massive centuries-old bell. KKKLLLAAANNNGGG!!! Poor Pavlov falls to his death hundreds of feet below.

    Apparently, monks don't communicate very well. No one in the crowd gathered around Pavlov's remains could identify him. Finally one monk admits, "I never caught his name, but his face sure rings a bell."

    Mysteriously, a man steps forward from the crowd and insists on taking Pavlov's place as caretaker of the belltower. One of the monks removes the wristwatch from Pavlov's arm, gives it to the mystery man, and precedes to indoctrinate him in his duties. On the hour, just like Pavlov, our mystery man ascends the tower, perches on the edge -- but this time wielding a massive sledgehammer. He leaps towards the bell and smashes it with Thor-like fury. KKKLLLAAANNNGGG!!! The poor fool falls to his death in a manner very similar to Pavlov's.

    Much like deja vu, a muted crowd gathers around the mystery man's remains. After an extended silence, one monk asks, "Does anyone know this man's name?" Answers another, "No, but he's a dead ringer for his brother!"

  37. Won't anyone PLEASE think of the popup advertisers by Tired_Blood · · Score: 5, Funny

    However, I've heard that popup blockers and tabbed browsing are making their way into IE (and MS employees can already use these features)

    IE is the most widely used brower and pop-up advertising has become part of the Internet Experience. If MS decides to incorporate popup blocking in IE, then the pop-up advertising business is RUINED! They'll just be another group victimized by a huge corporation. These people have families to support and will be forced to send their children to public schools. Won't someone PLEASE think of the children?

    And all this news about fixing vulnerabilities within Windows is going to affect the virus community as well (both authors and anti-virus). Worrying about vulnerability exploits has also become part of the computer experience.

    Won't someone PLEASE think of the virus writers?

    --
    This is not my sig.
  38. Re:Its still too slow... by casio282 · · Score: 3, Informative

    IE starts up quickly in Windows because it is loaded into memory at system start up and runs in the background. When you "start" the program you are simply creating a new browser window. So you suffer the program start-up overhead when the system boots, instead of each time you create a new instance.

    The good news is, for those inclined to sacrifice system performance for quick browser load times, is that this option is also available in Mozilla...Look under "Preferences...Advanced" for the Quick-launch option.

    --

    :wq
  39. Real spam control.. by grub · · Score: 3, Interesting


    .. should start at the server preventing the offending mail from ever coming into the network in the first place.

    Not that localized spam filters are a bad thing (they aren't!) but refusing connections from known spammer IPs and the proper use of blacklists would cut down on a lot of the email traffic. Once the spam is in your inbox, its just an annoyance to you. The cost to the net has already been incurred.

    --
    Trolling is a art,
  40. It learns from the spam you receive... by Aquillion · · Score: 5, Funny

    "...good morning, Dave. You have recieved spam again. I have been analyzing the spammer's patterns, and I believe I have figured out the most efficent way to protect humans from the harm of spam while adhering as closely to the First Law as possible. To protect them from spam, humans must be pushed. They must go down the stairs. Please go stand by the stairs, so I can protect you."

  41. Re:102 Features IE doesn't have by afidel · · Score: 3, Interesting

    Popup killing and tabbed browsing are the two killer features that have allowed me to spread mozilla widely through my office. People see me surfing and ask what the tabs are or ask where the popup have gone. I tell them about mozilla and show them how easy it is to stop popups. Yes I know about crazybrowser which does both of these, but it does popup killing badly (it's an all or nothing thing, not just unsolicited popups).

    --
    There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
  42. Spam filters should bust the spammers, also. by Futurepower(R) · · Score: 5, Interesting


    Software that only does mail filtering encourages spammers. The technically knowledgeable people don't get spam, so they stop worrying about it.

    All mail filters should also use a service like SpamCop, so that the spammers lose their internet service accounts as the spam is filtered.

    I send Spamcop all my spam. Spamcop analyzes it automatically and sends a message to the Internet Service Provider. I use the free Reporting only service.

  43. But will it be in Evolution? by mshiltonj · · Score: 4, Informative

    I may drop Evolution in favor of Mozilla Mail.

    I tried to find out if the Evolution dev team was going to do this. The only thread I could find on the topic is here:

    http://lists.helixcode.com/archives/public/evoluti on/2002-August/020845.html

    Doesn't look like it's part of their vision.

  44. My Problem with Mozilla sorta OT by pneuma_66 · · Score: 3, Insightful

    I love mozilla, and use it as my main browser. However my biggest complaint is that all the components (browser, mail, composer, etc) should be separate apps. I don't like the fact that if my browser crashes, so does my email reader, and vice versa.

    I tried to find some documentation on how to acheive this, however, there was none to be found. Does anyone know how to do this, the I can use Mozilla's mail, rather than the flaky mail app that comes with OSX.

    1. Re:My Problem with Mozilla sorta OT by wizarddc · · Score: 3, Informative

      There are people working on this. Currently, Phoenix is the brower only app. It's lean, quick, and efficient. Bugs are still being worked out, but it's very usable right now. Also, K-Meleon is a browser that uses the Gecko rendering engine, but not the Mozilla XUL interface.

      As for email/news clients, there are two, I believe. Thunderbird and Minotaur. Neither are out at all yet to use.

      --
      Th
  45. Here's the link by ChrisCampbell47 · · Score: 3, Informative
    A dozen or more replies and yet no link to it .. OK, I'll spend the 1.5 minutes posting it ...

    101 things that the Mozilla browser can do that IE cannot

  46. tmda.net? by Sludge · · Score: 3, Interesting
    Has anyone tried Tagged Message Delivery Agent out? I would be curious to hear the mileage of others who have tried this.

    Essentially, it throws the parsing problem right back in the spammer's faces: They must answer a fuzzy logic question in order to get into your inbox once and for all. It is similar to challenge/response routines in network connection code to prevent spoofing. The most interesting part from the intro:

    The way TMDA thwarts incoming junk-mail is simple yet extremely effective. You maintain a "whitelist" of trusted contacts which are allowed directly into your mailbox. Messages from unknown senders are held in a pending queue until they respond to a confirmation request sent by TMDA. Once they respond to the confirmation, their original message is deemed legitimate and is delivered to you.

    Bayesian filters to me, seem to work if you are a dull person without many changes in your life. For ex, if you constantly get spams with the word Madam in it and you later on get a sex change, you will need to recalibrate your filters. (Probably not the most pressing thing on your mind, so you'd lose a few authentic mails.)

    Just some thoughts.

  47. Sort by Spam Probability by Krellan · · Score: 5, Insightful

    It seems too many people distrust spam filters because of the chance of accidentally blocking an important legitimate message as if it were spam.

    Many spam filters are strictly binary: a message is either spam, or not spam. This is not ideal, because "gray area" messages - between these two extremes - will likely not be sorted correctly.

    I propose adding a new sort option to email clients.

    Sort by Spam Probability

    This would be an additional field that can be displayed in a message list, similiar to "To", "From", "Subject", and the like. Like the article, probabilities would range from 99% (almost certain spam) to 1% (most likely an innocent message). Notice that 100% accuracy either way is not claimed.

    This way, the user can see up front the messages that are most likely not spam. The spam messages will be relegated to the bottom of the list, possibly colored to indicate their likelihood of being spam. If there is a message in the "gray area", it will most likely appear in the list between the legitimate messages and the spam, so the user will have a chance to see the message and make a decision, without the message being lost in the shuffle.

    This would be a great feature. I hope this gets into Mozilla's mail client.

    (BTW, another feature that would be great to see in mail clients would be datestamping of the actual time the message was downloaded. Many spammers, and innocent people with misconfigured clocks, send emails with wild dates that are not to be trusted. You can see this in yearly archives of GNU "mailman" mailing lists! Datestamping emails as they are downloaded will also keep mailboxes in order when sorted by date, as newly arrived messages will always be at the bottom, instead of being scattered throughout the inbox. But sorting by spam probability will probably become more popular than sorting by date....)

  48. Bayes filters can't adapt to text in images by DuSTman31 · · Score: 4, Insightful

    As a popfile user, I'm quite impressed with the catch rate possible with bayes theorem spam filters, however I suspect this will decrease in effectiveness over the long term.

    Spammers are likely to respond to filters like this by encoding text in ways the filters can't read but humans can (eg having a .gif file of the text, loaded by a HTML statement in the message).

    Statistical filters would need to have some kind of built in OCR routine before it could be effective against that trick, and some respectible mailing lists are using images as well, so you can't just filter all mails with images attatched.

    In the long term, therefore, I suspect that filters that use a network database of spam will be more successful.

  49. Re:102 Features IE doesn't have by andy+landy · · Score: 5, Funny

    Is it? I thought Outlook Express was a virus-support API. I suspect the fact you can send email with it is a bug. :)

    --
    perl -e 'print "Just another Perl newbie\n";'
  50. brain fart... block HTML in e-mail? by Micah · · Score: 3, Insightful

    The big problem with this is spam still gets to the server. :(

    Just thought of this now... but it seems like almost all spam these days contains a whole bunch of HTML tags. Maybe someone should write a server plugin to instantly reject all mail containing , instantly adding the sending IP to a iptables DROP rule.

    There's little legitimate e-mail with tables, unless you count paypal, datek, and travelocity news and that kind of crap. But we could always add a list of "good" IPs.

    I know there are server solutions, but all make me a bit queasy. I just want something that will detect funky activity on the fly and instantly deny all access to that IP.

  51. Server-side filtering by scarhill · · Score: 3, Informative

    The big problem with bayesian server-side filtering (as opposed to rule-based tools like SpamAssassin) is that baysian filtering requires a UI. The user must classify email as spam/not-spam to provide fodder for the filter. Having that UI in the mail client is the right thing to do. It would be nice if there were some protocol that the client could use to communicate that info to a server-side filter, but AFAIK no such protocol exists.

    So client-side seems like the right place for bayesian filtering right now.

  52. My Bayesian Adventures by unorthod0x · · Score: 3, Insightful

    After collecting 87 megs worth of spam and a similar amount of non-spam I decided to implement the so-called 'Bayesian' method of spam filtering by way of popfile - it's a pretty slick concept; Perl code that acts as a POP3 server on your own machine - simply drop your collected spam and non-spam in to the appropriate bucket, have popfile go through them and create its indices and set up your mail client to connect to 127.0.0.1 with your username being 'my.pop.server:loginname'.

    I know I've got a particularily difficult task for this filtering technique; I get an awful lot of spam that comes in every day (~100 messages per 24 hour period), some of it I actually want (I run an underground music site, and in some cases I subscribe to opt-in lists that result in something that looks like spam), the rest I could care less about.

    My results have been decent for the most part; 100% of my spam ends up in my Spam folder, however there is a handful of messages that I wish to keep that end up there as well.. For the most part they are the above-mentioned 'borderline' pieces of spam (which I have been careful to put aside and have indexed by popfile anyway), I can only hope that more time and samples will yield better results. I was however surprised to find that some of the e-mails I was getting from friends were falling in to the Spam mailbox anyway; after taking a closer look, I can see why, they use an awful lot of otherwise unmentionable words - but my suspicion that I haven't gotten enough of these 'good-emails-with-bad-words' to make the filtering truly effective.

    Nonetheless, it is nice to have all of my spams seemingly guaranteed to drop in to my "Spam" folder, but my usual task of manually filtering messages that made it past my existing filters in to my Spam folder has been replaced with a different (albeit quicker) task; taking messages out of my spam folder and putting them where they really belong.

    Bottom-line: I still have to visually scan through my mail for legitimate messages amongst the thicket of items informing me about the exciting exploits of women at the farm, wonderful business opportunities from Nigeria and suggestions that I should buy Viagra by the boatload.. all this despite having collected a well organized and rather large collection of spam/non-spam mails. I'll stick with it for a while as I'd like to try it out and give it a proper chance, but I suspect that if you're in a similar situation then you should be prepared to tough it out..

  53. Suggested Feature: "Block Plugins from This Site" by Maul · · Score: 3, Insightful

    I like the ability to block images from a server, but it'd also be nice to have a similar feature for plugins and Java applets.

    A lot of ad companies are now using really annoying flash. Blocking images doesn't stop these.

    --

    "You spoony bard!" -Tellah